Search This Blog

Thursday, June 26, 2025

Analytical engine

From Wikipedia, the free encyclopedia
Portion of the calculating machine with a printing mechanism of the analytical engine, built by Charles Babbage, as displayed at the Science Museum (London)

The analytical engine was a proposed digital mechanical general-purpose computer designed by English mathematician and computer pioneer Charles Babbage. It was first described in 1837 as the successor to Babbage's difference engine, which was a design for a simpler mechanical calculator.

The analytical engine incorporated an arithmetic logic unit, control flow in the form of conditional branching and loops, and integrated memory, making it the first design for a general-purpose computer that could be described in modern terms as Turing-complete. In other words, the structure of the analytical engine was essentially the same as that which has dominated computer design in the electronic era. The analytical engine is one of the most successful achievements of Charles Babbage.

Babbage was never able to complete construction of any of his machines due to conflicts with his chief engineer and inadequate funding. It was not until 1941 that Konrad Zuse built the first general-purpose computer, Z3, more than a century after Babbage had proposed the pioneering analytical engine in 1837.

Design

Two types of punched cards used to program the machine. Foreground: 'operational cards', for inputting instructions; background: 'variable cards', for inputting data

Babbage's first attempt at a mechanical computing device, the difference engine, was a special-purpose machine designed to tabulate logarithms and trigonometric functions by evaluating finite differences to create approximating polynomials. Construction of this machine was never completed; Babbage had conflicts with his chief engineer, Joseph Clement, and ultimately the British government withdrew its funding for the project.

During this project, Babbage realised that a much more general design, the analytical engine, was possible. The work on the design of the analytical engine started around 1833.

The input, consisting of programs ("formulae") and data, was to be provided to the machine via punched cards, a method being used at the time to direct mechanical looms such as the Jacquard loom. For output, the machine would have a printer, a curve plotter, and a bell. The machine would also be able to punch numbers onto cards to be read in later. It employed ordinary base-10 fixed-point arithmetic.

There was to be a store (that is, a memory) capable of holding 1,000 numbers of 40 decimal digits each (ca. 16.6 kB). An arithmetic unit (the "mill") would be able to perform all four arithmetic operations, plus comparisons and optionally square roots. Initially (1838) it was conceived as a difference engine curved back upon itself, in a generally circular layout, with the long store exiting off to one side. Later drawings (1858) depict a regularised grid layout. Like the central processing unit (CPU) in a modern computer, the mill would rely upon its own internal procedures, roughly equivalent to microcode in modern CPUs, to be stored in the form of pegs inserted into rotating drums called "barrels", to carry out some of the more complex instructions the user's program might specify.

The programming language to be employed by users was akin to modern day assembly languages. Loops and conditional branching were possible, and so the language as conceived would have been Turing-complete as later defined by Alan Turing. Three different types of punch cards were used: one for arithmetical operations, one for numerical constants, and one for load and store operations, transferring numbers from the store to the arithmetical unit or back. There were three separate readers for the three types of cards. Babbage developed some two dozen programs for the analytical engine between 1837 and 1840, and one program later. These programs treat polynomials, iterative formulas, Gaussian elimination, and Bernoulli numbers.

In 1842, the Italian mathematician Luigi Federico Menabrea published a description of the engine in French, based on lectures Babbage gave when he visited Turin in 1840. In 1843, the description was translated into English and extensively annotated by Ada Lovelace, who had become interested in the engine eight years earlier. In recognition of her additions to Menabrea's paper, which included a way to calculate Bernoulli numbers using the machine (widely considered to be the first complete computer program), she has been described by many as the first computer programmer, although others have challenged this view.

Construction

Late in his life, Babbage sought ways to build a simplified version of the machine, and assembled a small part of it before his death in 1871.

In 1878, a committee of the British Association for the Advancement of Science described the analytical engine as "a marvel of mechanical ingenuity", but recommended against constructing it. The committee acknowledged the usefulness and value of the machine, but could not estimate the cost of building it, and were unsure whether the machine would function correctly after being built.

Henry Babbage's analytical engine mill, built in 1910, in the Science Museum (London)

Intermittently from 1880 to 1910, Babbage's son Henry Prevost Babbage was constructing a part of the mill and the printing apparatus. In 1910, it was able to calculate a (faulty) list of multiples of pi. This constituted only a small part of the whole engine; it was not programmable and had no storage. (Popular images of this section have sometimes been mislabelled, implying that it was the entire mill or even the entire engine.) Henry Babbage's "analytical engine mill" is on display at the Science Museum in London. Henry also proposed building a demonstration version of the full engine, with a smaller storage capacity: "perhaps for a first machine ten (columns) would do, with fifteen wheels in each". Such a version could manipulate 20 numbers of 25 digits each, and what it could be told to do with those numbers could still be impressive. "It is only a question of cards and time", wrote Henry Babbage in 1888, "... and there is no reason why (twenty thousand) cards should not be used if necessary, in an analytical engine for the purposes of the mathematician".

In 1991, the London Science Museum built a complete and working specimen of Babbage's Difference Engine No. 2, a design that incorporated refinements Babbage discovered during the development of the analytical engine. This machine was built using materials and engineering tolerances that would have been available to Babbage, quelling the suggestion that Babbage's designs could not have been produced using the manufacturing technology of his time.

In October 2010, John Graham-Cumming started a "Plan 28" campaign to raise funds by "public subscription" to enable serious historical and academic study of Babbage's plans, with a view to then build and test a fully working virtual design which will then in turn enable construction of the physical analytical engine. As of May 2016, actual construction had not been attempted, since no consistent understanding could yet be obtained from Babbage's original design drawings. In particular it was unclear whether it could handle the indexed variables which were required for Lovelace's Bernoulli program. By 2017, the "Plan 28" effort reported that a searchable database of all catalogued material was available, and an initial review of Babbage's voluminous Scribbling Books had been completed.

Many of Babbage's original drawings have been digitised and are publicly available online.

Instruction set

Plan diagram of the analytical engine from 1840

Babbage is not known to have written down an explicit set of instructions for the engine in the manner of a modern processor manual. Instead he showed his programs as lists of states during their execution, showing what operator was run at each step with little indication of how the control flow would be guided.

Allan G. Bromley has assumed that the card deck could be read in forwards and backwards directions as a function of conditional branching after testing for conditions, which would make the engine Turing-complete:

...the cards could be ordered to move forward and reverse (and hence to loop)...

The introduction for the first time, in 1845, of user operations for a variety of service functions including, most importantly, an effective system for user control of looping in user programs. There is no indication how the direction of turning of the operation and variable cards is specified. In the absence of other evidence I have had to adopt the minimal default assumption that both the operation and variable cards can only be turned backward as is necessary to implement the loops used in Babbage's sample programs. There would be no mechanical or microprogramming difficulty in placing the direction of motion under the control of the user.

In their emulator of the engine, Fourmilab say:

The Engine's Card Reader is not constrained to simply process the cards in a chain one after another from start to finish. It can, in addition, directed by the very cards it reads and advised by whether the Mill's run-up lever is activated, either advance the card chain forward, skipping the intervening cards, or backward, causing previously-read cards to be processed once again.

This emulator does provide a written symbolic instruction set, though this has been constructed by its authors rather than based on Babbage's original works. For example, a factorial program would be written as:

N0 6
N1 1
N2 1
×
L1
L0
S1
–
L0
L2
S0
L2
L0
CB?11

where the CB is the conditional branch instruction or "combination card" used to make the control flow jump, in this case backward by 11 cards.

Influence

Predicted influence

Babbage understood that the existence of an automatic computer would kindle interest in the field now known as algorithmic efficiency, writing in his Passages from the Life of a Philosopher, "As soon as an analytical engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will then arise—By what course of calculation can these results be arrived at by the machine in the shortest time?"

Computer science

From 1872, Henry continued diligently with his father's work and then intermittently in retirement in 1875.

Percy Ludgate wrote about the engine in 1914 and published his own design for an analytical engine in 1909. It was drawn up in detail, but never built, and the drawings have never been found. Ludgate's engine would be much smaller (about 230 L (8.1 cu ft) than Babbage's, and hypothetically would be capable of multiplying two 20-decimal-digit numbers in about six seconds.

In his work Essays on Automatics (1914) Leonardo Torres Quevedo, inspired by Babbage, designed a theoretical electromechanical calculating machine which was to be controlled by a read-only program. The paper also contains the idea of floating-point arithmetic. In 1920, to celebrate the 100th anniversary of the invention of the arithmometer, Torres presented in Paris the electromechanical arithmometer, which consisted of an arithmetic unit connected to a (possibly remote) typewriter, on which commands could be typed and the results printed automatically.

Vannevar Bush's paper Instrumental Analysis (1936) included several references to Babbage's work. In the same year he started the Rapid Arithmetical Machine project to investigate the problems of constructing an electronic digital computer.

Despite this groundwork, Babbage's work fell into historical obscurity, and the analytical engine was unknown to builders of electromechanical and electronic computing machines in the 1930s and 1940s when they began their work, resulting in the need to re-invent many of the architectural innovations Babbage had proposed. Howard Aiken, who built the quickly-obsoleted electromechanical calculator, the Harvard Mark I, between 1937 and 1945, praised Babbage's work likely as a way of enhancing his own stature, but knew nothing of the analytical engine's architecture during the construction of the Mark I, and considered his visit to the constructed portion of the analytical engine "the greatest disappointment of my life". The Mark I showed no influence from the analytical engine and lacked the analytical engine's most prescient architectural feature, conditional branchingJ. Presper Eckert and John W. Mauchly similarly were not aware of the details of Babbage's analytical engine work prior to the completion of their design for the first electronic general-purpose computer, the ENIAC.

Comparison to other early computers

If the analytical engine had been built, it would have been digital, programmable and Turing-complete. It would, however, have been very slow. Luigi Federico Menabrea reported in Sketch of the Analytical Engine: "Mr. Babbage believes he can, by his engine, form the product of two numbers, each containing twenty figures, in three minutes". By comparison the Harvard Mark I could perform the same task in just six seconds (though it is debatable that computer is Turing complete; the ENIAC, which is, would also have been faster). A modern CPU could do the same thing in under a billionth of a second.

Name First operational Numeral system Computing mechanism Programming Turing complete Memory
Difference engine Not built until the 1990s (design 1820s) Decimal Mechanical Not programmable; initial numerical constants of polynomial differences set physically No Physical state of wheels in axes
Analytical engine Not built (design 1830s) Decimal Mechanical Program-controlled by punched cards Yes (design; not built, yet) Physical state of wheels in axes
Ludgate's analytical engine Not built (design 1909) Decimal Mechanical Program-controlled by punched cards Yes (not built) Physical state of rods
Torres' analytical machine 1920 Decimal Electro-mechanical Not programmable; input and output settings specified by patch cables No Mechanical relays
Zuse Z1 (Germany) 1939 Binary floating point Mechanical Not programmable; cipher input settings specified by patch cables No Physical state of rods
Bombe (Poland, UK, US) 1939 (Polish), March 1940 (British), May 1943 (US) Character computations Electro-mechanical Not programmable; cipher input settings specified by patch cables No Physical state of rotors
Zuse Z2 (Germany) 1940 Binary fixed point Electro-mechanical (mechanical memory) Program-controlled by punched 35 mm film stock (no conditional branch) No Physical state of rods
Zuse Z3 (Germany) May 1941 Binary floating point Electro-mechanical Program-controlled by punched 35 mm film stock (but no conditional branch) In theory (1998) Mechanical relays
Atanasoff–Berry computer (US) 1942 Binary Electronic Not programmable; linear system coefficients input using punched cards No Regenerative capacitor memory
Colossus Mark 1 (UK) December 1943 Binary Electronic Program-controlled by patch cables and switches No Thermionic valves (vacuum tubes) and thyratrons
Harvard Mark I – IBM ASCC (US) May 1944 Decimal Electro-mechanical Program-controlled by 24-channel punched paper tape (but no conditional branch) Debatable Mechanical relays
Colossus Mark 2 (UK) 1 June 1944 Binary Electronic Program-controlled by patch cables and switches Conjectured
Zuse Z4 (Germany) March 1945 (or 1948) Binary floating point Electro-mechanical Program-controlled by punched 35 mm film stock In 1950 Mechanical relays
ENIAC (US) December 1945 Decimal Electronic Program-controlled by patch cables and switches Yes Vacuum tube triode flip-flops
Manchester Baby (UK) June 1948 Binary Electronic Binary program entered into memory by keyboard (first electronic stored-program digital computer) Yes Williams cathode ray tube
EDSAC (UK) May 1949 Binary Electronic Five-bit opcode and variable-length operand (first stored-program computer offering computing services to a wide community). Yes Mercury delay lines
  • The cyberpunk novelists William Gibson and Bruce Sterling co-authored a steampunk novel of alternative history titled The Difference Engine in which Babbage's difference and analytical engines became available to Victorian society. The novel explores the consequences and implications of the early introduction of computational technology.
  • Moriarty by Modem, a short story by Jack Nimersheim, describes an alternative history where Babbage's analytical engine was indeed completed and had been deemed highly classified by the British government. The characters of Sherlock Holmes and Moriarty had in reality been a set of prototype programs written for the analytical engine. This short story follows Holmes as his program is implemented on modern computers and he is forced to compete against his nemesis yet again in the modern counterparts of Babbage's analytical engine.
  • A similar setting to The Difference Engine is used by Sydney Padua in the webcomic The Thrilling Adventures of Lovelace and Babbage[ It features an alternative history where Ada Lovelace and Babbage have built the analytical engine and use it to fight crime at Queen Victoria's request. The comic is based on thorough research on the biographies of and correspondence between Babbage and Lovelace, which is then twisted for humorous effect.
  • The Orion's Arm online project features the Machina Babbagenseii, fully sentient Babbage-inspired mechanical computers. Each is the size of a large asteroid, only capable of surviving in microgravity conditions, and processes data at 0.5% the speed of a human brain.
  • Charles Babbage and Ada Lovelace appear in an episode of Doctor Who, "Spyfall Part 2", where the engine is displayed and referenced.

WorldWide Telescope

From Wikipedia, the free encyclopedia
 
WorldWide Telescope
Original author(s)Jonathan Fay, Curtis Wong
Developer(s)Microsoft Research, .NET Foundation, American Astronomical Society
Initial releaseFebruary 27, 2008
Stable release
6.1.2.0 / July 12, 2022; 2 years ago
Repository
Written inC#
Operating systemMicrosoft Windows; web app version available
Platform.NET Framework, Web platform
Available inEnglish, Chinese, Spanish, German, Russian, Hindi
TypeVisualization software
LicenseMIT License
Websiteworldwidetelescope.org

WorldWide Telescope (WWT) is an open-source set of applications, data and cloud services, originally created by Microsoft Research but now an open source project hosted on GitHub. The .NET Foundation holds the copyright and the project is managed by the American Astronomical Society and has been supported by grants from the Moore Foundation and National Science Foundation. WWT displays astronomical, earth and planetary data allowing visual navigation through the 3-dimensional (3D) Universe. Users are able to navigate the sky by panning and zooming, or explore the 3D universe from the surface of Earth to past the Cosmic microwave background (CMB), viewing both visual imagery and scientific data (academic papers, etc.) about that area and the objects in it. Data is curated from hundreds of different data sources, but its open data nature allows users to explore any third party data that conforms to a WWT supported format. With the rich source of multi-spectral all-sky images it is possible to view the sky in many wavelengths of light. The software utilizes Microsoft's Visual Experience Engine technologies to function. WWT can also be used to visualize arbitrary or abstract data sets and time series data.

WWT is completely free and currently comes in two versions: a native application that runs under Microsoft Windows (this version can use the specialized capabilities of a computer graphics card to render up to a half million data points), and a web client based on HTML5 and WebGL. The web client uses a responsive design which allows people to use it on smartphones and on desktops. The Windows desktop application is a high-performance system which scales from a desktop to large multi-channel full dome digital planetariums.

The WWT project began in 2002, at Microsoft Research and Johns Hopkins University. Database researcher Jim Gray had developed a satellite Earth-images database (Terraserver) and wanted to apply a similar technique to organizing the many disparate astronomical databases of sky images. WWT was announced at the TED Conference in Monterey, California in February 2008. As of 2016, WWT has been downloaded by at least 10 million active users."

As of February 2012 the earth science applications of WWT are showcased and supported by the Layerscape community collaboration website, also created by Microsoft Research. Since WWT has gone to Open Source Layerscape communities have been brought into the WWT application and re-branded simply "communities".

Features

Modes

WorldWide Telescope has six main modes. These are Sky, Earth, Planets, Panoramas, Solar System and Sandbox.

Earth

Earth mode allows users to view a 3D model of the Earth, similar to NASA World Wind, Microsoft Virtual Earth and Google Earth. The Earth mode has a default data set with near global coverage and resolution down to sub-meter in high-population centers. Unlike most Earth viewers, WorldWide Telescope supports many different map projections including Mercator, Equirectangular and Tessellated Octahedral Adaptive Subdivision Transform (TOAST). There are also map layers for seasonal, night, streets, hybrid and science oriented Moderate-Resolution Imaging Spectroradiometer (MODIS) imagery. The new layer manager can be used to add data visualization on the Earth or other planets.

Planets

Planets mode currently allows users to view 3D models of eight celestial bodies: Venus, Mars, Jupiter, the Galilean moons of Jupiter, and Earth's Moon. It also allows users to view a Mandelbrot set.

Sky

Sky mode is the main feature of the software. It allows users to view high quality images of outer space with images from many space and Earth-based telescopes. Each image is shown at its actual position in the sky. There are over 200 full-sky images in spectral bands ranging from radio to gamma-rays There are also thousands of individual study images of various astronomical objects from space telescopes such as the Hubble Space Telescope, the Spitzer Space Telescope in infrared, the Chandra X-ray Observatory, COBE, WMAP, ROSAT, IRAS, GALEX as well as many other space and ground-based telescopes. Sky mode also shows the Sun, Moon, planets, and their moons in their current positions.

Users can add their own image data from FITS files or can convert them to standard image formats such as JPEG, PNG, TIFF. These images can be formatted with the astronomical visual metadata (AVM).

Panoramas

The Panorama mode allows users to view several panoramas, from remote robotic rovers: the Curiosity rover, Mars Exploration Rovers, as well as from the Apollo program astronauts.

Users can include their own panoramas, created by gigapixel panoramas such as the ones available for HDView., or single-shot spherical cameras, such as the Ricoh Theta.

Solar System

This mode displays the major Solar System objects from the Sun to Pluto, and Jupiter's moons, orbits of all Solar System moons, and all 550,000+ minor planets positioned with their correct scale, position and phase. The user can move forward and backward in time at various rates, or type in a time and date for which to view the positions of the planets, and can select viewing location. The program can show the Solar System the way it would look from any location at any time between 1 AD and 4000 AD. Using this tool a user can watch an eclipse (e.g., 2017 total solar eclipse) occultation, or astronomical alignment, and preview where the best spot might be to observe a future event. In this mode it is possible to zoom away from the Solar System, through the Milky Way, and out into the cosmos to see a hypothetical view of the entire known universe. Other bodies, spacecraft and orbital reference frames can be added and visualized in the Solar System Mode using the layer manager.

Users can query the Minor Planet Center for the orbits of minor bodies in the Solar System, such as

Sandbox

The Sandbox mode allows users to view arbitrary 3d models (OBJ or 3DS formats) in an empty universe. For instance, this is useful to explore 3D objects such as molecular data.

Local user content

WorldWide Telescope was designed as a professional research environment and as such it facilitates viewing of user data. Virtually all of the data types and visualizations in WorldWide Telescope can be run using supplied user data either locally or over the network. Any of the above viewing modes allow the user to browse and load equirectangular, fisheye, or dome master images to be viewed as planet surfaces, sky images or panoramas. Images with Astronomy Visualization Metadata (AVM) can be loaded and registered to their location in the sky. Images without AVM can be shown on the sky but the user must align the images in the sky by moving, scaling and rotating the images until star patterns align. Once the images are aligned they can be saved to collections for later viewing and sharing. The layer manager can be used to add vector or image data to planet surfaces or in orbit.

Layer Manager

Introduced in the Aphelion release, the Layer Manager allows management of relative reference frames allowing data and images to be places on Earth, the planets, moons, the sky or anywhere else in the universe. Data can be loaded from files, linked live with Microsoft Excel, or pasted in from other applications. Layers support 3D points and Well-known text representation of geometry (WKT), shape files, 3D models, orbital elements, image layers and more. Time series data can be viewed to see smoothly animated events over time. Reference frames can contain orbital information allowing 3d models or other data to be plotted at their correct location over time.

Use for amateur astronomy

The program allows the selection of a telescope and camera and can preview the field of view against the sky. Using ASCOM the user can connect a computer-controlled telescope or an astronomical pointing device such as Meade's MySky, and then either control or follow it. The large selection of catalog objects and 1 arc-second-per-pixel imagery allow an astrophotographer to select and plan a photograph and find a suitable guide star using the multi-chip FOV indicator.

Tours

WorldWide Telescope contains a multimedia authoring environment that allows users or educators to create tours with a simple slide-based paradigm. The slides can have a begin and end camera position allowing for easy Ken Burns Effects. Pictures, objects, and text can be added to the slides, and tours can have both background music and voice-overs with separate volume control. The layer manager can be used in conjunction with a tour to publish user data visualizations with annotations and animations. One of the tours featured was made by a six-year-old boy, while other tours are made by astrophysicists such as Dr. Alyssa A. Goodman of the Center for Astrophysics | Harvard & Smithsonian and Dr. Robert L. Hurt of Caltech/JPL.

Communities

Communities are a way of allowing organizations and communities to add their own images, tours, catalogs and research materials to the WorldWide Telescope interface. The concept is similar to subscribing to a RSS feed except the contents are astronomical metadata.

Virtual observatory

The WorldWide Telescope was designed to be the embodiment of a rich virtual observatory client envisioned by Turing Award winner Jim Gray and JHU astrophysicist and co-principal investigator for the US National Virtual Observatory, Alex Szalay in their paper titled "The WorldWide Telescope". The WorldWide Telescope program makes use of IVOA standards for inter-operating with data providers to provide its image, search and catalog data. Rather than concentrate all data into one database, the WorldWide Telescope sources its data from all over the web and the available content grows as more VO compliant data sources are placed on the web.

Full dome planetarium support

Visualization using WorldWide Telescope at the Hayden Planetarium

The WorldWide Telescope Windows client application supports both single and multichannel full-dome video projection allowing it to power full-dome digital planetarium systems. It is currently installed in several world-class planetariums where it runs on turn-key planetarium system. It can also be used to create a stand-alone planetarium by using the included tools for calibration, alignment, and blending. This allows using consumer DLP projectors to create a projection system with resolution, performance and functionality comparable to high-end turnkey solutions, at a fraction of the cost. The University of Washington pioneered this approach with the UW Planetarium. WorldWide Telescope can also be used in single channel mode from a laptop using a mirror dome or fisheye projector to display on inflatable domes, or even on user constructed low-cost planetariums for which plans are available on their website.

Reception

WorldWide Telescope was praised before its announcement in a post by blogger Robert Scoble, who said the demo had made him cry. He later called it "the most fabulous thing I’ve seen Microsoft do in years."

Dr. Roy Gould of the Center for Astrophysics | Harvard & Smithsonian said:

"The WorldWide Telescope takes the best images from the greatest telescopes on Earth ... and in space ... and assembles them into a seamless, holistic view of the universe. This new resource will change the way we do astronomy ... the way we teach astronomy ... and, most importantly, I think it's going to change the way we see ourselves in the universe,"..."The creators of the WorldWide Telescope have now given us a way to have a dialogue with our universe."

A PC World review of the original beta concluded that WorldWide Telescope "has a few shortcomings" but "is a phenomenal resource for enthusiasts, students, and teachers." It also believed the product to be "far beyond Google's current offerings."

Prior to the cross-platform web client release, at least one reviewer regretted the lack of support for non-Windows operating systems, the slow speed at which imagery loads, and the lack of KML support.

Awards

  • 365: AIGA Annual Design Competitions 29, experience design category
  • I.D. Magazine 2009 Annual Design Review, Best of Category: Interactive
  • Astroinformatics

    From Wikipedia, the free encyclopedia
    Hyperion proto-supercluster unveiled by measurements and examination of archive data

    Astroinformatics is an interdisciplinary field of study involving the combination of astronomy, data science, machine learning, informatics, and information/communications technologies. The field is closely related to astrostatistics.

    Data-driven astronomy (DDA) refers to the use of data science in astronomy. Several outputs of telescopic observations and sky surveys are taken into consideration and approaches related to data mining and big data management are used to analyze, filter, and normalize the data set that are further used for making Classifications, Predictions, and Anomaly detections by advanced Statistical approaches, digital image processing and machine learning. The output of these processes is used by astronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the cosmos.

    Background

    Astroinformatics is primarily focused on developing the tools, methods, and applications of computational science, data science, machine learning, and statistics for research and education in data-oriented astronomy. Early efforts in this direction included data discovery, metadata standards development, data modeling, astronomical data dictionary development, data access, information retrievaldata integration, and data mining in the astronomical Virtual Observatory initiatives. Further development of the field, along with astronomy community endorsement, was presented to the National Research Council (United States) in 2009 in the astroinformatics "state of the profession" position paper for the 2010 Astronomy and Astrophysics Decadal Survey. That position paper provided the basis for the subsequent more detailed exposition of the field in the Informatics Journal paper Astroinformatics: Data-Oriented Astronomy Research and Education.

    Astroinformatics as a distinct field of research was inspired by work in the fields of Geoinformatics, Cheminformatics, Bioinformatics, and through the eScience work of Jim Gray (computer scientist) at Microsoft Research, whose legacy was remembered and continued through the Jim Gray eScience Awards.

    Although the primary focus of astroinformatics is on the large worldwide distributed collection of digital astronomical databases, image archives, and research tools, the field recognizes the importance of legacy data sets as well—using modern technologies to preserve and analyze historical astronomical observations. Some Astroinformatics practitioners help to digitize historical and recent astronomical observations and images in a large database for efficient retrieval through web-based interfaces.Another aim is to help develop new methods and software for astronomers, as well as to help facilitate the process and analysis of the rapidly growing amount of data in the field of astronomy.

    Astroinformatics is described as the "fourth paradigm" of astronomical research. There are many research areas involved with astroinformatics, such as data mining, machine learning, statistics, visualization, scientific data management, and semantic science. Data mining and machine learning play significant roles in astroinformatics as a scientific research discipline due to their focus on "knowledge discovery from data" (KDD) and "learning from data".

    The amount of data collected from astronomical sky surveys has grown from gigabytes to terabytes throughout the past decade and is predicted to grow in the next decade into hundreds of petabytes with the Large Synoptic Survey Telescope and into the exabytes with the Square Kilometre Array. This plethora of new data both enables and challenges effective astronomical research. Therefore, new approaches are required. In part due to this, data-driven science is becoming a recognized academic discipline. Consequently, astronomy (and other scientific disciplines) are developing information-intensive and data-intensive sub-disciplines to an extent that these sub-disciplines are now becoming (or have already become) standalone research disciplines and full-fledged academic programs. While many institutes of education do not boast an astroinformatics program, such programs most likely will be developed in the near future.

    Informatics has been recently defined as "the use of digital data, information, and related services for research and knowledge generation". However the usual, or commonly used definition is "informatics is the discipline of organizing, accessing, integrating, and mining data from multiple sources for discovery and decision support." Therefore, the discipline of astroinformatics includes many naturally-related specialties including data modeling, data organization, etc. It may also include transformation and normalization methods for data integration and information visualization, as well as knowledge extraction, indexing techniques, information retrieval and data mining methods. Classification schemes (e.g., taxonomies, ontologies, folksonomies, and/or collaborative tagging) plus Astrostatistics will also be heavily involved. Citizen science projects (such as Galaxy Zoo) also contribute highly valued novelty discovery, feature meta-tagging, and object characterization within large astronomy data sets. All of these specialties enable scientific discovery across varied massive data collections, collaborative research, and data re-use, in both research and learning environments.

    In 2007, the Galaxy Zoo project was launched for morphological classification of a large number of galaxies. In this project, 900,000 images were considered for classification that were taken from the Sloan Digital Sky Survey (SDSS) for the past 7 years. The task was to study each picture of a galaxy, classify it as elliptical or spiral, and determine whether it was spinning or not. The team of Astrophysicists led by Kevin Schawinski in Oxford University were in charge of this project and Kevin and his colleague Chris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work. There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.

    In 2012, two position papers were presented to the Council of the American Astronomical Society that led to the establishment of formal working groups in astroinformatics and Astrostatistics for the profession of astronomy within the US and elsewhere.

    Astroinformatics provides a natural context for the integration of education and research. The experience of research can now be implemented within the classroom to establish and grow data literacy through the easy re-use of data. It also has many other uses, such as repurposing archival data for new projects, literature-data links, intelligent retrieval of information, and many others.

    Methodology

    The data retrieved from the sky surveys are first brought for data preprocessing. In this, redundancies are removed and filtrated. Further, feature extraction is performed on this filtered data set, which is further taken for processes. Some of the renowned sky surveys are listed below:

    The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB. Further, data mining tasks that are involved in the management and manipulation of the data involve methods like classification, regression, clustering, anomaly detection, and time-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.

    Classification

    Classification is used for specific identifications and categorizations of astronomical data such as Spectral classification, Photometric classification, Morphological classification, and classification of solar activity. The approaches of classification techniques are listed below:

    Regression

    Regression is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars. The approaches are listed below:

    Clustering

    Clustering is classifying objects based on a similarity measure metric. It is used in Astronomy for Classification as well as Special/rare object detection. The approaches are listed below:

    Anomaly detection

    Anomaly detection is used for detecting irregularities in the dataset. However, this technique is used here to detect rare/special objects. The following approaches are used:

    Time-series analysis

    Time-Series analysis helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:

    Analytical engine

    From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Analytical_engine   ...