A Medley of Potpourri

Thursday, June 26, 2025

WorldWide Telescope

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/WorldWide_Telescope

WorldWide Telescope

WorldWide Telescope viewing a Hubble image of the Whirlpool Galaxy (M51)
Original author(s)	Jonathan Fay, Curtis Wong
Developer(s)	Microsoft Research, .NET Foundation, American Astronomical Society
Initial release	February 27, 2008
Stable release	6.1.2.0 / July 12, 2022; 2 years ago
Repository	github.com/WorldWideTelescope
Written in	C#
Operating system	Microsoft Windows; web app version available
Platform	.NET Framework, Web platform
Available in	English, Chinese, Spanish, German, Russian, Hindi
Type	Visualization software
License	MIT License
Website	worldwidetelescope.org

WorldWide Telescope (WWT) is an open-source set of applications, data and cloud services, originally created by Microsoft Research but now an open source project hosted on GitHub. The .NET Foundation holds the copyright and the project is managed by the American Astronomical Society and has been supported by grants from the Moore Foundation and National Science Foundation. WWT displays astronomical, earth and planetary data allowing visual navigation through the 3-dimensional (3D) Universe. Users are able to navigate the sky by panning and zooming, or explore the 3D universe from the surface of Earth to past the Cosmic microwave background (CMB), viewing both visual imagery and scientific data (academic papers, etc.) about that area and the objects in it. Data is curated from hundreds of different data sources, but its open data nature allows users to explore any third party data that conforms to a WWT supported format. With the rich source of multi-spectral all-sky images it is possible to view the sky in many wavelengths of light. The software utilizes Microsoft's Visual Experience Engine technologies to function. WWT can also be used to visualize arbitrary or abstract data sets and time series data.

WWT is completely free and currently comes in two versions: a native application that runs under Microsoft Windows (this version can use the specialized capabilities of a computer graphics card to render up to a half million data points), and a web client based on HTML5 and WebGL. The web client uses a responsive design which allows people to use it on smartphones and on desktops. The Windows desktop application is a high-performance system which scales from a desktop to large multi-channel full dome digital planetariums.

The WWT project began in 2002, at Microsoft Research and Johns Hopkins University. Database researcher Jim Gray had developed a satellite Earth-images database (Terraserver) and wanted to apply a similar technique to organizing the many disparate astronomical databases of sky images. WWT was announced at the TED Conference in Monterey, California in February 2008. As of 2016, WWT has been downloaded by at least 10 million active users."

As of February 2012 the earth science applications of WWT are showcased and supported by the Layerscape community collaboration website, also created by Microsoft Research. Since WWT has gone to Open Source Layerscape communities have been brought into the WWT application and re-branded simply "communities".

Features

Modes

WorldWide Telescope has six main modes. These are Sky, Earth, Planets, Panoramas, Solar System and Sandbox.

Earth

Earth mode allows users to view a 3D model of the Earth, similar to NASA World Wind, Microsoft Virtual Earth and Google Earth. The Earth mode has a default data set with near global coverage and resolution down to sub-meter in high-population centers. Unlike most Earth viewers, WorldWide Telescope supports many different map projections including Mercator, Equirectangular and Tessellated Octahedral Adaptive Subdivision Transform (TOAST). There are also map layers for seasonal, night, streets, hybrid and science oriented Moderate-Resolution Imaging Spectroradiometer (MODIS) imagery. The new layer manager can be used to add data visualization on the Earth or other planets.

Planets

Planets mode currently allows users to view 3D models of eight celestial bodies: Venus, Mars, Jupiter, the Galilean moons of Jupiter, and Earth's Moon. It also allows users to view a Mandelbrot set.

Sky

Sky mode is the main feature of the software. It allows users to view high quality images of outer space with images from many space and Earth-based telescopes. Each image is shown at its actual position in the sky. There are over 200 full-sky images in spectral bands ranging from radio to gamma-rays There are also thousands of individual study images of various astronomical objects from space telescopes such as the Hubble Space Telescope, the Spitzer Space Telescope in infrared, the Chandra X-ray Observatory, COBE, WMAP, ROSAT, IRAS, GALEX as well as many other space and ground-based telescopes. Sky mode also shows the Sun, Moon, planets, and their moons in their current positions.

Users can add their own image data from FITS files or can convert them to standard image formats such as JPEG, PNG, TIFF. These images can be formatted with the astronomical visual metadata (AVM).

Panoramas

The Panorama mode allows users to view several panoramas, from remote robotic rovers: the Curiosity rover, Mars Exploration Rovers, as well as from the Apollo program astronauts.

Users can include their own panoramas, created by gigapixel panoramas such as the ones available for HDView., or single-shot spherical cameras, such as the Ricoh Theta.

Solar System

This mode displays the major Solar System objects from the Sun to Pluto, and Jupiter's moons, orbits of all Solar System moons, and all 550,000+ minor planets positioned with their correct scale, position and phase. The user can move forward and backward in time at various rates, or type in a time and date for which to view the positions of the planets, and can select viewing location. The program can show the Solar System the way it would look from any location at any time between 1 AD and 4000 AD. Using this tool a user can watch an eclipse (e.g., 2017 total solar eclipse) occultation, or astronomical alignment, and preview where the best spot might be to observe a future event. In this mode it is possible to zoom away from the Solar System, through the Milky Way, and out into the cosmos to see a hypothetical view of the entire known universe. Other bodies, spacecraft and orbital reference frames can be added and visualized in the Solar System Mode using the layer manager.

Users can query the Minor Planet Center for the orbits of minor bodies in the Solar System, such as

Sandbox

The Sandbox mode allows users to view arbitrary 3d models (OBJ or 3DS formats) in an empty universe. For instance, this is useful to explore 3D objects such as molecular data.

Local user content

WorldWide Telescope was designed as a professional research environment and as such it facilitates viewing of user data. Virtually all of the data types and visualizations in WorldWide Telescope can be run using supplied user data either locally or over the network. Any of the above viewing modes allow the user to browse and load equirectangular, fisheye, or dome master images to be viewed as planet surfaces, sky images or panoramas. Images with Astronomy Visualization Metadata (AVM) can be loaded and registered to their location in the sky. Images without AVM can be shown on the sky but the user must align the images in the sky by moving, scaling and rotating the images until star patterns align. Once the images are aligned they can be saved to collections for later viewing and sharing. The layer manager can be used to add vector or image data to planet surfaces or in orbit.

Layer Manager

Introduced in the Aphelion release, the Layer Manager allows management of relative reference frames allowing data and images to be places on Earth, the planets, moons, the sky or anywhere else in the universe. Data can be loaded from files, linked live with Microsoft Excel, or pasted in from other applications. Layers support 3D points and Well-known text representation of geometry (WKT), shape files, 3D models, orbital elements, image layers and more. Time series data can be viewed to see smoothly animated events over time. Reference frames can contain orbital information allowing 3d models or other data to be plotted at their correct location over time.

Use for amateur astronomy

The program allows the selection of a telescope and camera and can preview the field of view against the sky. Using ASCOM the user can connect a computer-controlled telescope or an astronomical pointing device such as Meade's MySky, and then either control or follow it. The large selection of catalog objects and 1 arc-second-per-pixel imagery allow an astrophotographer to select and plan a photograph and find a suitable guide star using the multi-chip FOV indicator.

Tours

WorldWide Telescope contains a multimedia authoring environment that allows users or educators to create tours with a simple slide-based paradigm. The slides can have a begin and end camera position allowing for easy Ken Burns Effects. Pictures, objects, and text can be added to the slides, and tours can have both background music and voice-overs with separate volume control. The layer manager can be used in conjunction with a tour to publish user data visualizations with annotations and animations. One of the tours featured was made by a six-year-old boy, while other tours are made by astrophysicists such as Dr. Alyssa A. Goodman of the Center for Astrophysics | Harvard & Smithsonian and Dr. Robert L. Hurt of Caltech/JPL.

Communities

Communities are a way of allowing organizations and communities to add their own images, tours, catalogs and research materials to the WorldWide Telescope interface. The concept is similar to subscribing to a RSS feed except the contents are astronomical metadata.

Virtual observatory

The WorldWide Telescope was designed to be the embodiment of a rich virtual observatory client envisioned by Turing Award winner Jim Gray and JHU astrophysicist and co-principal investigator for the US National Virtual Observatory, Alex Szalay in their paper titled "The WorldWide Telescope". The WorldWide Telescope program makes use of IVOA standards for inter-operating with data providers to provide its image, search and catalog data. Rather than concentrate all data into one database, the WorldWide Telescope sources its data from all over the web and the available content grows as more VO compliant data sources are placed on the web.

Full dome planetarium support

The WorldWide Telescope Windows client application supports both single and multichannel full-dome video projection allowing it to power full-dome digital planetarium systems. It is currently installed in several world-class planetariums where it runs on turn-key planetarium system. It can also be used to create a stand-alone planetarium by using the included tools for calibration, alignment, and blending. This allows using consumer DLP projectors to create a projection system with resolution, performance and functionality comparable to high-end turnkey solutions, at a fraction of the cost. The University of Washington pioneered this approach with the UW Planetarium. WorldWide Telescope can also be used in single channel mode from a laptop using a mirror dome or fisheye projector to display on inflatable domes, or even on user constructed low-cost planetariums for which plans are available on their website.

Reception

WorldWide Telescope was praised before its announcement in a post by blogger Robert Scoble, who said the demo had made him cry. He later called it "the most fabulous thing I’ve seen Microsoft do in years."

Dr. Roy Gould of the Center for Astrophysics | Harvard & Smithsonian said:

"The WorldWide Telescope takes the best images from the greatest telescopes on Earth ... and in space ... and assembles them into a seamless, holistic view of the universe. This new resource will change the way we do astronomy ... the way we teach astronomy ... and, most importantly, I think it's going to change the way we see ourselves in the universe,"..."The creators of the WorldWide Telescope have now given us a way to have a dialogue with our universe."

A PC World review of the original beta concluded that WorldWide Telescope "has a few shortcomings" but "is a phenomenal resource for enthusiasts, students, and teachers." It also believed the product to be "far beyond Google's current offerings."

Prior to the cross-platform web client release, at least one reviewer regretted the lack of support for non-Windows operating systems, the slow speed at which imagery loads, and the lack of KML support.

Awards

365: AIGA Annual Design Competitions 29, experience design category

I.D. Magazine 2009 Annual Design Review, Best of Category: Interactive

Astroinformatics

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Astroinformatics

Astroinformatics is an interdisciplinary field of study involving the combination of astronomy, data science, machine learning, informatics, and information/communications technologies.The field is closely related to astrostatistics.

Data-driven astronomy (DDA) refers to the use of data science in astronomy. Several outputs of telescopic observations and sky surveys are taken into consideration and approaches related to data mining and big data management are used to analyze, filter, and normalize the data set that are further used for making Classifications, Predictions, and Anomaly detections by advanced Statistical approaches, digital image processing and machine learning. The output of these processes is used by astronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the cosmos.

Background

Astroinformatics is primarily focused on developing the tools, methods, and applications of computational science, data science, machine learning, and statistics for research and education in data-oriented astronomy. Early efforts in this direction included data discovery, metadata standards development, data modeling, astronomical data dictionary development, data access, information retrieval, data integration, and data mining in the astronomical Virtual Observatory initiatives. Further development of the field, along with astronomy community endorsement, was presented to the National Research Council (United States) in 2009 in the astroinformatics "state of the profession" position paper for the 2010 Astronomy and Astrophysics Decadal Survey. That position paper provided the basis for the subsequent more detailed exposition of the field in the Informatics Journal paper Astroinformatics: Data-Oriented Astronomy Research and Education.

Astroinformatics as a distinct field of research was inspired by work in the fields of Geoinformatics, Cheminformatics, Bioinformatics, and through the eScience work of Jim Gray (computer scientist) at Microsoft Research, whose legacy was remembered and continued through the Jim Gray eScience Awards.

Although the primary focus of astroinformatics is on the large worldwide distributed collection of digital astronomical databases, image archives, and research tools, the field recognizes the importance of legacy data sets as well—using modern technologies to preserve and analyze historical astronomical observations. Some Astroinformatics practitioners help to digitize historical and recent astronomical observations and images in a large database for efficient retrieval through web-based interfaces.Another aim is to help develop new methods and software for astronomers, as well as to help facilitate the process and analysis of the rapidly growing amount of data in the field of astronomy.

Astroinformatics is described as the "fourth paradigm" of astronomical research. There are many research areas involved with astroinformatics, such as data mining, machine learning, statistics, visualization, scientific data management, and semantic science. Data mining and machine learning play significant roles in astroinformatics as a scientific research discipline due to their focus on "knowledge discovery from data" (KDD) and "learning from data".

The amount of data collected from astronomical sky surveys has grown from gigabytes to terabytes throughout the past decade and is predicted to grow in the next decade into hundreds of petabytes with the Large Synoptic Survey Telescope and into the exabytes with the Square Kilometre Array. This plethora of new data both enables and challenges effective astronomical research. Therefore, new approaches are required. In part due to this, data-driven science is becoming a recognized academic discipline. Consequently, astronomy (and other scientific disciplines) are developing information-intensive and data-intensive sub-disciplines to an extent that these sub-disciplines are now becoming (or have already become) standalone research disciplines and full-fledged academic programs. While many institutes of education do not boast an astroinformatics program, such programs most likely will be developed in the near future.

Informatics has been recently defined as "the use of digital data, information, and related services for research and knowledge generation". However the usual, or commonly used definition is "informatics is the discipline of organizing, accessing, integrating, and mining data from multiple sources for discovery and decision support." Therefore, the discipline of astroinformatics includes many naturally-related specialties including data modeling, data organization, etc. It may also include transformation and normalization methods for data integration and information visualization, as well as knowledge extraction, indexing techniques, information retrieval and data mining methods. Classification schemes (e.g., taxonomies, ontologies, folksonomies, and/or collaborative tagging) plus Astrostatistics will also be heavily involved. Citizen science projects (such as Galaxy Zoo) also contribute highly valued novelty discovery, feature meta-tagging, and object characterization within large astronomy data sets. All of these specialties enable scientific discovery across varied massive data collections, collaborative research, and data re-use, in both research and learning environments.

In 2007, the Galaxy Zoo project was launched for morphological classification of a large number of galaxies. In this project, 900,000 images were considered for classification that were taken from the Sloan Digital Sky Survey (SDSS) for the past 7 years. The task was to study each picture of a galaxy, classify it as elliptical or spiral, and determine whether it was spinning or not. The team of Astrophysicists led by Kevin Schawinski in Oxford University were in charge of this project and Kevin and his colleague Chris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work. There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.

In 2012, two position paperswere presented to the Council of the American Astronomical Society that led to the establishment of formal working groups in astroinformatics and Astrostatistics for the profession of astronomy within the US and elsewhere.

Astroinformatics provides a natural context for the integration of education and research. The experience of research can now be implemented within the classroom to establish and grow data literacy through the easy re-use of data. It also has many other uses, such as repurposing archival data for new projects, literature-data links, intelligent retrieval of information, and many others.

Methodology

The data retrieved from the sky surveys are first brought for data preprocessing. In this, redundancies are removed and filtrated. Further, feature extraction is performed on this filtered data set, which is further taken for processes. Some of the renowned sky surveys are listed below:

The Palomar Digital Sky Survey (DPOSS)
The Two-Micron All Sky Survey (2MASS)
Green Bank Telescope (GBT)
The Galaxy Evolution Explorer (GALEX)
The Sloan Digital Sky Survey (SDSS)
SkyMapper Southern Sky Survey (SMSS)
The Panoramic Survey Telescope and Rapid Response System (PanSTARRS)
The Large Synoptic Survey Telescope (LSST)
The Square Kilometer Array (SKA)

The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB. Further, data mining tasks that are involved in the management and manipulation of the data involve methods like classification, regression, clustering, anomaly detection, and time-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.

Classification

Classification is used for specific identifications and categorizations of astronomical data such as Spectral classification, Photometric classification, Morphological classification, and classification of solar activity. The approaches of classification techniques are listed below:

Regression

Regression is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars. The approaches are listed below:

Clustering

Clustering is classifying objects based on a similarity measure metric. It is used in Astronomy for Classification as well as Special/rare object detection. The approaches are listed below:

Anomaly detection

Anomaly detection is used for detecting irregularities in the dataset. However, this technique is used here to detect rare/special objects. The following approaches are used:

Time-series analysis

Time-Series analysis helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:

Wednesday, June 25, 2025

Data science

From Wikipedia, the free encyclopedia

The existence of Comet NEOWISE (here depicted as a series of red dots) was discovered by analyzing astronomical survey data acquired by a space telescope, the Wide-field Infrared Survey Explorer.

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, structured, or unstructured data.

Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.

Data science is "a concept to unify statistics, data analysis, informatics, and their related methods" to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science. Turing Award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational, and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

A data scientist is a professional who creates programming code and combines it with statistical knowledge to summarize data.

Foundations

Data science is an interdisciplinary field focused on extracting knowledge from typically large data sets and applying the knowledge from that data to solve problems in other application domains. The field encompasses preparing data for analysis, formulating data science problems, analyzing data, and summarizing these findings. As such, it incorporates skills from computer science, mathematics, data visualization, graphic design, communication, and business.

Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g., from images, text, sensors, transactions, customer information, etc.) and emphasizes prediction and action. Andrew Gelman of Columbia University has described statistics as a non-essential part of data science. Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data-science program. He describes data science as an applied field growing out of traditional statistics.

Etymology

Early usage

In 1962, John Tukey described a field he called "data analysis", which resembles modern data science. In 1985, in a lecture given to the Chinese Academy of Sciences in Beijing, C. F. Jeff Wu used the term "data science" for the first time as an alternative name for statistics. Later, attendees at a 1992 statistics symposium at the University of Montpellier II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.

The term "data science" has been traced back to 1974, when Peter Naur proposed it as an alternative name to computer science. In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic. However, the definition was still in flux. After the 1985 lecture at the Chinese Academy of Sciences in Beijing, in 1997 C. F. Jeff Wu again suggested that statistics should be renamed data science. He reasoned that a new name would help statistics shed inaccurate stereotypes, such as being synonymous with accounting or limited to describing data. In 1998, Hayashi Chikio argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.

Modern usage

In 2012, technologists Thomas H. Davenport and DJ Patil declared "Data Scientist: The Sexiest Job of the 21st Century", a catchphrase that was picked up even by major-city newspapers like the New York Times and the Boston Globe. A decade later, they reaffirmed it, stating that "the job is more in demand than ever with employers".

The modern conception of data science as an independent discipline is sometimes attributed to William S. Cleveland. In 2014, the American Statistical Association's Section on Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and Data Science, reflecting the ascendant popularity of data science.

The professional title of "data scientist" has been attributed to DJ Patil and Jeff Hammerbacher in 2008. Though it was used by the National Science Board in their 2005 report "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century", it referred broadly to any key role in managing a digital data collection.

Data science and data analysis

Data analysis typically involves working with structured datasets to answer specific questions or solve specific problems. This can involve tasks such as data cleaning and data visualization to summarize data and develop hypotheses about relationships between variables. Data analysts typically use statistical methods to test these hypotheses and draw conclusions from the data.

Data science involves working with larger datasets that often require advanced computational and statistical methods to analyze. Data scientists often work with unstructured data such as text or images and use machine learning algorithms to build predictive models. Data science often uses statistical analysis, data preprocessing, and supervised learning.

Cloud computing for data science

Cloud computing can offer access to large amounts of computational power and storage. In big data, where volumes of information are continually generated and processed, these platforms can be used to handle complex and resource-intensive analytical tasks.

Some distributed computing frameworks are designed to handle big data workloads. These frameworks can enable data scientists to process and analyze large datasets in parallel, which can reduce processing times.

Ethical consideration in data science

Data science involves collecting, processing, and analyzing data which often includes personal and sensitive information. Ethical concerns include potential privacy violations, bias perpetuation, and negative societal impacts.

Machine learning models can amplify existing biases present in training data, leading to discriminatory or unfair outcomes.

Data modeling

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Data_modeling

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. It may be applied as part of broader Model-driven engineering (MDE) concept.

Overview

Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.

There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them.

Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:

to assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations and clients to understand and use an agreed-upon semi-formal model that encompasses the concepts of the organization and how they relate to one another
to manage data as a resource
to integrate information systems
to design databases/data warehouses (aka data repositories)

Data modelling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead, a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten et al. (2004) determined two types of data modelling:

Strategic data modelling: This is part of the creation of an information systems strategy, which defines an overall vision and architecture for information systems. Information technology engineering is a methodology that embraces this approach.
Data modelling during systems analysis: In systems analysis logical data models are created as part of the development of new databases.

Data modelling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modelling because a data model is eventually implemented in a database.

Topics

Data models

Data models provide a framework for data to be used within information systems by providing specific definitions and formats. If a data model is used consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor.

Some common problems found in data models are:

Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces. So, business rules need to be implemented in a flexible way that does not result in complicated dependencies, rather the data model should be flexible enough so that changes in the business can be implemented within the data model in a relatively quick and efficient way.
Entity types are often not identified, or are identified incorrectly. This can lead to replication of data, data structure and functionality, together with the attendant costs of that duplication in development and maintenance. Therefore, data definitions should be made as explicit and easy to understand as possible to minimize misinterpretation and duplication.
Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25 and 70% of the cost of current systems. Required interfaces should be considered inherently while designing a data model, as a data model on its own would not be usable without interfaces within different systems.
Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data have not been standardised. To obtain optimal value from an implemented data model, it is very important to define standards that will ensure that data models will both meet business needs and be consistent.

Conceptual, logical and physical schemas

In 1975 ANSI described three kinds of data-model instance:

Conceptual schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements.
Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same.
Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model.

Data modeling process

In the context of business process integration (see figure), data modeling complements business process modeling, and ultimately results in database generation.

The process of designing a database involves producing the previously described three types of schemas – conceptual, logical, and physical. The database design documented in these schemas is converted through a Data Definition Language, which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the Database Management System or DBMS.

In the process, system interfaces account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share a common data model. If data models are developed on a system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within the organization.

Modeling methodologies

Data models represent information areas of interest. While there are many ways to create data models, according to Len Silverston (1997) only two modeling methodologies stand out, top-down and bottom-up:

Bottom-up models or View Integration models are often the result of a reengineering effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an enterprise perspective. They may not promote data sharing, especially if they are built without reference to other parts of the organization.
Top-down logical data models, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template.

Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments, the distinction between a logical data model and a physical data model is blurred. In addition, some CASE tools don't make a distinction between logical and physical data models.

Entity–relationship diagrams

There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the data. An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, often a relational database, and its requirements in a top-down fashion.

These models are being used in the first stage of information system design during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse i.e. the area of interest.

Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:

Generic data modeling

Generic data models are generalizations of conventional data models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of the generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a binary relation between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related.

Given an extensible list of classes, this allows the classification of any individual thing and to specification of part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.

Semantic data modeling

The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity.

Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., is symbolically defined by its description within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.

The purpose of semantic data modeling is to create a structural model of a piece of the real world, called "universe of discourse". For this, three fundamental structural relations are considered:

Classification/instantiation: Objects with some structural similarity are described as instances of classes
Aggregation/decomposition: Composed objects are obtained by joining their parts
Generalization/specialization: Distinct classes with some common properties are reconsidered in a more generic class with the common attributes

A semantic data model can be used to serve many purposes, such as:

Planning of data resources
Building of shareable databases
Evaluation of vendor software
Integration of existing databases

The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful abstraction concepts known from the artificial intelligence field. The idea is to provide high-level modeling primitives as integral parts of a data model in order to facilitate the representation of real-world situations.

Structural inequality in education

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Structural_inequality_in_education

Structural inequality has been identified as the bias that is built into the structure of organizations, institutions, governments, or social networks. Structural inequality occurs when the fabric of organizations, institutions, governments or social networks contains an embedded bias which provides advantages for some members and marginalizes or produces disadvantages for other members. This can involve property rights, status, or unequal access to health care, housing, education and other physical or financial resources or opportunities. Structural inequality is believed to be an embedded part of the culture of the United States due to the history of slavery and the subsequent suppression of equal civil rights of minority races. Structural inequality has been encouraged and maintained in the society of the United States through structured institutions such as the public school system with the goal of maintaining the existing structure of wealth, employment opportunities, and social standing of the races by keeping minority students from high academic achievement in high school and college as well as in the workforce of the country. In the attempt to equalize allocation of state funding, policymakers evaluate the elements of disparity to determine an equalization of funding throughout school districts.p.(14)

Policymakers have to determine a formula based on per-pupil revenue and the student need.p.(8) Critical race theory is part of the ongoing oppression of minorities in the public school system and the corporate workforce that limits academic and career success. The public school system maintains structural inequality through such practices as tracking of students, standardized assessment tests, and a teaching force that does not represent the diversity of the student body. Also see social inequality, educational inequality, racism, discrimination, and oppression. Social inequality occurs when certain groups in a society do not have equal social status. Aspects of social status involve property rights, voting rights, freedom of speech and freedom of assembly, access to health care, and education as well as many other social commodities.

Education: student tracking

Education is the base for equality. Specifically in the structuring of schools, the concept of tracking is believed by some scholars to create a social disparity in providing students an equal education. Schools have been found to have a unique acculturative process that helps to pattern self-perceptions and world views. Schools not only provide education but also a setting for students to develop into adults, form future social status and roles, and maintain social and organizational structures of society. Tracking is an educational term that indicates where students will be placed during their secondary school years.^[3] "Depending on how early students are separated into these tracks, determines the difficulty in changing from one track to another" (Grob, 2003, p. 202).

Tracking or sorting categorizes students into different groups based on standardized test scores. These groups or tracks are vocational, general, and academic. Students are sorted into groups that will determine educational and vocational outcomes for the future. The sorting that occurs in the educational system parallels the hierarchical social and economic structures in society. Thus, students are viewed and treated differently according to their individual track. Each track has a designed curriculum that is meant to fit the unique educational and social needs of each sorted group. Consequently, the information taught as well as the expectations of the teachers differ based on the track resulting in the creation of dissimilar classroom cultures.

Access to college

Not only the classes that students take, but the school they are enrolled in has been shown to have an effect on their educational success and social mobility, especially ability to graduate from college. Simply being enrolled in a school with less access to resources, or in an area with a high concentration of racial minorities, makes one much less likely to gain access to prestigious four-year colleges. For example, there are far fewer first time freshmen within the University of California (UC) system who graduate from schools where the majority population is an underrepresented racial minority group. Students from these schools comprise only 22.1% of the first time freshmen within the UC system, whereas students from majority white schools make up 65.3% of the first time freshman population. At more prestigious schools, like UC Berkeley, the division is even more pronounced. Only 15.2% of first time freshmen who attend the university came from schools with a high percentage of underrepresented minorities.

Issues of structural inequality are probably also at fault for the low numbers of students from underserved backgrounds graduating from college. Out of the entire population of low-income youth in the US, only 13% receive a bachelor's degree by the time they are 28. Students from racial minorities are similarly disadvantaged. Hispanic students are half as likely to attend college than white students and black students are 25% less likely. Despite increased attention and educational reform, this gap has increased in the past 30 years.

The costs required to attend college also contribute to the structural inequality in education. The higher educational system in the United States relies on public funding to support the universities. However, even with the public funding, policymakers have voiced their desire to have universities become less dependent on government funding and to compete for other sources of funding. The result of this could sway many students from low-income backgrounds from attending higher institutions due to the inability of paying to attend. In a 2013 study by the National Center for Educational Statistics, only 49% of students from low-income families that graduated from high school immediately enrolled into college. In comparison, students from high-income families had an 80%immediate college enrollment rate. Furthermore, in another 2013 report, over 58% of low-income families were minorities. In the Bill and Melinda Gates Foundation supported survey, researchers discovered that 6 in 10 students that dropped out was due to the inability to pay for the cost of attending themselves and without help from their families.

Access to technology

Gaps in the availability of technology, the digital divide, are gradually decreasing as more people purchase home computers and the ratio of students to computers within schools continues to decreases. However, inequities in access to technology still exist due to the lack of teacher training and, subsequently, confidence in use of technologic tools; the diverse needs of students; and administrative pressures to increase test scores. These inequities are noticeably different between high need (HN) and low need (LN) populations. In a survey of teachers participating in an e-Learning for Educators online professional development workshop, Chapman finds that HN schools need increased access and teacher training in technology resources. Though results vary in their level of significance, teachers of non-HN schools report more confidence in having adequate technical abilities to simply participate in the workshop; later surveys showed that teachers of HN schools report that "they use, or will use, technology in the classroom more after the workshop" less likely than that of teachers of non-HN schools. Additionally, teachers from HN schools report less access to technology as well as lower technical skills and abilities (p. 246). Even when teachers in low-SES schools had confidence in their technical skills, other they faced other obstacles, including larger numbers of English language learners and at-risk students, larger numbers of students with limited computer experience, and greater pressure to increase test scores and adhere to policy mandates.

Other structural inequalities in access to technology exist in differences in the ratio of students to computers within public schools. Correlations show that as the number of minorities enrolled in a school increase so, too, does the ratio of students to computers, 4.0:1 in schools with 50% or more minority enrollment versus 3.1 in schools with 6% or less minority enrollment (as cited in Warschauer, 2010, p. 188-189). Within school structures, low-socioeconomic status (SES) schools tended to have less stable teaching staff, administrative staff, and IT support staff, which contributed to teachers being less likely to incorporate technology in their curriculum for lack of support.

Disabilities

The challenge of the new millennium will include a realignment in focus to include "the curriculum as disabled, rather than students, their insights in translating principles of universal design, which originated in architecture, to education commensurate with advances characterized as a major paradigm shift."

According to the Individuals with Disabilities Education Act (IDEA), children with disabilities have the right to a free appropriate public education in the Least Restrictive Environment (LRE). The LRE means that children with disabilities must be educated in regular classrooms with their non-disabled peers with the appropriate supports and services.

An individual with a disability is also protected under American with Disabilities Act (ADA) which is defined as any person who has a physical or mental impairment that substantially limits one or more major life activities. Assistive technology which supports individuals with disabilities covering a wide range of areas from cognitive to physical limitations, plays an important role.

School finance

School finance is another area where social injustice and inequality might exist. Districts in wealthier areas typically receive more Average Daily Attendance (ADA) funds for total (e.g. restricted and unrestricted) expenditures per pupil than socio-economically disadvantaged districts, therefore, a wealthier school district will receive more funding than a socio-economically disadvantaged school district. "Most U.S. schools are underfunded. Schools in low wealth states and districts are especially hard hit, with inadequate instructional materials, little technology, unsafe buildings, and less-qualified teachers" (p. 31) The method in which funds are distributed or allocated within a school district can also be of concern. De facto segregation can occur in districts or educational organizations that passively promote racial segregation. Epstein (2006) stated the "Two years after the victorious Supreme Court decision against segregation, Oakland's"... "school board increased Oakland's segregation by spending $40 million from a bond election to build..." a "... High School, and then establishing a ten-mile long, two-mile wide attendance boundary, which effectively excluded almost every black and Latino student in the city" (p. 28).

History of state funding in U.S education

Since the early 19th Century policymakers have developed a plethora of educational programs, each with its own particular structural inequality. The mechanisms involved in the allocation of state funding have changed significantly over time. In the past, public schools were primarily funded by property taxes. Funding was supplemented by other state sources. In the early 19th century, policymakers recognized districts relying on property tax could lead to significant disparities in the amount of funding per student.

Thus, policymakers began to analyze the elements of disparity and sought means to address it, numbers of teachers, quality of facilities and materials. To address disparity some states implemented Flat Grants, which typically allocate funding based on the number of teachers. However, this often magnified the disparity, since wealthy communities would have fewer students per teacher.

In their attempt to reduce disparity policymakers in 1920 designed what they call the Foundation Program. The stated purpose was to equalize per-pupil revenue across districts. The goal was achieved by setting a target per-pupil revenue level, and the state supplying funding to equalize revenue in underserved districts. Some analyst characterized the program as a hoax because its structure allowed wealthier districts to exceed the target per-pupil revenue level.

Also in order to aid persons with categories of issues, policymakers designed Categorical Programs. The purpose of these programs are to target disparity in poor districts, which do not take into account district wealth. Overtime, policymakers began to allocate funding that takes into consideration pupil needs along with the wealth of the district.

Healthcare

An identified inequality that negatively affects health and wellness among minority races is highly correlated with income, wealth, social capital, and, indirectly, education. Researchers have been able to identify significant gaps that exist in mortality rates of African Americans and Caucasian Americans. There has not been significant changes in the major factors of income, wealth, social capital/psycho-social environment, and socioeconomic status, that positively impact the existing inequality. Studies have noted significant correlations between these factors and major health issues. For example, poor socioeconomic status is strongly correlated with cardiovascular disease.

Social inequalities

When discussing the issue of structural inequality we must also consider how hegemonic social structures can support institutional and technological inequalities. In the realm of education studies have suggested that the level of educational attainment for a parent will influence the levels of educational attainment for said parents child. The level of education which one receives also tends to be correlated with social capital, income, and criminal activity as well. These findings suggest that by simply being the child of someone who is well educated places the child in an advantageous position. This in turn means that the children of new migrants and other groups who have historically been less educated and have significantly less resources at their disposal will be less likely to achieve higher levels of education. Because education plays a role in income, social capital, criminal activity and even the educational attainment of others it becomes possible that a positive feedback loop where the lack of education will perpetuate itself throughout a social class or group.

The outcomes can be highly problematic at the K-12 level as well. Looking back to school funding we see that when the majority of funding has to come from local school districts and this leads to poorer districts being less adequately funded than wealthier districts. This means that the children who attend these schools which will struggle to provide a quality education with fewer students per teacher, less access to technology and tend to be unable to prepare students for selecting and attending college or university. When these students who were unprepared to attend higher education fail to do so they are less likely to encourage their own children to pursue higher education and more likely to be poorer. Then these individuals will live in traditionally poorer neighborhoods, thus sending their children to underfunded schools ill-prepared to gear students towards higher education and further perpetuate a cycle of poor districts and disadvantaged social groups.

Historical

The structural inequality of tracking in the educational system is the foundation of the inequalities instituted in other social and organizational structures. Tracking is a term in the educational vernacular that determines where students will be placed during their secondary school years. Traditionally, the most tracked subjects are math and English. Students are categorized into different groups based on their standardized test scores. Tracking is justified by the following four assumptions:

Students learn better in an academically equal group.
Positive self-attitudes are developed in homogenous groups, especially for slower students that do not have a high rate of ability differences.
Fair and accurate group placement is appropriate for future learning based on individual past performance and ability.
Homogenous groups ease the teaching process.

Race, ethnicity, and socio-economic class limits exposure to advanced academic knowledge thus limiting advanced educational opportunities. A disproportionate number of minority students are placed in low track courses. The content of low track courses are markedly different. Low and average track students typically have limited exposure to "high-status" academic material, thus, the possibility of academic achievement and subsequent success is significantly limited. The tracking phenomenon in schools tends to perpetuate prejudices, misconceptions, and inequalities of the poor and minority people in society. Schools provide both an education and a setting for students to develop into adults, form future societal roles, and maintain social and organizational structures of society. Tracking in the public educational system parallels the hierarchical social and economic structures in society. Schools have a unique acculturative process that helps to pattern self-perceptions and world views. The expectations of the teachers and information taught differ based on tracks. Thus, dissimilar classroom cultures, different dissemination of knowledge, and unequal education opportunities are created.

The cycle of academic tracking and oppression of minority races is dependent on the use of standardized testing. IQ tests are frequently the foundation that determines an individual's group placement. However, accuracy of IQ tests has been found by research to be flawed. Tests, by design, only indicate a student's placement along a high to low continuum and not their actual achievement. The tests have also been found to be culturally biased, therefore, language and experience differences affects test outcomes with lower-class and minority children consistently having lower scores. This leads to inaccurate judgements of students' abilities.

Standardized tests were developed by eugenicists to determine who would best fill societal roles and professions. Tests were originally designed to verify the intellectuals of British society. This original intent unconsciously began the sorting dynamic. Tests were used to assist societies to fill important roles. In America, standardized tests were designed to sort students based on responses to test questions that were and are racially biased. These tests do not factor in the experiential and cultural knowledge or general ability of the students. Students are placed in vocational, general, or academic tracks based on test scores. Students' futures are determined by tracks and they are viewed and treated differently according to their individual track. Tracks are hierarchical in nature and create, consciously for some and unconsciously for others, the damaging effects of labeling students as fast or slow; bright or special education; average or below average.

Corporate America has an interest in maintaining the use of standardized tests in public school systems thus protecting their potential future workforce that will be derived from the high-tracked, successful high income students by eliminating, through poor academic achievement, a disproportionate number of minority students. Also, standardized testing is big business. Although it is often argued that standardized testing is an economical method to evaluate students, the real cost is staggering, estimated at $20 billion annually in indirect and direct costs, an amount that does not factor in the social and emotional costs.

Standardized tests remain a frequently used and expected evaluative method for a variety of reasons. The American culture is interested in intelligence and potential. Standardized testing also provides an economic advantage to some stakeholders, such as prestigious universities, that use standardized test numbers as part of their marketing plan. Finally, standardized testing maintains the status quo of the established social system.

Teacher and counselor judgements have been shown to be just as inaccurate as standardized tests. Teachers and counselors may have a large number of students for which they are responsible for analyzing and making recommendations. Research has found that factors such as appearance, language, behavior, grooming, as well as academic potential, are all considered in the analysis and decision on group placement. This leads to a disproportionate number of lower and minority children placed unfairly into lower track groups.

Teacher diversity is limited by policies that create often-unattainable requirements for bilingual instructors. For example, bilingual instructors may be unable to pass basic educational skills tests because of the inability to write rapidly enough to complete the essay portions of the tests. Limiting resources, in the form of providing primarily English speaking teachers, for bilingual or English as a second language student, limits the learning simply by restricting dissemination of knowledge. Restructuring the educational system, as well as, encouraging prospective bilingual teachers are two of the ways to ensure diversity among the teaching workforce, increase the distribution of knowledge, and increase the potential and continued academic success of minority students.

Possible solutions to tracking and standardized testing:

Legal action against standardized test based on discrimination against poor and minority students based on precedent set in the state of Massachusetts.
Curricula designed as age, culture, and language appropriate.
Recruit and train a diverse and highly skilled, culturally competent teaching force.
Elimination of norm-referenced testing.
Community constructed and culture appropriate assessment tests..
Explore critical race theory within the educational system to identify how race and racism is a part of the structural inequality of the public school system.
Create alternative teacher education certification programs that allow teachers to work while earning credentials.

Search This Blog

Thursday, June 26, 2025

Features

Modes

Earth

Planets

Sky

Panoramas

Solar System

Local user content

Layer Manager

Use for amateur astronomy

Tours

Communities

Virtual observatory

Full dome planetarium support

Reception

Awards

Background

Methodology

Classification

Regression

Clustering

Anomaly detection

Time-series analysis

Wednesday, June 25, 2025

Foundations

Etymology

Early usage

Modern usage

Data science and data analysis

Cloud computing for data science

Ethical consideration in data science

Overview

Topics

Data models

Conceptual, logical and physical schemas

Data modeling process

Modeling methodologies

Entity–relationship diagrams

Generic data modeling

Semantic data modeling

Education: student tracking

Access to college

Access to technology

Disabilities

School finance

History of state funding in U.S education

Healthcare

Social inequalities

Historical