Search This Blog

Wednesday, June 25, 2025

Data science

From Wikipedia, the free encyclopedia
The existence of Comet NEOWISE (here depicted as a series of red dots) was discovered by analyzing astronomical survey data acquired by a space telescope, the Wide-field Infrared Survey Explorer.

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, structured, or unstructured data.

Data science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine). Data science is multifaceted and can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession.

Data science is "a concept to unify statistics, data analysis, informatics, and their related methods" to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science. Turing Award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational, and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

A data scientist is a professional who creates programming code and combines it with statistical knowledge to summarize data.

Foundations

Data science is an interdisciplinary field focused on extracting knowledge from typically large data sets and applying the knowledge from that data to solve problems in other application domains. The field encompasses preparing data for analysis, formulating data science problems, analyzing data, and summarizing these findings. As such, it incorporates skills from computer science, mathematics, data visualization, graphic design, communication, and business.

Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g., from images, text, sensors, transactions, customer information, etc.) and emphasizes prediction and action. Andrew Gelman of Columbia University has described statistics as a non-essential part of data science. Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data-science program. He describes data science as an applied field growing out of traditional statistics.

Etymology

Early usage

In 1962, John Tukey described a field he called "data analysis", which resembles modern data science. In 1985, in a lecture given to the Chinese Academy of Sciences in Beijing, C. F. Jeff Wu used the term "data science" for the first time as an alternative name for statistics. Later, attendees at a 1992 statistics symposium at the University of Montpellier  II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.

The term "data science" has been traced back to 1974, when Peter Naur proposed it as an alternative name to computer science. In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic. However, the definition was still in flux. After the 1985 lecture at the Chinese Academy of Sciences in Beijing, in 1997 C. F. Jeff Wu again suggested that statistics should be renamed data science. He reasoned that a new name would help statistics shed inaccurate stereotypes, such as being synonymous with accounting or limited to describing data. In 1998, Hayashi Chikio argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.

Modern usage

In 2012, technologists Thomas H. Davenport and DJ Patil declared "Data Scientist: The Sexiest Job of the 21st Century", a catchphrase that was picked up even by major-city newspapers like the New York Times and the Boston Globe. A decade later, they reaffirmed it, stating that "the job is more in demand than ever with employers".

The modern conception of data science as an independent discipline is sometimes attributed to William S. Cleveland. In 2014, the American Statistical Association's Section on Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and Data Science, reflecting the ascendant popularity of data science.

The professional title of "data scientist" has been attributed to DJ Patil and Jeff Hammerbacher in 2008. Though it was used by the National Science Board in their 2005 report "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century", it referred broadly to any key role in managing a digital data collection.

Data science and data analysis

summary statistics and scatterplots showing the Datasaurus dozen data set
Example for the usefulness of exploratory data analysis as demonstrated using the Datasaurus dozen data set
Data science is at the intersection of mathematics, computer science and domain expertise.

Data analysis typically involves working with structured datasets to answer specific questions or solve specific problems. This can involve tasks such as data cleaning and data visualization to summarize data and develop hypotheses about relationships between variables. Data analysts typically use statistical methods to test these hypotheses and draw conclusions from the data.

Data science involves working with larger datasets that often require advanced computational and statistical methods to analyze. Data scientists often work with unstructured data such as text or images and use machine learning algorithms to build predictive models. Data science often uses statistical analysis, data preprocessing, and supervised learning.

Cloud computing for data science

A cloud-based architecture for enabling big data analytics. Data flows from various sources, such as personal computers, laptops, and smart phones, through cloud services for processing and analysis, finally leading to various big data applications.

Cloud computing can offer access to large amounts of computational power and storage. In big data, where volumes of information are continually generated and processed, these platforms can be used to handle complex and resource-intensive analytical tasks.

Some distributed computing frameworks are designed to handle big data workloads. These frameworks can enable data scientists to process and analyze large datasets in parallel, which can reduce processing times.

Ethical consideration in data science

Data science involves collecting, processing, and analyzing data which often includes personal and sensitive information. Ethical concerns include potential privacy violations, bias perpetuation, and negative societal impacts.

Machine learning models can amplify existing biases present in training data, leading to discriminatory or unfair outcomes.

Data modeling

From Wikipedia, the free encyclopedia
The data modeling process. The figure illustrates the way data models are developed and used today . A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design.

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. It may be applied as part of broader Model-driven engineering (MDE) concept.

Overview

Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.

There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them.

Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:

  • to assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations and clients to understand and use an agreed-upon semi-formal model that encompasses the concepts of the organization and how they relate to one another
  • to manage data as a resource
  • to integrate information systems
  • to design databases/data warehouses (aka data repositories)

Data modelling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead, a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten et al. (2004) determined two types of data modelling:

  • Strategic data modelling: This is part of the creation of an information systems strategy, which defines an overall vision and architecture for information systems. Information technology engineering is a methodology that embraces this approach.
  • Data modelling during systems analysis: In systems analysis logical data models are created as part of the development of new databases.

Data modelling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modelling because a data model is eventually implemented in a database.

Topics

Data models

How data models deliver benefit.

Data models provide a framework for data to be used within information systems by providing specific definitions and formats. If a data model is used consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor.

Some common problems found in data models are:

  • Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces. So, business rules need to be implemented in a flexible way that does not result in complicated dependencies, rather the data model should be flexible enough so that changes in the business can be implemented within the data model in a relatively quick and efficient way.
  • Entity types are often not identified, or are identified incorrectly. This can lead to replication of data, data structure and functionality, together with the attendant costs of that duplication in development and maintenance. Therefore, data definitions should be made as explicit and easy to understand as possible to minimize misinterpretation and duplication.
  • Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25 and 70% of the cost of current systems. Required interfaces should be considered inherently while designing a data model, as a data model on its own would not be usable without interfaces within different systems.
  • Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data have not been standardised. To obtain optimal value from an implemented data model, it is very important to define standards that will ensure that data models will both meet business needs and be consistent.

Conceptual, logical and physical schemas

The ANSI/SPARC three-level architecture. This shows that a data model can be an external model (or view), a conceptual model, or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models.

In 1975 ANSI described three kinds of data-model instance:

  • Conceptual schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements.
  • Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same.
  • Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model.

Data modeling process

Data modeling in the context of business process integration.

In the context of business process integration (see figure), data modeling complements business process modeling, and ultimately results in database generation.

The process of designing a database involves producing the previously described three types of schemas – conceptual, logical, and physical. The database design documented in these schemas is converted through a Data Definition Language, which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the Database Management System or DBMS.

In the process, system interfaces account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share a common data model. If data models are developed on a system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within the organization.

Modeling methodologies

Data models represent information areas of interest. While there are many ways to create data models, according to Len Silverston (1997) only two modeling methodologies stand out, top-down and bottom-up:

  • Bottom-up models or View Integration models are often the result of a reengineering effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an enterprise perspective. They may not promote data sharing, especially if they are built without reference to other parts of the organization.
  • Top-down logical data models, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template.

Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments, the distinction between a logical data model and a physical data model is blurred. In addition, some CASE tools don't make a distinction between logical and physical data models.

Entity–relationship diagrams

Example of an IDEF1X entity–relationship diagrams used to model IDEF1X itself. The name of the view is mm. The domain hierarchy and constraints are also given. The constraints are expressed as sentences in the formal theory of the meta model.

There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the data. An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, often a relational database, and its requirements in a top-down fashion.

These models are being used in the first stage of information system design during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain universe of discourse i.e. the area of interest.

Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:

Generic data modeling

Example of a Generic data model.

Generic data models are generalizations of conventional data models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of the generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a binary relation between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related.

Given an extensible list of classes, this allows the classification of any individual thing and to specification of part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.

Semantic data modeling

The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity.

Semantic data models.

Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., is symbolically defined by its description within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.

The purpose of semantic data modeling is to create a structural model of a piece of the real world, called "universe of discourse". For this, three fundamental structural relations are considered:

  • Classification/instantiation: Objects with some structural similarity are described as instances of classes
  • Aggregation/decomposition: Composed objects are obtained by joining their parts
  • Generalization/specialization: Distinct classes with some common properties are reconsidered in a more generic class with the common attributes

A semantic data model can be used to serve many purposes, such as:

  • Planning of data resources
  • Building of shareable databases
  • Evaluation of vendor software
  • Integration of existing databases

The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful abstraction concepts known from the artificial intelligence field. The idea is to provide high-level modeling primitives as integral parts of a data model in order to facilitate the representation of real-world situations.

Structural inequality in education

Structural inequality has been identified as the bias that is built into the structure of organizations, institutions, governments, or social networks. Structural inequality occurs when the fabric of organizations, institutions, governments or social networks contains an embedded bias which provides advantages for some members and marginalizes or produces disadvantages for other members. This can involve property rights, status, or unequal access to health care, housing, education and other physical or financial resources or opportunities. Structural inequality is believed to be an embedded part of the culture of the United States due to the history of slavery and the subsequent suppression of equal civil rights of minority races. Structural inequality has been encouraged and maintained in the society of the United States through structured institutions such as the public school system with the goal of maintaining the existing structure of wealth, employment opportunities, and social standing of the races by keeping minority students from high academic achievement in high school and college as well as in the workforce of the country. In the attempt to equalize allocation of state funding, policymakers evaluate the elements of disparity to determine an equalization of funding throughout school districts.p.(14)

Policymakers have to determine a formula based on per-pupil revenue and the student need.p.(8) Critical race theory is part of the ongoing oppression of minorities in the public school system and the corporate workforce that limits academic and career success. The public school system maintains structural inequality through such practices as tracking of students, standardized assessment tests, and a teaching force that does not represent the diversity of the student body. Also see social inequality, educational inequality, racism, discrimination, and oppression. Social inequality occurs when certain groups in a society do not have equal social status. Aspects of social status involve property rights, voting rights, freedom of speech and freedom of assembly, access to health care, and education as well as many other social commodities.

Education: student tracking

Education is the base for equality. Specifically in the structuring of schools, the concept of tracking is believed by some scholars to create a social disparity in providing students an equal education. Schools have been found to have a unique acculturative process that helps to pattern self-perceptions and world views. Schools not only provide education but also a setting for students to develop into adults, form future social status and roles, and maintain social and organizational structures of society. Tracking is an educational term that indicates where students will be placed during their secondary school years.[3] "Depending on how early students are separated into these tracks, determines the difficulty in changing from one track to another" (Grob, 2003, p. 202).

Tracking or sorting categorizes students into different groups based on standardized test scores. These groups or tracks are vocational, general, and academic. Students are sorted into groups that will determine educational and vocational outcomes for the future. The sorting that occurs in the educational system parallels the hierarchical social and economic structures in society. Thus, students are viewed and treated differently according to their individual track. Each track has a designed curriculum that is meant to fit the unique educational and social needs of each sorted group. Consequently, the information taught as well as the expectations of the teachers differ based on the track resulting in the creation of dissimilar classroom cultures.

Access to college

Not only the classes that students take, but the school they are enrolled in has been shown to have an effect on their educational success and social mobility, especially ability to graduate from college. Simply being enrolled in a school with less access to resources, or in an area with a high concentration of racial minorities, makes one much less likely to gain access to prestigious four-year colleges. For example, there are far fewer first time freshmen within the University of California (UC) system who graduate from schools where the majority population is an underrepresented racial minority group. Students from these schools comprise only 22.1% of the first time freshmen within the UC system, whereas students from majority white schools make up 65.3% of the first time freshman population. At more prestigious schools, like UC Berkeley, the division is even more pronounced. Only 15.2% of first time freshmen who attend the university came from schools with a high percentage of underrepresented minorities.

Issues of structural inequality are probably also at fault for the low numbers of students from underserved backgrounds graduating from college. Out of the entire population of low-income youth in the US, only 13% receive a bachelor's degree by the time they are 28. Students from racial minorities are similarly disadvantaged. Hispanic students are half as likely to attend college than white students and black students are 25% less likely. Despite increased attention and educational reform, this gap has increased in the past 30 years.

The costs required to attend college also contribute to the structural inequality in education. The higher educational system in the United States relies on public funding to support the universities. However, even with the public funding, policymakers have voiced their desire to have universities become less dependent on government funding and to compete for other sources of funding. The result of this could sway many students from low-income backgrounds from attending higher institutions due to the inability of paying to attend. In a 2013 study by the National Center for Educational Statistics, only 49% of students from low-income families that graduated from high school immediately enrolled into college. In comparison, students from high-income families had an 80%immediate college enrollment rate. Furthermore, in another 2013 report, over 58% of low-income families were minorities. In the Bill and Melinda Gates Foundation supported survey, researchers discovered that 6 in 10 students that dropped out was due to the inability to pay for the cost of attending themselves and without help from their families.

Access to technology

Gaps in the availability of technology, the digital divide, are gradually decreasing as more people purchase home computers and the ratio of students to computers within schools continues to decreases. However, inequities in access to technology still exist due to the lack of teacher training and, subsequently, confidence in use of technologic tools; the diverse needs of students; and administrative pressures to increase test scores. These inequities are noticeably different between high need (HN) and low need (LN) populations. In a survey of teachers participating in an e-Learning for Educators online professional development workshop, Chapman finds that HN schools need increased access and teacher training in technology resources. Though results vary in their level of significance, teachers of non-HN schools report more confidence in having adequate technical abilities to simply participate in the workshop; later surveys showed that teachers of HN schools report that "they use, or will use, technology in the classroom more after the workshop" less likely than that of teachers of non-HN schools. Additionally, teachers from HN schools report less access to technology as well as lower technical skills and abilities (p. 246). Even when teachers in low-SES schools had confidence in their technical skills, other they faced other obstacles, including larger numbers of English language learners and at-risk students, larger numbers of students with limited computer experience, and greater pressure to increase test scores and adhere to policy mandates.

Other structural inequalities in access to technology exist in differences in the ratio of students to computers within public schools. Correlations show that as the number of minorities enrolled in a school increase so, too, does the ratio of students to computers, 4.0:1 in schools with 50% or more minority enrollment versus 3.1 in schools with 6% or less minority enrollment (as cited in Warschauer, 2010, p. 188-189). Within school structures, low-socioeconomic status (SES) schools tended to have less stable teaching staff, administrative staff, and IT support staff, which contributed to teachers being less likely to incorporate technology in their curriculum for lack of support.

Disabilities

The challenge of the new millennium will include a realignment in focus to include "the curriculum as disabled, rather than students, their insights in translating principles of universal design, which originated in architecture, to education commensurate with advances characterized as a major paradigm shift."

According to the Individuals with Disabilities Education Act (IDEA), children with disabilities have the right to a free appropriate public education in the Least Restrictive Environment (LRE). The LRE means that children with disabilities must be educated in regular classrooms with their non-disabled peers with the appropriate supports and services.

An individual with a disability is also protected under American with Disabilities Act (ADA) which is defined as any person who has a physical or mental impairment that substantially limits one or more major life activities. Assistive technology which supports individuals with disabilities covering a wide range of areas from cognitive to physical limitations, plays an important role.

School finance

School finance is another area where social injustice and inequality might exist. Districts in wealthier areas typically receive more Average Daily Attendance (ADA) funds for total (e.g. restricted and unrestricted) expenditures per pupil than socio-economically disadvantaged districts, therefore, a wealthier school district will receive more funding than a socio-economically disadvantaged school district. "Most U.S. schools are underfunded. Schools in low wealth states and districts are especially hard hit, with inadequate instructional materials, little technology, unsafe buildings, and less-qualified teachers" (p. 31) The method in which funds are distributed or allocated within a school district can also be of concern. De facto segregation can occur in districts or educational organizations that passively promote racial segregation. Epstein (2006) stated the "Two years after the victorious Supreme Court decision against segregation, Oakland's"... "school board increased Oakland's segregation by spending $40 million from a bond election to build..." a "... High School, and then establishing a ten-mile long, two-mile wide attendance boundary, which effectively excluded almost every black and Latino student in the city" (p. 28).

History of state funding in U.S education

Since the early 19th Century policymakers have developed a plethora of educational programs, each with its own particular structural inequality. The mechanisms involved in the allocation of state funding have changed significantly over time. In the past, public schools were primarily funded by property taxes. Funding was supplemented by other state sources. In the early 19th century, policymakers recognized districts relying on property tax could lead to significant disparities in the amount of funding per student.

Thus, policymakers began to analyze the elements of disparity and sought means to address it, numbers of teachers, quality of facilities and materials. To address disparity some states implemented Flat Grants, which typically allocate funding based on the number of teachers. However, this often magnified the disparity, since wealthy communities would have fewer students per teacher.

In their attempt to reduce disparity policymakers in 1920 designed what they call the Foundation Program. The stated purpose was to equalize per-pupil revenue across districts. The goal was achieved by setting a target per-pupil revenue level, and the state supplying funding to equalize revenue in underserved districts. Some analyst characterized the program as a hoax because its structure allowed wealthier districts to exceed the target per-pupil revenue level.

Also in order to aid persons with categories of issues, policymakers designed Categorical Programs. The purpose of these programs are to target disparity in poor districts, which do not take into account district wealth. Overtime, policymakers began to allocate funding that takes into consideration pupil needs along with the wealth of the district.

Healthcare

An identified inequality that negatively affects health and wellness among minority races is highly correlated with income, wealth, social capital, and, indirectly, education. Researchers have been able to identify significant gaps that exist in mortality rates of African Americans and Caucasian Americans. There has not been significant changes in the major factors of income, wealth, social capital/psycho-social environment, and socioeconomic status, that positively impact the existing inequality. Studies have noted significant correlations between these factors and major health issues. For example, poor socioeconomic status is strongly correlated with cardiovascular disease.

Social inequalities

When discussing the issue of structural inequality we must also consider how hegemonic social structures can support institutional and technological inequalities. In the realm of education studies have suggested that the level of educational attainment for a parent will influence the levels of educational attainment for said parents child. The level of education which one receives also tends to be correlated with social capital, income, and criminal activity as well. These findings suggest that by simply being the child of someone who is well educated places the child in an advantageous position. This in turn means that the children of new migrants and other groups who have historically been less educated and have significantly less resources at their disposal will be less likely to achieve higher levels of education. Because education plays a role in income, social capital, criminal activity and even the educational attainment of others it becomes possible that a positive feedback loop where the lack of education will perpetuate itself throughout a social class or group.

The outcomes can be highly problematic at the K-12 level as well. Looking back to school funding we see that when the majority of funding has to come from local school districts and this leads to poorer districts being less adequately funded than wealthier districts. This means that the children who attend these schools which will struggle to provide a quality education with fewer students per teacher, less access to technology and tend to be unable to prepare students for selecting and attending college or university. When these students who were unprepared to attend higher education fail to do so they are less likely to encourage their own children to pursue higher education and more likely to be poorer. Then these individuals will live in traditionally poorer neighborhoods, thus sending their children to underfunded schools ill-prepared to gear students towards higher education and further perpetuate a cycle of poor districts and disadvantaged social groups.

Historical

The structural inequality of tracking in the educational system is the foundation of the inequalities instituted in other social and organizational structures. Tracking is a term in the educational vernacular that determines where students will be placed during their secondary school years. Traditionally, the most tracked subjects are math and English. Students are categorized into different groups based on their standardized test scores. Tracking is justified by the following four assumptions:

  1. Students learn better in an academically equal group.
  2. Positive self-attitudes are developed in homogenous groups, especially for slower students that do not have a high rate of ability differences.
  3. Fair and accurate group placement is appropriate for future learning based on individual past performance and ability.
  4. Homogenous groups ease the teaching process.

Race, ethnicity, and socio-economic class limits exposure to advanced academic knowledge thus limiting advanced educational opportunities. A disproportionate number of minority students are placed in low track courses. The content of low track courses are markedly different. Low and average track students typically have limited exposure to "high-status" academic material, thus, the possibility of academic achievement and subsequent success is significantly limited. The tracking phenomenon in schools tends to perpetuate prejudices, misconceptions, and inequalities of the poor and minority people in society. Schools provide both an education and a setting for students to develop into adults, form future societal roles, and maintain social and organizational structures of society. Tracking in the public educational system parallels the hierarchical social and economic structures in society. Schools have a unique acculturative process that helps to pattern self-perceptions and world views. The expectations of the teachers and information taught differ based on tracks. Thus, dissimilar classroom cultures, different dissemination of knowledge, and unequal education opportunities are created.

The cycle of academic tracking and oppression of minority races is dependent on the use of standardized testing. IQ tests are frequently the foundation that determines an individual's group placement. However, accuracy of IQ tests has been found by research to be flawed. Tests, by design, only indicate a student's placement along a high to low continuum and not their actual achievement. The tests have also been found to be culturally biased, therefore, language and experience differences affects test outcomes with lower-class and minority children consistently having lower scores. This leads to inaccurate judgements of students' abilities.

Standardized tests were developed by eugenicists to determine who would best fill societal roles and professions. Tests were originally designed to verify the intellectuals of British society. This original intent unconsciously began the sorting dynamic. Tests were used to assist societies to fill important roles. In America, standardized tests were designed to sort students based on responses to test questions that were and are racially biased. These tests do not factor in the experiential and cultural knowledge or general ability of the students. Students are placed in vocational, general, or academic tracks based on test scores. Students' futures are determined by tracks and they are viewed and treated differently according to their individual track. Tracks are hierarchical in nature and create, consciously for some and unconsciously for others, the damaging effects of labeling students as fast or slow; bright or special education; average or below average.

Corporate America has an interest in maintaining the use of standardized tests in public school systems thus protecting their potential future workforce that will be derived from the high-tracked, successful high income students by eliminating, through poor academic achievement, a disproportionate number of minority students. Also, standardized testing is big business. Although it is often argued that standardized testing is an economical method to evaluate students, the real cost is staggering, estimated at $20 billion annually in indirect and direct costs, an amount that does not factor in the social and emotional costs.

Standardized tests remain a frequently used and expected evaluative method for a variety of reasons. The American culture is interested in intelligence and potential. Standardized testing also provides an economic advantage to some stakeholders, such as prestigious universities, that use standardized test numbers as part of their marketing plan. Finally, standardized testing maintains the status quo of the established social system.

Teacher and counselor judgements have been shown to be just as inaccurate as standardized tests. Teachers and counselors may have a large number of students for which they are responsible for analyzing and making recommendations. Research has found that factors such as appearance, language, behavior, grooming, as well as academic potential, are all considered in the analysis and decision on group placement. This leads to a disproportionate number of lower and minority children placed unfairly into lower track groups.

Teacher diversity is limited by policies that create often-unattainable requirements for bilingual instructors. For example, bilingual instructors may be unable to pass basic educational skills tests because of the inability to write rapidly enough to complete the essay portions of the tests. Limiting resources, in the form of providing primarily English speaking teachers, for bilingual or English as a second language student, limits the learning simply by restricting dissemination of knowledge. Restructuring the educational system, as well as, encouraging prospective bilingual teachers are two of the ways to ensure diversity among the teaching workforce, increase the distribution of knowledge, and increase the potential and continued academic success of minority students.

Possible solutions to tracking and standardized testing:

  • Legal action against standardized test based on discrimination against poor and minority students based on precedent set in the state of Massachusetts.
  • Curricula designed as age, culture, and language appropriate.
  • Recruit and train a diverse and highly skilled, culturally competent teaching force.
  • Elimination of norm-referenced testing.
  • Community constructed and culture appropriate assessment tests..
  • Explore critical race theory within the educational system to identify how race and racism is a part of the structural inequality of the public school system.
  • Create alternative teacher education certification programs that allow teachers to work while earning credentials.

Diversity in computing

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Diversity_in_computing

Diversity in computing refers to the representation and inclusion of underrepresented groups, such as women, people of color, individuals with disabilities, and LGBTQ+ individuals, in the field of computing. The computing sector, like other STEM fields, lacks diversity in the United States.

Despite women constituting around half of the U.S. population they still are not properly represented in the computing sector. Racial minorities, such as African Americans, Hispanics, and American Indians or Alaska Natives, also remain significantly underrepresented in the computing sector.

Two issues that cause the lack of diversity are:

  1. Pipeline: the lack of early access to resources
  2. Culture: exclusivity and discrimination in the workplace

The lack of diversity can also be attributed to limited early exposure to resources, as students who do not already have computer skills upon entering college are at a disadvantage in computing majors. There is also the issue of discrimination and harassment faced in the workplace which affects all underrepresented groups. For example, studies have shown that 50% of women reported experiencing sexual harassment in tech companies.

As technology is becoming omnipresent, diversity in the tech field could help institutions reduce inequalities in society. To make the field more diverse, organizations need to address both issues. There are multiple organizations and initiatives which are working towards increasing diversity in computing by providing resources, mentorship, support, and fostering a sense of belonging for minority groups such as EarSketch, Black Girls Code, and ColorStack. Institutions are also implementing strategies such as Summer Bridge programs, tutoring, academic advising, financial support, and curriculum reform to support diversity in STEM. Along with Institutions Educators can help cultivate a sense of confidence in underrepresented students interested in pursuing computing, such as emphasizing a growth mindset, rejecting the idea that some individuals have innate talent, and establishing inclusive learning environments.

Statistics

In 2019, women represented 50.8% of the total population of the United States, but made up only 25.6% of computer and mathematical occupations and 27% of computer and information systems manager occupations. African Americans represented 13.4% of the population, but held 8.4% of computer and mathematical occupations. Hispanic or Latino people made up 18.3% of the population, but constituted only 7.5% of the people in these jobs. Meanwhile, white people, standing at 60.4%-76.5% of the population of the United States, represented 67% of computer and mathematical occupations and 77% of computer and information systems manager occupations. Asians, representing 5.9% of the population, held 22% of computer and mathematical jobs and were 14.3% of all computer and information systems managers.

In 2021, women made up 51% of the total population aged 18 to 74 years old, yet only accounted for 35% of STEM occupations. Additionally, while individuals with disabilities made up 9% of the population, they accounted for 3% of STEM occupations. Hispanics, Blacks, and American Indians or Alaska Natives collectively only accounted for 24% of STEM occupations in 2021 while making up 31% of the total population.

In addition to occupational disparities, there are differences in representation in postsecondary science and engineering education. Women earning associate's or bachelor's degrees in science and engineering accounted for approximately half of the total number of degrees in 2020, which was proportional to their share of the population for the age range of 18 – 34 years. In contrast, women only accounted for 46% of science and engineering master's degrees and 41% of science and engineering doctoral degrees. Hispanics, Blacks, and American Indians or Alaska Natives as a group face a similar gap between their share of the population and proportion of degrees earned, with them collectively making up 37% of the college age population in 2021, yet only 26% of bachelor's degrees in science and engineering, 24% of master's degrees in science and engineering, and 16% of doctoral degrees in science and engineering awarded in 2020. On top of the degree gap, data indicates that only 38% of women who major in computer science actually end up working in the computer science field, in contrast to 53% of men.

A 2021 report indicates that approximately 57% of women working in tech responded that have experienced gender discrimination in the workplace in contrast with men, where approximately only 10% reported experiencing gender discrimination. Additionally, 48% of women reported experiencing discrimination over their technical abilities in contrast with only 24% of men reporting the same discrimination. The report also found that 48% of Black respondents indicated that they experienced racial discrimination in the tech workplace. Hispanic respondents followed at 30%, Asian/Pacific Islanders responded at 25%, Asian Indians responded at 23%, and White respondents followed them at 9%.

In a 2022 survey available on Stack Overflow, approximately 2% of all respondents identified either "in their own words" or "transgender." On top of that, approximately 16% of all respondents identified using an option other than "Straight/Heterosexual." Additionally, 10.6% of respondents identified as having a concentration and/or memory disorder, 10.3% identified as having an anxiety disorder, and 9.7% as having a mood or emotional disorder.

When it comes to career mobility, a 2022 report found that there is a gap in promotions given in the tech industry to women in comparison to men. The report found that for every 100 men promoted to manager, only 52 women were given the same promotion.

Factors contributing to underrepresentation

There are two reported reasons for the lack of participation of women and minorities in the computing sector. The first reason is the lack of early exposure to resources like computers, internet connections and experiences such as computer courses. Research shows that the digital divide acts as a factor; students who do not already have computer skills upon entering college are at a disadvantage in computing majors, and access to computers is influenced by demographics, such as ethnic background. The problem of lack of resources is compounded with lack of exposure to courses and information that can lead to a successful computing career. A survey of students at University of Maryland Eastern Shore and Howard University, two historically black universities, found that the majority of students were not "counseled about computer related careers" either before or during college. The same study (this time only surveying UMES students) found that fewer women than men had learned about computers and programming in high school. The researchers have concluded that these factors could contribute to lower numbers of women and minorities choosing to pursue computing degrees.

Another reported issue that leads to the homogeneity of the computing sector is the cultural issue of discrimination at the workplace and how minorities are treated. For participants to excel in a tech-related course or career, their sense of belonging matters more than pre-gained knowledge. That was reflected in “The Great Resignation” that took place in the US during the COVID-19 pandemic. In a survey of 2,030 workers between the ages of 18 and 28 conducted in July 2021, the company found that 50% said they had left or wanted to their leave tech or IT job “because the company culture made them feel unwelcome or uncomfortable,” with a higher percentage of women and Asian, Black, and Hispanic respondents each saying they had such an experience. In most cases, the workplaces not only lack a sense of belonging but are also unsafe. Research conducted by Dice, a tech career hub, showed that more than 50% of women faced sexual harassment in tech companies. A pilot program that was done to understand different elements that affect minorities during a STEM course showed that increased mentorship and support was an important factor for the completion of the course.  

One of the biggest factors halting the increase of diversity in STEM education is awareness. Many experts feel that increasing awareness is a strong first step towards enacting change at a higher level. One of the most common outreach methods are on campus workshops at colleges. These workshops are effective because they instill awareness into people who are just coming into the field and learning about the field to foster inclusivity. Students leaving a workshop at a West Virginia university reported that they were unaware of the problems facing diverse people in STEM, particularly people with disabilities.

Effects on different groups

Black People

Gaming

Black gamers are put into unique positions when it comes to entering spaces of gaming, for when they are represented incorrectly whilst constantly at risk of being harassed for a wide variety of reasons. Whenever they are represented, which is not as often as is what occurs in the real world, it typically comes at the price of being stereotyped into typically two categories: being an athlete, a criminal, or both. If they decide to call out these issues, there is typically heavy backlash for their actions. One such example comes from The Sims community. When its black player base call out issues about various hair texture representations, enter Sims community spaces, or see storylines about black sims members, they typically faced racial attacks, microagressions, or see storylines of characters that looked like them that were based on prevalent stereotypes of black people. The solution to their issues did not come from the creators, but rather groups of black Sims players coming together to make their own spaces in order to have somewhere to go to. Moreover, Black content creators have a unique space within the gaming world: they need to maintain a level of being black that allows people to be comfortable with watching their content, but in creating who they are as creators, they are inherently creating spaces for racialized comments against them that fills their comment sections. Moreover, whenever they do ask for bigger changes, companies take on a race-blind approach to ignoring the problems within the communities they are allowing to exist. When black people are included, it’s mostly because the games being played are inherently included in African American culture, and often considered “diversity nights” for black creators.[25]

Artificial Intelligence

The issues that lie dormant within the training data of large language models such as ChatGPT can be seen through how it sees black people. Former Google AI Ethicist Timnit Gebru had her time end at Google due to complications over a paper that described the issues of some AI Ethicists: its carbon impact is an issue that could create many issues very soon, greater datasets would lead to complications with currently insensitive vocabulary that was utilized in earlier days of the internet, and the amount of effort it takes to train the model again if something were to fail. There has already been clear evidence that AI models have latent biases that claim that white men are the best scientists. When this was discovered, OpenAI quickly created a block for questions that directly pertained to race, rather than fixing the issue at hand. Something else is the idea of beauty: when creating a supposedly unbiased judge for a beauty contest, BeautyAI asked for submissions from throughout the world, and within its 44 winners of the contest, 38 were white, and 1 finalist had an obvious darker skin tone. These submissions also were used in a manner of gleaning information about health factors affecting the users, and the fact that "healthy" people were put further to the front implies to the AI model that those who are darker skin toned are generally less healthy. Within both of these models, there exists training data that inherently has been given data that presents biases against people of color. A lack of representation within the spaces of developing these models creates an underlying issue of a lack of consideration for more people to be included. If the people that initial testing is done on are coworkers, it is possible that these models from the beginning are untested on all scenarios.

Surveillance

Black and Latinx communities have frequently been the targets of new surveillance and risk assessment technologies that have brought more arrest to these communities. The police have utilized tools to target communities of color for decades. One of the earliest examples of this occurring within the borders United States itself was directly after attacks on the Twin Towers. The New York Police Department used community leaders, taxi drivers, and extensive databases that managed to find ways of connecting people together in order to find more potential terrorists that lived within the United States. This has mostly been done through a program called CompStat, and many precincts have been encouraged to do the same because of its ability to find high crime areas and put more police in areas where they believe crime will happen, leading to even more arrests. In time, this has created systems in which entire states have attempted to create gang databases that have been based on risk assessments, but in turn created situations where children less than a year old were determined to be "self identified gang members". This creates a sense of both confusion and distrust amongst those within these communities, and in turn could lead to even more violence and arrests. These programs have been used throughout the United States such as Boston, Massachusetts, Salina, California, and, most clearly, Camden, New Jersey. Outside of specifically Boston, most of these places have not provided social services to those who are a part of these cycles of violence. Rather, they prefer to put them into prison. This cycle is a positive feedback loop for the computers, and does not help these communities.

Social Media

Africans throughout the world have a much higher risk of harassment through the internet:

  1. The two countries with the highest levels of cyberbullying reports came from Kenya and Nigeria, with around 70% of all users claiming to have received hate throughout their time using the internet.
  2. Tweets that have discriminatory ideals within them are linked to rates of hate crimes within the area that the Tweet was made.
  3. Black People are more likely to report the attacks they received throughout the internet are mostly based on their race.

There is an inherent tie to being black within the internet and also receiving racially-charged hatred. Moreover, because of the lax nature of many popular social media sites (such as Twitter), there exists many ways in which white nationalists can come together to spread hatred through large hate waves that target people of color, and most especially black women.

Increasing diversity

Institutions working to improve diversity in the computing sector are focusing on increasing access to resources and building a sense of belonging for minorities. One organization working toward this goal is EarSketch, an educational coding program that allows users to produce music by coding in JavaScript and Python. Its aim is to spark interest in programming and computer science for a wider range of students and "to attract different demographics, especially girls." The nonprofit Black Girls Code is working to encourage and empower black girls and girls of color to enter the world of computing by teaching them how to code. The American nonprofit for minority students in computer science, ColorStack, also works towards a similar goals, using mentorship-based operations and hosting multiple in-person and virtual support programs as a means of doing so. Another way to widen access to resources is by increasing equality in access to computers. Students who use computers in school settings are more likely to use them outside the classroom, so bringing computers into the classroom improves students' computer literacy.

Those who work in the field of education, primarily educators, have a significant impact on how students perceive the fields of engineering and computing, as well as their own capabilities within these fields. According to the American Association of University Women (AAUW), there are several things that teachers can do to cultivate a sense of confidence in underrepresented individuals interested in pursuing an education or career in the field of computing. Some of these things that educators can do are:

  1. Emphasize that engineering skills and abilities can be acquired through learning. In other words, emphasize the idea of a growth mindset.
  2. Portray obstacles and challenges as universal experiences, rather than indicators of unsuitability for engineering or computing.
  3. Increase accessibility to computing for people from diverse backgrounds and reject the notion that some individuals are inherently better suited to the field.
  4. Highlight the varied and extensive applications of engineering and computing.
  5. Establish inclusive environments for girls in math, science, engineering, and computing where they're encouraged to tinker with technology and develop confidence in their programming and design skills.

Another way for educators to affect change and help to resolve the problem is through certain intervention methods that have shown to have a positive impact on the issue. These can be implemented by institutions rather than individuals and have shown a lot of promise. Of these there are ten that have been heavily researched and are as follows:

  1. Summer Bridge: Summer bridge programs are meant to help students from low income families transition to college life and take place between the end of a prospective student's senior year of high school and freshman year of college. Summer bridge programs are meant to help students adjust and get ahead in their college lives.
  2. Mentoring: In this program each student must take a mentor that they can trust to help them when they find themselves struggling while also promoting individual successes.
  3. Research Experience: Students participate in research on or off campus during their time as an undergraduate. This has been found to greatly increase a student's likelihood of pursuing a graduate degree compared to students who do not participate in research.
  4. Tutoring: One of the most common academic intervention methods a student seeks out a knowledgeable individual to provide extra instruction and practice.
  5. Career Counseling and Awareness: Having a connection to someone in the field that a student is trying to join is extremely important. If an institution can help to connect students with someone in their prospective career it causes a higher likelihood of that student staying in that field.
  6. Learning Center: An on campus learning center is a place where students can go to learn skills that will help them succeed in school in general. Topics like study skills and note taking skills are taught free of charge.
  7. Workshops and Seminars: Short Classes and meetings on campus that focus on skills or research work from professors at other universities who are visiting. Workshops can be used to learn knowledge that is outside of the curriculum.
  8. Academic Advising: Higher Quality academic advising is a large factor in increasing student retention. If students feel adequately supported and are paced correctly throughout their experience they are much more likely to finish their degree.
  9. Financial Support: Giving financial aid to students through merit scholarships or other outside scholarship opportunities has been found to increase retention rates among Students.
  10. Curriculum and Instructional Reform: Find and isolate areas of the program that are meant to “weed out” students and refactor them to be challenging but rewarding.

These methods on their own are not enough to adequately increase the diversity of the talent pool but have shown promise as potential solutions. They can be most effective when used in an integrated manner, meaning the more that are studied and utilized the closer to a solution STEM educators will be.

Since workplace discrimination causes lack of diversity in STEM, changing that would increase diversity in the sector. Big tech companies like Microsoft and Facebook are publishing diversity reports and investing in programs to make their companies more diverse.

Additionally, while companies dedicating resources to initiatives designed to promote diversity within their workplaces is a great start, there is more that tech companies can do. The AAUW published a set of proposals for STEM employers to adopt, aimed at enhancing diversity within their organizations:

  1. Sustain effective management practices that are equitable, consistent, and promote a healthy work environment.
  2. Administer and advocate for diversity and affirmative action policies.
  3. Minimize the detrimental effects of gender bias.
  4. Foster a sense of inclusion and belonging.
  5. Allow employees the opportunity to work on projects or initiatives that have social significance.

Data science

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Data_scie...