Search This Blog

Tuesday, February 11, 2025

Data warehouse

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Data_warehouse
Data Warehouse and Data-Marts overview
Data Warehouse and Data mart overview, with Data Marts shown in the top right.

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is a core component of business intelligence. Data warehouses are central repositories of data integrated from disparate sources. They store current and historical data organized so as to make it easy to create reports, query and get insights from the data. Unlike databases, they are intended to be used by analysts and managers to help make organizational decisions.

The basic architecture of a data warehouse

The data stored in the warehouse is uploaded from operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the data warehouse for reporting.

The two main approaches for building a data warehouse system are extract, transform, load (ETL) and extract, load, transform (ELT).

Components

The environment for data warehouses and marts includes the following:

  • Source systems of data (often, the company's operational databases, such as relational databases);
  • Data integration technology and processes to extract data from source systems, transform them, and load them into a data mart or warehouse;
  • Architectures to store data in the warehouse or marts;
  • Tools and applications for varied users;
  • Metadata, data quality, and governance processes. Metadata includes data sources (database, table, and column names), refresh schedules and data usage measures.

Operational databases

Operational databases are optimized for the preservation of data integrity and speed of recording of business transactions through use of database normalization and an entity–relationship model. Operational system designers generally follow Codd's 12 rules of database normalization to ensure data integrity. Fully normalized database designs (that is, those satisfying all Codd rules) often result in information from a business transaction being stored in dozens to hundreds of tables. Relational databases are efficient at managing the relationships between these tables. The databases have very fast insert/update performance because only a small amount of data in those tables is affected by each transaction. To improve performance, older data are periodically purged.

Data warehouses are optimized for analytic access patterns, which usually involve selecting specific fields rather than all fields as is common in operational databases. Because of these differences in access, operational databases (loosely, OLTP) benefit from the use of a row-oriented database management system (DBMS), whereas analytics databases (loosely, OLAP) benefit from the use of a column-oriented DBMS. Operational systems maintain a snapshot of the business, while warehouses maintain historic data through ETL processes that periodically migrate data from the operational systems to the warehouse.

Online analytical processing (OLAP) is characterized by a low rate of transactions and complex queries that involve aggregations. Response time is an effective performance measure of OLAP systems. OLAP applications are widely used for data mining. OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP systems typically have a data latency of a few hours, while data mart latency is closer to one day. The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives. The three basic operations in OLAP are roll-up (consolidation), drill-down, and slicing & dicing.

Online transaction processing (OLTP) is characterized by a large numbers of short online transactions (INSERT, UPDATE, DELETE). OLTP systems emphasize fast query processing and maintaining data integrity in multi-access environments. For OLTP systems, performance is the number of transactions per second. OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model (usually 3NF). Normalization is the norm for data modeling techniques in this system.

Predictive analytics is about finding and quantifying hidden patterns in the data using complex mathematical models and to predict future outcomes. By contrast, OLAP focuses on historical data analysis and is reactive. Predictive systems are also used for customer relationship management (CRM).

Data marts

A data mart is a simple data warehouse focused on a single subject or functional area. Hence it draws data from a limited number of sources such as sales, finance or marketing. Data marts are often built and controlled by a single department in an organization. The sources could be internal operational systems, a central data warehouse, or external data. As with warehouses, stored data is usually not normalized.

Difference between data warehouse and data mart
Attribute Data warehouse Data mart
Scope of the data enterprise department
Number of subject areas multiple single
How difficult to build difficult easy
Memory required larger limited

Types of data marts include dependent, independent, and hybrid data marts.

Variants

ETL

The typical extract, transform, load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates disparate data sets by transforming the data from the staging layer, often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.

The main source of the data is cleansed, transformed, catalogued, and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform, and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition of data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

ELT

ELT-based data warehouse architecture

ELT-based data warehousing gets rid of a separate ETL tool for data transformation. Instead, it maintains a staging area inside the data warehouse itself. In this approach, data gets extracted from heterogeneous source systems and are then directly loaded into the data warehouse, before any transformation occurs. All necessary transformations are then handled inside the data warehouse itself. Finally, the manipulated data gets loaded into target tables in the same data warehouse.

Benefits

A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:

  • Integrate data from multiple sources into a single database and data model. More congregation of data to single database so a single query engine can be used to present data in an operational data store.
  • Mitigate the problem of isolation-level lock contention in transaction processing systems caused by long-running analysis queries in transaction processing databases.
  • Maintain data history, even if the source transaction systems do not.
  • Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization grows via merging.
  • Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
  • Present the organization's information consistently.
  • Provide a single common data model for all data of interest regardless of data source.
  • Restructure the data so that it makes sense to the business users.
  • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
  • Add value to operational business applications, notably customer relationship management (CRM) systems.
  • Make decision–support queries easier to write.
  • Organize and disambiguate repetitive data.

History

The concept of data warehousing dates back to the late 1980s[7] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations, it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to as legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that was tailored for ready access by users.

Additionally, with the publication of The IRM Imperative (Wiley & Sons, 1991) by James M. Kerr, the idea of managing and putting a dollar value on an organization's data resources and then reporting that value as an asset on a balance sheet became popular. In the book, Kerr described a way to populate subject-area databases from data derived from transaction-driven systems to create a storage area where summary data could be further leveraged to inform executive decision-making. This concept served to promote further thinking of how a data warehouse could be developed and managed in a practical way within any enterprise.

Key developments in early years of data warehousing:

  • 1960s – General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
  • 1970s – ACNielsen and IRI provide dimensional data marts for retail sales.
  • 1970s – Bill Inmon begins to define and discuss the term Data Warehouse.
  • 1975 – Sperry Univac introduces MAPPER (MAintain, Prepare, and Produce Executive Reports), a database management and reporting system that includes the world's first 4GL. It is the first platform designed for building Information Centers (a forerunner of contemporary data warehouse technology).
  • 1983 – Teradata introduces the DBC/1012 database computer specifically designed for decision support.
  • 1984 – Metaphor Computer Systems, founded by David Liddle and Don Massaro, releases a hardware/software package and GUI for business users to create a database management and analytic system.
  • 1988 – Barry Devlin and Paul Murphy publish the article "An architecture for a business and information system" where they introduce the term "business data warehouse".
  • 1990 – Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing.
  • 1991 – James M. Kerr authors The IRM Imperative, which suggests data resources could be reported as an asset on a balance sheet, furthering commercial interest in the establishment of data warehouses.
  • 1991 – Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.
  • 1992 – Bill Inmon publishes the book Building the Data Warehouse.
  • 1995 – The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
  • 1996 – Ralph Kimball publishes the book The Data Warehouse Toolkit.
  • 1998 – Focal modeling is implemented as an ensemble (hybrid) data warehouse modeling approach, with Patrik Lager as one of the main drivers.
  • 2000 – Dan Linstedt releases in the public domain the Data vault modeling, conceived in 1990 as an alternative to Inmon and Kimball to provide long-term historical storage of data coming in from multiple operational systems, with emphasis on tracing, auditing and resilience to change of the source data model.
  • 2008 – Bill Inmon, along with Derek Strauss and Genia Neushloss, publishes "DW 2.0: The Architecture for the Next Generation of Data Warehousing", explaining his top-down approach to data warehousing and coining the term, data-warehousing 2.0
  • 2008 – Anchor modeling was formalized in a paper presented at the International Conference on Conceptual Modeling, and won the best paper award.
  • 2012 – Bill Inmon develops and makes public technology known as "textual disambiguation". Textual disambiguation applies context to raw text and reformats the raw text and context into a standard data base format. Once raw text is passed through textual disambiguation, it can easily and efficiently be accessed and analyzed by standard business intelligence technology. Textual disambiguation is accomplished through the execution of textual ETL. Textual disambiguation is useful wherever raw text is found, such as in documents, Hadoop, email, and so forth.
  • 2013 – Data vault 2.0 was released, having some minor changes to the modeling method, as well as integration with best practices from other methodologies, architectures and implementations including agile and CMMI principles.

Data organization

Facts

A fact is a value or measurement in the system being managed.

Raw facts are ones reported by the reporting entity. For example, in a mobile telephone system, if a base transceiver station (BTS) receives 1,000 requests for traffic channel allocation, allocates for 820, and rejects the rest, it could report three facts to a management system:

  • tch_req_total = 1000
  • tch_req_success = 820
  • tch_req_fail = 180

Raw facts are aggregated to higher levels in various dimensions to extract information more relevant to the service or business. These are called aggregated facts or summaries.

For example, if there are three BTSs in a city, then the facts above can be aggregated to the city level in the network dimension. For example:

  • tch_req_success_city = tch_req_success_bts1 + tch_req_success_bts2 + tch_req_success_bts3
  • avg_tch_req_success_city = (tch_req_success_bts1 + tch_req_success_bts2 + tch_req_success_bts3) / 3

Dimensional versus normalized approach for storage of data

The two most important approaches to store data in a warehouse are dimensional and normalized. The dimensional approach uses a star schema as proposed by Ralph Kimball. The normalized approach, also called the third normal form (3NF) is an entity-relational normalized model proposed by Bill Inmon.

Dimensional approach

In a dimensional approach, transaction data is partitioned into "facts", which are usually numeric transaction data, and "dimensions", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the total price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.

This dimensional approach makes data easier to understand and speeds up data retrieval. Dimensional structures are easy for business users to understand because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organization's business processes and operational system, and dimensions are the context about them (Kimball, Ralph 2008). Another advantage is that the dimensional model does not involve a relational database every time. Thus, this type of modeling technique is very useful for end-user queries in data warehouse.

The model of facts and dimensions can also be understood as a data cube, where dimensions are the categorical coordinates in a multi-dimensional cube, the fact is a value corresponding to the coordinates.

The main disadvantages of the dimensional approach are:

  1. It is complicated to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems
  2. It is difficult to modify the warehouse structure if the organization changes the way it does business.

Normalized approach

In the normalized approach, the data in the warehouse are stored following, to a degree, database normalization rules. Normalized relational database tables are grouped into subject areas (for example, customers, products and finance). When used in large enterprises, the result is dozens of tables linked by a web of joins.(Kimball, Ralph 2008).

The main advantage of this approach is that it is straightforward to add information into the database. Disadvantages include that, because of the large number of tables, it can be difficult for users to join data from different sources into meaningful information and access the information without a precise understanding of the date sources and the data structure of the data warehouse.

Both normalized and dimensional models can be represented in entity–relationship diagrams because both contain joined relational tables. The difference between them is the degree of normalization. These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree (Kimball, Ralph 2008).

In Information-Driven Business, Robert Hillard compares the two approaches based on the information needs of the business problem. He concludes that normalized models hold far more information than their dimensional equivalents (even when the same fields are used in both models) but at the cost of usability. The technique measures information quantity in terms of information entropy and usability in terms of the Small Worlds data transformation measure.

Design methods

Bottom-up design

In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. These data marts can then be integrated to create a comprehensive data warehouse. The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts.

Top-down design

The top-down approach is designed using a normalized enterprise data model. "Atomic" data, that is, data at the greatest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse.

Hybrid design

Data warehouses often resemble the hub and spokes architecture. Legacy systems feeding the warehouse often include customer relationship management and enterprise resource planning, generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load process, data warehouses often make use of an operational data store, the information from which is parsed into the actual data warehouse. To reduce data redundancy, larger systems often store the data in a normalized way. Data marts for specific reports can then be built on top of the data warehouse.

A hybrid (also called ensemble) data warehouse database is kept on third normal form to eliminate data redundancy. A normal relational database, however, is not efficient for business intelligence reports where dimensional modelling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required. The data warehouse provides a single source of information from which the data marts can read, providing a wide range of business information. The hybrid architecture allows a data warehouse to be replaced with a master data management repository where operational (not static) information could reside.

The data vault modeling components follow hub and spokes architecture. This modeling style is a hybrid design, consisting of the best practices from both third normal form and star schema. The data vault model is not a true third normal form, and breaks some of its rules, but it is a top-down architecture with a bottom up design. The data vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which, when built, still requires the use of a data mart or star schema-based release area for business purposes.

Characteristics

There are basic features that define the data in the data warehouse that include subject orientation, data integration, time-variant, nonvolatile data, and data granularity.

Subject-oriented

Unlike the operational systems, the data in the data warehouse revolves around the subjects of the enterprise. Subject orientation is not database normalization. Subject orientation can be really useful for decision-making. Gathering the required objects is called subject-oriented.

Integrated

The data found within the data warehouse is integrated. Since it comes from several operational systems, all inconsistencies must be removed. Consistencies include naming conventions, measurement of variables, encoding structures, physical attributes of data, and so forth.

Time-variant

While operational systems reflect current values as they support day-to-day operations, data warehouse data represents a long time horizon (up to 10 years) which means it stores mostly historical data. It is mainly meant for data mining and forecasting. (E.g. if a user is searching for a buying pattern of a specific customer, the user needs to look at data on the current and past purchases.)

Nonvolatile

The data in the data warehouse is read-only, which means it cannot be updated, created, or deleted (unless there is a regulatory or statutory obligation to do so).

Options

Aggregation

In the data warehouse process, data can be aggregated in data marts at different levels of abstraction. The user may start looking at the total sale units of a product in an entire region. Then the user looks at the states in that region. Finally, they may examine the individual stores in a certain state. Therefore, typically, the analysis starts at a higher level and drills down to lower levels of details.

Virtualization

With data virtualization, the data used remains in its original locations and real-time access is established to allow analytics across multiple sources creating a virtual data warehouse. This can aid in resolving some technical difficulties such as compatibility problems when combining data from various platforms, lowering the risk of error caused by faulty data, and guaranteeing that the newest data is used. Furthermore, avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations. However, with data virtualization, the connection to all necessary data sources must be operational as there is no local copy of the data, which is one of the main drawbacks of the approach.

Architecture

The different methods used to construct/organize a data warehouse specified by an organization are numerous. The hardware utilized, software created and data resources specifically required for the correct functionality of a data warehouse are the main components of the data warehouse architecture. All data warehouses have multiple phases in which the requirements of the organization are modified and fine-tuned.

Evolution in organization use

These terms refer to the level of sophistication of a data warehouse:

Offline operational data warehouse
Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented database.
Offline data warehouse
Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data are stored in a data structure designed to facilitate reporting.
On-time data warehouse
Online Integrated Data Warehousing represent the real-time Data warehouses stage data in the warehouse is updated for every transaction performed on the source data
Integrated data warehouse
These data warehouses assemble data from different areas of business, so users can look up the information they need across other systems.

Biotechnology

From Wikipedia, the free encyclopedia
A biologist conducting research in a biotechnology laboratory

Biotechnology is a multidisciplinary field that involves the integration of natural sciences and engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists in the field are known as biotechnologists.

The term biotechnology was first used by Károly Ereky in 1919 to refer to the production of products from raw materials with the aid of living organisms. The core principle of biotechnology involves harnessing biological systems and organisms, such as bacteria, yeast, and plants, to perform specific tasks or produce valuable substances.

Biotechnology had a significant impact on many areas of society, from medicine to agriculture to environmental science. One of the key techniques used in biotechnology is genetic engineering, which allows scientists to modify the genetic makeup of organisms to achieve desired outcomes. This can involve inserting genes from one organism into another, and consequently, create new traits or modifying existing ones.

Other important techniques used in biotechnology include tissue culture, which allows researchers to grow cells and tissues in the lab for research and medical purposes, and fermentation, which is used to produce a wide range of products such as beer, wine, and cheese.

The applications of biotechnology are diverse and have led to the development of products like life-saving drugs, biofuels, genetically modified crops, and innovative materials. It has also been used to address environmental challenges, such as developing biodegradable plastics and using microorganisms to clean up contaminated sites.

Biotechnology is a rapidly evolving field with significant potential to address pressing global challenges and improve the quality of life for people around the world; however, despite its numerous benefits, it also poses ethical and societal challenges, such as questions around genetic modification and intellectual property rights. As a result, there is ongoing debate and regulation surrounding the use and application of biotechnology in various industries and fields.

Definition

The concept of biotechnology encompasses a wide range of procedures for modifying living organisms for human purposes, going back to domestication of animals, cultivation of plants, and "improvements" to these through breeding programs that employ artificial selection and hybridization. Modern usage also includes genetic engineering, as well as cell and tissue culture technologies. The American Chemical Society defines biotechnology as the application of biological organisms, systems, or processes by various industries to learning about the science of life and the improvement of the value of materials and organisms, such as pharmaceuticals, crops, and livestock. As per the European Federation of Biotechnology, biotechnology is the integration of natural science and organisms, cells, parts thereof, and molecular analogues for products and services. Biotechnology is based on the basic biological sciences (e.g., molecular biology, biochemistry, cell biology, embryology, genetics, microbiology) and conversely provides methods to support and perform basic research in biology.

A visual representation of tissue engineering principles, demonstrating the creation of functional tissues using a combination of engineering and biological concepts
Principles of Tissue Engineering

Biotechnology is the research and development in the laboratory using bioinformatics for exploration, extraction, exploitation, and production from any living organisms and any source of biomass by means of biochemical engineering where high value-added products could be planned (reproduced by biosynthesis, for example), forecasted, formulated, developed, manufactured, and marketed for the purpose of sustainable operations (for the return from bottomless initial investment on R & D) and gaining durable patents rights (for exclusives rights for sales, and prior to this to receive national and international approval from the results on animal experiment and human experiment, especially on the pharmaceutical branch of biotechnology to prevent any undetected side-effects or safety concerns by using the products). The utilization of biological processes, organisms or systems to produce products that are anticipated to improve human lives is termed biotechnology.

By contrast, bioengineering is generally thought of as a related field that more heavily emphasizes higher systems approaches (not necessarily the altering or using of biological materials directly) for interfacing with and utilizing living things. Bioengineering is the application of the principles of engineering and natural sciences to tissues, cells, and molecules. This can be considered as the use of knowledge from working with and manipulating biology to achieve a result that can improve functions in plants and animals. Relatedly, biomedical engineering is an overlapping field that often draws upon and applies biotechnology (by various definitions), especially in certain sub-fields of biomedical or chemical engineering such as tissue engineering, biopharmaceutical engineering, and genetic engineering.

History

Brewing was an early application of biotechnology.

Although not normally what first comes to mind, many forms of human-derived agriculture clearly fit the broad definition of "utilizing a biotechnological system to make products". Indeed, the cultivation of plants may be viewed as the earliest biotechnological enterprise.

Agriculture has been theorized to have become the dominant way of producing food since the Neolithic Revolution. Through early biotechnology, the earliest farmers selected and bred the best-suited crops (e.g., those with the highest yields) to produce enough food to support a growing population. As crops and fields became increasingly large and difficult to maintain, it was discovered that specific organisms and their by-products could effectively fertilize, restore nitrogen, and control pests. Throughout the history of agriculture, farmers have inadvertently altered the genetics of their crops through introducing them to new environments and breeding them with other plants — one of the first forms of biotechnology.

These processes also were included in early fermentation of beer. These processes were introduced in early Mesopotamia, Egypt, China and India, and still use the same basic biological methods. In brewing, malted grains (containing enzymes) convert starch from grains into sugar and then adding specific yeasts to produce beer. In this process, carbohydrates in the grains broke down into alcohols, such as ethanol. Later, other cultures produced the process of lactic acid fermentation, which produced other preserved foods, such as soy sauce. Fermentation was also used in this time period to produce leavened bread. Although the process of fermentation was not fully understood until Louis Pasteur's work in 1857, it is still the first use of biotechnology to convert a food source into another form.

Before the time of Charles Darwin's work and life, animal and plant scientists had already used selective breeding. Darwin added to that body of work with his scientific observations about the ability of science to change species. These accounts contributed to Darwin's theory of natural selection.

For thousands of years, humans have used selective breeding to improve the production of crops and livestock to use them for food. In selective breeding, organisms with desirable characteristics are mated to produce offspring with the same characteristics. For example, this technique was used with corn to produce the largest and sweetest crops.

In the early twentieth century scientists gained a greater understanding of microbiology and explored ways of manufacturing specific products. In 1917, Chaim Weizmann first used a pure microbiological culture in an industrial process, that of manufacturing corn starch using Clostridium acetobutylicum, to produce acetone, which the United Kingdom desperately needed to manufacture explosives during World War I.

Biotechnology has also led to the development of antibiotics. In 1928, Alexander Fleming discovered the mold Penicillium. His work led to the purification of the antibiotic formed by the mold by Howard Florey, Ernst Boris Chain and Norman Heatley – to form what we today know as penicillin. In 1940, penicillin became available for medicinal use to treat bacterial infections in humans.

The field of modern biotechnology is generally thought of as having been born in 1971 when Paul Berg's (Stanford) experiments in gene splicing had early success. Herbert W. Boyer (Univ. Calif. at San Francisco) and Stanley N. Cohen (Stanford) significantly advanced the new technology in 1972 by transferring genetic material into a bacterium, such that the imported material would be reproduced. The commercial viability of a biotechnology industry was significantly expanded on June 16, 1980, when the United States Supreme Court ruled that a genetically modified microorganism could be patented in the case of Diamond v. Chakrabarty. Indian-born Ananda Chakrabarty, working for General Electric, had modified a bacterium (of the genus Pseudomonas) capable of breaking down crude oil, which he proposed to use in treating oil spills. (Chakrabarty's work did not involve gene manipulation but rather the transfer of entire organelles between strains of the Pseudomonas bacterium).

The MOSFET invented at Bell Labs between 1955 and 1960, Two years later, Leland C. Clark and Champ Lyons invented the first biosensor in 1962. Biosensor MOSFETs were later developed, and they have since been widely used to measure physical, chemical, biological and environmental parameters. The first BioFET was the ion-sensitive field-effect transistor (ISFET), invented by Piet Bergveld in 1970. It is a special type of MOSFET, where the metal gate is replaced by an ion-sensitive membrane, electrolyte solution and reference electrode. The ISFET is widely used in biomedical applications, such as the detection of DNA hybridization, biomarker detection from blood, antibody detection, glucose measurement, pH sensing, and genetic technology.

By the mid-1980s, other BioFETs had been developed, including the gas sensor FET (GASFET), pressure sensor FET (PRESSFET), chemical field-effect transistor (ChemFET), reference ISFET (REFET), enzyme-modified FET (ENFET) and immunologically modified FET (IMFET). By the early 2000s, BioFETs such as the DNA field-effect transistor (DNAFET), gene-modified FET (GenFET) and cell-potential BioFET (CPFET) had been developed.

A factor influencing the biotechnology sector's success is improved intellectual property rights legislation—and enforcement—worldwide, as well as strengthened demand for medical and pharmaceutical products.

Rising demand for biofuels is expected to be good news for the biotechnology sector, with the Department of Energy estimating ethanol usage could reduce U.S. petroleum-derived fuel consumption by up to 30% by 2030. The biotechnology sector has allowed the U.S. farming industry to rapidly increase its supply of corn and soybeans—the main inputs into biofuels—by developing genetically modified seeds that resist pests and drought. By increasing farm productivity, biotechnology boosts biofuel production.

Examples

Biotechnology has applications in four major industrial areas, including health care (medical), crop production and agriculture, non-food (industrial) uses of crops and other products (e.g., biodegradable plastics, vegetable oil, biofuels), and environmental uses.

For example, one application of biotechnology is the directed use of microorganisms for the manufacture of organic products (examples include beer and milk products). Another example is using naturally present bacteria by the mining industry in bioleaching. Biotechnology is also used to recycle, treat waste, clean up sites contaminated by industrial activities (bioremediation), and also to produce biological weapons.

A series of derived terms have been coined to identify several branches of biotechnology, for example:

  • Bioinformatics (or "gold biotechnology") is an interdisciplinary field that addresses biological problems using computational techniques, and makes the rapid organization as well as analysis of biological data possible. The field may also be referred to as computational biology, and can be defined as, "conceptualizing biology in terms of molecules and then applying informatics techniques to understand and organize the information associated with these molecules, on a large scale". Bioinformatics plays a key role in various areas, such as functional genomics, structural genomics, and proteomics, and forms a key component in the biotechnology and pharmaceutical sector.
  • Blue biotechnology is based on the exploitation of sea resources to create products and industrial applications. This branch of biotechnology is the most used for the industries of refining and combustion principally on the production of bio-oils with photosynthetic micro-algae.
  • Green biotechnology is biotechnology applied to agricultural processes. An example would be the selection and domestication of plants via micropropagation. Another example is the designing of transgenic plants to grow under specific environments in the presence (or absence) of chemicals. One hope is that green biotechnology might produce more environmentally friendly solutions than traditional industrial agriculture. An example of this is the engineering of a plant to express a pesticide, thereby ending the need of external application of pesticides. An example of this would be Bt corn. Whether or not green biotechnology products such as this are ultimately more environmentally friendly is a topic of considerable debate. It is commonly considered as the next phase of green revolution, which can be seen as a platform to eradicate world hunger by using technologies which enable the production of more fertile and resistant, towards biotic and abiotic stress, plants and ensures application of environmentally friendly fertilizers and the use of biopesticides, it is mainly focused on the development of agriculture. On the other hand, some of the uses of green biotechnology involve microorganisms to clean and reduce waste.
  • Red biotechnology is the use of biotechnology in the medical and pharmaceutical industries, and health preservation. This branch involves the production of vaccines and antibiotics, regenerative therapies, creation of artificial organs and new diagnostics of diseases. As well as the development of hormones, stem cells, antibodies, siRNA and diagnostic tests.
  • White biotechnology, also known as industrial biotechnology, is biotechnology applied to industrial processes. An example is the designing of an organism to produce a useful chemical. Another example is the using of enzymes as industrial catalysts to either produce valuable chemicals or destroy hazardous/polluting chemicals. White biotechnology tends to consume less in resources than traditional processes used to produce industrial goods.
  • "Yellow biotechnology" refers to the use of biotechnology in food production (food industry), for example in making wine (winemaking), cheese (cheesemaking), and beer (brewing) by fermentation. It has also been used to refer to biotechnology applied to insects. This includes biotechnology-based approaches for the control of harmful insects, the characterisation and utilisation of active ingredients or genes of insects for research, or application in agriculture and medicine and various other approaches.
  • Gray biotechnology is dedicated to environmental applications, and focused on the maintenance of biodiversity and the remotion of pollutants.
  • Brown biotechnology is related to the management of arid lands and deserts. One application is the creation of enhanced seeds that resist extreme environmental conditions of arid regions, which is related to the innovation, creation of agriculture techniques and management of resources.
  • Violet biotechnology is related to law, ethical and philosophical issues around biotechnology.
  • Microbial biotechnology has been proposed for the rapidly emerging area of biotechnology applications in space and microgravity (space bioeconomy)
  • Dark biotechnology is the color associated with bioterrorism or biological weapons and biowarfare which uses microorganisms, and toxins to cause diseases and death in humans, livestock and crops.

Medicine

In medicine, modern biotechnology has many applications in areas such as pharmaceutical drug discoveries and production, pharmacogenomics, and genetic testing (or genetic screening). In 2021, nearly 40% of the total company value of pharmaceutical biotech companies worldwide were active in Oncology with Neurology and Rare Diseases being the other two big applications.

DNA microarray chip – some can do as many as a million blood tests at once.

Pharmacogenomics (a combination of pharmacology and genomics) is the technology that analyses how genetic makeup affects an individual's response to drugs. Researchers in the field investigate the influence of genetic variation on drug responses in patients by correlating gene expression or single-nucleotide polymorphisms with a drug's efficacy or toxicity. The purpose of pharmacogenomics is to develop rational means to optimize drug therapy, with respect to the patients' genotype, to ensure maximum efficacy with minimal adverse effects. Such approaches promise the advent of "personalized medicine"; in which drugs and drug combinations are optimized for each individual's unique genetic makeup.

Computer-generated image of insulin hexamers highlighting the threefold symmetry, the zinc ions holding it together, and the histidine residues involved in zinc binding

Biotechnology has contributed to the discovery and manufacturing of traditional small molecule pharmaceutical drugs as well as drugs that are the product of biotechnology – biopharmaceutics. Modern biotechnology can be used to manufacture existing medicines relatively easily and cheaply. The first genetically engineered products were medicines designed to treat human diseases. To cite one example, in 1978 Genentech developed synthetic humanized insulin by joining its gene with a plasmid vector inserted into the bacterium Escherichia coli. Insulin, widely used for the treatment of diabetes, was previously extracted from the pancreas of abattoir animals (cattle or pigs). The genetically engineered bacteria are able to produce large quantities of synthetic human insulin at relatively low cost. Biotechnology has also enabled emerging therapeutics like gene therapy. The application of biotechnology to basic science (for example through the Human Genome Project) has also dramatically improved our understanding of biology and as our scientific knowledge of normal and disease biology has increased, our ability to develop new medicines to treat previously untreatable diseases has increased as well.

Genetic testing allows the genetic diagnosis of vulnerabilities to inherited diseases, and can also be used to determine a child's parentage (genetic mother and father) or in general a person's ancestry. In addition to studying chromosomes to the level of individual genes, genetic testing in a broader sense includes biochemical tests for the possible presence of genetic diseases, or mutant forms of genes associated with increased risk of developing genetic disorders. Genetic testing identifies changes in chromosomes, genes, or proteins. Most of the time, testing is used to find changes that are associated with inherited disorders. The results of a genetic test can confirm or rule out a suspected genetic condition or help determine a person's chance of developing or passing on a genetic disorder. As of 2011 several hundred genetic tests were in use. Since genetic testing may open up ethical or psychological problems, genetic testing is often accompanied by genetic counseling.

Agriculture

Genetically modified crops ("GM crops", or "biotech crops") are plants used in agriculture, the DNA of which has been modified with genetic engineering techniques. In most cases, the main aim is to introduce a new trait that does not occur naturally in the species. Biotechnology firms can contribute to future food security by improving the nutrition and viability of urban agriculture. Furthermore, the protection of intellectual property rights encourages private sector investment in agrobiotechnology.

Examples in food crops include resistance to certain pests, diseases, stressful environmental conditions, resistance to chemical treatments (e.g. resistance to a herbicide), reduction of spoilage, or improving the nutrient profile of the crop. Examples in non-food crops include production of pharmaceutical agents, biofuels, and other industrially useful goods, as well as for bioremediation.

Farmers have widely adopted GM technology. Between 1996 and 2011, the total surface area of land cultivated with GM crops had increased by a factor of 94, from 17,000 to 1,600,000 square kilometers (4,200,000 to 395,400,000 acres). 10% of the world's crop lands were planted with GM crops in 2010. As of 2011, 11 different transgenic crops were grown commercially on 395 million acres (160 million hectares) in 29 countries such as the US, Brazil, Argentina, India, Canada, China, Paraguay, Pakistan, South Africa, Uruguay, Bolivia, Australia, Philippines, Myanmar, Burkina Faso, Mexico and Spain.

Genetically modified foods are foods produced from organisms that have had specific changes introduced into their DNA with the methods of genetic engineering. These techniques have allowed for the introduction of new crop traits as well as a far greater control over a food's genetic structure than previously afforded by methods such as selective breeding and mutation breeding. Commercial sale of genetically modified foods began in 1994, when Calgene first marketed its Flavr Savr delayed ripening tomato. To date most genetic modification of foods have primarily focused on cash crops in high demand by farmers such as soybean, corn, canola, and cotton seed oil. These have been engineered for resistance to pathogens and herbicides and better nutrient profiles. GM livestock have also been experimentally developed; in November 2013 none were available on the market, but in 2015 the FDA approved the first GM salmon for commercial production and consumption.

There is a scientific consensus that currently available food derived from GM crops poses no greater risk to human health than conventional food, but that each GM food needs to be tested on a case-by-case basis before introduction. Nonetheless, members of the public are much less likely than scientists to perceive GM foods as safe. The legal and regulatory status of GM foods varies by country, with some nations banning or restricting them, and others permitting them with widely differing degrees of regulation.

GM crops also provide a number of ecological benefits, if not used in excess. Insect-resistant crops have proven to lower pesticide usage, therefore reducing the environmental impact of pesticides as a whole. However, opponents have objected to GM crops per se on several grounds, including environmental concerns, whether food produced from GM crops is safe, whether GM crops are needed to address the world's food needs, and economic concerns raised by the fact these organisms are subject to intellectual property law.

Biotechnology has several applications in the realm of food security. Crops like Golden rice are engineered to have higher nutritional content, and there is potential for food products with longer shelf lives. Though not a form of agricultural biotechnology, vaccines can help prevent diseases found in animal agriculture. Additionally, agricultural biotechnology can expedite breeding processes in order to yield faster results and provide greater quantities of food. Transgenic biofortification in cereals has been considered as a promising method to combat malnutrition in India and other countries.

Industrial

Industrial biotechnology (known mainly in Europe as white biotechnology) is the application of biotechnology for industrial purposes, including industrial fermentation. It includes the practice of using cells such as microorganisms, or components of cells like enzymes, to generate industrially useful products in sectors such as chemicals, food and feed, detergents, paper and pulp, textiles and biofuels. In the current decades, significant progress has been done in creating genetically modified organisms (GMOs) that enhance the diversity of applications and economical viability of industrial biotechnology. By using renewable raw materials to produce a variety of chemicals and fuels, industrial biotechnology is actively advancing towards lowering greenhouse gas emissions and moving away from a petrochemical-based economy.

Synthetic biology is considered one of the essential cornerstones in industrial biotechnology due to its financial and sustainable contribution to the manufacturing sector. Jointly biotechnology and synthetic biology play a crucial role in generating cost-effective products with nature-friendly features by using bio-based production instead of fossil-based. Synthetic biology can be used to engineer model microorganisms, such as Escherichia coli, by genome editing tools to enhance their ability to produce bio-based products, such as bioproduction of medicines and biofuels. For instance, E. coli and Saccharomyces cerevisiae in a consortium could be used as industrial microbes to produce precursors of the chemotherapeutic agent paclitaxel by applying the metabolic engineering in a co-culture approach to exploit the benefits from the two microbes.

Another example of synthetic biology applications in industrial biotechnology is the re-engineering of the metabolic pathways of E. coli by CRISPR and CRISPRi systems toward the production of a chemical known as 1,4-butanediol, which is used in fiber manufacturing. In order to produce 1,4-butanediol, the authors alter the metabolic regulation of the Escherichia coli by CRISPR to induce point mutation in the gltA gene, knockout of the sad gene, and knock-in six genes (cat1, sucD, 4hbd, cat2, bld, and bdh). Whereas CRISPRi system used to knockdown the three competing genes (gabD, ybgC, and tesB) that affect the biosynthesis pathway of 1,4-butanediol. Consequently, the yield of 1,4-butanediol significantly increased from 0.9 to 1.8 g/L.

Environmental

Environmental biotechnology includes various disciplines that play an essential role in reducing environmental waste and providing environmentally safe processes, such as biofiltration and biodegradation. The environment can be affected by biotechnologies, both positively and adversely. Vallero and others have argued that the difference between beneficial biotechnology (e.g., bioremediation is to clean up an oil spill or hazard chemical leak) versus the adverse effects stemming from biotechnological enterprises (e.g., flow of genetic material from transgenic organisms into wild strains) can be seen as applications and implications, respectively. Cleaning up environmental wastes is an example of an application of environmental biotechnology; whereas loss of biodiversity or loss of containment of a harmful microbe are examples of environmental implications of biotechnology.

Many cities have installed CityTrees, which use biotechnology to filter pollutants from urban atmospheres.

Regulation

The regulation of genetic engineering concerns approaches taken by governments to assess and manage the risks associated with the use of genetic engineering technology, and the development and release of genetically modified organisms (GMO), including genetically modified crops and genetically modified fish. There are differences in the regulation of GMOs between countries, with some of the most marked differences occurring between the US and Europe. Regulation varies in a given country depending on the intended use of the products of the genetic engineering. For example, a crop not intended for food use is generally not reviewed by authorities responsible for food safety. The European Union differentiates between approval for cultivation within the EU and approval for import and processing. While only a few GMOs have been approved for cultivation in the EU a number of GMOs have been approved for import and processing. The cultivation of GMOs has triggered a debate about the coexistence of GM and non-GM crops. Depending on the coexistence regulations, incentives for the cultivation of GM crops differ.

Database for the GMOs used in the EU

The EUginius (European GMO Initiative for a Unified Database System) database is intended to help companies, interested private users and competent authorities to find precise information on the presence, detection and identification of GMOs used in the European Union. The information is provided in English.

Learning

Central New York Biotech Accelerator, Upstate Medical University

In 1988, after prompting from the United States Congress, the National Institute of General Medical Sciences (National Institutes of Health) (NIGMS) instituted a funding mechanism for biotechnology training. Universities nationwide compete for these funds to establish Biotechnology Training Programs (BTPs). Each successful application is generally funded for five years then must be competitively renewed. Graduate students in turn compete for acceptance into a BTP; if accepted, then stipend, tuition and health insurance support are provided for two or three years during the course of their PhD thesis work. Nineteen institutions offer NIGMS supported BTPs. Biotechnology training is also offered at the undergraduate level and in community colleges.

Data warehouse

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Data_warehouse ...