Search This Blog

Thursday, April 27, 2023

Metadata

In the 21st century, metadata typically refers to digital forms, but traditional card catalogs contain metadata, with cards holding information about books in a library (author, title, subject, etc.).
 
Metadata can come in different layers: This physical herbarium record of Cenchrus ciliaris consists of the specimens as well as metadata about them, while the barcode points to a digital record with metadata about the physical record.

Metadata (or metainformation) is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

  • Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords.
  • Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials.
  • Administrative metadata – the information to help manage a resource, like resource type, permissions, and when and how it was created.
  • Reference metadata – the information about the contents and quality of statistical data.
  • Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data.
  • Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided.

Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways.

History

Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance.

Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards.

The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data."

Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc.

In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations.

Definition

Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include:

  • Means of creation of the data
  • Purpose of the data
  • Time and date of creation
  • Creator or author of the data
  • Location on a computer network where the data was created
  • Standards used
  • File size
  • Data quality
  • Source of the data
  • Process used to create the data

For example, a digital image may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content. These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s. The reliance on metatags in web searches was decreased in the late 1990s because of "keyword stuffing", whereby metatags were being largely misused to trick search engines into thinking some websites had more relevance in the search than they really did.

Metadata can be stored and managed in a database, often called a metadata registry or metadata repository. However, without context and a point of reference, it might be impossible to identify metadata just by looking at it. For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an equation  –  without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as ISBNs  –  information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogs. Since then the fields of information management, information science, information technology, librarianship, and GIS have widely adopted the term. In these fields, the word metadata is defined as "data about data". While this is the generally accepted definition, various disciplines have adopted their own more specific explanations and uses of the term.

Slate reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails.

Types

While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata.

NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource.

Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production.

An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. Schema.org has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. The Wiki page WebSchemas/Accessibility lists several properties and their values. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done.

Structures

Metadata (metacontent) or, more correctly, the vocabularies used to assemble metadata (metacontent) statements, is typically structured according to a standardized concept using a well-defined metadata scheme, including metadata standards and metadata models. Tools such as controlled vocabularies, taxonomies, thesauri, data dictionaries, and metadata registries can be used to apply further standardization to the metadata. Structural metadata commonality is also of paramount importance in data model development and in database design.

Syntax

Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent). A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text, HTML, XML, and RDF.

A common example of (guide) metacontent is the bibliographic classification, the subject, the Dewey Decimal class number. There is always an implied statement in any "classification" of some object. To classify an object as, for example, Dewey class number 514 (Topology) (i.e. books having the number 514 on their spine) the implied statement is: "<book><subject heading><514>". This is a subject-predicate-object triple, or more importantly, a class-attribute-value triple. The first 2 elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement i.e. "metacontent = metadata + master data". All of these elements can be thought of as "vocabulary". Both metadata and master data are vocabularies that can be assembled into metacontent statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT, XSD, Dewey/UDC/LoC, SKOS, ISO-25964, Pantone, Linnaean Binomial Nomenclature, etc. Using controlled vocabularies for the components of metacontent statements, whether for indexing or finding, is endorsed by ISO 25964: "If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved." This is particularly relevant when considering search engines of the internet, such as Google. The process indexes pages and then matches text strings using its complex algorithm; there is no intelligence or "inferencing" occurring, just the illusion thereof.

Hierarchical, linear, and planar schemata

Metadata schemata can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements. An example of a hierarchical metadata schema is the IEEE LOM schema, in which metadata elements may belong to a parent metadata element. Metadata schemata can also be one-dimensional, or linear, where each element is completely discrete from other elements and classified according to one dimension only. An example of a linear metadata schema is the Dublin Core schema, which is one-dimensional. Metadata schemata are often 2 dimensional, or planar, where each element is completely discrete from other elements but classified according to 2 orthogonal dimensions.

Granularity

The degree to which the data or metadata is structured is referred to as "granularity". "Granularity" refers to how much detail is provided. Metadata with a high granularity allows for deeper, more detailed, and more structured information and enables a greater level of technical manipulation. A lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity is not only on creation and capture, but moreover on maintenance costs. As soon as the metadata structures become outdated, so too is the access to the referred data. Hence granularity must take into account the effort to create the metadata as well as the effort to maintain it.

Hypermapping

In all cases where the metadata schemata exceed the planar depiction, some type of hypermapping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays.

Standards

International standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially ANSI (American National Standards Institute) and ISO (International Organization for Standardization) to reach a consensus on standardizing metadata and registries. The core metadata registry standard is ISO/IEC 11179 Metadata Registries (MDR), the framework for the standard is described in ISO/IEC 11179-1:2004. A new edition of Part 1 is in its final stage for publication in 2015 or early 2016. It has been revised to align with the current edition of Part 3, ISO/IEC 11179-3:2013 which extends the MDR to support the registration of Concept Systems. (see ISO/IEC 11179). This standard specifies a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers. ISO/IEC 11179 standard refers to metadata as information objects about data, or "data about data". In ISO/IEC 11179 Part-3, the information objects are data about Data Elements, Value Domains, and other reusable semantic and representational information objects that describe the meaning and technical details of a data item. This standard also prescribes the details for a metadata registry, and for registering and administering the information objects within a Metadata Registry. ISO/IEC 11179 Part 3 also has provisions for describing compound structures that are derivations of other data elements, for example through calculations, collections of one or more data elements, or other forms of derived data. While this standard describes itself originally as a "data element" registry, its purpose is to support describing and registering metadata content independently of any particular application, lending the descriptions to being discovered and reused by humans or computers in developing new applications, databases, or for analysis of data collected in accordance with the registered metadata content. This standard has become the general basis for other kinds of metadata registries, reusing and extending the registration and administration portion of the standard.

The Geospatial community has a tradition of specialized geospatial metadata standards, particularly building on traditions of map- and image-libraries and catalogs. Formal metadata is usually essential for geospatial data, as common text-processing approaches are not applicable.

The Dublin Core metadata terms are a set of vocabulary terms that can be used to describe resources for the purposes of discovery. The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set are endorsed in the following standards documents:

  • IETF RFC 5013
  • ISO Standard 15836-2009
  • NISO Standard Z39.85.

The W3C Data Catalog Vocabulary (DCAT) is an RDF vocabulary that supplements Dublin Core with classes for Dataset, Data Service, Catalog, and Catalog Record. DCAT also uses elements from FOAF, PROV-O, and OWL-Time. DCAT provides an RDF model to support the typical structure of a catalog that contains records, each describing a dataset or service.

Although not a standard, Microformat (also mentioned in the section metadata on the internet below) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata. Microformat follows XHTML and HTML standards but is not a standard in itself. One advocate of microformats, Tantek Çelik, characterized a problem with alternative approaches:

Here's a new language we want you to learn, and now you need to output these additional files on your server. It's a hassle. (Microformats) lower the barrier to entry.

Use

Photographs

Metadata may be written into a digital photo file that will identify who owns it, copyright and contact information, what brand or model of camera created the file, along with exposure information (shutter speed, f-stop, etc.) and descriptive information, such as keywords about the photo, making the file or image searchable on a computer and/or the Internet. Some metadata is created by the camera such as, color space, color channels, exposure time, and aperture (EXIF), while some is input by the photographer and/or software after downloading to a computer. Most digital cameras write metadata about the model number, shutter speed, etc., and some enable you to edit it; this functionality has been available on most Nikon DSLRs since the Nikon D3, on most new Canon cameras since the Canon EOS 7D, and on most Pentax DSLRs since the Pentax K-3. Metadata can be used to make organizing in post-production easier with the use of key-wording. Filters can be used to analyze a specific set of photographs and create selections on criteria like rating or capture time. On devices with geolocation capabilities like GPS (smartphones in particular), the location the photo was taken from may also be included.

Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to:

  • IPTC Information Interchange Model IIM (International Press Telecommunications Council)
  • IPTC Core Schema for XMP
  • XMP – Extensible Metadata Platform (an ISO standard)
  • Exif – Exchangeable image file format, Maintained by CIPA (Camera & Imaging Products Association) and published by JEITA (Japan Electronics and Information Technology Industries Association)
  • Dublin Core (Dublin Core Metadata Initiative – DCMI)
  • PLUS (Picture Licensing Universal System)
  • VRA Core (Visual Resource Association)

Telecommunications

Information on the times, origins and destinations of phone calls, electronic messages, instant messages, and other modes of telecommunication, as opposed to message content, is another form of metadata. Bulk collection of this call detail record metadata by intelligence agencies has proven controversial after disclosures by Edward Snowden of the fact that certain Intelligence agencies such as the NSA had been (and perhaps still are) keeping online metadata on millions of internet users for up to a year, regardless of whether or not they [ever] were persons of interest to the agency.

Video

Metadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) is not directly understandable by a computer, but where an efficient search of the content is desirable. This is particularly useful in video applications such as Automatic Number Plate Recognition and Vehicle Recognition Identification software, wherein license plate data is saved and used to create reports and alerts. There are 2 sources in which video metadata is derived: (1) operational gathered metadata, that is information about the content produced, such as the type of equipment, software, date, and location; (2) human-authored metadata, to improve search engine visibility, discoverability, audience engagement, and providing advertising opportunities to video publishers. Today most professional video editing software has access to metadata. Avid's MetaSync and Adobe's Bridge are 2 prime examples of this.

Geospatial metadata

Geospatial metadata relates to Geographic Information Systems (GIS) files, maps, images, and other data that is location-based. Metadata is used in GIS to document the characteristics and attributes of geographic data, such as database files and data that is developed within a GIS. It includes details like who developed the data, when it was collected, how it was processed, and what formats it's available in, and then delivers the context for the data to be used effectively.

Creation

Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when an object was created, who created it, when it was last updated, file size, and file extension. In this context an object refers to any of the following:

  • A physical item such as a book, CD, DVD, a paper map, chair, table, flower pot, etc.
  • An electronic file such as a digital image, digital photo, electronic document, program file, database table, etc.

A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a domain.

Data virtualization

Data virtualization emerged in the 2000s as the new software technology to complete the virtualization "stack" in the enterprise. Metadata is used in data virtualization servers which are enterprise infrastructure components, alongside database and application servers. Metadata in these servers is saved as persistent repository and describe business objects in various enterprise systems and applications. Structural metadata commonality is also important to support data virtualization.

Statistics and census services

Standardization and harmonization work has brought advantages to industry efforts to build metadata systems in the statistical community. Several metadata guidelines and standards such as the European Statistics Code of Practice and ISO 17369:2013 (Statistical Data and Metadata Exchange or SDMX) provide key principles for how businesses, government bodies, and other entities should manage statistical data and metadata. Entities such as Eurostat, European System of Central Banks, and the U.S. Environmental Protection Agency have implemented these and other such standards and guidelines with the goal of improving "efficiency when managing statistical business processes".

Library and information science

Metadata has been used in various ways as a means of cataloging items in libraries in both digital and analog formats. Such data helps classify, aggregate, identify, and locate a particular book, DVD, magazine, or any object a library might hold in its collection. Until the 1980s, many library catalogs used 3x5 inch cards in file drawers to display a book's title, author, subject matter, and an abbreviated alpha-numeric string (call number) which indicated the physical location of the book within the library's shelves. The Dewey Decimal System employed by libraries for the classification of library materials by subject is an early example of metadata usage. The early paper catalog had information regarding whichever item was described on said card: title, author, subject, and a number as to where to find said item. Beginning in the 1980s and 1990s, many libraries replaced these paper file cards with computer databases. These computer databases make it much easier and faster for users to do keyword searches. Another form of older metadata collection is the use by the US Census Bureau of what is known as the "Long Form". The Long Form asks questions that are used to create demographic data to find patterns of distribution. Libraries employ metadata in library catalogues, most commonly as part of an Integrated Library Management System. Metadata is obtained by cataloging resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system, ILMS, using the MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question.

More recent and specialized instances of library metadata include the establishment of digital libraries including e-print repositories and digital image libraries. While often based on library principles, the focus on non-librarian use, especially in providing metadata, means they do not follow traditional or common cataloging approaches. Given the custom nature of included materials, metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords, or copyright statement. Standard file information such as file size and format are usually automatically included. Library operation has for decades been a key topic in efforts toward international standardization. Standards for metadata in digital libraries include Dublin Core, METS, MODS, DDI, DOI, URN, PREMIS schema, EML, and OAI-PMH. Leading libraries in the world give hints on their metadata standards strategies. The use and creation of metadata in library and information science also include scientific publications:

Science

Metadata for scientific publications is often created by journal publishers and citation databases such as PubMed and Web of Science. The data contained within manuscripts or accompanying them as supplementary material is less often subject to metadata creation, though they may be submitted to e.g. biomedical databases after publication. The original authors and database curators then become responsible for metadata creation, with the assistance of automated processes. Comprehensive metadata for all experimental data is the foundation of the FAIR Guiding Principles, or the standards for ensuring research data are findable, accessible, interoperable, and reusable.

Such metadata can then be utilized, complemented, and made accessible in useful ways. OpenAlex is a free online index of over 200 million scientific documents that integrates and provides metadata such as sources, citations, author information, scientific fields, and research topics. Its API and open source website can be used for metascience, scientometrics, and novel tools that query this semantic web of papers. Another project under development, Scholia, uses the metadata of scientific publications for various visualizations and aggregation features such as providing a simple user interface summarizing literature about a specific feature of the SARS-CoV-2 virus using Wikidata's "main subject" property.

In research labor, transparent metadata about authors' contributions to works have been proposed – e.g. the role played in the production of the paper, the level of contribution and the responsibilities.

Moreover, various metadata about scientific outputs can be created or complemented – for instance, scite.ai attempts to track and link citations of papers as 'Supporting', 'Mentioning' or 'Contrasting' the study. Other examples include developments of alternative metrics – which, beyond providing help for assessment and findability, also aggregate many of the public discussions about a scientific paper on social media such as Reddit, citations on Wikipedia, and reports about the study in the news media – and a call for showing whether or not the original findings are confirmed or could get reproduced.

Museums

Metadata in a museum context is the information that trained cultural documentation specialists, such as archivists, librarians, museum registrars and curators, create to index, structure, describe, identify, or otherwise specify works of art, architecture, cultural objects and their images. Descriptive metadata is most commonly used in museum contexts for object identification and resource recovery purposes.

Usage

Metadata is developed and applied within collecting institutions and museums in order to:

  • Facilitate resource discovery and execute search queries.
  • Create digital archives that store information relating to various aspects of museum collections and cultural objects, and serve archival and managerial purposes.
  • Provide public audiences access to cultural objects through publishing digital content online.

Standards

Many museums and cultural heritage centers recognize that given the diversity of artworks and cultural objects, no single model or standard suffices to describe and catalog cultural works. For example, a sculpted Indigenous artifact could be classified as an artwork, an archaeological artifact, or an Indigenous heritage item. The early stages of standardization in archiving, description and cataloging within the museum community began in the late 1990s with the development of standards such as Categories for the Description of Works of Art (CDWA), Spectrum, CIDOC Conceptual Reference Model (CRM), Cataloging Cultural Objects (CCO) and the CDWA Lite XML schema. These standards use HTML and XML markup languages for machine processing, publication and implementation. The Anglo-American Cataloguing Rules (AACR), originally developed for characterizing books, have also been applied to cultural objects, works of art and architecture. Standards, such as the CCO, are integrated within a Museum's Collections Management System (CMS), a database through which museums are able to manage their collections, acquisitions, loans and conservation. Scholars and professionals in the field note that the "quickly evolving landscape of standards and technologies" creates challenges for cultural documentarians, specifically non-technically trained professionals. Most collecting institutions and museums use a relational database to categorize cultural works and their images. Relational databases and metadata work to document and describe the complex relationships amongst cultural objects and multi-faceted works of art, as well as between objects and places, people, and artistic movements. Relational database structures are also beneficial within collecting institutions and museums because they allow for archivists to make a clear distinction between cultural objects and their images; an unclear distinction could lead to confusing and inaccurate searches.

Cultural objects

An object's materiality, function, and purpose, as well as the size (e.g., measurements, such as height, width, weight), storage requirements (e.g., climate-controlled environment), and focus of the museum and collection, influence the descriptive depth of the data attributed to the object by cultural documentarians. The established institutional cataloging practices, goals, and expertise of cultural documentarians and database structure also influence the information ascribed to cultural objects and the ways in which cultural objects are categorized. Additionally, museums often employ standardized commercial collection management software that prescribes and limits the ways in which archivists can describe artworks and cultural objects. As well, collecting institutions and museums use Controlled Vocabularies to describe cultural objects and artworks in their collections. Getty Vocabularies and the Library of Congress Controlled Vocabularies are reputable within the museum community and are recommended by CCO standards. Museums are encouraged to use controlled vocabularies that are contextual and relevant to their collections and enhance the functionality of their digital information systems. Controlled Vocabularies are beneficial within databases because they provide a high level of consistency, improving resource retrieval. Metadata structures, including controlled vocabularies, reflect the ontologies of the systems from which they were created. Often the processes through which cultural objects are described and categorized through metadata in museums do not reflect the perspectives of the maker communities.

Online content

Metadata has been instrumental in the creation of digital information systems and archives within museums and has made it easier for museums to publish digital content online. This has enabled audiences who might not have had access to cultural objects due to geographic or economic barriers to have access to them. In the 2000s, as more museums have adopted archival standards and created intricate databases, discussions about Linked Data between museum databases have come up in the museum, archival, and library science communities. Collection Management Systems (CMS) and Digital Asset Management tools can be local or shared systems. Digital Humanities scholars note many benefits of interoperability between museum databases and collections, while also acknowledging the difficulties of achieving such interoperability.

Law

United States

Problems involving metadata in litigation in the United States are becoming widespread. Courts have looked at various questions involving metadata, including the discoverability of metadata by parties. The Federal Rules of Civil Procedure have specific rules for discovery of electronically stored information, and subsequent case law applying those rules has elucidated on the litigant's duty to produce metadata when litigating in federal court. In October 2009, the Arizona Supreme Court has ruled that metadata records are public record. Document metadata have proven particularly important in legal environments in which litigation has requested metadata, that can include sensitive information detrimental to a certain party in court. Using metadata removal tools to "clean" or redact documents can mitigate the risks of unwittingly sending sensitive data. This process partially (see data remanence) protects law firms from potentially damaging leaking of sensitive data through electronic discovery.

Opinion polls have shown that 45% of Americans are "not at all confident" in the ability of social media sites to ensure their personal data is secure and 40% say that social media sites should not be able to store any information on individuals. 76% of Americans say that they are not confident that the information advertising agencies collect on them is secure and 50% say that online advertising agencies should not be allowed to record any of their information at all.

Australia

In Australia, the need to strengthen national security has resulted in the introduction of a new metadata storage law. This new law means that both security and policing agencies will be allowed to access up to 2 years of an individual's metadata, with the aim of making it easier to stop any terrorist attacks and serious crimes from happening.

Legislation

Legislative metadata has been the subject of some discussion in law.gov forums such as workshops held by the Legal Information Institute at the Cornell Law School on 22 and 23 March 2010. The documentation for these forums is titled, "Suggested metadata practices for legislation and regulations".

A handful of key points have been outlined by these discussions, section headings of which are listed as follows:

  • General Considerations
  • Document Structure
  • Document Contents
  • Metadata (elements of)
  • Layering
  • Point-in-time versus post-hoc

Healthcare

Australian medical research pioneered the definition of metadata for applications in health care. That approach offers the first recognized attempt to adhere to international standards in medical sciences instead of defining a proprietary standard under the World Health Organization (WHO) umbrella. The medical community yet did not approve of the need to follow metadata standards despite research that supported these standards.

Biomedical researches

Research studies in the fields of biomedicine and molecular biology frequently yield large quantities of data, including results of genome or meta-genome sequencing, proteomics data, and even notes or plans created during the course of research itself. Each data type involves its own variety of metadata and the processes necessary to produce these metadata. General metadata standards, such as ISA-Tab, allow researchers to create and exchange experimental metadata in consistent formats. Specific experimental approaches frequently have their own metadata standards and systems: metadata standards for mass spectrometry include mzML and SPLASH, while XML-based standards such as PDBML and SRA XML serve as standards for macromolecular structure and sequencing data, respectively.

The products of biomedical research are generally realized as peer-reviewed manuscripts and these publications are yet another source of data (see #In science).

Data warehousing

A data warehouse (DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data. Data warehouses differ from business intelligence (BI) systems because BI systems are designed to use data to create reports and analyze the information, to provide strategic guidance to management. Metadata is an important tool in how data is stored in data warehouses. The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, "cleaned" and timely data, extracted from various operational systems in an organization. The extracted data are integrated in the data warehouse environment to provide an enterprise-wide perspective. Data are structured in a way to serve the reporting and analytic requirements. The design of structural metadata commonality using a data modeling method such as entity-relationship model diagramming is important in any data warehouse development effort. They detail metadata on each piece of data in the data warehouse. An essential component of a data warehouse/business intelligence system is the metadata and tools to manage and retrieve the metadata. Ralph Kimball describes metadata as the DNA of the data warehouse as metadata defines the elements of the data warehouse and how they work together.

Kimball et al. refers to 3 main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata is primarily definitional, while business metadata and process metadata is primarily descriptive. The categories sometimes overlap.

  • Technical metadata defines the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata includes the system metadata, which defines the data structures such as tables, fields, data types, indexes, and partitions in the relational engine, as well as databases, dimensions, measures, and data mining models. Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists, and user security rights.
  • Business metadata is content from the data warehouse described in more user-friendly terms. The business metadata tells you what data you have, where they come from, what they mean and what their relationship is to other data in the data warehouse. Business metadata may also serve as documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata.
  • Process metadata is used to describe the results of various operations in the data warehouse. Within the ETL process, all key data from tasks is logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes, and rows processed. When troubleshooting the ETL or query process, this sort of data becomes valuable. Process metadata is the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies – in that case, the process metadata becomes the business metadata for the fact and dimension tables. Collecting process metadata is in the interest of business people who can use the data to identify the users of their products, which products they are using, and what level of service they are receiving.

Internet

The HTML format used to define web pages allows for the inclusion of a variety of types of metadata, from basic descriptive text, dates and keywords to further advanced metadata schemes such as the Dublin Core, e-GMS, and AGLS standards. Pages and files can also be geotagged with coordinates, categorized or tagged, including collaboratively such as with folksonomies.

When media has identifiers set or when such can be generated, information such as file tags and descriptions can be pulled or scraped from the Internet – for example about movies. Various online databases are aggregated and provide metadata for various data. The collaboratively built Wikidata has identifiers not just for media but also abstract concepts, various objects, and other entities, that can be looked up by humans and machines to retrieve useful information and to link knowledge in other knowledge bases and databases.

Metadata may be included in the page's header or in a separate file. Microformats allow metadata to be added to on-page data in a way that regular web users do not see, but computers, web crawlers and search engines can readily access. Many search engines are cautious about using metadata in their ranking algorithms because of exploitation of metadata and the practice of search engine optimization, SEO, to improve rankings. See the Meta element article for further discussion. This cautious attitude may be justified as people, according to Doctorow, are not executing care and diligence when creating their own metadata and that metadata is part of a competitive environment where the metadata is used to promote the metadata creators own purposes. Studies show that search engines respond to web pages with metadata implementations, and Google has an announcement on its site showing the meta tags that its search engine understands. Enterprise search startup Swiftype recognizes metadata as a relevance signal that webmasters can implement for their website-specific search engine, even releasing their own extension, known as Meta Tags 2.

Broadcast industry

In the broadcast industry, metadata is linked to audio and video broadcast media to:

  • identify the media: clip or playlist names, duration, timecode, etc.
  • describe the content: notes regarding the quality of video content, rating, description (for example, during a sport event, keywords like goal, red card will be associated to some clips)
  • classify media: metadata allows producers to sort the media or to easily and quickly find a video content (a TV news could urgently need some archive content for a subject). For example, the BBC has a large subject classification system, Lonclass, a customized version of the more general-purpose Universal Decimal Classification.

This metadata can be linked to the video media thanks to the video servers. Most major broadcast sporting events like FIFA World Cup or the Olympic Games use this metadata to distribute their video content to TV stations through keywords. It is often the host broadcaster who is in charge of organizing metadata through its International Broadcast Centre and its video servers. This metadata is recorded with the images and entered by metadata operators (loggers) who associate in live metadata available in metadata grids through software (such as Multicam(LSM) or IPDirector used during the FIFA World Cup or Olympic Games).

Geography

Metadata that describes geographic objects in electronic storage or format (such as datasets, maps, features, or documents with a geospatial component) has a history dating back to at least 1994. This class of metadata is described more fully on the geospatial metadata article.

Ecology and environment

Ecological and environmental metadata is intended to document the "who, what, when, where, why, and how" of data collection for a particular study. This typically means which organization or institution collected the data, what type of data, which date(s) the data was collected, the rationale for the data collection, and the methodology used for the data collection. Metadata should be generated in a format commonly used by the most relevant science community, such as Darwin Core, Ecological Metadata Language, or Dublin Core. Metadata editing tools exist to facilitate metadata generation (e.g. Metavist, Mercury, Morpho). Metadata should describe the provenance of the data (where they originated, as well as any transformations the data underwent) and how to give credit for (cite) the data products.

Digital music

When first released in 1982, Compact Discs only contained a Table Of Contents (TOC) with the number of tracks on the disc and their length in samples. Fourteen years later in 1996, a revision of the CD Red Book standard added CD-Text to carry additional metadata. But CD-Text was not widely adopted. Shortly thereafter, it became common for personal computers to retrieve metadata from external sources (e.g. CDDB, Gracenote) based on the TOC.

Digital audio formats such as digital audio files superseded music formats such as cassette tapes and CDs in the 2000s. Digital audio files could be labeled with more information than could be contained in just the file name. That descriptive information is called the audio tag or audio metadata in general. Computer programs specializing in adding or modifying this information are called tag editors. Metadata can be used to name, describe, catalog, and indicate ownership or copyright for a digital audio file, and its presence makes it much easier to locate a specific audio file within a group, typically through use of a search engine that accesses the metadata. As different digital audio formats were developed, attempts were made to standardize a specific location within the digital files where this information could be stored.

As a result, almost all digital audio formats, including mp3, broadcast wav, and AIFF files, have similar standardized locations that can be populated with metadata. The metadata for compressed and uncompressed digital music is often encoded in the ID3 tag. Common editors such as TagLib support MP3, Ogg Vorbis, FLAC, MPC, Speex, WavPack TrueAudio, WAV, AIFF, MP4, and ASF file formats.

Cloud applications

With the availability of cloud applications, which include those to add metadata to content, metadata is increasingly available over the Internet.

Administration and management

Storage

Metadata can be stored either internally, in the same file or structure as the data (this is also called embedded metadata), or externally, in a separate file or field from the described data. A data repository typically stores the metadata detached from the data but can be designed to support embedded metadata approaches. Each option has advantages and disadvantages:

  • Internal storage means metadata always travels as part of the data they describe; thus, metadata is always available with the data, and can be manipulated locally. This method creates redundancy (precluding normalization), and does not allow managing all of a system's metadata in one place. It arguably increases consistency, since the metadata is readily changed whenever the data is changed.
  • External storage allows collocating metadata for all the contents, for example in a database, for more efficient searching and management. Redundancy can be avoided by normalizing the metadata's organization. In this approach, metadata can be united with the content when information is transferred, for example in Streaming media; or can be referenced (for example, as a web link) from the transferred content. On the downside, the division of the metadata from the data content, especially in standalone files that refer to their source metadata elsewhere, increases the opportunities for misalignments between the two, as changes to either may not be reflected in the other.

Metadata can be stored in either human-readable or binary form. Storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. However, text-based formats are rarely optimized for storage capacity, communication time, or processing speed. A binary metadata format enables efficiency in all these respects, but requires special software to convert the binary information into human-readable content.

Database management

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:

  • Tables of all tables in a database, their names, sizes, and number of rows in each table.
  • Tables of columns in each database, what tables they are used in, and the type of data stored in each column.

In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the information schema, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata. Programmatic access to metadata is possible using APIs such as JDBC, or SchemaCrawler.

Popular culture

One of the first satirical examinations of the concept of Metadata as we understand it today is American science fiction author Hal Draper's short story, "MS Fnd in a Lbry" (1961). Here, the knowledge of all Mankind is condensed into an object the size of a desk drawer, however, the magnitude of the metadata (e.g. catalog of catalogs of... , as well as indexes and histories) eventually leads to dire yet humorous consequences for the human race. The story prefigures the modern consequences of allowing metadata to become more important than the real data it is concerned with, and the risks inherent in that eventuality as a cautionary tale.

Information history

https://en.wikipedia.org/wiki/Information_history

Fragment of an inscripted clay cone of Urukagina
Fragment of an inscripted clay cone of Urukagina

Information history may refer to the history of each of the categories listed below (or to combinations of them). It should be recognized that the understanding of, for example, libraries as information systems only goes back to about 1950. The application of the term information for earlier systems or societies is a retronym.

The word and concept "information"

The Latin roots and Greek origins of the word "information" is presented by Capurro & Hjørland (2003). References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes:

Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic [sic]. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses.

In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote:

Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn’t necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon’s theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don’t think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn’t seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon’s information theory." (Northrup, 1993, p. 5)

In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic.

Hjørland (2007)  describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bate son, Yovits, Span-Hansen, Brier, Buck land, Goguen, and Hjørland. Hjørland provided the following example:

A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone’s possible information for every individual. Nor is any one mapping the one "true" mapping. But people have different educational backgrounds and play different roles in the division of labor in society. A stone in a field represents typical one kind of information for the geologist, another for the archaeologist. The information from the stone can be mapped into different collective knowledge structures produced by e.g. geology and archaeology. Information can be identified, described, represented in information systems for different domains of knowledge. Of course, there are much uncertainty and many and difficult problems in determining whether a thing is informative or not for a domain. Some domains have high degree of consensus and rather explicit criteria of relevance. Other domains have different, conflicting paradigms, each containing its own more or less implicate view of the informativeness of different kinds of information sources. (Hjørland, 1997, p. 111, emphasis in original).

Academic discipline

Information history is an emerging discipline related to, but broader than, library history. An important introduction and review was made by Alistair Black (2006). A prolific scholar in this field is also Toni Weller, for example, Weller (2007, 2008, 2010a and 2010b). As part of her work Toni Weller has argued that there are important links between the modern information age and its historical precedents. A description from Russia is Volodin (2000).

Alistair Black (2006, p. 445) wrote: "This chapter explores issues of discipline definition and legitimacy by segmenting information history into its various components:

  • The history of print and written culture, including relatively long-established areas such as the histories of libraries and librarianship, book history, publishing history, and the history of reading.
  • The history of more recent information disciplines and practice, that is to say, the history of information management, information systems, and information science.
  • The history of contiguous areas, such as the history of the information society and information infrastructure, necessarily enveloping communication history (including telecommunications history) and the history of information policy.
  • The history of information as social history, with emphasis on the importance of informal information networks."

"Bodies influential in the field include the American Library Association’s Round Table on Library History, the Library History Section of the International Federation of Library Associations and Institutions (IFLA), and, in the U.K., the Library and Information History Group of the Chartered Institute of Library and Information Professionals (CILIP). Each of these bodies has been busy in recent years, running conferences and seminars, and initiating scholarly projects. Active library history groups function in many other countries, including Germany (The Wolfenbuttel Round Table on Library History, the History of the Book and the History of Media, located at the Herzog August Bibliothek), Denmark (The Danish Society for Library History, located at the Royal School of Library and Information Science), Finland (The Library History Research Group, University of Tamepere), and Norway (The Norwegian Society for Book and Library History). Sweden has no official group dedicated to the subject, but interest is generated by the existence of a museum of librarianship in Bods, established by the Library Museum Society and directed by Magnus Torstensson. Activity in Argentina, where, as in Europe and the U.S., a "new library history" has developed, is described by Parada (2004)." (Black (2006, p. 447).

Journals

  • Information & Culture (previously Libraries & the Cultural Record, Libraries & Culture)
  • Library & Information History (until 2008: Library History; until 1967: Library Association. Library History Group. Newsletter)

Information technology (IT)

The term IT is ambiguous although mostly synonym with computer technology. Haigh (2011, pp. 432-433) wrote

"In fact, the great majority of references to information technology have always been concerned with computers, although the exact meaning has shifted over time (Kline, 2006). The phrase received its first prominent usage in a Harvard Business Review article (Haigh, 2001b; Leavitt & Whisler, 1958) intended to promote a technocratic vision for the future of business management. Its initial definition was at the conjunction of computers, operations research methods, and simulation techniques. Having failed initially to gain much traction (unlike related terms of a similar vintage such as information systems, information processing, and information science) it was revived in policy and economic circles in the 1970s with a new meaning. Information technology now described the expected convergence of the computing, media, and telecommunications industries (and their technologies), understood within the broader context of a wave of enthusiasm for the computer revolution, post-industrial society, information society (Webster, 1995), and other fashionable expressions of the belief that new electronic technologies were bringing a profound rupture with the past. As it spread broadly during the 1980s, IT increasingly lost its association with communications (and, alas, any vestigial connection to the idea of anybody actually being informed of anything) to become a new and more pretentious way of saying "computer". The final step in this process is the recent surge in references to "information and communication technologies" or ICTs, a coinage that makes sense only if one assumes that a technology can inform without communicating".

Some people use the term information technology about technologies used before the development of the computer. This is however to use the term as a retronym.

 

Library and information science

Library and information science(s) or studies (LIS) is an interdisciplinary field of study that deals generally with organization, access, collection, and protection/regulation of information, whether in physical (e.g. art, legal proceedings, etc.) or digital forms.

In spite of various trends to merge the two fields, some consider the two original disciplines, library science and information science, to be separate. However, it is common today to use the terms synonymously or to drop the term "library" and to speak about information departments or I-schools. There have also been attempts to revive the concept of documentation and to speak of Library, information and documentation studies (or science).

Relations between library science, information science and LIS

Tefko Saracevic (1992, p. 13) argued that library science and information science are separate fields:

The common ground between library science and information science, which is a strong one, is in the sharing of their social role and in their general concern with the problems of effective utilization of graphic records. But there are also very significant differences in several critical respects, among them in: (1) selection of problems addressed and in the way they were defined; (2) theoretical questions asked and frameworks established;(3) the nature and degree of experimentation and empirical development and the resulting practical knowledge/competencies derived; (4) tools and approaches used; and (5) the nature and strength of interdisciplinary relations established and the dependence of the progress and evolution of interdisciplinary approaches. All of these differences warrant the conclusion that librarianship and information science are two different fields in a strong interdisciplinary relation, rather than one and the same field, or one being a special case of the other.

Another indication of the different uses of the two terms are the indexing in UMI's Dissertations Abstracts. In Dissertations Abstracts Online in November 2011 were 4888 dissertations indexed with the descriptor LIBRARY SCIENCE and 9053 with the descriptor INFORMATION SCIENCE. For the year 2009 the numbers were 104 LIBRARY SCIENCE and 514 INFORMATION SCIENCE. 891 dissertations were indexed with both terms (36 in 2009).

It should be considered that information science grew out of documentation science and therefore has a tradition for considering scientific and scholarly communication, bibliographic databases, subject knowledge and terminology etc. Library science, on the other hand has mostly concentrated on libraries and their internal processes and best practices. It is also relevant to consider that information science used to be done by scientists, while librarianship has been split between public libraries and scholarly research libraries. Library schools have mainly educated librarians for public libraries and not shown much interest in scientific communication and documentation. When information scientists from 1964 entered library schools, they brought with them competencies in relation to information retrieval in subject databases, including concepts such as recall and precision, boolean search techniques, query formulation and related issues. Subject bibliographic databases and citation indexes provided a major step forward in information dissemination - and also in the curriculum at library schools.

Julian Warner (2010) suggests that the information and computer science tradition in information retrieval may broadly be characterized as query transformation, with the query articulated verbally by the user in advance of searching and then transformed by a system into a set of records. From librarianship and indexing, on the other hand, has been an implicit stress on selection power enabling the user to make relevant selections.

Library science

Library science (often termed library studies, bibliothecography, and library economy) is an interdisciplinary or multidisciplinary field that applies the practices, perspectives, and tools of management, information technology, education, and other areas to libraries; the collection, organization, preservation, and dissemination of information resources; and the political economy of information. Martin Schrettinger, a Bavarian librarian, coined the discipline within his work (1808–1828) Versuch eines vollständigen Lehrbuchs der Bibliothek-Wissenschaft oder Anleitung zur vollkommenen Geschäftsführung eines Bibliothekars. Rather than classifying information based on nature-oriented elements, as was previously done in his Bavarian library, Schrettinger organized books in alphabetical order. The first American school for library science was founded by Melvil Dewey at Columbia University in 1887.

Historically, library science has also included archival science. This includes how information resources are organized to serve the needs of selected user groups, how people interact with classification systems and technology, how information is acquired, evaluated and applied by people in and outside libraries as well as cross-culturally, how people are trained and educated for careers in libraries, the ethics that guide library service and organization, the legal status of libraries and information resources, and the applied science of computer technology used in documentation and records management.

There is no generally agreed-upon distinction between the terms library science and librarianship. To a certain extent, they are interchangeable perhaps differing most significantly in connotation. The term library and information studies (alternatively library and information science), abbreviated as LIS, is most often used; most librarians consider it as only a terminological variation, intended to emphasize the scientific and technical foundations of the subject and its relationship with information science. LIS should not be confused with information theory, the mathematical study of the concept of information. Library philosophy has been contrasted with library science as the study of the aims and justifications of librarianship as opposed to the development and refinement of techniques.

Difficulties defining LIS

"The question, 'What is library and information science?' does not elicit responses of the same internal conceptual coherence as similar inquiries as to the nature of other fields, e.g., 'What is chemistry?', 'What is economics?', 'What is medicine?' Each of those fields, though broad in scope, has clear ties to basic concerns of their field. [...] Neither LIS theory nor practice is perceived to be monolithic nor unified by a common literature or set of professional skills. Occasionally, LIS scholars (many of whom do not self-identify as members of an interreading LIS community, or prefer names other than LIS), attempt, but are unable, to find core concepts in common. Some believe that computing and internetworking concepts and skills underlie virtually every important aspect of LIS, indeed see LIS as a sub-field of computer science! [Footnote III.1] Others claim that LIS is principally a social science accompanied by practical skills such as ethnography and interviewing. Historically, traditions of public service, bibliography, documentalism, and information science have viewed their mission, their philosophical toolsets, and their domain of research differently. Still others deny the existence of a greater metropolitan LIS, viewing LIS instead as a loosely organized collection of specialized interests often unified by nothing more than their shared (and fought-over) use of the descriptor information. Indeed, claims occasionally arise to the effect that the field even has no theory of its own." (Konrad, 2007, p. 652–653).

A multidisciplinary, interdisciplinary or monodisciplinary field?

The Swedish researcher Emin Tengström (1993) described cross-disciplinary research as a process, not a state or structure. He differentiates three levels of ambition regarding cross-disciplinary research:

What is described here is a view of social fields as dynamic and changing. Library and information science is viewed as a field that started as a multidisciplinary field based on literature, psychology, sociology, management, computer science etc., which is developing towards an academic discipline in its own right. However, the following quote seems to indicate that LIS is actually developing in the opposite direction:

Chua & Yang (2008) studied papers published in Journal of the American Society for Information Science and Technology in the period 1988–1997 and found, among other things: "Top authors have grown in diversity from those being affiliated predominantly with library/information-related departments to include those from information systems management, information technology, business, and the humanities. Amid heterogeneous clusters of collaboration among top authors, strongly connected crossdisciplinary coauthor pairs have become more prevalent. Correspondingly, the distribution of top keywords’ occurrences that leans heavily on core information science has shifted towards other subdisciplines such as information technology and sociobehavioral science."

A more recent study revealed that 31% of the papers published in 31 LIS journals from 2007 through 2012 were by authors in academic departments of library and information science (i.e., those offering degree programs accredited by the American Library Association or similar professional organizations in other countries). Faculty in departments of computer science (10%), management (10%), communication (3%), the other social sciences (9%), and the other natural sciences (7%) were also represented. Nearly one-quarter of the papers in the 31 journals were by practicing librarians, and 6% were by others in non-academic (e.g., corporate) positions.

As a field with its own body of interrelated concepts, techniques, journals, and professional associations, LIS is clearly a discipline. But by the nature of its subject matter and methods LIS is just as clearly an interdiscipline, drawing on many adjacent fields (see below).

A fragmented adhocracy

Richard Whitley (1984, 2000) classified scientific fields according to their intellectual and social organization and described management studies as a 'fragmented adhocracy', a field with a low level of coordination around a diffuse set of goals and a non-specialized terminology; but with strong connections to the practice in the business sector. Åström (2006) applied this conception to the description of LIS.

Scattering of the literature

Meho & Spurgin (2005) found that in a list of 2,625 items published between 1982 and 2002 by 68 faculty members of 18 schools of library and information science, only 10 databases provided significant coverage of the LIS literature. Results also show that restricting the data sources to one, two, or even three databases leads to inaccurate rankings and erroneous conclusions. Because no database provides comprehensive coverage of the LIS literature, researchers must rely on a wide range of disciplinary and multidisciplinary databases for ranking and other research purposes. Even when the nine most comprehensive databases in LIS was searched and combined, 27.0% (or 710 of 2,635) of the publications remain not found.

The study confirms earlier research that LIS literature is highly scattered and is not limited to standard LIS databases. What was not known or verified before, however, is that a significant amount of this literature is indexed in the interdisciplinary or multidisciplinary databases of Inside Conferences and INSPEC. Other interdisciplinary databases, such as America: History and Life, were also found to be very useful and complementary to traditional LIS databases, particularly in the areas of archives and library history. (Meho & Spurgin, 2005, p.1329).

The unique concern of library and information science

"Concern for people becoming informed is not unique to LIS, and thus is insufficient to differentiate LIS from other fields. LIS are a part of a larger enterprise." (Konrad, 2007, p. 655).

"The unique concern of LIS is recognized as: Statement of the core concern of LIS: Humans becoming informed (constructing meaning) via intermediation between inquirers and instrumented records. No other field has this as its concern. " (Konrad, 2007, p. 660)

"Note that the promiscuous term information does not appear in the above statement circumscribing the field's central concerns: The detrimental effects of the ambiguity this term provokes are discussed above (Part III). Furner [Furner 2004, 427] has shown that discourse in the field is improved where specific terms are utilized in place of the i-word for specific senses of that term." (Konrad, 2007, p. 661).

Michael Buckland wrote: "Educational programs in library, information and documentation are concerned with what people know, are not limited to technology, and require wide-ranging expertise. They differ fundamentally and importantly from computer science programs and from the information systems programs found in business schools."

Bawden and Robinson argue that while Information Science has overlaps with numerous other disciplines with interest in studying communication, it is unique in that it is concerned with all aspects of the communication chain. For example, Computer Science may be interested in the indexing and retrieval, sociology with user studies, and publishing (business) with dissemination, whereas information science is interested in the study of all of these individual areas and the interactions between them.

The organization of information and information resources is one of the fundamental aspects of LIS. and is an example of both LIS's uniqueness and its multidisciplinary origins. Some of the main tools used by LIS toward this end to provide access to the digital resources of modern times (particularly theory relating to indexing and classification) originated in 19th century to assist humanity's effort to make its intellectual output accessible by recording, identifying, and providing bibliographic control of printed knowledge. The origin for some of these tools were even earlier. For example, in the 17th century, during the 'golden age of libraries', publishers and sellers seeking to take advantage of the burgeoning book trade developed descriptive catalogs of their wares for distribution – a practice was adopted and further extrapolated by many libraries of the time to cover areas like philosophy, sciences, linguistics, medicine, etc. In this way, a business concern of publishers – keeping track of and advertising inventory – was developed into a system for organizing and preserving information by the library.

The development of Metadata is another area that exemplifies the aim of LIS to be something more than an mishmash of several disciplines – that uniqueness Bawden and Robinson describe. Pre-Internet classification systems and cataloging systems were mainly concerned with two objectives: 1. to provide rich bibliographic descriptions and relations between information objects and 2. to facilitate sharing of this bibliographic information across library boundaries. The development of the Internet and the information explosion that followed found many communities needing mechanisms for the description, authentication and management of their information. These communities developed taxonomies and controlled vocabularies to describe their knowledge as well as unique information architectures to communicate these classifications and libraries found themselves as liaison or translator between these metadata systems. Of course the concerns of cataloging in the Internet era have gone beyond simple bibliographic descriptions. The need for descriptive information about the ownership and copyright of a digital product – a publishing concern – and description for the different formats and accessibility features of a resource – a sociological concern – show the continued development and cross discipline necessity of resource description.

In the 21st century, the usage of open data, open source and open protocols like OAI-PMH has allowed thousands of libraries and institutions to collaborate on the production of global metadata services previously offered only by increasingly expensive commercial proprietary products. Examples include BASE and Unpaywall, which automates the search of an academic paper across thousands of repositories by libraries and research institutions.

Christopher M. Owusu-Ansah argued that, Many African universities have employed distance education to expand access to education and digital libraries can ensure seamless access to information for distance learners.

LIS theories

Julian Warner (2010, p. 4–5) suggests that

Two paradigms, the cognitive and the physical, have been distinguished in information retrieval research, but they share the assumption of the value of delivering relevant records (Ellis 1984, 19; Belkin and Vickery 1985, 114). For the purpose of discussion here, they can be considered a single heterogeneous paradigm, linked but not united by this common assumption. The value placed on query transformation is dissonant with common practice, where users may prefer to explore an area and may value fully informed exploration. Some dissenting research discussions have been more congruent with practice, advocating explorative capability—the ability to explore and make discriminations between representations of objects—as the fundamental design principle for information retrieval systems.

Among other approaches, Evidence Based Library and Information Practice should also be mentioned.

Theory and practice

Many practicing librarians do not contribute to LIS scholarship but focus on daily operations within their own libraries or library systems. Other practicing librarians, particularly in academic libraries, do perform original scholarly LIS research and contribute to the academic end of the field.

Whether or not individual professional librarians contribute to scholarly research and publication, many are involved with and contribute to the advancement of the profession and of library science through local, state, regional, national, and international library or information organizations.

Library science is very closely related to issues of knowledge organization; however, the latter is a broader term that covers how knowledge is represented and stored (computer science/linguistics), how it might be automatically processed (artificial intelligence), and how it is organized outside the library in global systems such as the internet. In addition, library science typically refers to a specific community engaged in managing holdings as they are found in university and government libraries, while knowledge organization, in general, refers to this and also to other communities (such as publishers) and other systems (such as the Internet). The library system is thus one socio-technical structure for knowledge organization.

The terms information organization and knowledge organization are often used synonymously. The fundamentals of their study (particularly theory relating to indexing and classification) and many of the main tools used by the disciplines in modern times to provide access to digital resources (abstracting, metadata, resource description, systematic and alphabetic subject description, and terminology) originated in the 19th century and were developed, in part, to assist in making humanity's intellectual output accessible by recording, identifying, and providing bibliographic control of printed knowledge.

Information has been published that analyses the relations between the philosophy of information (PI), library and information science (LIS), and social epistemology (SE).

Ethics

Practicing library professionals and members of the American Library Association recognize and abide by the ALA Code of Ethics. According to the American Library Association, "In a political system grounded in an informed citizenry, we are members of a profession explicitly committed to intellectual freedom and freedom of access to information. We have a special obligation to ensure the free flow of information and ideas to present and future generations." The ALA Code of Ethics was adopted in the winter of 1939, and updated on June 29, 2021.

Education and training

Academic courses in library science include collection management, information systems and technology, research methods, information literacy, cataloging and classification, preservation, reference, statistics and management. Library science is constantly evolving, incorporating new topics like database management, information architecture and information management, among others. With the mounting acceptance of Wikipedia as a valued and reliable reference source, many libraries, museums, and archives have introduced the role of Wikipedian in residence. As a result, some universities are including coursework relating to Wikipedia and Knowledge Management in their MLIS programs.

Most schools in the US only offer a master's degree in library science or an MLIS and do not offer an undergraduate degree in the subject. About fifty schools have this graduate program, and seven are still being ranked. Many have online programs, which makes attending more convenient if the college is not in a student's immediate vicinity. According to US News' online journal, the University of Illinois is at the top of the list of best MLIS programs provided by universities. Second is the University of North Carolina and third is the University of Washington.

Most professional library jobs require a professional post-baccalaureate degree in library science or one of its equivalent terms. In the United States and Canada the certification usually comes from a master's degree granted by an ALA-accredited institution, so even non-scholarly librarians have an original academic background. In the United Kingdom, however, there have been moves to broaden the entry requirements to professional library posts, such that qualifications in, or experience of, a number of other disciplines have become more acceptable. In Australia, a number of institutions offer degrees accepted by the ALIA (Australian Library and Information Association). Global standards of accreditation or certification in librarianship have yet to be developed.

In academic regalia in the United States, the color for library science is lemon.

The Master of Library Science (MLIS) is the master's degree that is required for most professional librarian positions in the United States and Canada. The MLIS is a relatively recent degree; an older and still common degree designation for librarians to acquire is the Master of Library Science (MLS), or Master of Science in Library Science (MSLS) degree. According to the American Library Association (ALA), "The master’s degree in library and information studies is frequently referred to as the MLS; however, ALA-accredited degrees have various names such as Master of Arts, Master of Librarianship, Master of Library and Information Studies, or Master of Science. The degree name is determined by the program. The [ALA] Committee for Accreditation evaluates programs based on their adherence to the Standards for Accreditation of Master's Programs in Library and Information Studies, not based on the name of the degree

Employment outlook and opportunities

According to U.S. News & World Report, library and information science ranked as one of the "Best Careers of 2008". The median annual salary for 2020 was reported by the U.S. Bureau of Labor Statistics as $60,820 in the United States. Additional salary breakdowns available by metropolitan area show that the San Jose-Sunnyvale-Santa Clara metropolitan area has the highest average salary at $86,380. In September 2021, the BLS projected growth for the field "to grow 9 percent from 2020 to 2030", which is "about as fast as the average for all occupations". The 2010-2011 Occupational Outlook Handbook states, "Workers in this occupation tend to be older than workers in the rest of the economy. As a result, there may be more workers retiring from this occupation than other occupations. However, relatively large numbers of graduates from MLS programs may cause competition in some areas and for some jobs."

Types of librarianship

Public

The study of librarianship for public libraries covers issues such as cataloging; collection development for a diverse community; information literacy; readers' advisory; community standards; public services-focused librarianship; serving a diverse community of adults, children, and teens; intellectual freedom; censorship; and legal and budgeting issues. The public library as a commons or public sphere based on the work of Jürgen Habermas has become a central metaphor in the 21st century.

Most people are familiar with municipal public libraries, but there are, in fact, four different types of public libraries: association libraries, municipal public libraries, school district libraries, and special district public libraries. It is important to be able to distinguish among the four. Each receives its funding through different sources, each is established by a different set of voters, and not all are subject to municipal civil service governance.

School

The study of school librarianship covers library services for children in primary through secondary school. In some regions, the local government may have stricter standards for the education and certification of school librarians (who are often considered a special case of teacher), than for other librarians, and the educational program will include those local criteria. School librarianship may also include issues of intellectual freedom, pedagogy, information literacy, and how to build a cooperative curriculum with the teaching staff.

Academic

The study of academic librarianship covers library services for colleges and universities. Issues of special importance to the field may include copyright; technology, digital libraries, and digital repositories; academic freedom; open access to scholarly works; as well as specialized knowledge of subject areas important to the institution and the relevant reference works. Librarians often divide focus individually as liaisons on particular schools within a college or university.

Some academic librarians are considered faculty, and hold similar academic ranks to those of professors, while others are not. In either case, the minimal qualification is a Master of Arts in Library Studies or a Master of Arts in Library Science. Some academic libraries may only require a master's degree in a specific academic field or a related field, such as educational technology.

Archival

The study of archives includes the training of archivists, librarians specially trained to maintain and build archives of records intended for historical preservation. Special issues include physical preservation, conservation, and restoration of materials and mass deacidification; specialist catalogs; solo work; access; and appraisal. Many archivists are also trained historians specializing in the period covered by the archive.

The archival mission includes three major goals: To identify papers and records with enduring value, preserve the identified papers, and make the papers available to others.

There are significant differences between libraries and archives, including differences in collections, records creation, item acquisition, and preferred behavior in the institution. The major difference in collections is that library collections typically comprise published items (books, magazines, etc.), while archival collections are usually unpublished works (letters, diaries, etc.) In managing their collections, libraries will categorize items individually, but archival items never stand alone. An archival record gains meaning and importance from its relationship to the entire collection; therefore archival items are usually received by the archive in a group or batch. Library collections are created by many individuals, as each author and illustrator create their own publication; in contrast, an archive usually collects the records of one person, family, institution, or organization, so the archival items will have fewer sources of authors.

Another difference between a library and an archive is that library materials are created explicitly by authors or others who are working intentionally. They choose to write and publish a book, for example, and that occurs. Archival materials are not created intentionally. Instead, the items in an archive are what remains after a business, institution, or person conducts their normal business practices. The collection of letters, documents, receipts, ledger books, etc. was created with the intention to perform daily tasks, they were not created in order to populate a future archive.

As for item acquisition, libraries receive items individually, but archival items will usually become part of the archive's collection as a cohesive group.

Behavior in an archive differs from behavior in a library, as well. In most libraries, patrons are allowed and encouraged to browse the stacks, because the books are openly available to the public. Archival items almost never circulate, and someone interested in viewing documents must request them of the archivist and may only view them in a closed reading room. Those who wish to visit an archive will usually begin with an entrance interview. This is an opportunity for the archivist to register the researcher, confirm their identity, and determine their research needs. This is also the opportune time for the archivist to review reading room rules, which vary but typically include policies on privacy, photocopying, the use of finding aids, and restrictions on food, drinks, and other activities or items that could damage the archival materials.

Special

Special libraries are libraries established to meet the highly specialized requirements of professional or business groups. A library is special depending on whether it covers a specialized collection, a special subject, or a particular group of users, or even the type of parent organization. A library can be special if it only serves a particular group of users such as lawyers, doctors, nurses, etc. These libraries are called professional libraries and special librarians include almost any other form of librarianship, including those who serve in medical libraries (and hospitals or medical schools), corporations, news agencies, government organizations, or other special collections. The issues at these libraries are specific to their industries but may include solo work, corporate financing, specialized collection development, and extensive self-promotion to potential patrons. Special librarians have their own professional organization, the Special Libraries Association (SLA).

National Center for Atmospheric Research (NCAR) is considered a special library. Its mission is to support, preserve, make accessible, and collaborate in the scholarly research and educational outreach activities of UCAR/NCAR.

Another is the Federal Bureau of Investigation Library. According to its website, "The FBI Library supports the FBI in its statutory mission to uphold the law through the investigation of violations of federal criminal law; to protect the United States from foreign intelligence and terrorist activities; and to provide leadership and law enforcement assistance to federal, state, local, and international agencies.

A further example would be the classified CIA Library. It is a resource to employees of the Central Intelligence Agency, containing over 125,000 written materials, subscribes to around 1,700 periodicals, and had collections in three areas: Historical Intelligence, Circulating, and Reference. In February 1997, three librarians working at the institution spoke to Information Outlook, a publication of the SLA, revealing that the library had been created in 1947, the importance of the library in disseminating information to employees, even with a small staff, and how the library organizes its materials. In May 2021, an unnamed gay librarian, for the institution, was shown in a recruitment video for the agency.

Preservation

Preservation librarians most often work in academic libraries. Their focus is on the management of preservation activities that seek to maintain access to content within books, manuscripts, archival materials, and other library resources. Examples of activities managed by preservation librarians include binding, conservation, digital and analog reformatting, digital preservation, and environmental monitoring.

History

17th century

Portrait of Gabriel Naudé, author of Advis pour dresser une bibliothèque (1627), later translated into English in 1661

The earliest text on "library operations", Advice on Establishing a Library was published in 1627 by French librarian and scholar Gabriel Naudé. Naudé wrote prolifically, producing works on many subjects including politics, religion, history, and the supernatural. He put into practice all the ideas put forth in Advice when given the opportunity to build and maintain the library of Cardinal Jules Mazarin.

19th century

Dewey relatv index.png

Martin Schrettinger wrote the second textbook (the first in Germany) on the subject from 1808 to 1829.

Thomas Jefferson, whose library at Monticello consisted of thousands of books, devised a classification system inspired by the Baconian method, which grouped books more or less by subject rather than alphabetically, as it was previously done.

The Jefferson collection provided the start of what became the Library of Congress.

The first American school of librarianship opened at Columbia University under the leadership of Melvil Dewey, noted for his 1876 decimal classification, on January 5, 1887, as the School of Library Economy. The term library economy was common in the U.S. until 1942, with the term, library science, predominant through much of the 20th century. Key events are described in "History of American Library Science: Its Origins and Early Development."

20th century

Later, the term was used in the title of S. R. Ranganathan's The Five Laws of Library Science, published in 1931, and in the title of Lee Pierce Butler's 1933 book, An Introduction to Library Science (University of Chicago Press).

S. R. Ranganathan conceived the five laws of library science and the development of the first major analytical-synthetic classification system, the colon classification.

In the United States, Lee Pierce Butler's new approach advocated research using quantitative methods and ideas in the social sciences with the aim of using librarianship to address society's information needs. He was one of the first faculty at the University of Chicago Graduate Library School, which changed the structure and focus of education for librarianship in the twentieth century. This research agenda went against the more procedure-based approach of the "library economy," which was mostly confined to practical problems in the administration of libraries.

William Stetson Merrill's A Code for Classifiers, released in several editions from 1914 to 1939, is an example of a more pragmatic approach, where arguments stemming from in-depth knowledge about each field of study are employed to recommend a system of classification. While Ranganathan's approach was philosophical, it was also tied more to the day-to-day business of running a library. A reworking of Ranganathan's laws was published in 1995 which removes the constant references to books. Michael Gorman's Our Enduring Values: Librarianship in the 21st Century features the eight principles necessary by library professionals and incorporates knowledge and information in all their forms, allowing for digital information to be considered.

In the English-speaking world the term "library science" seems to have been used for the first time in India in the 1916 book Punjab Library Primer, written by Asa Don Dickinson and published by the University of Punjab, Lahore, Pakistan. This university was the first in Asia to begin teaching "library science". The Punjab Library Primer was the first textbook on library science published in English anywhere in the world. The first textbook in the United States was the Manual of Library Economy by James Duff Brown, published in 1903. In 1923, C. C. Williamson, who was appointed by the Carnegie Corporation, published an assessment of library science education entitled "The Williamson Report," which designated that universities should provide library science training. This report had a significant impact on library science training and education. Library research and practical work, in the area of information science, have remained largely distinct both in training and in research interests.

From Library Science to LIS

By the late 1960s, mainly due to the meteoric rise of human computing power and the new academic disciplines formed therefrom, academic institutions began to add the term "information science" to their names. The first school to do this was at the University of Pittsburgh in 1964. More schools followed during the 1970s and 1980s, and by the 1990s almost all library schools in the USA had added information science to their names. Although there are exceptions, similar developments have taken place in other parts of the world. In Denmark, for example, the 'Royal School of Librarianship' changed its English name to The Royal School of Library and Information Science in 1997.

21st century

The digital age has transformed how information is accessed and retrieved. "The library is now a part of a complex and dynamic educational, recreational, and informational infrastructure." Mobile devices and applications with wireless networking, high-speed computers and networks, and the computing cloud have deeply impacted and developed information science and information services. The evolution of the library sciences maintains its mission of access equity and community space, as well as the new means for information retrieval called information literacy skills. All catalogs, databases, and a growing number of books are available on the Internet. In addition, the expanding free access to open-source journals and sources such as Wikipedia has fundamentally impacted how information is accessed. Information literacy is the ability to "determine the extent of information needed, access the needed information effectively and efficiently, evaluate information and its sources critically, incorporate selected information into one’s knowledge base, use information effectively to accomplish a specific purpose, and understand the economic, legal, and social issues surrounding the use of information, and access and use information ethically and legally."

Journals

(see also List of LIS Journals in India page, Category:Library science journals and Journal Citation Reports for listing according to Impact factor)

Some core journals in LIS are:

Important bibliographical databases in LIS are, among others, Social Sciences Citation Index and Library and Information Science Abstracts

Conferences

This is a list of some of the major conferences in the field.

Common subfields

An advertisement for a full Professor in information science at the Royal School of Library and Information Science, spring 2011, provides one view of which subdisciplines are well-established: "The research and teaching/supervision must be within some (and at least one) of these well-established information science areas

There are other ways to identify subfields within LIS, for example bibliometric mapping and comparative studies of curricula. Bibliometric maps of LIS have been produced by, among others, Vickery & Vickery (1987, frontispiece), White & McCain (1998), Åström (2002), 2006) and Hassan-Montero & Herrero-Solana (2007). An example of a curriculum study is Kajberg & Lørring, 2005. In this publication are the following data reported (p 234): "Degree of overlap of the ten curricular themes with subject areas in the current curricula of responding LIS schools

There is often an overlap between these subfields of LIS and other fields of study. Most information retrieval research, for example, belongs to computer science. Knowledge management is considered a subfield of management or organizational studies.

Delayed-choice quantum eraser

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Delayed-choice_quantum_eraser A delayed-cho...