Communication studies (or communication science) is an academic discipline that deals with processes of human communication and behavior, patterns of communication in interpersonal relationships, social interactions and communication in different cultures. Communication is commonly defined as giving, receiving or exchanging ideas, information, signals or messages through appropriate media, enabling individuals or groups to persuade, to seek information, to give information or to express emotions effectively. Communication studies is a social science that uses various methods of empirical investigation and critical analysis to develop a body of knowledge that encompasses a range of topics, from face-to-face conversation at a level of individual agency and interaction to social and cultural communication systems at a macro level.
Scholarly communication theorists focus primarily on refining the theoretical understanding of communication, examining statistics to help substantiate claims. The range of social scientific methods to study communication has been expanding. Communication researchers draw upon a variety of qualitative and quantitative techniques. The linguistic and cultural turns of the mid-20th century led to increasingly interpretative, hermeneutic, and philosophic approaches towards the analysis of communication. Conversely, the end of the 1990s and the beginning of the 2000s have
seen the rise of new analytically, mathematically, and computationally
focused techniques.
Communication, a natural human behavior, became a topic of study in the 20th century. As communication technologies developed, so did the serious study of
communication. During this time, a renewed interest in the study of rhetoric,
including persuasion and public address, emerged, ultimately laying the
foundation for several forms of communication studies we know today. The focus of communication studies developed further in the 20th
century, eventually including means of communication such as mass
communication, interpersonal communication, and oral interpretation. When World War I
ended, the interest in studying communication intensified. The methods
of communication used during the war challenged many people's beliefs
about the limits of war that existed before these events. During this
period, innovations were invented that no one had ever seen before, such
as aircraft, telephones, and throat microphones. However, new ways of communicating that had been discovered, especially
the use of morse code through portable morse code machines, helped
troops to communicate in a much more rapid pace than ever before. This then sparked ideas for even more advanced ways of communication to be later created and discovered.
The social sciences were fully recognized as legitimate disciplines after World War II. Before being established as its own discipline, communication studies,
was formed from three other major studies no: psychology, sociology, and
political science. Communication studies focus on communication as central to the human
experience, which involves understanding how people behave in creating,
exchanging, and interpreting messages. Today, this accepted discipline now also encompasses more modern forms
of communication studies as well, such as gender and communication,
intercultural communication, political communication, health
communication, and organizational communication.
Wilbur Schramm is considered the founder of the field of communication studies in the United States. Schramm was hugely influential in establishing communication as a field
of study and in forming departments of communication studies across
universities in the United States. He was the first individual to identify himself as a communication
scholar; he created the first academic degree-granting programs with
communication in their name; and he trained the first generation of
communication scholars. Schramm had a background in English literature and developed
communication studies partly by merging existing programs in speech
communication, rhetoric, and journalism. He also edited a textbook, The Process and Effects of Mass Communication (1954), that helped define the field, partly by claiming Paul Lazarsfeld, Harold Lasswell, Carl Hovland, and Kurt Lewin as its founding fathers.
Schramm established three important communication institutes: the Institute of Communications Research (University of Illinois at Urbana-Champaign), the Institute for Communication Research (Stanford University), and the East-West Communication Institute (Honolulu). The patterns of scholarly work in communication studies that were set in motion at these institutes continue to this day. Many of Schramm's students, such as Everett Rogers and David Berlo went on to make important contributions of their own.
Associations related to Communication Studies were founded or
expanded during the 1950s. The National Society for the Study of
Communication (NSSC) was founded in 1950 to encourage scholars to pursue
communication research as a social science. This Association launched the Journal of Communication
in the same year as its founding. Like many communication associations
founded in this decade, the association's name changed as the field
evolved. In 1968 the name changed to the International Communication Association (ICA).
In the United States
Undergraduate curricula aim to prepare students to interrogate the
nature of communication in society and the development of communication
as a specific field.
Many colleges in the United States offer a variety of majors
within communication studies, including programs of study in the areas
mentioned above. Communication studies is often perceived by many in
society as primarily centered on the media arts; however, graduates of
communication studies can pursue careers in areas ranging from media
arts to public advocacy, marketing, and non-profit organizations.
In Canada
With the early influence of federal institutional inquiries, notably the 1951 Massey Commission, which "investigated the overall state of culture in Canada", the study of communication in Canada has frequently focused on the
development of a cohesive national culture, and on infrastructural
empires of social and material circulation. Although influenced by the
American Communication tradition and British Cultural Studies, Communication studies in Canada has been more directly oriented toward the state and the policy apparatus, for example the Canadian Radio-television and Telecommunications Commission. Influential thinkers from the Canadian communication tradition include Harold Innis, Marshall McLuhan, Florian Sauvageau, Gertrude Robinson, Marc Raboy, Dallas Smythe, James R. Taylor, François Cooren, Gail Guthrie Valaskakis and George Grant.
Communication studies within Canada are a relatively new
discipline, however, there are programs and departments to support and
teach this topic in about 13 Canadian universities and many colleges as
well. The Communication et information from Laval, and the Canadian Journal
of Communication from McGill University in Montréal, are two journals
that exist in Canada. There are also organizations and associations, both national and in
Québec, that appeal to the specific interests that are targeted towards
these academics. These specific journals consist of representatives from the industry of
communication, the government, and members of the public as a whole.
Recent critiques have focused on the homogeneity of communication scholarship. For example, Chakravartty, et al. (2018) find that white scholars comprise the vast majority of publications, citations, and editorial positions. From a post-colonial perspective, this state is problematic because communication studies engage with a wide range of social justice concerns.
Business communication emerged as a field of study in the late 20th
century, due to the centrality of communication within business
relationships. The scope of the field is difficult to define because
communication is used in various ways among employers, employees,
consumers, and brands. Because of this, the focus of the field is usually placed on the
demands of employers, which is more universally understood by the
revision of the American Assembly of Collegiate Schools of business
standards to emphasize written and oral communication as an important
characteristic in the curriculum. Business communication studies, therefore, revolve around the ever
changing aspects of written and oral communication directly related to
the field of business. Implementation of modern business communication curriculums are
enhancing the study of business communication as a whole, while further
preparing those to be able to communicate in the business community
effectively.
Health communication
is a multidisciplinary field that applies "communication evidence,
strategy, theory, and creativity" to advance the well-being of people
and populations. The term was first coined in 1975 by the International
Communication Association and, in 1997, Health communication was
officially recognized in the broader fields of Public Health Education
and Health Promotion by the American Public Health Association. The discipline integrates components of various theories and models,
with a focus on social marketing. It uses marketing to develop
"activities and interventions designed to positively change behaviors." This emergence affected several dynamics of the healthcare system. It
raised awareness of various avenues, including promotional activities
and communication with health professionals' employees, patients, and
constituents. "Efforts to create marketing-oriented organizations called
for the widespread dissemination of information", putting a spotlight
on theories of "communication, the communication process, and the
techniques that were being utilized to communicate in other settings." Now, health care organizations of all types are using things like
social media. "Uses include communicating with the community and
patients; enhancing organizational visibility; marketing products and
services; establishing a venue for acquiring news about activities,
promotions, and fund-raising; providing a channel for patient resources
and education; and providing customer service and support."
A digital library, also referred to as an online library, an internet library, a digital repository, a library without walls, or a digital collection, is an online database of digital resources that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content, such as print or photographs, as well as originally produced digital content, including word processor files or social media posts. In addition to storing content, digital libraries provide mechanisms for the organization, searching, and retrieving
of content from the collection. Digital libraries can vary immensely in
size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.
History
The early history of digital libraries is not well documented, but
several key thinkers are connected to the emergence of the concept. Predecessors include Paul Otlet and Henri La Fontaine's Mundaneum,
an attempt begun in 1895 to gather and systematically catalogue the
world's knowledge, with the hope of bringing about world peace. The visions of the digital library were largely realized a century later during the great expansion of the Internet.[5]
Vannevar Bush and J. C. R. Licklider
were two pioneers who advanced this idea into contemporary technology.
Bush had supported research that led to the bomb that was dropped on Hiroshima.
After seeing the city's destruction, he wanted to create a machine that
would show how technology can lead to understanding instead of
destruction. This machine would include a desk with two screens,
switches and buttons, and a keyboard. He named this the "Memex". This way individuals would be able to access stored books and files at a rapid speed. In 1956, Ford Foundation funded Licklider to analyze how libraries could be improved with technology. Almost a decade later, his book, entitled "Libraries of the Future"
included his vision. He wanted to create a system that would use
computers and networks, thereby ensuring the accessibility of human
knowledge for human needs and ensuring automatic feedback for machine
purposes. This system contained three components: the corpus of
knowledge, the question, and the answer. Licklider called it a
procognitive system.
Early projects centered on the creation of an electronic card catalogue known as Online Public Access Catalog (OPAC). By the 1980s, the success of these endeavors resulted in OPAC replacing the traditional card catalog
in many academic, public and special libraries. This permitted
libraries to undertake additional rewarding co-operative efforts to
support resource sharing and expand access to library materials beyond
an individual library.
An early example of a digital library is the Education Resources Information Center (ERIC), a database of education citations, abstracts and texts that was created in 1964 and made available online through DIALOG in 1969.
Early attempts at creating a model for digital libraries included the DELOS Digital Library Reference Model and the 5S Framework.
Terminology
The term digital library was first popularized by the NSF/DARPA/NASA Digital Libraries Initiative in 1994. With the availability of the computer networks the information
resources are expected to stay distributed and accessed as needed,
whereas in Vannevar Bush's essay As We May Think (1945) they were to be collected and kept within the researcher's Memex.
The term virtual library was initially used interchangeably with digital library,
but is now primarily used for libraries that are virtual in other
senses (such as libraries which aggregate distributed content). In the
early days of digital libraries, there was discussion of the
similarities and differences among the terms digital, virtual, and electronic.
A distinction is often made between content that was created in a digital format, known as born-digital, and information that has been converted from a physical medium, e.g. paper, through digitization. Not all electronic content is in digital data format. The term hybrid library is sometimes used for libraries that have both physical collections and electronic collections. For example, American Memory is a digital library within the Library of Congress.
Some important digital libraries also serve as long term archives, such as arXiv and the Internet Archive. Others, such as the Digital Public Library of America, seek to make digital information from various institutions widely accessible online.
Many academic libraries are actively involved in building
repositories of their institution's books, papers, theses, and other
works that can be digitized or were 'born digital'. Many of these
repositories are made available to the general public with few
restrictions, in accordance with the goals of open access,
in contrast to the publication of research in commercial journals,
where the publishers usually limit access rights. Irrespective of access
rights, institutional, truly free, and corporate repositories can be
referred to as digital libraries. Institutional repository software is designed for archiving, organizing, and searching a library's content. Popular open-source solutions include DSpace, Greenstone Digital Library (GSDL), EPrints, Digital Commons, and the Fedora Commons-based systems Islandora and Samvera.
National library collections
Legal deposit is often covered by copyright
legislation and sometimes by laws specific to legal deposit, and
requires that one or more copies of all material published in a country
should be submitted for preservation in an institution, typically the national library. Since the advent of electronic documents, legislation has had to be amended to cover the new formats, such as the 2016 amendment to the Copyright Act 1968 in Australia.
Since then various types of electronic depositories have been built. The British Library's Publisher Submission Portal and the German model at the Deutsche Nationalbibliothek
have one deposit point for a network of libraries, but public access is
only available in the reading rooms in the libraries. The Australian National edeposit system has the same features, but also allows for remote access by the general public for most of the content.
Digital archives
Physical archives differ from physical libraries in several ways. Traditionally, archives are defined as:
Containing primary sources
of information (typically letters and papers directly produced by an
individual or organization) rather than the secondary sources found in a
library (books, periodicals, etc.).
Having their contents organized in groups rather than individual items.
Having unique contents.
The technology used to create digital libraries is even more
revolutionary for archives since it breaks down the second and third of
these general rules. In other words, "digital archives" or "online
archives" will still generally contain primary sources, but they are
likely to be described individually rather than (or in addition to) in
groups or collections. Further, because they are digital, their contents
are easily reproducible and may indeed have been reproduced from
elsewhere. The Oxford Text Archive is generally considered to be the oldest digital archive of academic physical primary source materials.
Archives differ from libraries in the nature of the materials
held. Libraries collect individual published books and serials, or
bounded sets of individual items. The books and journals held by
libraries are not unique, since multiple copies exist and any given copy
will generally prove as satisfactory as any other copy. The material in
archives and manuscript libraries are "the unique records of corporate
bodies and the papers of individuals and families".
A fundamental characteristic of archives is that they have to
keep the context in which their records have been created and the
network of relationships between them in order to preserve their
informative content and provide understandable and useful information
over time. The fundamental characteristic of archives resides in their
hierarchical organization expressing the context by means of the archival bond.
Archival descriptions are the fundamental means to describe,
understand, retrieve and access archival material. At the digital level,
archival descriptions are usually encoded by means of the Encoded Archival Description
XML format. The EAD is a standardized electronic representation of
archival description which makes it possible to provide union access to
detailed archival descriptions and resources in repositories distributed
throughout the world.
Given the importance of archives, a dedicated formal model, called NEsted SeTs for Object Hierarchies (NESTOR), built around their peculiar constituents, has been defined. NESTOR is
based on the idea of expressing the hierarchical relationships between
objects through the inclusion property between sets, in contrast to the
binary relation between nodes exploited by the tree. NESTOR has been
used to formally extend the 5S model to define a digital archive as a
specific case of digital library able to take into consideration the
peculiar features of archives.
In-game book libraries are virtual collections of written works that players can read, share, or interact with inside a video game. Unlike programming libraries,
these libraries function as narrtive or educational spaces, often
mirroring real-world libraries and sometimes providing access to text
that are censored or unavailable in the player's region. The most prominent example is The Uncensored Library, a minecraft map that distributes banned books worldwide. Another example is NaNa-Library known for its good novels and is designed to make small writers known.
Features of digital libraries
The advantages of digital libraries as a means of easily and rapidly
accessing books, archives and images of various types are now widely
recognized by commercial interests and public bodies alike.
Traditional libraries are limited by storage space; digital
libraries have the potential to store much more information, simply
because digital information requires very little physical space to
contain it. As such, the cost of maintaining a digital library can be much lower
than that of a traditional library. A physical library must spend large
sums of money paying for staff, book maintenance, rent, and additional
books. Digital libraries may reduce or, in some instances, do away with
these fees. Both types of library require cataloging input to allow
users to locate and retrieve material. Digital libraries may be more
willing to adopt innovations in technology providing users with
improvements in electronic and audio book technology as well as
presenting new forms of communication such as wikis and blogs;
conventional libraries may consider that providing online access to
their OP AC catalog is sufficient. An important advantage to digital
conversion is increased accessibility to users. They also increase
availability to individuals who may not be traditional patrons of a
library, due to geographic location or organizational affiliation.
No physical boundary: The user of a digital library need not to
go to the library physically; people from all over the world can gain
access to the same information, as long as an Internet connection is
available.
Round the clock availability: A major advantage of digital libraries is that people can gain access 24/7 to the information.
Multiple access: The same resources can be used simultaneously by a
number of institutions and patrons. This may not be the case for
copyrighted material: a library may have a license for "lending out"
only one copy at a time; this is achieved with a system of digital rights management
where a resource can become inaccessible after expiration of the
lending period or after the lender chooses to make it inaccessible
(equivalent to returning the resource).
Information retrieval: The user is able to use any search term
(word, phrase, title, name, subject) to search the entire collection.
Digital libraries can provide very user-friendly interfaces, giving
click able access to its resources.
Preservation and conservation: Digitization is not a long-term
preservation solution for physical collections, but does succeed in
providing access copies for materials that would otherwise fall to
degradation from repeated use. Digitized collections and born-digital
objects pose many preservation and conservation concerns that analog
materials do not. See § Digital preservation for examples.
Space: Whereas traditional libraries are limited by storage space,
digital libraries have the potential to store much more information,
simply because digital information requires very little physical space
to contain them and media storage technologies are more affordable than
ever before.
Added value: Certain characteristics of objects, primarily the
quality of images, may be improved. Digitization can enhance legibility
and remove visible flaws such as stains and discoloration.
Digital libraries offer a variety of software packages, including those tailored for kids' educational games. Institutional repository software, which focuses primarily on ingest,
preservation and access of locally produced documents, particularly
locally produced academic outputs, can be found in Institutional repository software.
This software may be proprietary, as is the case with the Library of
Congress which uses Digiboard and CTS to manage digital content.
The design and implementation in digital libraries are
constructed so computer systems and software can make use of the
information when it is exchanged. These are referred to as semantic
digital libraries. Semantic libraries are also used to socialize with
different communities from a mass of social networks. DjDL is a type of semantic digital library. Keywords-based and semantic
search are the two main types of searches. A tool is provided in the
semantic search that create a group for augmentation and refinement for
keywords-based search. Conceptual knowledge used in DjDL is centered
around two forms; the subject ontology and the set of concept search patterns based on the ontology. The three type of ontologies that are associated to this search are bibliographic ontologies, community-aware ontologies, and subject ontologies.
Metadata
In traditional libraries, the ability to find works of interest is
directly related to how well they were cataloged. While cataloging
electronic works digitized from a library's existing holding may be as
simple as copying or moving a record from the print to the electronic
form, complex and born-digital works require substantially more effort.
To handle the growing volume of electronic publications, new tools and
technologies have to be designed to allow effective automated semantic
classification and searching. While full-text search can be used for some items, there are many common catalog searches which cannot be performed using full text, including:
finding texts which are translations of other texts
differentiating between editions/volumes of a text/periodical
Most digital libraries provide a search interface which allows resources to be found. These resources are typically deep web (or invisible web) resources since they frequently cannot be located by search enginecrawlers. Some digital libraries create special pages or sitemaps to allow search engines to find all their resources. Digital libraries frequently use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their metadata to other digital libraries, and search engines like Google Scholar, Yahoo! and Scirus can also use OAI-PMH to find these deep web resources. As with physical libraries, very relatively little is known about how users actually select books.
There are two general strategies for searching a federation of digital libraries: distributed searching and searching previously harvested metadata.
Distributed searching typically involves a client sending
multiple search requests in parallel to a number of servers in the
federation. The results are gathered, duplicates are eliminated or
clustered, and the remaining items are sorted and presented back to the
client. Protocols like Z39.50
are frequently used in distributed searching. A benefit to this
approach is that the resource-intensive tasks of indexing and storage
are left to the respective servers in the federation. A drawback to this
approach is that the search mechanism is limited by the different
indexing and ranking capabilities of each database; therefore, making it
difficult to assemble a combined result consisting of the most relevant
found items.
Searching over previously harvested metadata involves searching a locally stored index
of information that has previously been collected from the libraries in
the federation. When a search is performed, the search mechanism does
not need to make connections with the digital libraries it is
searching—it already has a local representation of the information. This
approach requires the creation of an indexing and harvesting mechanism
which operates regularly, connecting to all the digital libraries and
querying the whole collection in order to discover new and updated
resources. OAI-PMH
is frequently used by digital libraries for allowing metadata to be
harvested. A benefit to this approach is that the search mechanism has
full control over indexing and ranking algorithms, possibly allowing
more consistent results. A drawback is that harvesting and indexing
systems are more resource-intensive and therefore expensive.
Digital preservation aims to ensure that digital media and
information systems are still interpretable into the indefinite future. Each necessary component of this must be migrated, preserved or emulated. Typically lower levels of systems (floppy disks
for example) are emulated, bit-streams (the actual files stored in the
disks) are preserved and operating systems are emulated as a virtual machine.
Only where the meaning and content of digital media and information
systems are well understood is migration possible, as is the case for
office documents. However, at least one organization, the Wider Net Project, has created an offline digital library, the eGranary, by reproducing materials on a 6 TBhard drive. Instead of a bit-stream environment, the digital library contains a built-in proxy server and search engine so the digital materials can be accessed using a web browser. Also, the materials are not preserved for the future. The eGranary is
intended for use in places or situations where Internet connectivity is
very slow, non-existent, unreliable, unsuitable or too expensive.
In the past few years, procedures for digitizing
books at high speed and comparatively low cost have improved
considerably with the result that it is now possible to digitize
millions of books per year. The Google book-scanning project is also working with libraries to
offer digitize books pushing forward on the digitize book realm.
Copyright and licensing
Digital libraries are hampered by copyright
law because, unlike with traditional printed works, the laws of digital
copyright are still being formed. The republication of material on the
web by libraries may require permission from rights holders, and there
is a conflict of interest between libraries and the publishers who may
wish to create online versions of their acquired content for commercial
purposes. In 2010, it was estimated that twenty-three percent of books
in existence were created before 1923 and thus out of copyright. Of
those printed after this date, only five percent were still in print as
of 2010. Thus, approximately seventy-two percent of books were not available to the public.
There is a dilution of responsibility that occurs as a result of
the distributed nature of digital resources. Complex intellectual
property matters may become involved since digital material is not
always owned by a library. The content is, in many cases, public domain or self-generated content only. Some digital libraries, such as Project Gutenberg,
work to digitize out-of-copyright works and make them freely available
to the public. An estimate of the number of distinct books still
existent in library catalogues from 2000 BC to 1960, has been made.
The Fair Use Provisions (17 USC § 107) under the Copyright Act of 1976
provide specific guidelines under which circumstances libraries are
allowed to copy digital resources. Four factors that constitute fair use
are "Purpose of the use, Nature of the work, Amount or substantiality
used and Market impact".
Some digital libraries acquire a license to lend their resources.
This may involve the restriction of lending out only one copy at a time
for each license, and applying a system of digital rights management for this purpose.
The Digital Millennium Copyright Act
of 1998 was an act created in the United States to attempt to deal with
the introduction of digital works. This Act incorporates two treaties
from the year 1996. It criminalizes the attempt to circumvent measures
which limit access to copyrighted materials. It also criminalizes the
act of attempting to circumvent access control. This act provides an exemption for nonprofit libraries and archives
which allows up to three copies to be made, one of which may be digital.
This may not be made public or distributed on the web, however.
Further, it allows libraries and archives to copy a work if its format
becomes obsolete.
Copyright issues persist. As such, proposals have been put
forward suggesting that digital libraries be exempt from copyright law.
Although this would be very beneficial to the public, it may have a
negative economic effect and authors may be less inclined to create new
works.
Another issue that complicates matters is the desire of some
publishing houses to restrict the use of digit materials such as e-books
purchased by libraries. Whereas with printed books, the library owns
the book until it can no longer be circulated, publishers want to limit
the number of times an e-book can be checked out before the library
would need to repurchase that book. "[HarperCollins] began licensing use
of each e-book copy for a maximum of 26 loans. This affects only the
most popular titles and has no practical effect on others. After the
limit is reached, the library can repurchase access rights at a lower
cost than the original price." While from a publishing perspective, this sounds like a good balance of
library lending and protecting themselves from a feared decrease in
book sales, libraries are not set up to monitor their collections as
such. They acknowledge the increased demand of digital materials
available to patrons and the desire of a digital library to become
expanded to include best sellers, but publisher licensing may hinder the
process.
Recommendation systems
Many digital libraries offer recommender systems to reduce information overload and help their users discovering relevant literature. Some examples of digital libraries offering recommender systems are IEEE Xplore, Europeana, and GESIS Sowiport. The recommender systems work mostly based on content-based filtering but also other approaches are used such as collaborative filtering and citation-based recommendations. Beel et al. report that there are more than 90 different recommendation
approaches for digital libraries, presented in more than 200 research articles.
Typically, digital libraries develop and maintain their own
recommender systems based on existing search and recommendation
frameworks such as Apache Lucene or Apache Mahout.
Drawbacks of digital libraries
Digital libraries, or at least their digital collections, also have brought their own problems and challenges in areas such as:
Exorbitant cost of building/maintaining the terabytes of storage,
servers, and redundancies necessary for a functional digital collection.
There are many large scale digitisation projects that perpetuate these problems.
Future development
Large scale digitization projects are underway at Google, the Million Book Project, and Internet Archive. With continued improvements in book handling and presentation technologies such as optical character recognition
and development of alternative depositories and business models,
digital libraries are rapidly growing in popularity. Just as libraries
have ventured into audio and video collections, so have digital
libraries such as the Internet Archive. In 2016, Google Books project received a court victory on proceeding with their book-scanning project that was halted by the Authors' Guild. This helped open the road for libraries to work with Google to better
reach patrons who are accustomed to computerized information.
According to Larry Lannom, Director of Information Management Technology at the nonprofit Corporation for National Research Initiatives
(CNRI), "all the problems associated with digital libraries are wrapped
up in archiving". He goes on to state, "If in 100 years people can
still read your article, we'll have solved the problem." Daniel Akst, author of The Webster Chronicle, proposes that "the future of libraries—and of information—is digital". Peter Lyman and Hal Variant, information scientists at the University of California, Berkeley,
estimate that "the world's total yearly production of print, film,
optical, and magnetic content would require roughly 1.5 billion
gigabytes of storage". Therefore, they believe that "soon it will be
technologically possible for an average person to access virtually all
recorded information".
Digital archives are an evolving medium and they develop under
various circumstances. Alongside large scale repositories, other digital
archiving projects have also evolved in response to needs in research
and research communication on various institutional levels. For example, during the COVID-19 pandemic, libraries
and higher education institutions have launched digital archiving
projects to document life during the pandemic, thus creating a digital,
cultural record of collective memories from the period. Researchers have also utilized digital archiving to create specialized research databases.
These databases compile digital records for use on international and
interdisciplinary levels. COVID CORPUS, launched in October 2020, is an
example of such a database, built in response to scientific
communication needs in light of the pandemic. Beyond academia, digital collections have also recently been developed
to appeal to a more general audience, as is the case with the Selected
General Audience Content of the Internet-First University Press
developed by Cornell University. This general-audience database contains
specialized research information but is digitally organized for
accessibility. The establishment of these archives has facilitated specialized forms
of digital recordkeeping to fulfill various niches in online, research-based communication.
Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organicsmall molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge
to the biomolecular target with which they interact and therefore will
bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptidesand especially therapeutic antibodies
are an increasingly important class of drugs and computational methods
for improving the affinity, selectivity, and stability of these
protein-based therapeutics have also been developed.
Definition
The phrase "drug design" is similar to ligand design (i.e., design of a molecule that will bind tightly to its target). Although design techniques for prediction of binding affinity are
reasonably successful, there are many other properties, such as bioavailability, metabolic half-life, and side effects,
that first must be optimized before a ligand can become a safe and
effective drug. These other characteristics are often difficult to
predict with rational design techniques.
Due to high attrition rates, especially during clinical phases of drug development, more attention is being focused early in the drug design process on selecting candidate drugs whose physicochemical
properties are predicted to result in fewer complications during
development and hence more likely to lead to an approved, marketed drug. Furthermore, in vitro experiments complemented with computation methods are increasingly used in early drug discovery to select compounds with more favorable ADME (absorption, distribution, metabolism, and excretion) and toxicological profiles.
Drug targets
A biomolecular target (most commonly a protein or a nucleic acid) is a key molecule involved in a particular metabolic or signaling pathway that is associated with a specific disease condition or pathology or to the infectivity or survival of a microbialpathogen. Potential drug targets are not necessarily disease causing but must by definition be disease modifying. In some cases, small molecules
will be designed to enhance or inhibit the target function in the
specific disease modifying pathway. Small molecules (for example
receptor agonists, antagonists, inverse agonists, or modulators; enzyme activators or inhibitors; or ion channel openers or blockers) will be designed that are complementary to the binding site of target. Small molecules (drugs) can be designed so as not to affect any other important "off-target" molecules (often referred to as antitargets) since drug interactions with off-target molecules may lead to undesirable side effects. Due to similarities in binding sites, closely related targets identified through sequence homology have the highest chance of cross reactivity and hence highest side effect potential.
Most commonly, drugs are organicsmall molecules produced through chemical synthesis, but biopolymer-based drugs (also known as biopharmaceuticals) produced through biological processes are becoming increasingly more common. In addition, mRNA-based gene silencing technologies may have therapeutic applications. For example, nanomedicines based on mRNA can streamline and expedite
the drug development process, enabling transient and localized
expression of immunostimulatory molecules. In vitro transcribed (IVT) mRNA allows for delivery to various
accessible cell types via the blood or alternative pathways. The use of
IVT mRNA serves to convey specific genetic information into a person's
cells, with the primary objective of preventing or altering a particular
disease.
Drug discovery
Phenotypic drug discovery
Phenotypic drug discovery
is a traditional drug discovery method, also known as forward
pharmacology or classical pharmacology. It uses the process of
phenotypic screening on collections of synthetic small molecules,
natural products, or extracts within chemical libraries to pinpoint
substances exhibiting beneficial therapeutic effects. This method is to
first discover the in vivo or in vitro functional activity of drugs
(such as extract drugs or natural products), and then perform target
identification. Phenotypic discovery uses a practical and
target-independent approach to generate initial leads, aiming to
discover pharmacologically active compounds and therapeutics that
operate through novel drug mechanisms. This method allows the exploration of disease phenotypes to find
potential treatments for conditions with unknown, complex, or
multifactorial origins, where the understanding of molecular targets is
insufficient for effective intervention.
Rational drug discovery
Rational drug design (also called reverse pharmacology)
begins with a hypothesis that modulation of a specific biological
target may have therapeutic value. In order for a biomolecule to be
selected as a drug target, two essential pieces of information are
required. The first is evidence that modulation of the target will be
disease modifying. This knowledge may come from, for example, disease
linkage studies that show an association between mutations in the
biological target and certain disease states. The second is that the target is capable of binding to a small molecule
and that its activity can be modulated by the small molecule.
Once a suitable target has been identified, the target is normally cloned and produced and purified. The purified protein is then used to establish a screening assay. In addition, the three-dimensional structure of the target may be determined.
The search for small molecules that bind to the target is begun
by screening libraries of potential drug compounds. This may be done by
using the screening assay (a "wet screen"). In addition, if the
structure of the target is available, a virtual screen may be performed of candidate drugs. Ideally, the candidate drug compounds should be "drug-like", that is they should possess properties that are predicted to lead to oral bioavailability, adequate chemical and metabolic stability, and minimal toxic effects. Several methods are available to estimate druglikeness such as Lipinski's Rule of Five and a range of scoring methods such as lipophilic efficiency. Several methods for predicting drug metabolism have also been proposed in the scientific literature.
Due to the large number of drug properties that must be simultaneously optimized during the design process, multi-objective optimization techniques are sometimes employed. Finally because of the limitations in the current methods for
prediction of activity, drug design is still very much reliant on serendipity and bounded rationality.
Computer-aided drug design
The most fundamental goal in drug design is to predict whether a given molecule will bind to a target and if so how strongly. Molecular mechanics or molecular dynamics is most often used to estimate the strength of the intermolecular interaction between the small molecule and its biological target. These methods are also used to predict the conformation of the small molecule and to model conformational changes in the target that may occur when the small molecule binds to it. Semi-empirical, ab initio quantum chemistry methods, or density functional theory
are often used to provide optimized parameters for the molecular
mechanics calculations and also provide an estimate of the electronic
properties (electrostatic potential, polarizability, etc.) of the drug candidate that will influence binding affinity.
Molecular mechanics methods may also be used to provide
semi-quantitative prediction of the binding affinity. Also,
knowledge-based scoring function may be used to provide binding affinity estim These methods use linear regression, machine learning, neural nets
or other statistical techniques to derive predictive binding affinity
equations by fitting experimental affinities to computationally derived
interaction energies between the small molecule and the target.
Ideally, the computational method will be able to predict
affinity before a compound is synthesized and hence in theory only one
compound needs to be synthesized, saving enormous time and cost. The
reality is that present computational methods are imperfect and provide,
at best, only qualitatively accurate estimates of affinity. In
practice, it requires several iterations of design, synthesis, and
testing before an optimal drug is discovered. Computational methods have
accelerated discovery by reducing the number of iterations required and
have often provided novel structures.
Computer-aided drug design may be used at any of the following stages of drug discovery:
hit identification using virtual screening (structure- or ligand-based design)
hit-to-lead optimization of affinity and selectivity (structure-based design, QSAR, etc.)
lead optimization of other pharmaceutical properties while maintaining affinity
Flowchart of a Usual Clustering Analysis for Structure-Based Drug Design
In order to overcome the insufficient prediction of binding affinity
calculated by recent scoring functions, the protein-ligand interaction
and compound 3D structure information are used for analysis. For
structure-based drug design, several post-screening analyses focusing on
protein-ligand interaction have been developed for improving enrichment
and effectively mining potential candidates:
Consensus scoring
Selecting candidates by voting of multiple scoring functions
May lose the relationship between protein-ligand structural information and scoring criterion
Cluster analysis
Represent and cluster candidates according to protein-ligand 3D information
Needs meaningful representation of protein-ligand interactions.
Types
Drug discovery cycle highlighting both ligand-based (indirect) and structure-based (direct) drug design strategies.
There are two major types of drug design. The first is referred to as ligand-based drug design and the second, structure-based drug design.
Ligand-based
Ligand-based drug design (or indirect drug design) relies on
knowledge of other molecules that bind to the biological target of
interest. These other molecules may be used to derive a pharmacophore model that defines the minimum necessary structural characteristics a molecule must possess in order to bind to the target. A model of the biological target may be built based on the knowledge of
what binds to it, and this model in turn may be used to design new
molecular entities that interact with the target. Alternatively, a quantitative structure-activity relationship (QSAR), in which a correlation between calculated properties of molecules and their experimentally determined biological activity, may be derived. These QSAR relationships in turn may be used to predict the activity of new analogs.
Structure-based
Structure-based drug design (or direct drug design) relies on knowledge of the three dimensional structure of the biological target obtained through methods such as x-ray crystallography or NMR spectroscopy. If an experimental structure of a target is not available, it may be possible to create a homology model
of the target based on the experimental structure of a related protein.
Using the structure of the biological target, candidate drugs that are
predicted to bind with high affinity and selectivity to the target may be designed using interactive graphics and the intuition of a medicinal chemist. Alternatively, various automated computational procedures may be used to suggest new drug candidates.
Current methods for structure-based drug design can be divided roughly into three main categories. The first method is identification of new ligands for a given receptor
by searching large databases of 3D structures of small molecules to find
those fitting the binding pocket of the receptor using fast approximate
docking programs. This method is known as virtual screening.
A second category is de novo design of new ligands. In this
method, ligand molecules are built up within the constraints of the
binding pocket by assembling small pieces in a stepwise manner. These
pieces can be either individual atoms or molecular fragments. The key
advantage of such a method is that novel structures, not contained in
any database, can be suggested. A third method is the optimization of known ligands by evaluating proposed analogs within the binding cavity.
Binding site identification
Binding site identification is the first step in structure based design. If the structure of the target or a sufficiently similar homolog
is determined in the presence of a bound ligand, then the ligand should
be observable in the structure in which case location of the binding
site is trivial. However, there may be unoccupied allosteric binding sites that may be of interest. Furthermore, it may be that only apoprotein
(protein without ligand) structures are available and the reliable
identification of unoccupied sites that have the potential to bind
ligands with high affinity is non-trivial. In brief, binding site
identification usually relies on identification of concave surfaces on the protein that can accommodate drug sized molecules that also possess appropriate "hot spots" (hydrophobic surfaces, hydrogen bonding sites, etc.) that drive ligand binding.
Structure-based drug design attempts to use the structure of proteins
as a basis for designing new ligands by applying the principles of molecular recognition. Selective high affinity binding to the target is generally desirable since it leads to more efficacious
drugs with fewer side effects. Thus, one of the most important
principles for designing or obtaining potential new ligands is to
predict the binding affinity of a certain ligand to its target (and
known antitargets) and use the predicted affinity as a criterion for selection.
One early general-purposed empirical scoring function to describe
the binding energy of ligands to receptors was developed by Böhm. This empirical scoring function took the form:
where:
ΔG0 – empirically derived offset that in part
corresponds to the overall loss of translational and rotational entropy
of the ligand upon binding.
ΔGhb – contribution from hydrogen bonding
ΔGionic – contribution from ionic interactions
ΔGlip – contribution from lipophilic interactions where |Alipo| is surface area of lipophilic contact between the ligand and receptor
ΔGrot – entropy penalty due to freezing a rotatable in the ligand bond upon binding
A more general thermodynamic "master" equation is as follows:
where:
desolvation – enthalpic penalty for removing the ligand from solvent
motion – entropic penalty for reducing the degrees of freedom when a ligand binds to its receptor
configuration – conformational strain energy required to put the ligand in its "active" conformation
interaction – enthalpic gain for "resolvating" the ligand with its receptor
The basic idea is that the overall binding free energy can be
decomposed into independent components that are known to be important
for the binding process. Each component reflects a certain kind of free
energy alteration during the binding process between a ligand and its
target receptor. The Master Equation is the linear combination of these
components. According to Gibbs free energy equation, the relation
between dissociation equilibrium constant, Kd, and the components of free energy was built.
Various computational methods are used to estimate each of the
components of the master equation. For example, the change in polar
surface area upon ligand binding can be used to estimate the desolvation
energy. The number of rotatable bonds frozen upon ligand binding is
proportional to the motion term. The configurational or strain energy
can be estimated using molecular mechanics
calculations. Finally the interaction energy can be estimated using
methods such as the change in non polar surface, statistically derived potentials of mean force,
the number of hydrogen bonds formed, etc. In practice, the components
of the master equation are fit to experimental data using multiple
linear regression. This can be done with a diverse training set
including many types of ligands and receptors to produce a less accurate
but more general "global" model or a more restricted set of ligands and
receptors to produce a more accurate but less general "local" model.
Examples
A particular example of rational drug design involves the use of
three-dimensional information about biomolecules obtained from such
techniques as X-ray crystallography and NMR spectroscopy. Computer-aided
drug design in particular becomes much more tractable when there is a
high-resolution structure of a target protein bound to a potent ligand.
This approach to drug discovery is sometimes referred to as
structure-based drug design. The first unequivocal example of the
application of structure-based drug design leading to an approved drug is the carbonic anhydrase inhibitor dorzolamide, which was approved in 1995.
Types of drug screening include phenotypic screening, high-throughput screening, and virtual screening.
Phenotypic screening is characterized by the process of screening drugs
using cellular or animal disease models to identify compounds that
alter the phenotype and produce beneficial disease-related effects. Emerging
technologies in high-throughput screening substantially enhance
processing speed and decrease the required detection volume. Virtual screening is completed by computer, enabling a large number of
molecules can be screened with a short cycle and low cost. Virtual
screening uses a range of computational methods that empower chemists to
reduce extensive virtual libraries into more manageable sizes.