Search This Blog

Monday, November 27, 2023

Electronic publishing

From Wikipedia, the free encyclopedia

Electronic publishing (also referred to as publishing, digital publishing, or online publishing) includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals, and magazines to be posted on a screen (computer, e-reader, tablet, or smartphone).

About

Electronic publishing has become common in scientific publishing where it has been argued that peer-reviewed scientific journals are in the process of being replaced by electronic publishing. It is also becoming common to distribute books, magazines, and newspapers to consumers through tablet reading devices, a market that is growing by millions each year, generated by online vendors such as Apple's iTunes bookstore, Amazon's bookstore for Kindle, and books in the Google Play Bookstore. Market research suggested that half of all magazine and newspaper circulation would be via digital delivery by the end of 2015 and that half of all reading in the United States would be done without paper by 2015.

Although distribution via the Internet (also known as online publishing or web publishing when in the form of a website) is nowadays strongly associated with electronic publishing, there are many non-network electronic publications such as encyclopedias on CD and DVD, as well as technical and reference publications relied on by mobile users and others without reliable and high-speed access to a network. Electronic publishing is also being used in the field of test-preparation in developed as well as in developing economies for student education (thus partly replacing conventional books) – for it enables content and analytics combined – for the benefit of students. The use of electronic publishing for textbooks may become more prevalent with Apple Books from Apple Inc. and Apple's negotiation with the three largest textbook suppliers in the U.S.

Electronic publishing is increasingly popular in works of fiction. Electronic publishers are able to respond quickly to changing market demand, because the companies do not have to order printed books and have them delivered. E-publishing is also making a wider range of books available, including books that customers would not find in standard book retailers, due to insufficient demand for a traditional "print run". E-publication is enabling new authors to release books that would be unlikely to be profitable for traditional publishers. While the term "electronic publishing" is primarily used in the 2010s to refer to online and web-based publishers, the term has a history of being used to describe the development of new forms of production, distribution, and user interaction in regard to computer-based production of text and other interactive media.

History

Digitization

The first digitization initiative was in 1971 by Michael S. Hart, a student at the University of Illinois at Chicago, who launched Project Gutenberg, designed to make literature more accessible to everyone, through the internet. It took a while to develop, and in 1989 there were only 10 texts that were manually recopied on computer by Michael S. Hart himself and some volunteers. But with the appearance of the Web 1.0 in 1991 and its ability to connect documents together through static pages, the project moved quickly forward. Many more volunteers helped in developing the project by giving access to public domain classics.

In the 1970s, the French National Centre for Scientific Research digitized a thousand books from diverse subjects, mostly literature but also philosophy and science, dating back to the 12th century to present times. In this way were built the foundations of a large dictionary, the Trésor de la langue française au Québec. This foundation of e-texts, named Frantext, was published on a compact disc under the brand name Discotext, and then on the worldwide web in 1998.

Mass-scale digitization

In 1974, American inventor and futurist Raymond Kurzweil developed a scanner which was equipped with an Omnifont software that enabled optical character recognition for numeric inputs. The digitization projects could then be more ambitious since the time needed for digitization decreased considerably, and digital libraries were on the rise. All over the world, e-libraries started to emerge.

The ABU (Association des Bibliophiles Universels), was a public digital library project created by the Cnam in 1993. It was the first French digital library in the network; suspended since 2002, they reproduced over a hundred texts that are still available.

In 1992, the Bibliothèque nationale de France launched a vast digitization program. The president François Mitterrand had wanted since 1988 to create a new and innovative digital library, and it was published in 1997 under the name of Gallica. In 2014, the digital library was offering 80 255 online books and over a million documents, including prints and manuscripts.

In 2003, Wikisource was launched, and the project aspired to constitute a digital and multilingual library that would be a complement to the Wikipedia project. It was originally named "Project Sourceberg", as a word play to remind the Project Gutenberg. Supported by the Wikimedia Foundation, Wikisource proposes digitized texts that have been verified by volunteers.

In December 2004, Google created Google Books, a project to digitize all the books available in the world (over 130 million books) to make them accessible online. 10 years later, 25 000 000 books, from a hundred countries and in 400 languages, are on the platform. This was possible because by that time, robotic scanners could digitize around 6 000 books per hour.

In 2008, the prototype of Europeana was launched; and by 2010, the project had been giving access to over 10 million digital objects. The Europeana library is a European catalog that offers index cards on millions of digital objects and links to their digital libraries. In the same year, HathiTrust was created to put together the contents of many university e-libraries from USA and Europe, as well as Google Books and Internet Archive. In 2016, over six millions of users had been using HathiTrust.

Electronic publishing

The first digitization projects were transferring physical content into digital content. Electronic publishing is aiming to integrate the whole process of editing and publishing (production, layout, publication) in the digital world.

Alain Mille, in the book Pratiques de l'édition numérique (edited by Michael E. Sinatra and Marcello Vitali-Rosati), says that the beginnings of Internet and the Web are the very core of electronic publishing, since they pretty much determined the biggest changes in the production and diffusion patterns. Internet has a direct effect on the publishing questions, letting creators and users go further in the traditional process (writer-editor-publishing house).

The traditional publishing, and especially the creation part, were first revolutionized by new desktop publishing softwares appearing in the 1980s, and by the text databases created for the encyclopedias and directories. At the same time the multimedia was developing quickly, combining book, audiovisual and computer science characteristics. CDs and DVDs appear, permitting the visualization of these dictionaries and encyclopedias on computers.

The arrival and democratization of Internet is slowly giving small publishing houses the opportunity to publish their books directly online. Some websites, like Amazon, let their users buy eBooks; Internet users can also find many educative platforms (free or not), encyclopedic websites like Wikipedia, and even digital magazines platforms. The eBook then becomes more and more accessible through many different supports, like the e-reader and even smartphones. The digital book had, and still has, an important impact on publishing houses and their economical models; it is still a moving domain, and they yet have to master the new ways of publishing in a digital era.

Online edition

Based on new communications practices of the web 2.0 and the new architecture of participation, online edition opens the door to a collaboration of a community to elaborate and improve contents on Internet, while also enriching reading through collective reading practices. The web 2.0 not only links documents together, as did the web 1.0, it also links people together through social media: that's why it's called the Participative (or participatory) Web.

Many tools were put in place to foster sharing and creative collective contents. One of the many is the Wikipedia encyclopedia, since it is edited, corrected and enhanced by millions of contributors. OpenStreetMap is also based on the same principle. Blogs and comment systems are also now renown as online edition and publishing, since it is possible through new interactions between the author and its readers, and can be an important method for inspiration but also for visibility.

Process

The electronic publishing process follows some aspects of the traditional paper-based publishing process but differs from traditional publishing in two ways: 1) it does not include using an offset printing press to print the final product and 2) it avoids the distribution of a physical product (e.g., paper books, paper magazines, or paper newspapers). Because the content is electronic, it may be distributed over the Internet and through electronic bookstores, and users can read the material on a range of electronic and digital devices, including desktop computers, laptops, tablet computers, smartphones or e-reader tablets. The consumer may read the published content online on a website, in an application on a tablet device, or in a PDF document on a computer. In some cases, the reader may print the content onto paper using a consumer-grade ink-jet or laser printer or via a print-on-demand system. Some users download digital content to their devices, enabling them to read the content even when their device is not connected to the Internet (e.g., on an airplane flight).

Distributing content electronically as software applications ("apps") has become popular in the 2010s, due to the rapid consumer adoption of smartphones and tablets. At first, native apps for each mobile platform were required to reach all audiences, but in an effort toward universal device compatibility, attention has turned to using HTML5 to create web apps that can run on any browser and function on many devices. The benefit of electronic publishing comes from using three attributes of digital technology: XML tags to define content, style sheets to define the look of content, and metadata (data about data) to describe the content for search engines, thus helping users to find and locate the content (a common example of metadata is the information about a song's songwriter, composer, genre that is electronically encoded along with most CDs and digital audio files; this metadata makes it easier for music lovers to find the songs they are looking for). With the use of tags, style sheets, and metadata, this enables "reflowable" content that adapts to various reading devices (tablet, smartphone, e-reader, etc.) or electronic delivery methods.

Because electronic publishing often requires text mark-up (e.g., HyperText Markup Language or some other markup language) to develop online delivery methods, the traditional roles of typesetters and book designers, who created the printing set-ups for paper books, have changed. Designers of digitally published content must have a strong knowledge of mark-up languages, the variety of reading devices and computers available, and the ways in which consumers read, view or access the content. However, in the 2010s, new user friendly design software is becoming available for designers to publish content in this standard without needing to know detailed programming techniques, such as Adobe Systems' Digital Publishing Suite and Apple's iBooks Author. The most common file format is .epub, used in many e-book formats. .epub is a free and open standard available in many publishing programs. Another common format is .folio, which is used by the Adobe Digital Publishing Suite to create content for Apple's iPad tablets and apps.

Academic publishing

After an article is submitted to an academic journal for consideration, there can be a delay ranging from several months to more than two years before it is published in a journal, rendering journals a less than ideal format for disseminating current research. In some fields, such as astronomy and some areas of physics, the role of the journal in disseminating the latest research has largely been replaced by preprint repositories such as arXiv.org. However, scholarly journals still play an important role in quality control and establishing scientific credit. In many instances, the electronic materials uploaded to preprint repositories are still intended for eventual publication in a peer-reviewed journal. There is statistical evidence that electronic publishing provides wider dissemination, because when a journal is available online, a larger number of researchers can access the journal. Even if a professor is working in a university that does not have a certain journal in its library, she may still be able to access the journal online. A number of journals have, while retaining their longstanding peer review process to ensure that the research is done properly, established electronic versions or even moved entirely to electronic publication.

Copyright

In the early 2000s, many of the existing copyright laws were designed around printed books, magazines and newspapers. For example, copyright laws often set limits on how much of a book can be mechanically reproduced or copied. Electronic publishing raises new questions in relation to copyright, because if an e-book or e-journal is available online, millions of Internet users may be able to view a single electronic copy of the document, without any "copies" being made.

Emerging evidence suggests that e-publishing may be more collaborative than traditional paper-based publishing; e-publishing often involves more than one author, and the resulting works are more accessible, since they are published online. At the same time, the availability of published material online opens more doors for plagiarism, unauthorized use, or re-use of the material. Some publishers are trying to address these concerns. For example, in 2011, HarperCollins limited the number of times that one of its e-books could be lent in a public library. Other publishers, such as Penguin, are attempting to incorporate e-book elements into their regular paper publications.

Reference work

From Wikipedia, the free encyclopedia
The Brockhaus Enzyklopädie, the best-known traditional reference book in German-speaking countries
The Lexikon des Mittelalters, a specialised German encyclopedia
Encyclopædia Britannica, 15th edition: volumes of the Propedia (green), Micropedia (red), Macropedia (black), and 2-volume Index (blue)

A reference work is a non-fiction work, such as a paper, book or periodical (or their electronic equivalents), to which one can refer for information. The information is intended to be found quickly when needed. Such works are usually referred to for particular pieces of information, rather than read beginning to end. The writing style used in these works is informative; the authors avoid use of the first person, and emphasize facts.

Indices are a common navigation feature in many types of reference works. Many reference works are put together by a team of contributors whose work is coordinated by one or more editors, rather than by an individual author. Updated editions are usually published as needed, in some cases annually (Whitaker's Almanack, Who's Who).

Reference works include textbooks, almanacs, atlases, bibliographies, biographical sources, catalogs such as library catalogs and art catalogs, concordances, dictionaries, directories such as business directories and telephone directories, discographies, encyclopedias, filmographies, gazetteers, glossaries, handbooks, indices such as bibliographic indices and citation indices, manuals, research guides, thesauruses, and yearbooks. Many reference works are available in electronic form and can be obtained as reference software, CD-ROMs, DVDs, or online through the Internet. Wikipedia, an online encyclopedia, is both the largest and the most-read reference work in history.

Reference book

In contrast to books that are loaned, a reference book or reference-only book in a library is one that may only be used in the library and may not be borrowed from the library. Many such books are reference works (in the first sense), which are, usually, used briefly or photocopied from, and therefore, do not need to be borrowed. Keeping reference books in the library assures that they will always be available for use on demand. Some reference-only books are too valuable to permit borrowers to take them out. Reference-only items may be shelved in a reference collection located separately from circulating items. Some libraries consist entirely, or to a large extent, of books which may not be borrowed.

Types of reference work

These are the main types and categories of reference work:

  • Abstracting journal – a published summary of articles, theses, reviews, conference proceedings etc. arranged systematically
  • Almanac – an annual publication, listing a set of current, general or specific information about one or multiple subjects
  • Annals – concise historical record in which events are arranged chronologically
  • Atlas – a collection of maps traditionally been bound into book form
  • Bibliography – a systematic list of books and other works such as journal articles on a given subject or which satisfy particular criteria
  • Biographical dictionary – an encyclopedic dictionary limited to biographical information
  • Books of Quotations – collections of quotations satisfying particular criteria, arranged systematically
  • Chronicle/Chronology – a historical account of events arranged in chronological order
  • Compendium – a concise collection of information pertaining to a body of knowledge
  • Concordance – an alphabetical list of the principal words used in a book or body of work
  • Dictionary – a list of words from one or more languages, systematically arranged and giving meanings, etymologies etc.
  • Digest – a summary of information on a particular subject
  • Directory – a systematically arranged list of names, addresses, products, etc.
  • Encyclopaedia – a compendium providing summaries of knowledge either from all branches or from a particular field or discipline
  • Gazetteer – a geographical dictionary or directory used to provide systematic access to a map or atlas
  • Glossary – an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms
  • Handbook – a small or portable book intended to provide ready reference
  • Index – a publication giving systematic access to a body of knowledge
  • Lexicon – a synonym for a dictionary or encyclopaedic dictionary
  • List – a published enumeration of a set of items
  • Manual – a handbook providing instructions in the use of a particular product
  • Phrase book – a collection of ready-made phrases, arranged systematically, usually for a foreign language together with a translation
  • Ready reckoner – a printed book or table containing pre-calculated values
  • Thematic catalogue – an index used to identify musical compositions through the citation of the opening notes
  • Textbook – a reference work containing information about a subject
  • Thesaurus – a reference work for finding synonyms and sometimes antonyms of words
  • Timetable – a published list of schedules giving times for transportation or other events
  • Yearbook – a compendium containing events relating to a specific year

Electronic resources

An electronic resource is a computer program or data that is stored electronically, which is usually found on a computer, including information that is available on the Internet. Libraries offer numerous types of electronic resources including electronic texts such as electronic books and electronic journals, bibliographic databases, institutional repositories, websites, and software applications.

Runtime system

From Wikipedia, the free encyclopedia

Most programming languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the management of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. The compiler makes assumptions depending on the specific runtime system to generate correct code. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.

Overview

Every programming language specifies an execution model, and many implement at least part of that model in a runtime system. One possible definition of runtime system behavior, among others, is "any behavior not directly attributable to the program itself". This definition includes putting parameters onto the stack before function calls, parallel execution of related behaviors, and disk I/O.

By this definition, essentially every language has a runtime system, including compiled languages, interpreted languages, and embedded domain-specific languages. Even API-invoked standalone execution models, such as Pthreads (POSIX threads), have a runtime system that implements the execution model's behavior.

Most scholarly papers on runtime systems focus on the implementation details of parallel runtime systems. A notable example of a parallel runtime system is Cilk, a popular parallel programming model. The proto-runtime toolkit was created to simplify the creation of parallel runtime systems.

In addition to execution model behavior, a runtime system may also perform support services such as type checking, debugging, or code generation and optimization.

Comparison between concepts similar to runtime system.
Type Description Examples
Runtime environment Software platform that provides an environment for executing code Node.js, .NET Framework
Engine Component of a runtime environment that executes code by compiling or interpreting it JavaScript engine in web browsers, Java Virtual Machine
Interpreter Type of engine that reads and executes code line by line, without compiling the entire program beforehand CPython interpreter, Ruby MRI, JavaScript (in some cases)
JIT interpreter Type of interpreter that dynamically compiles code into machine instructions at runtime, optimizing the code for faster execution V8, PyPy interpreter

Relation to runtime environments

The runtime system is also the gateway through which a running program interacts with the runtime environment. The runtime environment includes not only accessible state values, but also active entities with which the program can interact during execution. For example, environment variables are features of many operating systems, and are part of the runtime environment; a running program can access them via the runtime system. Likewise, hardware devices such as disks or DVD drives are active entities that a program can interact with via a runtime system.

One unique application of a runtime environment is its use within an operating system that only allows it to run. In other words, from boot until power-down, the entire OS is dedicated to only the application(s) running within that runtime environment. Any other code that tries to run, or any failures in the application(s), will break the runtime environment. Breaking the runtime environment in turn breaks the OS, stopping all processing and requiring a reboot. If the boot is from read-only memory, an extremely secure, simple, single-mission system is created.

Examples of such directly bundled runtime systems include:

  • Between 1983 and 1984, Digital Research offered several of their business and educations applications for the IBM PC on bootable floppy diskettes bundled with SpeedStart CP/M-86, a reduced version of CP/M-86 as runtime environment.
  • Some stand-alone versions of Ventura Publisher (1986–1993), Artline (1988–1991), Timeworks Publisher (1988–1991) and ViewMAX (1990–1992) contained special runtime versions of Digital Research's GEM as their runtime environment.
  • In the late 1990s, JP Software's command line processor 4DOS was optionally available in a special runtime version to be linked with BATCOMP pre-compiled and encrypted batch jobs in order to create unmodifyable executables from batch scripts and run them on systems without 4DOS installed.

Examples

The runtime system of the C language is a particular set of instructions inserted by the compiler into the executable image. Among other things, these instructions manage the process stack, create space for local variables, and copy function call parameters onto the top of the stack.

There are often no clear criteria for determining which language behaviors are part of the runtime system itself and which can be determined by any particular source program. For example, in C, the setup of the stack is part of the runtime system. It is not determined by the semantics of an individual program because the behavior is globally invariant: it holds over all executions. This systematic behavior implements the execution model of the language, as opposed to implementing semantics of the particular program (in which text is directly translated into code that computes results).

This separation between the semantics of a particular program and the runtime environment is reflected by the different ways of compiling a program: compiling source code to an object file that contains all the functions versus compiling an entire program to an executable binary. The object file will only contain assembly code relevant to the included functions, while the executable binary will contain additional code that implements the runtime environment. The object file, on one hand, may be missing information from the runtime environment that will be resolved by linking. On the other hand, the code in the object file still depends on assumptions in the runtime system; for example, a function may read parameters from a particular register or stack location, depending on the calling convention used by the runtime environment.

Another example is the case of using an application programming interface (API) to interact with a runtime system. The calls to that API look the same as calls to a regular software library, however at some point during the call the execution model changes. The runtime system implements an execution model different from that of the language the library is written in terms of. A person reading the code of a normal library would be able to understand the library's behavior by just knowing the language the library was written in. However, a person reading the code of the API that invokes a runtime system would not be able to understand the behavior of the API call just by knowing the language the call was written in. At some point, via some mechanism, the execution model stops being that of the language the call is written in and switches over to being the execution model implemented by the runtime system. For example, the trap instruction is one method of switching execution models. This difference is what distinguishes an API-invoked execution model, such as Pthreads, from a usual software library. Both Pthreads calls and software library calls are invoked via an API, but Pthreads behavior cannot be understood in terms of the language of the call. Rather, Pthreads calls bring into play an outside execution model, which is implemented by the Pthreads runtime system (this runtime system is often the OS kernel).

As an extreme example, the physical CPU itself can be viewed as an implementation of the runtime system of a specific assembly language. In this view, the execution model is implemented by the physical CPU and memory systems. As an analogy, runtime systems for higher-level languages are themselves implemented using some other languages. This creates a hierarchy of runtime systems, with the CPU itself—or actually its logic at the microcode layer or below—acting as the lowest-level runtime system.

Advanced features

Some compiled or interpreted languages provide an interface that allows application code to interact directly with the runtime system. An example is the Thread class in the Java language. The class allows code (that is animated by one thread) to do things such as start and stop other threads. Normally, core aspects of a language's behavior such as task scheduling and resource management are not accessible in this fashion.

Higher-level behaviors implemented by a runtime system may include tasks such as drawing text on the screen or making an Internet connection. It is often the case that operating systems provide these kinds of behaviors as well, and when available, the runtime system is implemented as an abstraction layer that translates the invocation of the runtime system into an invocation of the operating system. This hides the complexity or variations in the services offered by different operating systems. This also implies that the OS kernel can itself be viewed as a runtime system, and that the set of OS calls that invoke OS behaviors may be viewed as interactions with a runtime system.

In the limit, the runtime system may provide services such as a P-code machine or virtual machine, that hide even the processor's instruction set. This is the approach followed by many interpreted languages such as AWK, and some languages like Java, which are meant to be compiled into some machine-independent intermediate representation code (such as bytecode). This arrangement simplifies the task of language implementation and its adaptation to different machines, and improves efficiency of sophisticated language features such as reflection. It also allows the same program to be executed on any machine without an explicit recompiling step, a feature that has become very important since the proliferation of the World Wide Web. To speed up execution, some runtime systems feature just-in-time compilation to machine code.

A modern aspect of runtime systems is parallel execution behaviors, such as the behaviors exhibited by mutex constructs in Pthreads and parallel section constructs in OpenMP. A runtime system with such parallel execution behaviors may be modularized according to the proto-runtime approach.

History

Notable early examples of runtime systems are the interpreters for BASIC and Lisp. These environments also included a garbage collector. Forth is an early example of a language designed to be compiled into intermediate representation code; its runtime system was a virtual machine that interpreted that code. Another popular, if theoretical, example is Donald Knuth's MIX computer.

In C and later languages that supported dynamic memory allocation, the runtime system also included a library that managed the program's memory pool.

In the object-oriented programming languages, the runtime system was often also responsible for dynamic type checking and resolving method references.

Library (computing)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Library_(computing)
Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

A library is also a collection of implementations of behavior, written in terms of a language, that has a well-defined interface by which the behavior is invoked. For instance, people who want to write a higher-level program can use a library to make system calls instead of implementing those system calls over and over again. In addition, the behavior is provided for reuse by multiple independent programs. A program invokes the library-provided behavior via a mechanism of the language. For example, in a simple imperative language such as C, the behavior in a library is invoked by using C's normal function-call. What distinguishes the call as being to a library function, versus being to another function in the same program, is the way that the code is organized in the system.

Library code is organized in such a way that it can be used by multiple programs that have no connection to each other, while code that is part of a program is organized to be used only within that one program. This distinction can gain a hierarchical notion when a program grows large, such as a multi-million-line program. In that case, there may be internal libraries that are reused by independent sub-portions of the large program. The distinguishing feature is that a library is organized for the purposes of being reused by independent programs or sub-programs, and the user only needs to know the interface and not the internal details of the library.

The value of a library lies in the reuse of standardized program elements. When a program invokes a library, it gains the behavior implemented inside that library without having to implement that behavior itself. Libraries encourage the sharing of code in a modular fashion and ease the distribution of the code.

The behavior implemented by a library can be connected to the invoking program at different program lifecycle phases. If the code of the library is accessed during the build of the invoking program, then the library is called a static library. An alternative is to build the executable of the invoking program and distribute that, independently of the library implementation. The library behavior is connected after the executable has been invoked to be executed, either as part of the process of starting the execution, or in the middle of execution. In this case the library is called a dynamic library (loaded at runtime). A dynamic library can be loaded and linked when preparing a program for execution, by the linker. Alternatively, in the middle of execution, an application may explicitly request that a module be loaded.

Most compiled languages have a standard library, although programmers can also create their own custom libraries. Most modern software systems provide libraries that implement the majority of the system services. Such libraries have organized the services which a modern application requires. As such, most code used by modern applications is provided in these system libraries.

History

The idea of a computer library dates back to the first computers created by Charles Babbage. An 1888 paper on his Analytical Engine suggested that computer operations could be punched on separate cards from numerical input. If these operation punch cards were saved for reuse then "by degrees the engine would have a library of its own."

A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer.

In 1947 Goldstine and von Neumann speculated that it would be useful to create a "library" of subroutines for their work on the IAS machine, an early computer that was not yet operational at that time. They envisioned a physical library of magnetic wire recordings, with each wire storing reusable computer code.

Inspired by von Neumann, Wilkes and his team constructed EDSAC. A filing cabinet of punched tape held the subroutine library for this computer. Programs for EDSAC consisted of a main program and a sequence of subroutines copied from the subroutine library. In 1951 the team published the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer, which detailed the creation and the purpose of the library.

COBOL included "primitive capabilities for a library system" in 1959, but Jean Sammet described them as "inadequate library facilities" in retrospect.

JOVIAL had a Communication Pool (COMPOOL), roughly a library of header files.

Another major contributor to the modern library concept came in the form of the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of each other, but the compiler lacked a linker. So prior to the introduction of modules in Fortran-90, type checking between FORTRAN subprograms was impossible.

By the mid 1960s, copy and macro libraries for assemblers were common. Starting with the popularity of the IBM System/360, libraries containing other types of text elements, e.g., system parameters, also became common.

Simula was the first object-oriented programming language, and its classes were nearly identical to the modern concept as used in Java, C++, and C#. The class concept of Simula was also a progenitor of the package in Ada and the module of Modula-2. Even when developed originally in 1965, Simula classes could be included in library files and added at compile time.

Linking

Libraries are important in the program linking or binding process, which resolves references known as links or symbols to library modules. The linking process is usually automatically done by a linker or binder program that searches a set of libraries and other modules in a given order. Usually it is not considered an error if a link target can be found multiple times in a given set of libraries. Linking may be done when an executable file is created (static linking), or whenever the program is used at runtime (dynamic linking).

The references being resolved may be addresses for jumps and other routine calls. They may be in the main program, or in one module depending upon another. They are resolved into fixed or relocatable addresses (from a common base) by allocating runtime memory for the memory segments of each module referenced.

Some programming languages use a feature called smart linking whereby the linker is aware of or integrated with the compiler, such that the linker knows how external references are used, and code in a library that is never actually used, even though internally referenced, can be discarded from the compiled application. For example, a program that only uses integers for arithmetic, or does no arithmetic operations at all, can exclude floating-point library routines. This smart-linking feature can lead to smaller application file sizes and reduced memory usage.

Relocation

Some references in a program or library module are stored in a relative or symbolic form which cannot be resolved until all code and libraries are assigned final static addresses. Relocation is the process of adjusting these references, and is done either by the linker or the loader. In general, relocation cannot be done to individual libraries themselves because the addresses in memory may vary depending on the program using them and other libraries they are combined with. Position-independent code avoids references to absolute addresses and therefore does not require relocation.

Static libraries

When linking is performed during the creation of an executable or another object file, it is known as static linking or early binding. In this case, the linking is usually done by a linker, but may also be done by the compiler. A static library, also known as an archive, is one intended to be statically linked. Originally, only static libraries existed. Static linking must be performed when any modules are recompiled.

All of the modules required by a program are sometimes statically linked and copied into the executable file. This process, and the resulting stand-alone file, is known as a static build of the program. A static build may not need any further relocation if virtual memory is used and no address space layout randomization is desired.

Shared libraries

A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or runtime, rather than being copied by a linker when it creates a single monolithic executable file for the program.

Shared libraries can be statically linked during compile-time, meaning that references to the library modules are resolved and the modules are allocated memory when the executable file is created. But often linking of shared libraries is postponed until they are loaded.

Object libraries

Although originally pioneered in the 1960s, dynamic linking did not reach operating systems used by consumers until the late 1980s. It was generally available in some form in most operating systems by the early 1990s. During this same period, object-oriented programming (OOP) was becoming a significant part of the programming landscape. OOP with runtime binding requires additional information that traditional libraries do not supply. In addition to the names and entry points of the code located within, they also require a list of the objects they depend on. This is a side-effect of one of OOP's core concepts, inheritance, which means that parts of the complete definition of any method may be in different places. This is more than simply listing that one library requires the services of another: in a true OOP system, the libraries themselves may not be known at compile time, and vary from system to system.

At the same time many developers worked on the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls (RPC) already handled these tasks, but there was no standard RPC system.

Soon the majority of the minicomputer and mainframe vendors instigated projects to combine the two, producing an OOP library format that could be used anywhere. Such systems were known as object libraries, or distributed objects, if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use. DCOM, a modified version of COM, supports remote access.

For some time object libraries held the status of the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.

Class libraries

Class libraries are the rough OOP equivalent of older types of code libraries. They contain classes, which describe characteristics and define actions (methods) that involve objects. Class libraries are used to create instances, or objects with their characteristics set to specific values. In some OOP languages, like Java, the distinction is clear, with the classes often contained in library files (like Java's JAR file format) and the instantiated objects residing only in memory (although potentially able to be made persistent in separate files). In others, like Smalltalk, the class libraries are merely the starting point for a system image that includes the entire state of the environment, classes and all instantiated objects.

Today most class libraries are stored in a package repository (such as Maven Central for Java). Client code explicitly declare the dependencies to external libraries in build configuration files (such as a Maven Pom in Java).

Remote libraries

Another library technique uses completely separate executables (often in some lightweight form) and calls them using a remote procedure call (RPC) over a network to another computer. This maximizes operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.

However, such an approach means that every library call requires a considerable amount of overhead. RPC calls are much more expensive than calling a shared library that has already been loaded on the same machine. This approach is commonly used in a distributed architecture that makes heavy use of such remote calls, notably client-server systems and application servers such as Enterprise JavaBeans.

Code generation libraries

Code generation libraries are high-level APIs that can generate or transform byte code for Java. They are used by aspect-oriented programming, some data access frameworks, and for testing to generate dynamic proxy objects. They also are used to intercept field access.

File naming

Most modern Unix-like systems

The system stores libfoo.a and libfoo.so files in directories such as /lib, /usr/lib or /usr/local/lib. The filenames always start with lib, and end with a suffix of .a (archive, static library) or of .so (shared object, dynamically linked library). Some systems might have multiple names for a dynamically linked library. These names typically share the same prefix and have different suffixes indicating the version number. Most of the names are names for symbolic links to the latest version. For example, on some systems libfoo.so.2 would be the filename for the second major interface revision of the dynamically linked library libfoo. The .la files sometimes found in the library directories are libtool archives, not usable by the system as such.

macOS

The system inherits static library conventions from BSD, with the library stored in a .a file, and can use .so-style dynamically linked libraries (with the .dylib suffix instead). Most libraries in macOS, however, consist of "frameworks", placed inside special directories called "bundles" which wrap the library's required files and metadata. For example, a framework called MyFramework would be implemented in a bundle called MyFramework.framework, with MyFramework.framework/MyFramework being either the dynamically linked library file or being a symlink to the dynamically linked library file in MyFramework.framework/Versions/Current/MyFramework.

Microsoft Windows

Dynamic-link libraries usually have the suffix *.DLL,[18] although other file name extensions may identify specific-purpose dynamically linked libraries, e.g. *.OCX for OLE libraries. The interface revisions are either encoded in the file names, or abstracted away using COM-object interfaces. Depending on how they are compiled, *.LIB files can be either static libraries or representations of dynamically linkable libraries needed only during compilation, known as "import libraries". Unlike in the UNIX world, which uses different file extensions, when linking against .LIB file in Windows one must first know if it is a regular static library or an import library. In the latter case, a .DLL file must be present at runtime.

Linker (computing)

From Wikipedia, the free encyclopedia
An illustration of the linking process. Object files and static libraries are assembled into a new library or executable

In computing, a linker or link editor is a computer system program that takes one or more object files (generated by a compiler or an assembler) and combines them into a single executable file, library file, or another "object" file.

A simpler version that writes its output directly to memory is called the loader, though loading is typically considered a separate process.

Overview

Computer programs typically are composed of several parts or modules; these parts/modules do not need to be contained within a single object file, and in such cases refer to each other by means of symbols as addresses into other modules, which are mapped into memory addresses when linked for execution.

While the process of linking is meant to ultimately combine these independent parts, there are many good reasons to develop those separately at the source-level. Among these reasons are the ease of organizing several smaller pieces over a monolithic whole and the ability to better define the purpose and responsibilities of each individual piece, which is essential for managing complexity and increasing long-term maintainability in software architecture.

Typically, an object file can contain three kinds of symbols:

  • defined "external" symbols, sometimes called "public" or "entry" symbols, which allow it to be called by other modules,
  • undefined "external" symbols, which reference other modules where these symbols are defined, and
  • local symbols, used internally within the object file to facilitate relocation.

For most compilers, each object file is the result of compiling one input source code file. When a program comprises multiple object files, the linker combines these files into a unified executable program, resolving the symbols as it goes along.

Linkers can take objects from a collection called a library or runtime library. Most linkers do not include all the object files in a static library in the output executable; they include only those object files from the library that are referenced by other object files or libraries directly or indirectly. But for a shared library, the entire library has to be loaded during runtime as it is not known which functions or methods will be called during runtime. Library linking may thus be an iterative process, with some referenced modules requiring additional modules to be linked, and so on. Libraries exist for diverse purposes, and one or more system libraries are usually linked in by default.

The linker also takes care of arranging the objects in a program's address space. This may involve relocating code that assumes a specific base address into another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example, zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores.

The executable output by the linker may need another relocation pass when it is finally loaded into memory (just before execution). This pass is usually omitted on hardware offering virtual memory: every program is put into its own address space, so there is no conflict even if all programs load at the same base address. This pass may also be omitted if the executable is a position independent executable.

On some Unix variants, such as SINTRAN III, the process performed by a linker (assembling object files into a program) was called loading (as in loading executable code onto a file). Additionally, in some operating systems, the same program handles both the jobs of linking and loading a program (dynamic linking).

Dynamic linking

Many operating system environments allow dynamic linking, deferring the resolution of some undefined symbols until a program is run. That means that the executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will load these objects/libraries as well, and perform a final linking.

This approach offers two advantages:

  • Often-used libraries (for example the standard system libraries) need to be stored in only one location, not duplicated in every single executable file, thus saving limited memory and disk space.
  • If a bug in a library function is corrected by replacing the library or performance is improved, all programs using it dynamically will benefit from the correction after restarting them. Programs that included this function by static linking would have to be re-linked first.

There are also disadvantages:

  • Known on the Windows platform as "DLL hell", an incompatible updated library will break executables that depended on the behavior of the previous version of the library if the newer version is not correctly backward compatible.
  • A program, together with the libraries it uses, might be certified (e.g. as to correctness, documentation requirements, or performance) as a package, but not if components can be replaced (this also argues against automatic OS updates in critical systems; in both cases, the OS and libraries form part of a qualified environment).

Contained or virtual environments may further allow system administrators to mitigate or trade-off these individual pros and cons.

Static linking

Static linking is the result of the linker copying all library routines used in the program into the executable image. This may require more disk space and memory than dynamic linking, but is more portable, since it does not require the presence of the library on the system where it runs. Static linking also prevents "DLL hell", since each program includes exactly the versions of library routines that it requires, with no conflict with other programs. A program using just a few routines from a library does not require the entire library to be installed.

Relocation

As the compiler has no information on the layout of objects in the final output, it cannot take advantage of shorter or more efficient instructions that place a requirement on the address of another object. For example, a jump instruction can reference an absolute address or an offset from the current location, and the offset could be expressed with different lengths depending on the distance to the target. By first generating the most conservative instruction (usually the largest relative or absolute variant, depending on platform) and adding relaxation hints, it is possible to substitute shorter or more efficient instructions during the final link. In regard to jump optimizations this is also called automatic jump-sizing. This step can be performed only after all input objects have been read and assigned temporary addresses; the linker relaxation pass subsequently reassigns addresses, which may in turn allow more potential relaxations to occur. In general, the substituted sequences are shorter, which allows this process to always converge on the best solution given a fixed order of objects; if this is not the case, relaxations can conflict, and the linker needs to weigh the advantages of either option.

While instruction relaxation typically occurs at link-time, inner-module relaxation can already take place as part of the optimizing process at compile-time. In some cases, relaxation can also occur at load-time as part of the relocation process or combined with dynamic dead-code elimination techniques.

Linkage editor

In IBM System/360 mainframe environments such as OS/360, including z/OS for the z/Architecture mainframes, this type of program is known as a linkage editor. As the name implies a linkage editor has the additional capability of allowing the addition, replacement, and/or deletion of individual program sections. Operating systems such as OS/360 have format for executable load-modules containing supplementary data about the component sections of a program, so that an individual program section can be replaced, and other parts of the program updated so that relocatable addresses and other references can be corrected by the linkage editor, as part of the process.

One advantage of this is that it allows a program to be maintained without having to keep all of the intermediate object files, or without having to re-compile program sections that haven't changed. It also permits program updates to be distributed in the form of small files (originally card decks), containing only the object module to be replaced. In such systems, object code is in the form and format of 80-byte punched-card images, so that updates can be introduced into a system using that medium. In later releases of OS/360 and in subsequent systems, load-modules contain additional data about versions of components modules, to create a traceable record of updates. It also allows one to add, change, or remove an overlay structure from an already linked load module.

The term "linkage editor" should not be construed as implying that the program operates in a user-interactive mode like a text editor. It is intended for batch-mode execution, with the editing commands being supplied by the user in sequentially organized files, such as punched cards, DASD, or magnetic tape.

Linkage editing (IBM nomenclature) or consolidation or collection (ICL nomenclature) refers to the linkage editor's or consolidator's act of combining the various pieces into a relocatable binary, whereas the loading and relocation into an absolute binary at the target address is normally considered a separate step.

Linker Control Scripts

In the beginning linkers gave users very limited control over the arrangement of generated output object files. As the target systems became complex with different memory requirements such as embedded systems, it became necessary to give users control to generate output object files with their specific requirements such as defining base addresses' of segments. Linkers control scripts were used for this.

Common implementations

On Unix and Unix-like systems, the linker is known as "ld". Origins of the name "ld" are "LoaDer" and "Link eDitor". The term "loader" was used to describe the process of loading external symbols from other programs during the process of linking.

GNU linker

The GNU linker (or GNU ld) is the GNU Project's free software implementation of the Unix command ld. GNU ld runs the linker, which creates an executable file (or a library) from object files created during compilation of a software project. A linker script may be passed to GNU ld to exercise greater control over the linking process. The GNU linker is part of the GNU Binary Utilities (binutils). Two versions of ld are provided in binutils: the traditional GNU ld based on bfd, and a "streamlined" ELF-only version called gold.

The command-line and linker script syntaxes of GNU ld is the de facto standard in much of the Unix-like world. The LLVM project's linker, lld, is designed to be drop-in compatible, and may be used directly with the GNU compiler. Another drop-in replacement, mold, is a highly parallelized and faster alternative which is also supported by GNU tools.

Politics of Europe

From Wikipedia, the free encyclopedia ...