Search This Blog

Monday, November 27, 2023

Typesetting

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Typesetting

Movable type on a composing stick on a type case
A specimen sheet issued by William Caslon, letter founder, from the 1728 edition of Cyclopaedia
Diagram of a cast metal sort

Typesetting is the composition of text by means of arranging physical type (or sort) in mechanical systems or glyphs in digital systems representing characters (letters and other symbols). Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts (which are widely but erroneously confused with and substituted for typefaces). One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

Pre-digital era

Manual typesetting

During much of the letterpress era, movable type was composed by hand for each page by workers called compositors. A tray with many dividers, called a case, contained cast metal sorts, each with a single letter or symbol, but backwards (so they would print correctly). The compositor assembled these sorts into words, then lines, then pages of text, which were then bound tightly together by a frame, making up a form or page. If done correctly, all letters were of the same height, and a flat surface of type was created. The form was placed in a press and inked, and then printed (an impression made) on paper. Metal type read backwards, from right to left, and a key skill of the compositor was their ability to read this backwards text.

Before computers were invented, and thus becoming computerized (or digital) typesetting, font sizes were changed by replacing the characters with a different size of type. In letterpress printing, individual letters and punctuation marks were cast on small metal blocks, known as "sorts," and then arranged to form the text for a page. The size of the type was determined by the size of the character on the face of the sort. A compositor would need to physically swap out the sorts for a different size to change the font size.

During typesetting, individual sorts are picked from a type case with the right hand, and set from left to right into a composing stick held in the left hand, appearing to the typesetter as upside down. As seen in the photo of the composing stick, a lower case 'q' looks like a 'd', a lower case 'b' looks like a 'p', a lower case 'p' looks like a 'b' and a lower case 'd' looks like a 'q'. This is reputed to be the origin of the expression "mind your p's and q's". It might just as easily have been "mind your b's and d's".

A forgotten but important part of the process took place after the printing: the expensive sorts had to be redistributed into the typecase or sorted, so they would be ready for reuse. Errors in sorting could later produce misprints if, say, a p was put into the b compartment.

The diagram at right illustrates a cast metal sort: a face, b body or shank, c point size, 1 shoulder, 2 nick, 3 groove, 4 foot. Wooden printing sorts were used for centuries in combination with metal type. Not shown, and more the concern of the casterman, is the “set”, or width of each sort. Set width, like body size, is measured in points.

In order to extend the working life of type, and to account for the finite sorts in a case of type, copies of forms were cast when anticipating subsequent printings of a text, freeing the costly type for other work. This was particularly prevalent in book and newspaper work where rotary presses required type forms to wrap an impression cylinder rather than set in the bed of a press. In this process, called stereotyping, the entire form is pressed into a fine matrix such as plaster of Paris or papier mâché to create a flong, from which a positive form is cast in type metal.

Advances such as the typewriter and computer would push the state of the art even farther ahead. Still, hand composition and letterpress printing have not fallen completely out of use, and since the introduction of digital typesetting, it has seen a revival as an artisanal pursuit. However, it is a small niche within the larger typesetting market.

Hot metal typesetting

The time and effort required to manually compose the text led to several efforts in the 19th century to produce mechanical typesetting. While some, such as the Paige compositor, met with limited success, by the end of the 19th century, several methods had been devised whereby an operator working a keyboard or other devices could produce the desired text. Most of the successful systems involved the in-house casting of the type to be used, hence are termed "hot metal" typesetting. The Linotype machine, invented in 1884, used a keyboard to assemble the casting matrices, and cast an entire line of type at a time (hence its name). In the Monotype System, a keyboard was used to punch a paper tape, which was then fed to control a casting machine. The Ludlow Typograph involved hand-set matrices, but otherwise used hot metal. By the early 20th century, the various systems were nearly universal in large newspapers and publishing houses.

Phototypesetting

Linotype CRTronic 360 photosetter, a direct entry machine

Phototypesetting or "cold type" systems first appeared in the early 1960s and rapidly displaced continuous casting machines. These devices consisted of glass or film disks or strips (one per font) that spun in front of a light source to selectively expose characters onto light-sensitive paper. Originally they were driven by pre-punched paper tapes. Later they were connected to computer front ends.

One of the earliest electronic photocomposition systems was introduced by Fairchild Semiconductor. The typesetter typed a line of text on a Fairchild keyboard that had no display. To verify correct content of the line it was typed a second time. If the two lines were identical a bell rang and the machine produced a punched paper tape corresponding to the text. With the completion of a block of lines the typesetter fed the corresponding paper tapes into a phototypesetting device that mechanically set type outlines printed on glass sheets into place for exposure onto a negative film. Photosensitive paper was exposed to light through the negative film, resulting in a column of black type on white paper, or a galley. The galley was then cut up and used to create a mechanical drawing or paste up of a whole page. A large film negative of the page is shot and used to make plates for offset printing.

Digital era

The next generation of phototypesetting machines to emerge were those that generated characters on a cathode ray tube display. Typical of the type were the Alphanumeric APS2 (1963), IBM 2680 (1967), I.I.I. VideoComp (1973?), Autologic APS5 (1975), and Linotron 202 (1978). These machines were the mainstay of phototypesetting for much of the 1970s and 1980s. Such machines could be "driven online" by a computer front-end system or took their data from magnetic tape. Type fonts were stored digitally on conventional magnetic disk drives.

Computers excel at automatically typesetting and correcting documents. Character-by-character, computer-aided phototypesetting was, in turn, rapidly rendered obsolete in the 1980s by fully digital systems employing a raster image processor to render an entire page to a single high-resolution digital image, now known as imagesetting.

The first commercially successful laser imagesetter, able to make use of a raster image processor, was the Monotype Lasercomp. ECRM, Compugraphic (later purchased by Agfa) and others rapidly followed suit with machines of their own.

Early minicomputer-based typesetting software introduced in the 1970s and early 1980s, such as Datalogics Pager, Penta, Atex, Miles 33, Xyvision, troff from Bell Labs, and IBM's Script product with CRT terminals, were better able to drive these electromechanical devices, and used text markup languages to describe type and other page formatting information. The descendants of these text markup languages include SGML, XML and HTML.

The minicomputer systems output columns of text on film for paste-up and eventually produced entire pages and signatures of 4, 8, 16 or more pages using imposition software on devices such as the Israeli-made Scitex Dolev. The data stream used by these systems to drive page layout on printers and imagesetters, often proprietary or specific to a manufacturer or device, drove development of generalized printer control languages, such as Adobe Systems' PostScript and Hewlett-Packard's PCL.

Text sample (an extract of the essay The Renaissance of English Art by Oscar Wilde) typeset in Iowan Old Style roman, italics and small caps, adjusted to approximately 10 words per line, with the typeface sized at 14 points on 1.4 x leading, with 0.2 points extra tracking

Computerized typesetting was so rare that BYTE magazine (comparing itself to "the proverbial shoemaker's children who went barefoot") did not use any computers in production until its August 1979 issue used a Compugraphics system for typesetting and page layout. The magazine did not yet accept articles on floppy disks, but hoped to do so "as matters progress". Before the 1980s, practically all typesetting for publishers and advertisers was performed by specialist typesetting companies. These companies performed keyboarding, editing and production of paper or film output, and formed a large component of the graphic arts industry. In the United States, these companies were located in rural Pennsylvania, New England or the Midwest, where labor was cheap and paper was produced nearby, but still within a few hours' travel time of the major publishing centers.

In 1985, with the new concept of WYSIWYG (for What You See Is What You Get) in text editing and word processing on personal computers, desktop publishing became available, starting with the Apple Macintosh, Aldus PageMaker (and later QuarkXPress) and PostScript and on the PC platform with Xerox Ventura Publisher under DOS as well as Pagemaker under Windows. Improvements in software and hardware, and rapidly lowering costs, popularized desktop publishing and enabled very fine control of typeset results much less expensively than the minicomputer dedicated systems. At the same time, word processing systems, such as Wang, WordPerfect and Microsoft Word, revolutionized office documents. They did not, however, have the typographic ability or flexibility required for complicated book layout, graphics, mathematics, or advanced hyphenation and justification rules (H and J).

By 2000, this industry segment had shrunk because publishers were now capable of integrating typesetting and graphic design on their own in-house computers. Many found the cost of maintaining high standards of typographic design and technical skill made it more economical to outsource to freelancers and graphic design specialists.

The availability of cheap or free fonts made the conversion to do-it-yourself easier, but also opened up a gap between skilled designers and amateurs. The advent of PostScript, supplemented by the PDF file format, provided a universal method of proofing designs and layouts, readable on major computers and operating systems.

QuarkXPress had enjoyed a market share of 95% in the 1990s, but lost its dominance to Adobe InDesign from the mid-2000s onward.

SCRIPT variants

Mural mosaic "Typesetter" at John A. Prior Health Sciences Library in Ohio

IBM created and inspired a family of typesetting languages with names that were derivatives of the word "SCRIPT". Later versions of SCRIPT included advanced features, such as automatic generation of a table of contents and index, multicolumn page layout, footnotes, boxes, automatic hyphenation and spelling verification.

NSCRIPT was a port of SCRIPT to OS and TSO from CP-67/CMS SCRIPT.

Waterloo Script was created at the University of Waterloo (UW) later. One version of SCRIPT was created at MIT and the AA/CS at UW took over project development in 1974. The program was first used at UW in 1975. In the 1970s, SCRIPT was the only practical way to word process and format documents using a computer. By the late 1980s, the SCRIPT system had been extended to incorporate various upgrades.

The initial implementation of SCRIPT at UW was documented in the May 1975 issue of the Computing Centre Newsletter, which noted some the advantages of using SCRIPT:

  1. It easily handles footnotes.
  2. Page numbers can be in Arabic or Roman numerals, and can appear at the top or bottom of the page, in the centre, on the left or on the right, or on the left for even-numbered pages and on the right for odd-numbered pages.
  3. Underscoring or overstriking can be made a function of SCRIPT, thus uncomplicating editor functions.
  4. SCRIPT files are regular OS datasets or CMS files.
  5. Output can be obtained on the printer, or at the terminal…

The article also pointed out SCRIPT had over 100 commands to assist in formatting documents, though 8 to 10 of these commands were sufficient to complete most formatting jobs. Thus, SCRIPT had many of the capabilities computer users generally associate with contemporary word processors.

SCRIPT/VS was a SCRIPT variant developed at IBM in the 1980s.

DWScript is a version of SCRIPT for MS-DOS, named after its author, D. D. Williams, but was never released to the public and only used internally by IBM.

Script is still available from IBM as part of the Document Composition Facility for the z/OS operating system.

SGML and XML systems

The standard generalized markup language (SGML) was based upon IBM Generalized Markup Language (GML). GML was a set of macros on top of IBM Script. DSSSL is an international standard developed to provide a stylesheets for SGML documents.

XML is a successor of SGML. XSL-FO is most often used to generate PDF files from XML files.

The arrival of SGML/XML as the document model made other typesetting engines popular. Such engines include Datalogics Pager, Penta, Miles 33's OASYS, Xyvision's XML Professional Publisher, FrameMaker, and Arbortext. XSL-FO compatible engines include Apache FOP, Antenna House Formatter, and RenderX's XEP. These products allow users to program their SGML/XML typesetting process with the help of scripting languages.

YesLogic's Prince is another one, which is based on CSS Paged Media.

Troff and successors

During the mid-1970s, Joe Ossanna, working at Bell Laboratories, wrote the troff typesetting program to drive a Wang C/A/T phototypesetter owned by the Labs; it was later enhanced by Brian Kernighan to support output to different equipment, such as laser printers. While its use has fallen off, it is still included with a number of Unix and Unix-like systems, and has been used to typeset a number of high-profile technical and computer books. Some versions, as well as a GNU work-alike called groff, are now open source.

TeX and LaTeX

Mathematical text typeset using TeX and the AMS Euler font

The TeX system, developed by Donald E. Knuth at the end of the 1970s, is another widespread and powerful automated typesetting system that has set high standards, especially for typesetting mathematics. LuaTeX and LuaLaTeX are variants of TeX and of LaTeX scriptable in Lua. TeX is considered fairly difficult to learn on its own, and deals more with appearance than structure. The LaTeX macro package, written by Leslie Lamport at the beginning of the 1980s, offered a simpler interface and an easier way to systematically encode the structure of a document. LaTeX markup is widely used in academic circles for published papers and books. Although standard TeX does not provide an interface of any sort, there are programs that do. These programs include Scientific Workplace and LyX, which are graphical/interactive editors; TeXmacs, while being an independent typesetting system, can also aid the preparation of TeX documents through its export capability.

Other text formatters

GNU TeXmacs (whose name is a combination of TeX and Emacs, although it is independent from both of these programs) is a typesetting system which is at the same time a WYSIWYG word processor.

SILE borrows some algorithms from TeX and relies on other libraries such as HarfBuzz and ICU, with an extensible core engine developed in Lua. By default, SILE's input documents can be composed in a custom LaTeX-inspired markup (SIL) or in XML. Via the adjunction of 3rd-party modules, composition in Markdown or Djot is also possible.

A new typesetting system Typst tries to combine a simple markup of the input and the possibility of using common programming constructs with a high typographical quality of the output. This system has been in beta testing since March 2023 and was presented in July 2023 at the Tex Users Group (TUG) 2023 conference.

Several other text-formatting software packages exist—notably Lout, Patoline, Pollen, and Ant —but are not widely used.

Electronic publishing

From Wikipedia, the free encyclopedia

Electronic publishing (also referred to as publishing, digital publishing, or online publishing) includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals, and magazines to be posted on a screen (computer, e-reader, tablet, or smartphone).

About

Electronic publishing has become common in scientific publishing where it has been argued that peer-reviewed scientific journals are in the process of being replaced by electronic publishing. It is also becoming common to distribute books, magazines, and newspapers to consumers through tablet reading devices, a market that is growing by millions each year, generated by online vendors such as Apple's iTunes bookstore, Amazon's bookstore for Kindle, and books in the Google Play Bookstore. Market research suggested that half of all magazine and newspaper circulation would be via digital delivery by the end of 2015 and that half of all reading in the United States would be done without paper by 2015.

Although distribution via the Internet (also known as online publishing or web publishing when in the form of a website) is nowadays strongly associated with electronic publishing, there are many non-network electronic publications such as encyclopedias on CD and DVD, as well as technical and reference publications relied on by mobile users and others without reliable and high-speed access to a network. Electronic publishing is also being used in the field of test-preparation in developed as well as in developing economies for student education (thus partly replacing conventional books) – for it enables content and analytics combined – for the benefit of students. The use of electronic publishing for textbooks may become more prevalent with Apple Books from Apple Inc. and Apple's negotiation with the three largest textbook suppliers in the U.S.

Electronic publishing is increasingly popular in works of fiction. Electronic publishers are able to respond quickly to changing market demand, because the companies do not have to order printed books and have them delivered. E-publishing is also making a wider range of books available, including books that customers would not find in standard book retailers, due to insufficient demand for a traditional "print run". E-publication is enabling new authors to release books that would be unlikely to be profitable for traditional publishers. While the term "electronic publishing" is primarily used in the 2010s to refer to online and web-based publishers, the term has a history of being used to describe the development of new forms of production, distribution, and user interaction in regard to computer-based production of text and other interactive media.

History

Digitization

The first digitization initiative was in 1971 by Michael S. Hart, a student at the University of Illinois at Chicago, who launched Project Gutenberg, designed to make literature more accessible to everyone, through the internet. It took a while to develop, and in 1989 there were only 10 texts that were manually recopied on computer by Michael S. Hart himself and some volunteers. But with the appearance of the Web 1.0 in 1991 and its ability to connect documents together through static pages, the project moved quickly forward. Many more volunteers helped in developing the project by giving access to public domain classics.

In the 1970s, the French National Centre for Scientific Research digitized a thousand books from diverse subjects, mostly literature but also philosophy and science, dating back to the 12th century to present times. In this way were built the foundations of a large dictionary, the Trésor de la langue française au Québec. This foundation of e-texts, named Frantext, was published on a compact disc under the brand name Discotext, and then on the worldwide web in 1998.

Mass-scale digitization

In 1974, American inventor and futurist Raymond Kurzweil developed a scanner which was equipped with an Omnifont software that enabled optical character recognition for numeric inputs. The digitization projects could then be more ambitious since the time needed for digitization decreased considerably, and digital libraries were on the rise. All over the world, e-libraries started to emerge.

The ABU (Association des Bibliophiles Universels), was a public digital library project created by the Cnam in 1993. It was the first French digital library in the network; suspended since 2002, they reproduced over a hundred texts that are still available.

In 1992, the Bibliothèque nationale de France launched a vast digitization program. The president François Mitterrand had wanted since 1988 to create a new and innovative digital library, and it was published in 1997 under the name of Gallica. In 2014, the digital library was offering 80 255 online books and over a million documents, including prints and manuscripts.

In 2003, Wikisource was launched, and the project aspired to constitute a digital and multilingual library that would be a complement to the Wikipedia project. It was originally named "Project Sourceberg", as a word play to remind the Project Gutenberg. Supported by the Wikimedia Foundation, Wikisource proposes digitized texts that have been verified by volunteers.

In December 2004, Google created Google Books, a project to digitize all the books available in the world (over 130 million books) to make them accessible online. 10 years later, 25 000 000 books, from a hundred countries and in 400 languages, are on the platform. This was possible because by that time, robotic scanners could digitize around 6 000 books per hour.

In 2008, the prototype of Europeana was launched; and by 2010, the project had been giving access to over 10 million digital objects. The Europeana library is a European catalog that offers index cards on millions of digital objects and links to their digital libraries. In the same year, HathiTrust was created to put together the contents of many university e-libraries from USA and Europe, as well as Google Books and Internet Archive. In 2016, over six millions of users had been using HathiTrust.

Electronic publishing

The first digitization projects were transferring physical content into digital content. Electronic publishing is aiming to integrate the whole process of editing and publishing (production, layout, publication) in the digital world.

Alain Mille, in the book Pratiques de l'édition numérique (edited by Michael E. Sinatra and Marcello Vitali-Rosati), says that the beginnings of Internet and the Web are the very core of electronic publishing, since they pretty much determined the biggest changes in the production and diffusion patterns. Internet has a direct effect on the publishing questions, letting creators and users go further in the traditional process (writer-editor-publishing house).

The traditional publishing, and especially the creation part, were first revolutionized by new desktop publishing softwares appearing in the 1980s, and by the text databases created for the encyclopedias and directories. At the same time the multimedia was developing quickly, combining book, audiovisual and computer science characteristics. CDs and DVDs appear, permitting the visualization of these dictionaries and encyclopedias on computers.

The arrival and democratization of Internet is slowly giving small publishing houses the opportunity to publish their books directly online. Some websites, like Amazon, let their users buy eBooks; Internet users can also find many educative platforms (free or not), encyclopedic websites like Wikipedia, and even digital magazines platforms. The eBook then becomes more and more accessible through many different supports, like the e-reader and even smartphones. The digital book had, and still has, an important impact on publishing houses and their economical models; it is still a moving domain, and they yet have to master the new ways of publishing in a digital era.

Online edition

Based on new communications practices of the web 2.0 and the new architecture of participation, online edition opens the door to a collaboration of a community to elaborate and improve contents on Internet, while also enriching reading through collective reading practices. The web 2.0 not only links documents together, as did the web 1.0, it also links people together through social media: that's why it's called the Participative (or participatory) Web.

Many tools were put in place to foster sharing and creative collective contents. One of the many is the Wikipedia encyclopedia, since it is edited, corrected and enhanced by millions of contributors. OpenStreetMap is also based on the same principle. Blogs and comment systems are also now renown as online edition and publishing, since it is possible through new interactions between the author and its readers, and can be an important method for inspiration but also for visibility.

Process

The electronic publishing process follows some aspects of the traditional paper-based publishing process but differs from traditional publishing in two ways: 1) it does not include using an offset printing press to print the final product and 2) it avoids the distribution of a physical product (e.g., paper books, paper magazines, or paper newspapers). Because the content is electronic, it may be distributed over the Internet and through electronic bookstores, and users can read the material on a range of electronic and digital devices, including desktop computers, laptops, tablet computers, smartphones or e-reader tablets. The consumer may read the published content online on a website, in an application on a tablet device, or in a PDF document on a computer. In some cases, the reader may print the content onto paper using a consumer-grade ink-jet or laser printer or via a print-on-demand system. Some users download digital content to their devices, enabling them to read the content even when their device is not connected to the Internet (e.g., on an airplane flight).

Distributing content electronically as software applications ("apps") has become popular in the 2010s, due to the rapid consumer adoption of smartphones and tablets. At first, native apps for each mobile platform were required to reach all audiences, but in an effort toward universal device compatibility, attention has turned to using HTML5 to create web apps that can run on any browser and function on many devices. The benefit of electronic publishing comes from using three attributes of digital technology: XML tags to define content, style sheets to define the look of content, and metadata (data about data) to describe the content for search engines, thus helping users to find and locate the content (a common example of metadata is the information about a song's songwriter, composer, genre that is electronically encoded along with most CDs and digital audio files; this metadata makes it easier for music lovers to find the songs they are looking for). With the use of tags, style sheets, and metadata, this enables "reflowable" content that adapts to various reading devices (tablet, smartphone, e-reader, etc.) or electronic delivery methods.

Because electronic publishing often requires text mark-up (e.g., HyperText Markup Language or some other markup language) to develop online delivery methods, the traditional roles of typesetters and book designers, who created the printing set-ups for paper books, have changed. Designers of digitally published content must have a strong knowledge of mark-up languages, the variety of reading devices and computers available, and the ways in which consumers read, view or access the content. However, in the 2010s, new user friendly design software is becoming available for designers to publish content in this standard without needing to know detailed programming techniques, such as Adobe Systems' Digital Publishing Suite and Apple's iBooks Author. The most common file format is .epub, used in many e-book formats. .epub is a free and open standard available in many publishing programs. Another common format is .folio, which is used by the Adobe Digital Publishing Suite to create content for Apple's iPad tablets and apps.

Academic publishing

After an article is submitted to an academic journal for consideration, there can be a delay ranging from several months to more than two years before it is published in a journal, rendering journals a less than ideal format for disseminating current research. In some fields, such as astronomy and some areas of physics, the role of the journal in disseminating the latest research has largely been replaced by preprint repositories such as arXiv.org. However, scholarly journals still play an important role in quality control and establishing scientific credit. In many instances, the electronic materials uploaded to preprint repositories are still intended for eventual publication in a peer-reviewed journal. There is statistical evidence that electronic publishing provides wider dissemination, because when a journal is available online, a larger number of researchers can access the journal. Even if a professor is working in a university that does not have a certain journal in its library, she may still be able to access the journal online. A number of journals have, while retaining their longstanding peer review process to ensure that the research is done properly, established electronic versions or even moved entirely to electronic publication.

Copyright

In the early 2000s, many of the existing copyright laws were designed around printed books, magazines and newspapers. For example, copyright laws often set limits on how much of a book can be mechanically reproduced or copied. Electronic publishing raises new questions in relation to copyright, because if an e-book or e-journal is available online, millions of Internet users may be able to view a single electronic copy of the document, without any "copies" being made.

Emerging evidence suggests that e-publishing may be more collaborative than traditional paper-based publishing; e-publishing often involves more than one author, and the resulting works are more accessible, since they are published online. At the same time, the availability of published material online opens more doors for plagiarism, unauthorized use, or re-use of the material. Some publishers are trying to address these concerns. For example, in 2011, HarperCollins limited the number of times that one of its e-books could be lent in a public library. Other publishers, such as Penguin, are attempting to incorporate e-book elements into their regular paper publications.

Reference work

From Wikipedia, the free encyclopedia
The Brockhaus Enzyklopädie, the best-known traditional reference book in German-speaking countries
The Lexikon des Mittelalters, a specialised German encyclopedia
Encyclopædia Britannica, 15th edition: volumes of the Propedia (green), Micropedia (red), Macropedia (black), and 2-volume Index (blue)

A reference work is a non-fiction work, such as a paper, book or periodical (or their electronic equivalents), to which one can refer for information. The information is intended to be found quickly when needed. Such works are usually referred to for particular pieces of information, rather than read beginning to end. The writing style used in these works is informative; the authors avoid use of the first person, and emphasize facts.

Indices are a common navigation feature in many types of reference works. Many reference works are put together by a team of contributors whose work is coordinated by one or more editors, rather than by an individual author. Updated editions are usually published as needed, in some cases annually (Whitaker's Almanack, Who's Who).

Reference works include textbooks, almanacs, atlases, bibliographies, biographical sources, catalogs such as library catalogs and art catalogs, concordances, dictionaries, directories such as business directories and telephone directories, discographies, encyclopedias, filmographies, gazetteers, glossaries, handbooks, indices such as bibliographic indices and citation indices, manuals, research guides, thesauruses, and yearbooks. Many reference works are available in electronic form and can be obtained as reference software, CD-ROMs, DVDs, or online through the Internet. Wikipedia, an online encyclopedia, is both the largest and the most-read reference work in history.

Reference book

In contrast to books that are loaned, a reference book or reference-only book in a library is one that may only be used in the library and may not be borrowed from the library. Many such books are reference works (in the first sense), which are, usually, used briefly or photocopied from, and therefore, do not need to be borrowed. Keeping reference books in the library assures that they will always be available for use on demand. Some reference-only books are too valuable to permit borrowers to take them out. Reference-only items may be shelved in a reference collection located separately from circulating items. Some libraries consist entirely, or to a large extent, of books which may not be borrowed.

Types of reference work

These are the main types and categories of reference work:

  • Abstracting journal – a published summary of articles, theses, reviews, conference proceedings etc. arranged systematically
  • Almanac – an annual publication, listing a set of current, general or specific information about one or multiple subjects
  • Annals – concise historical record in which events are arranged chronologically
  • Atlas – a collection of maps traditionally been bound into book form
  • Bibliography – a systematic list of books and other works such as journal articles on a given subject or which satisfy particular criteria
  • Biographical dictionary – an encyclopedic dictionary limited to biographical information
  • Books of Quotations – collections of quotations satisfying particular criteria, arranged systematically
  • Chronicle/Chronology – a historical account of events arranged in chronological order
  • Compendium – a concise collection of information pertaining to a body of knowledge
  • Concordance – an alphabetical list of the principal words used in a book or body of work
  • Dictionary – a list of words from one or more languages, systematically arranged and giving meanings, etymologies etc.
  • Digest – a summary of information on a particular subject
  • Directory – a systematically arranged list of names, addresses, products, etc.
  • Encyclopaedia – a compendium providing summaries of knowledge either from all branches or from a particular field or discipline
  • Gazetteer – a geographical dictionary or directory used to provide systematic access to a map or atlas
  • Glossary – an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms
  • Handbook – a small or portable book intended to provide ready reference
  • Index – a publication giving systematic access to a body of knowledge
  • Lexicon – a synonym for a dictionary or encyclopaedic dictionary
  • List – a published enumeration of a set of items
  • Manual – a handbook providing instructions in the use of a particular product
  • Phrase book – a collection of ready-made phrases, arranged systematically, usually for a foreign language together with a translation
  • Ready reckoner – a printed book or table containing pre-calculated values
  • Thematic catalogue – an index used to identify musical compositions through the citation of the opening notes
  • Textbook – a reference work containing information about a subject
  • Thesaurus – a reference work for finding synonyms and sometimes antonyms of words
  • Timetable – a published list of schedules giving times for transportation or other events
  • Yearbook – a compendium containing events relating to a specific year

Electronic resources

An electronic resource is a computer program or data that is stored electronically, which is usually found on a computer, including information that is available on the Internet. Libraries offer numerous types of electronic resources including electronic texts such as electronic books and electronic journals, bibliographic databases, institutional repositories, websites, and software applications.

Runtime system

From Wikipedia, the free encyclopedia

Most programming languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the management of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. The compiler makes assumptions depending on the specific runtime system to generate correct code. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.

Overview

Every programming language specifies an execution model, and many implement at least part of that model in a runtime system. One possible definition of runtime system behavior, among others, is "any behavior not directly attributable to the program itself". This definition includes putting parameters onto the stack before function calls, parallel execution of related behaviors, and disk I/O.

By this definition, essentially every language has a runtime system, including compiled languages, interpreted languages, and embedded domain-specific languages. Even API-invoked standalone execution models, such as Pthreads (POSIX threads), have a runtime system that implements the execution model's behavior.

Most scholarly papers on runtime systems focus on the implementation details of parallel runtime systems. A notable example of a parallel runtime system is Cilk, a popular parallel programming model. The proto-runtime toolkit was created to simplify the creation of parallel runtime systems.

In addition to execution model behavior, a runtime system may also perform support services such as type checking, debugging, or code generation and optimization.

Comparison between concepts similar to runtime system.
Type Description Examples
Runtime environment Software platform that provides an environment for executing code Node.js, .NET Framework
Engine Component of a runtime environment that executes code by compiling or interpreting it JavaScript engine in web browsers, Java Virtual Machine
Interpreter Type of engine that reads and executes code line by line, without compiling the entire program beforehand CPython interpreter, Ruby MRI, JavaScript (in some cases)
JIT interpreter Type of interpreter that dynamically compiles code into machine instructions at runtime, optimizing the code for faster execution V8, PyPy interpreter

Relation to runtime environments

The runtime system is also the gateway through which a running program interacts with the runtime environment. The runtime environment includes not only accessible state values, but also active entities with which the program can interact during execution. For example, environment variables are features of many operating systems, and are part of the runtime environment; a running program can access them via the runtime system. Likewise, hardware devices such as disks or DVD drives are active entities that a program can interact with via a runtime system.

One unique application of a runtime environment is its use within an operating system that only allows it to run. In other words, from boot until power-down, the entire OS is dedicated to only the application(s) running within that runtime environment. Any other code that tries to run, or any failures in the application(s), will break the runtime environment. Breaking the runtime environment in turn breaks the OS, stopping all processing and requiring a reboot. If the boot is from read-only memory, an extremely secure, simple, single-mission system is created.

Examples of such directly bundled runtime systems include:

  • Between 1983 and 1984, Digital Research offered several of their business and educations applications for the IBM PC on bootable floppy diskettes bundled with SpeedStart CP/M-86, a reduced version of CP/M-86 as runtime environment.
  • Some stand-alone versions of Ventura Publisher (1986–1993), Artline (1988–1991), Timeworks Publisher (1988–1991) and ViewMAX (1990–1992) contained special runtime versions of Digital Research's GEM as their runtime environment.
  • In the late 1990s, JP Software's command line processor 4DOS was optionally available in a special runtime version to be linked with BATCOMP pre-compiled and encrypted batch jobs in order to create unmodifyable executables from batch scripts and run them on systems without 4DOS installed.

Examples

The runtime system of the C language is a particular set of instructions inserted by the compiler into the executable image. Among other things, these instructions manage the process stack, create space for local variables, and copy function call parameters onto the top of the stack.

There are often no clear criteria for determining which language behaviors are part of the runtime system itself and which can be determined by any particular source program. For example, in C, the setup of the stack is part of the runtime system. It is not determined by the semantics of an individual program because the behavior is globally invariant: it holds over all executions. This systematic behavior implements the execution model of the language, as opposed to implementing semantics of the particular program (in which text is directly translated into code that computes results).

This separation between the semantics of a particular program and the runtime environment is reflected by the different ways of compiling a program: compiling source code to an object file that contains all the functions versus compiling an entire program to an executable binary. The object file will only contain assembly code relevant to the included functions, while the executable binary will contain additional code that implements the runtime environment. The object file, on one hand, may be missing information from the runtime environment that will be resolved by linking. On the other hand, the code in the object file still depends on assumptions in the runtime system; for example, a function may read parameters from a particular register or stack location, depending on the calling convention used by the runtime environment.

Another example is the case of using an application programming interface (API) to interact with a runtime system. The calls to that API look the same as calls to a regular software library, however at some point during the call the execution model changes. The runtime system implements an execution model different from that of the language the library is written in terms of. A person reading the code of a normal library would be able to understand the library's behavior by just knowing the language the library was written in. However, a person reading the code of the API that invokes a runtime system would not be able to understand the behavior of the API call just by knowing the language the call was written in. At some point, via some mechanism, the execution model stops being that of the language the call is written in and switches over to being the execution model implemented by the runtime system. For example, the trap instruction is one method of switching execution models. This difference is what distinguishes an API-invoked execution model, such as Pthreads, from a usual software library. Both Pthreads calls and software library calls are invoked via an API, but Pthreads behavior cannot be understood in terms of the language of the call. Rather, Pthreads calls bring into play an outside execution model, which is implemented by the Pthreads runtime system (this runtime system is often the OS kernel).

As an extreme example, the physical CPU itself can be viewed as an implementation of the runtime system of a specific assembly language. In this view, the execution model is implemented by the physical CPU and memory systems. As an analogy, runtime systems for higher-level languages are themselves implemented using some other languages. This creates a hierarchy of runtime systems, with the CPU itself—or actually its logic at the microcode layer or below—acting as the lowest-level runtime system.

Advanced features

Some compiled or interpreted languages provide an interface that allows application code to interact directly with the runtime system. An example is the Thread class in the Java language. The class allows code (that is animated by one thread) to do things such as start and stop other threads. Normally, core aspects of a language's behavior such as task scheduling and resource management are not accessible in this fashion.

Higher-level behaviors implemented by a runtime system may include tasks such as drawing text on the screen or making an Internet connection. It is often the case that operating systems provide these kinds of behaviors as well, and when available, the runtime system is implemented as an abstraction layer that translates the invocation of the runtime system into an invocation of the operating system. This hides the complexity or variations in the services offered by different operating systems. This also implies that the OS kernel can itself be viewed as a runtime system, and that the set of OS calls that invoke OS behaviors may be viewed as interactions with a runtime system.

In the limit, the runtime system may provide services such as a P-code machine or virtual machine, that hide even the processor's instruction set. This is the approach followed by many interpreted languages such as AWK, and some languages like Java, which are meant to be compiled into some machine-independent intermediate representation code (such as bytecode). This arrangement simplifies the task of language implementation and its adaptation to different machines, and improves efficiency of sophisticated language features such as reflection. It also allows the same program to be executed on any machine without an explicit recompiling step, a feature that has become very important since the proliferation of the World Wide Web. To speed up execution, some runtime systems feature just-in-time compilation to machine code.

A modern aspect of runtime systems is parallel execution behaviors, such as the behaviors exhibited by mutex constructs in Pthreads and parallel section constructs in OpenMP. A runtime system with such parallel execution behaviors may be modularized according to the proto-runtime approach.

History

Notable early examples of runtime systems are the interpreters for BASIC and Lisp. These environments also included a garbage collector. Forth is an early example of a language designed to be compiled into intermediate representation code; its runtime system was a virtual machine that interpreted that code. Another popular, if theoretical, example is Donald Knuth's MIX computer.

In C and later languages that supported dynamic memory allocation, the runtime system also included a library that managed the program's memory pool.

In the object-oriented programming languages, the runtime system was often also responsible for dynamic type checking and resolving method references.

Library (computing)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Library_(computing)
Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

A library is also a collection of implementations of behavior, written in terms of a language, that has a well-defined interface by which the behavior is invoked. For instance, people who want to write a higher-level program can use a library to make system calls instead of implementing those system calls over and over again. In addition, the behavior is provided for reuse by multiple independent programs. A program invokes the library-provided behavior via a mechanism of the language. For example, in a simple imperative language such as C, the behavior in a library is invoked by using C's normal function-call. What distinguishes the call as being to a library function, versus being to another function in the same program, is the way that the code is organized in the system.

Library code is organized in such a way that it can be used by multiple programs that have no connection to each other, while code that is part of a program is organized to be used only within that one program. This distinction can gain a hierarchical notion when a program grows large, such as a multi-million-line program. In that case, there may be internal libraries that are reused by independent sub-portions of the large program. The distinguishing feature is that a library is organized for the purposes of being reused by independent programs or sub-programs, and the user only needs to know the interface and not the internal details of the library.

The value of a library lies in the reuse of standardized program elements. When a program invokes a library, it gains the behavior implemented inside that library without having to implement that behavior itself. Libraries encourage the sharing of code in a modular fashion and ease the distribution of the code.

The behavior implemented by a library can be connected to the invoking program at different program lifecycle phases. If the code of the library is accessed during the build of the invoking program, then the library is called a static library. An alternative is to build the executable of the invoking program and distribute that, independently of the library implementation. The library behavior is connected after the executable has been invoked to be executed, either as part of the process of starting the execution, or in the middle of execution. In this case the library is called a dynamic library (loaded at runtime). A dynamic library can be loaded and linked when preparing a program for execution, by the linker. Alternatively, in the middle of execution, an application may explicitly request that a module be loaded.

Most compiled languages have a standard library, although programmers can also create their own custom libraries. Most modern software systems provide libraries that implement the majority of the system services. Such libraries have organized the services which a modern application requires. As such, most code used by modern applications is provided in these system libraries.

History

The idea of a computer library dates back to the first computers created by Charles Babbage. An 1888 paper on his Analytical Engine suggested that computer operations could be punched on separate cards from numerical input. If these operation punch cards were saved for reuse then "by degrees the engine would have a library of its own."

A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer.

In 1947 Goldstine and von Neumann speculated that it would be useful to create a "library" of subroutines for their work on the IAS machine, an early computer that was not yet operational at that time. They envisioned a physical library of magnetic wire recordings, with each wire storing reusable computer code.

Inspired by von Neumann, Wilkes and his team constructed EDSAC. A filing cabinet of punched tape held the subroutine library for this computer. Programs for EDSAC consisted of a main program and a sequence of subroutines copied from the subroutine library. In 1951 the team published the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer, which detailed the creation and the purpose of the library.

COBOL included "primitive capabilities for a library system" in 1959, but Jean Sammet described them as "inadequate library facilities" in retrospect.

JOVIAL had a Communication Pool (COMPOOL), roughly a library of header files.

Another major contributor to the modern library concept came in the form of the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of each other, but the compiler lacked a linker. So prior to the introduction of modules in Fortran-90, type checking between FORTRAN subprograms was impossible.

By the mid 1960s, copy and macro libraries for assemblers were common. Starting with the popularity of the IBM System/360, libraries containing other types of text elements, e.g., system parameters, also became common.

Simula was the first object-oriented programming language, and its classes were nearly identical to the modern concept as used in Java, C++, and C#. The class concept of Simula was also a progenitor of the package in Ada and the module of Modula-2. Even when developed originally in 1965, Simula classes could be included in library files and added at compile time.

Linking

Libraries are important in the program linking or binding process, which resolves references known as links or symbols to library modules. The linking process is usually automatically done by a linker or binder program that searches a set of libraries and other modules in a given order. Usually it is not considered an error if a link target can be found multiple times in a given set of libraries. Linking may be done when an executable file is created (static linking), or whenever the program is used at runtime (dynamic linking).

The references being resolved may be addresses for jumps and other routine calls. They may be in the main program, or in one module depending upon another. They are resolved into fixed or relocatable addresses (from a common base) by allocating runtime memory for the memory segments of each module referenced.

Some programming languages use a feature called smart linking whereby the linker is aware of or integrated with the compiler, such that the linker knows how external references are used, and code in a library that is never actually used, even though internally referenced, can be discarded from the compiled application. For example, a program that only uses integers for arithmetic, or does no arithmetic operations at all, can exclude floating-point library routines. This smart-linking feature can lead to smaller application file sizes and reduced memory usage.

Relocation

Some references in a program or library module are stored in a relative or symbolic form which cannot be resolved until all code and libraries are assigned final static addresses. Relocation is the process of adjusting these references, and is done either by the linker or the loader. In general, relocation cannot be done to individual libraries themselves because the addresses in memory may vary depending on the program using them and other libraries they are combined with. Position-independent code avoids references to absolute addresses and therefore does not require relocation.

Static libraries

When linking is performed during the creation of an executable or another object file, it is known as static linking or early binding. In this case, the linking is usually done by a linker, but may also be done by the compiler. A static library, also known as an archive, is one intended to be statically linked. Originally, only static libraries existed. Static linking must be performed when any modules are recompiled.

All of the modules required by a program are sometimes statically linked and copied into the executable file. This process, and the resulting stand-alone file, is known as a static build of the program. A static build may not need any further relocation if virtual memory is used and no address space layout randomization is desired.

Shared libraries

A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or runtime, rather than being copied by a linker when it creates a single monolithic executable file for the program.

Shared libraries can be statically linked during compile-time, meaning that references to the library modules are resolved and the modules are allocated memory when the executable file is created. But often linking of shared libraries is postponed until they are loaded.

Object libraries

Although originally pioneered in the 1960s, dynamic linking did not reach operating systems used by consumers until the late 1980s. It was generally available in some form in most operating systems by the early 1990s. During this same period, object-oriented programming (OOP) was becoming a significant part of the programming landscape. OOP with runtime binding requires additional information that traditional libraries do not supply. In addition to the names and entry points of the code located within, they also require a list of the objects they depend on. This is a side-effect of one of OOP's core concepts, inheritance, which means that parts of the complete definition of any method may be in different places. This is more than simply listing that one library requires the services of another: in a true OOP system, the libraries themselves may not be known at compile time, and vary from system to system.

At the same time many developers worked on the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls (RPC) already handled these tasks, but there was no standard RPC system.

Soon the majority of the minicomputer and mainframe vendors instigated projects to combine the two, producing an OOP library format that could be used anywhere. Such systems were known as object libraries, or distributed objects, if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use. DCOM, a modified version of COM, supports remote access.

For some time object libraries held the status of the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.

Class libraries

Class libraries are the rough OOP equivalent of older types of code libraries. They contain classes, which describe characteristics and define actions (methods) that involve objects. Class libraries are used to create instances, or objects with their characteristics set to specific values. In some OOP languages, like Java, the distinction is clear, with the classes often contained in library files (like Java's JAR file format) and the instantiated objects residing only in memory (although potentially able to be made persistent in separate files). In others, like Smalltalk, the class libraries are merely the starting point for a system image that includes the entire state of the environment, classes and all instantiated objects.

Today most class libraries are stored in a package repository (such as Maven Central for Java). Client code explicitly declare the dependencies to external libraries in build configuration files (such as a Maven Pom in Java).

Remote libraries

Another library technique uses completely separate executables (often in some lightweight form) and calls them using a remote procedure call (RPC) over a network to another computer. This maximizes operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.

However, such an approach means that every library call requires a considerable amount of overhead. RPC calls are much more expensive than calling a shared library that has already been loaded on the same machine. This approach is commonly used in a distributed architecture that makes heavy use of such remote calls, notably client-server systems and application servers such as Enterprise JavaBeans.

Code generation libraries

Code generation libraries are high-level APIs that can generate or transform byte code for Java. They are used by aspect-oriented programming, some data access frameworks, and for testing to generate dynamic proxy objects. They also are used to intercept field access.

File naming

Most modern Unix-like systems

The system stores libfoo.a and libfoo.so files in directories such as /lib, /usr/lib or /usr/local/lib. The filenames always start with lib, and end with a suffix of .a (archive, static library) or of .so (shared object, dynamically linked library). Some systems might have multiple names for a dynamically linked library. These names typically share the same prefix and have different suffixes indicating the version number. Most of the names are names for symbolic links to the latest version. For example, on some systems libfoo.so.2 would be the filename for the second major interface revision of the dynamically linked library libfoo. The .la files sometimes found in the library directories are libtool archives, not usable by the system as such.

macOS

The system inherits static library conventions from BSD, with the library stored in a .a file, and can use .so-style dynamically linked libraries (with the .dylib suffix instead). Most libraries in macOS, however, consist of "frameworks", placed inside special directories called "bundles" which wrap the library's required files and metadata. For example, a framework called MyFramework would be implemented in a bundle called MyFramework.framework, with MyFramework.framework/MyFramework being either the dynamically linked library file or being a symlink to the dynamically linked library file in MyFramework.framework/Versions/Current/MyFramework.

Microsoft Windows

Dynamic-link libraries usually have the suffix *.DLL,[18] although other file name extensions may identify specific-purpose dynamically linked libraries, e.g. *.OCX for OLE libraries. The interface revisions are either encoded in the file names, or abstracted away using COM-object interfaces. Depending on how they are compiled, *.LIB files can be either static libraries or representations of dynamically linkable libraries needed only during compilation, known as "import libraries". Unlike in the UNIX world, which uses different file extensions, when linking against .LIB file in Windows one must first know if it is a regular static library or an import library. In the latter case, a .DLL file must be present at runtime.

Entropy (information theory)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Entropy_(information_theory) In info...