Search This Blog

Saturday, March 27, 2021

Modular programming

From Wikipedia, the free encyclopedia

Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.

A module interface expresses the elements that are provided and required by the module. The elements defined in the interface are detectable by other modules. The implementation contains the working code that corresponds to the elements declared in the interface. Modular programming is closely related to structured programming and object-oriented programming, all having the same goal of facilitating construction of large software programs and systems by decomposition into smaller pieces, and all originating around the 1960s. While the historical usage of these terms has been inconsistent, "modular programming" now refers to the high-level decomposition of the code of an entire program into pieces: structured programming to the low-level code use of structured control flow, and object-oriented programming to the data use of objects, a kind of data structure.

In object-oriented programming, the use of interfaces as an architectural pattern to construct modules is known as interface-based programming.

Terminology

The term assembly (as in .NET languages like C#, F# or Visual Basic .NET) or package (as in Dart, Go or Java) is sometimes used instead of module. In other implementations, these are distinct concepts; in Python a package is a collection of modules, while in Java 9 the introduction of the new module concept (a collection of packages with enhanced access control) was implemented.

Furthermore, the term "package" has other uses in software (for example .NET NuGet packages). A component is a similar concept, but typically refers to a higher level; a component is a piece of a whole system, while a module is a piece of an individual program. The scale of the term "module" varies significantly between languages; in Python it is very small-scale and each file is a module, while in Java 9 it is planned to be large-scale, where a module is a collection of packages, which are in turn collections of files.

Other terms for modules include unit, used in Pascal dialects.

Language support

Languages that formally support the module concept include Ada, Algol, BlitzMax, C++, C#, Clojure, COBOL, Common_Lisp, D, Dart, eC, Erlang, Elixir, Elm, F, F#, Fortran, Go, Haskell, IBM/360 Assembler, Control Language (CL), IBM RPG, Java, MATLAB, ML, Modula, Modula-2, Modula-3, Morpho, NEWP, Oberon, Oberon-2, Objective-C, OCaml, several derivatives of Pascal (Component Pascal, Object Pascal, Turbo Pascal, UCSD Pascal), Perl, PL/I, PureBasic, Python, R, Ruby, Rust, JavaScript, Visual Basic .NET and WebDNA.

Conspicuous examples of languages that lack support for modules are C and have been C++ and Pascal in their original form, C and C++ do, however, allow separate compilation and declarative interfaces to be specified using header files. Modules were added to Objective-C in iOS 7 (2013); to C++ with C++20, and Pascal was superseded by Modula and Oberon, which included modules from the start, and various derivatives that included modules. JavaScript has had native modules since ECMAScript 2015.

Modular programming can be performed even where the programming language lacks explicit syntactic features to support named modules, like, for example, in C. This is done by using existing language features, together with, for example, coding conventions, programming idioms and the physical code structure. The IBM System i also uses modules when programming in the Integrated Language Environment (ILE).

Key aspects

With modular programming, concerns are separated such that modules perform logically discrete functions, interacting through well-defined interfaces. Often modules form a directed acyclic graph (DAG); in this case a cyclic dependency between modules is seen as indicating that these should be a single module. In the case where modules do form a DAG they can be arranged as a hierarchy, where the lowest-level modules are independent, depending on no other modules, and higher-level modules depend on lower-level ones. A particular program or library is a top-level module of its own hierarchy, but can in turn be seen as a lower-level module of a higher-level program, library, or system.

When creating a modular system, instead of creating a monolithic application (where the smallest component is the whole), several smaller modules are written separately so when they are composed together, they construct the executable application program. Typically these are also compiled separately, via separate compilation, and then linked by a linker. A just-in-time compiler may perform some of this construction "on-the-fly" at run time.

These independent functions are commonly classified as either program control functions or specific task functions. Program control functions are designed to work for one program. Specific task functions are closely prepared to be applicable for various programs.

This makes modular designed systems, if built correctly, far more reusable than a traditional monolithic design, since all (or many) of these modules may then be reused (without change) in other projects. This also facilitates the "breaking down" of projects into several smaller projects. Theoretically, a modularized software project will be more easily assembled by large teams, since no team members are creating the whole system, or even need to know about the system as a whole. They can focus just on the assigned smaller task (this, it is claimed, counters the key assumption of The Mythical Man Month, making it actually possible to add more developers to a late software project without making it later still).

History

Modular programming, in the form of subsystems (particularly for I/O) and software libraries, dates to early software systems, where it was used for code reuse. Modular programming per se, with a goal of modularity, developed in the late 1960s and 1970s, as a larger-scale analog of the concept of structured programming (1960s). The term "modular programming" dates at least to the National Symposium on Modular Programming, organized at the Information and Systems Institute in July 1968 by Larry Constantine; other key concepts were information hiding (1972) and separation of concerns (SoC, 1974).

Modules were not included in the original specification for ALGOL 68 (1968), but were included as extensions in early implementations, ALGOL 68-R (1970) and ALGOL 68C (1970), and later formalized. One of the first languages designed from the start for modular programming was the short-lived Modula (1975), by Niklaus Wirth. Another early modular language was Mesa (1970s), by Xerox PARC, and Wirth drew on Mesa as well as the original Modula in its successor, Modula-2 (1978), which influenced later languages, particularly through its successor, Modula-3 (1980s). Modula's use of dot-qualified names, like M.a to refer to object a from module M, coincides with notation to access a field of a record (and similarly for attributes or methods of objects), and is now widespread, seen in C#, Dart, Go, Java, and Python, among others. Modular programming became widespread from the 1980s: the original Pascal language (1970) did not include modules, but later versions, notably UCSD Pascal (1978) and Turbo Pascal (1983) included them in the form of "units", as did the Pascal-influenced Ada (1980). The Extended Pascal ISO 10206:1990 standard kept closer to Modula2 in its modular support. Standard ML (1984) has one of the most complete module systems, including functors (parameterized modules) to map between modules.

In the 1980s and 1990s, modular programming was overshadowed by and often conflated with object-oriented programming, particularly due to the popularity of C++ and Java. For example, the C family of languages had support for objects and classes in C++ (originally C with Classes, 1980) and Objective-C (1983), only supporting modules 30 years or more later. Java (1995) supports modules in the form of packages, though the primary unit of code organization is a class. However, Python (1991) prominently used both modules and objects from the start, using modules as the primary unit of code organization and "packages" as a larger-scale unit; and Perl 5 (1994) includes support for both modules and objects, with a vast array of modules being available from CPAN (1993).

Modular programming is now widespread, and found in virtually all major languages developed since the 1990s. The relative importance of modules varies between languages, and in class-based object-oriented languages there is still overlap and confusion with classes as a unit of organization and encapsulation, but these are both well-established as distinct concepts.

Search engine

From Wikipedia, the free encyclopedia

The results of a search for the term "lunar eclipse" in a web-based image search engine

A search engine is a software system that is designed to carry out web searches (Internet searches), which means to search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs) The information may be a mix of links to web pages, images, videos, infographics, articles, research papers, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Internet content that is not capable of being searched by a web search engine is generally described as the deep web.

History

Timeline
Year Engine Current status
1993 W3Catalog Active
Aliweb Active
JumpStation Inactive
WWW Worm Inactive
1994 WebCrawler Active
Go.com Inactive, redirects to Disney
Lycos Active
Infoseek Inactive, redirects to Disney
1995 Yahoo! Search Active, initially a search function for Yahoo! Directory
Daum Active
Magellan Inactive
Excite Active
SAPO Active
MetaCrawler Active
AltaVista Inactive, acquired by Yahoo! in 2003, since 2013 redirects to Yahoo!
1996 RankDex Inactive, incorporated into Baidu in 2000
Dogpile Active, Aggregator
Inktomi Inactive, acquired by Yahoo!
HotBot Active
Ask Jeeves Active (rebranded ask.com)
1997 AOL NetFind Active (rebranded AOL Search since 1999)
Northern Light Inactive
Yandex Active
1998 Google Active
Ixquick Active as Startpage.com
MSN Search Active as Bing
empas Inactive (merged with NATE)
1999 AlltheWeb Inactive (URL redirected to Yahoo!)
GenieKnows Active, rebranded Yellowee (redirection to justlocalbusiness.com)
Naver Active
Teoma Active (© APN, LLC)
2000 Baidu Active
Exalead Inactive
Gigablast Active
2001 Kartoo Inactive
2003 Info.com Active
Scroogle Inactive
2004 A9.com Inactive
Clusty Active (as Yippy)
Mojeek Active
Sogou Active
2005 SearchMe Inactive
KidzSearch Active, Google Search
2006 Soso Inactive, merged with Sogou
Quaero Inactive
Search.com Active
ChaCha Inactive
Ask.com Active
Live Search Active as Bing, rebranded MSN Search
2007 wikiseek Inactive
Sproose Inactive
Wikia Search Inactive
Blackle.com Active, Google Search
2008 Powerset Inactive (redirects to Bing)
Picollator Inactive
Viewzi Inactive
Boogami Inactive
LeapFish Inactive
Forestle Inactive (redirects to Ecosia)
DuckDuckGo Active
2009 Bing Active, rebranded Live Search
Yebol Inactive
Mugurdy Inactive due to a lack of funding
Scout (Goby) Active
NATE Active
Ecosia Active
Startpage.com Active, sister engine of Ixquick
2010 Blekko Inactive, sold to IBM
Cuil Inactive
Yandex (English) Active
Parsijoo Active
2011 YaCy Active, P2P
2012 Volunia Inactive
2013 Qwant Active
2014 Egerin Active, Kurdish / Sorani
Swisscows Active
2015 Yooz Active
Cliqz Inactive
2016 Kiddle Active, Google Search

Pre-1990s

A system for locating published information intended to overcome the ever increasing difficulty of locating information in ever-growing centralized indices of scientific work was described in 1945 by Vannevar Bush, who wrote an article in The Atlantic Monthly titled "As We May Think" in which he envisioned libraries of research with connected annotations not unlike modern hyperlinks. Link analysis would eventually become a crucial component of search engines through algorithms such as Hyper Search and PageRank.

1990s: Birth of search engines

The first internet search engines predate the debut of the Web in December 1990: Who is user search dates back to 1982, and the Knowbot Information Service multi-network user search was first implemented in 1989. The first well documented search engine that searched content files, namely FTP files, was Archie, which debuted on 10 September 1990.

Prior to September 1993, the World Wide Web was entirely indexed by hand. There was a list of webservers edited by Tim Berners-Lee and hosted on the CERN webserver. One snapshot of the list in 1992 remains, but as more and more web servers went online the central list could no longer keep up. On the NCSA site, new servers were announced under the title "What's New!"

The first tool used for searching content (as opposed to users) on the Internet was Archie. The name stands for "archive" without the "v"., It was created by Alan Emtage computer science student at McGill University in Montreal, Quebec, Canada. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of file names; however, Archie Search Engine did not index the contents of these sites since the amount of data was so limited it could be readily searched manually.

The rise of Gopher (created in 1991 by Mark McCahill at the University of Minnesota) led to two new search programs, Veronica and Jughead. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from specific Gopher servers. While the name of the search engine "Archie Search Engine" was not a reference to the Archie comic book series, "Veronica" and "Jughead" are characters in the series, thus referencing their predecessor.

In the summer of 1993, no search engine existed for the web, though numerous specialized catalogues were maintained by hand. Oscar Nierstrasz at the University of Geneva wrote a series of Perl scripts that periodically mirrored these pages and rewrote them into a standard format. This formed the basis for W3Catalog, the web's first primitive search engine, released on September 2, 1993.

In June 1993, Matthew Gray, then at MIT, produced what was probably the first web robot, the Perl-based World Wide Web Wanderer, and used it to generate an index called "Wandex". The purpose of the Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second search engine Aliweb appeared in November 1993. Aliweb did not use a web robot, but instead depended on being notified by website administrators of the existence at each site of an index file in a particular format.

JumpStation (created in December 1993 by Jonathon Fletcher) used a web robot to find web pages and to build its index, and used a web form as the interface to its query program. It was thus the first WWW resource-discovery tool to combine the three essential features of a web search engine (crawling, indexing, and searching) as described below. Because of the limited resources available on the platform it ran on, its indexing and hence searching were limited to the titles and headings found in the web pages the crawler encountered.

One of the first "all text" crawler-based search engines was WebCrawler, which came out in 1994. Unlike its predecessors, it allowed users to search for any word in any webpage, which has become the standard for all major search engines since. It was also the search engine that was widely known by the public. Also in 1994, Lycos (which started at Carnegie Mellon University) was launched and became a major commercial endeavor.

The first popular search engine on the Web was Yahoo! Search. The first product from Yahoo!, founded by Jerry Yang and David Filo in January 1994, was a Web directory called Yahoo! Directory. In 1995, a search function was added, allowing users to search Yahoo! Directory! It became one of the most popular ways for people to find web pages of interest, but its search function operated on its web directory, rather than its full-text copies of web pages.

Soon after, a number of search engines appeared and vied for popularity. These included Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista. Information seekers could also browse the directory instead of doing a keyword-based search.

In 1996, Robin Li developed the RankDex site-scoring algorithm for search engines results page ranking and received a US patent for the technology. It was the first search engine that used hyperlinks to measure the quality of websites it was indexing, predating the very similar algorithm patent filed by Google two years later in 1998. Larry Page referenced Li's work in some of his U.S. patents for PageRank. Li later used his Rankdex technology for the Baidu search engine, which was founded by Robin Li in China and launched in 2000.

In 1996, Netscape was looking to give a single search engine an exclusive deal as the featured search engine on Netscape's web browser. There was so much interest that instead Netscape struck deals with five of the major search engines: for $5 million a year, each search engine would be in rotation on the Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite.

Google adopted the idea of selling search terms in 1998, from a small search engine company named goto.com. This move had a significant effect on the SE business, which went from struggling to one of the most profitable businesses in the Internet.

Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, receiving record gains during their initial public offerings. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-com bubble, a speculation-driven market boom that peaked in 1990 and ended in 2000.

2000's-Present: Post dot-com bubble

Around 2000, Google's search engine rose to prominence. The company achieved better results for many searches with an algorithm called PageRank, as was explained in the paper Anatomy of a Search Engine written by Sergey Brin and Larry Page, the later founders of Google. This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Larry Page's patent for PageRank cites Robin Li's earlier RankDex patent as an influence. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a web portal. In fact, the Google search engine became so popular that spoof engines emerged such as Mystery Seeker.

By 2000, Yahoo! was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and Overture (which owned AlltheWeb and AltaVista) in 2003. Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.

Microsoft first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the site began to display listings from Looksmart, blended with results from Inktomi. For a short time in 1999, MSN Search used results from AltaVista instead. In 2004, Microsoft began a transition to its own search technology, powered by its own web crawler (called msnbot).

Microsoft's rebranded search engine, Bing, was launched on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized a deal in which Yahoo! Search would be powered by Microsoft Bing technology.

As of 2019, active search engine crawlers include those of Google, Sogou, Baidu, Bing, Gigablast, Mojeek, DuckDuckGo and Yandex.

Approach

A search engine maintains the following processes in near real time:

  1. Web crawling
  2. Indexing
  3. Searching

Web search engines get their information by web crawling from site to site. The "spider" checks for the standard filename robots.txt, addressed to it. The robots.txt file contains directives for search spiders, telling it which pages to crawl. After checking for robots.txt and either finding it or not, the spider sends certain information back to be indexed depending on many factors, such as the titles, page content, JavaScript, Cascading Style Sheets (CSS), headings, or its metadata in HTML meta tags. After a certain number of pages crawled, amount of data indexed, or time spent on the website, the spider stops crawling and moves on. "[N]o web crawler may actually crawl the entire reachable web. Due to infinite websites, spider traps, spam, and other exigencies of the real web, crawlers instead apply a crawl policy to determine when the crawling of a site should be deemed sufficient. Some websites are crawled exhaustively, while others are crawled only partially".

Indexing means associating words and other definable tokens found on web pages to their domain names and HTML-based fields. The associations are made in a public database, made available for web search queries. A query from a user can be a single word, multiple words or a sentence. The index helps find information relating to the query as quickly as possible. Some of the techniques for indexing, and caching are trade secrets, whereas web crawling is a straightforward process of visiting all sites on a systematic basis.

Between visits by the spider, the cached version of page (some or all the content needed to render it) stored in the search engine working memory is quickly sent to an inquirer. If a visit is overdue, the search engine can just act as a web proxy instead. In this case the page may differ from the search terms indexed. The cached page holds the appearance of the version whose words were previously indexed, so a cached version of a page can be useful to the web site when the actual page has been lost, but this problem is also considered a mild form of linkrot.

High-level architecture of a standard Web crawler

Typically when a user enters a query into a search engine it is a few keywords. The index already has the names of the sites containing the keywords, and these are instantly obtained from the index. The real processing load is in generating the web pages that are the search results list: Every page in the entire list must be weighted according to information in the indexes. Then the top search result item requires the lookup, reconstruction, and markup of the snippets showing the context of the keywords matched. These are only part of the processing each search results web page requires, and further pages (next to the top) require more of this post processing.

Beyond simple keyword lookups, search engines offer their own GUI- or command-driven operators and search parameters to refine the search results. These provide the necessary controls for the user engaged in the feedback loop users create by filtering and weighting while refining the search results, given the initial pages of the first search results. For example, from 2007 the Google.com search engine has allowed one to filter by date by clicking "Show search tools" in the leftmost column of the initial search results page, and then selecting the desired date range. It's also possible to weight by date because each page has a modification time. Most search engines support the use of the boolean operators AND, OR and NOT to help end users refine the search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search, which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.

The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an "inverted index" by analyzing texts it locates. This first form relies much more heavily on the computer itself to do the bulk of the work.

Most Web search engines are commercial ventures supported by advertising revenue and thus some of them allow advertisers to have their listings ranked higher in search results for a fee. Search engines that do not accept money for their search results make money by running search related ads alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.

Local search

Local search is the process that optimizes efforts of local businesses. They focus on change to make sure all searches are consistent. It's important because many people determine where they plan to go and what to buy based on their searches.

Market share

As of February 2021, Google is the world's most used search engine, with a market share of 92.04%, and the world's other most used search engines were:

Russia and East Asia

In Russia, Yandex has a market share of 61.9%, compared to Google's 28.3%. In China, Baidu is the most popular search engine. South Korea's homegrown search portal, Naver, is used for 70% of online searches in the country. Yahoo! Japan and Yahoo! Taiwan are the most popular avenues for Internet searches in Japan and Taiwan, respectively. China is one of few countries where Google is not in the top three web search engines for market share. Google was previously a top search engine in China, but had to withdraw after failing to follow China's laws.

Europe

Most countries' markets in Western Europe are dominated by Google, except for the Czech Republic, where Seznam is a strong competitor.

Search engine bias

Although search engines are programmed to rank websites based on some combination of their popularity and relevancy, empirical studies indicate various political, economic, and social biases in the information they provide and the underlying assumptions about the technology. These biases can be a direct result of economic and commercial processes (e.g., companies that advertise with a search engine can become also more popular in its organic search results), and political processes (e.g., the removal of search results to comply with local laws). For example, Google will not surface certain neo-Nazi websites in France and Germany, where Holocaust denial is illegal.

Biases can also be a result of social processes, as search engine algorithms are frequently designed to exclude non-normative viewpoints in favor of more "popular" results. Indexing algorithms of major search engines skew towards coverage of U.S.-based sites, rather than websites from non-U.S. countries.

Google Bombing is one example of an attempt to manipulate search results for political, social or commercial reasons.

Several scholars have studied the cultural changes triggered by search engines, and the representation of certain controversial topics in their results, such as terrorism in Ireland, climate change denial, and conspiracy theories.

Customized results and filter bubbles

Many search engines such as Google and Bing provide customized results based on the user's activity history. This leads to an effect that has been called a filter bubble. The term describes a phenomenon in which websites use algorithms to selectively guess what information a user would like to see, based on information about the user (such as location, past click behaviour and search history). As a result, websites tend to show only information that agrees with the user's past viewpoint. This puts the user in a state of intellectual isolation without contrary information. Prime examples are Google's personalized search results and Facebook's personalized news stream. According to Eli Pariser, who coined the term, users get less exposure to conflicting viewpoints and are isolated intellectually in their own informational bubble. Pariser related an example in which one user searched Google for "BP" and got investment news about British Petroleum while another searcher got information about the Deepwater Horizon oil spill and that the two search results pages were "strikingly different". The bubble effect may have negative implications for civic discourse, according to Pariser. Since this problem has been identified, competing search engines have emerged that seek to avoid this problem by not tracking or "bubbling" users, such as DuckDuckGo. Other scholars do not share Pariser's view, finding the evidence in support of his thesis unconvincing.

Religious search engines

The global growth of the Internet and electronic media in the Arab and Muslim World during the last decade has encouraged Islamic adherents in the Middle East and Asian sub-continent, to attempt their own search engines, their own filtered search portals that would enable users to perform safe searches. More than usual safe search filters, these Islamic web portals categorizing websites into being either "halal" or "haram", based on interpretation of the "Law of Islam". ImHalal came online in September 2011. Halalgoogling came online in July 2013. These use haram filters on the collections from Google and Bing (and others).

While lack of investment and slow pace in technologies in the Muslim World has hindered progress and thwarted success of an Islamic search engine, targeting as the main consumers Islamic adherents, projects like Muxlim, a Muslim lifestyle site, did receive millions of dollars from investors like Rite Internet Ventures, and it also faltered. Other religion-oriented search engines are Jewogle, the Jewish version of Google, and SeekFind.org, which is Christian. SeekFind filters sites that attack or degrade their faith.

Search engine submission

Web search engine submission is a process in which a webmaster submits a website directly to a search engine. While search engine submission is sometimes presented as a way to promote a website, it generally is not necessary because the major search engines use web crawlers that will eventually find most web sites on the Internet without assistance. They can either submit one web page at a time, or they can submit the entire site using a sitemap, but it is normally only necessary to submit the home page of a web site as search engines are able to crawl a well designed website. There are two remaining reasons to submit a web site or web page to a search engine: to add an entirely new web site without waiting for a search engine to discover it, and to have a web site's record updated after a substantial redesign.

Some search engine submission software not only submits websites to multiple search engines, but also adds links to websites from their own pages. This could appear helpful in increasing a website's ranking, because external links are one of the most important factors determining a website's ranking. However, John Mueller of Google has stated that this "can lead to a tremendous number of unnatural links for your site" with a negative impact on site ranking.

API

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/API

In computing, an application programming interface (API) is an interface that defines interactions between multiple software applications or mixed hardware-software intermediaries. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionality in various ways and to varying degrees. An API can be entirely custom, specific to a component, or designed based on an industry-standard to ensure interoperability. Through information hiding, APIs enable modular programming, allowing users to use the interface independently of the implementation.

Reference to Web APIs is currently the most common use of the term. There are also APIs for programming languages, software libraries, computer operating systems, and computer hardware. APIs originated in the 1940s, though the term API did not emerge until the 1960s and 70s.

Purpose

In building applications, an API (application programming interface) simplifies programming by abstracting the underlying implementation and only exposing objects or actions the developer needs. While a graphical interface for an email client might provide a user with a button that performs all the steps for fetching and highlighting new emails, an API for file input/output might give the developer a function that copies a file from one location to another without requiring that the developer understand the file system operations occurring behind the scenes.

History of the term

A diagram from 1978 proposing the expansion of the idea of the API to become a general programming interface, beyond application programs alone.

The meaning of the term API has expanded over its history. It first described an interface only for end-user-facing programs, known as application programs. This origin is still reflected in the name "application programming interface." Today, the term API is broader, including also utility software and even hardware interfaces.

The idea of the API is much older than the term. British computer scientists Wilkes and Wheeler worked on modular software libraries in the 1940s for the EDSAC computer. Their book The Preparation of Programs for an Electronic Digital Computer contains the first published API specification. Joshua Bloch claims that Wilkes and Wheeler "latently invented" the API, because it is more of a concept that is discovered than invented.

Although the people who coined the term API were implementing software on a Univac 1108, the goal of their API was to make hardware independent programs possible.

The term "application program interface" (without an -ing suffix) is first recorded in a paper called Data structures and techniques for remote computer graphics presented at an AFIPS conference in 1968. The authors of this paper use the term to describe the interaction of an application — a graphics program in this case — with the rest of the computer system. A consistent application interface (consisting of Fortran subroutine calls) was intended to free the programmer from dealing with idiosyncrasies of the graphics display device, and to provide hardware independence if the computer or the display were replaced.

The term was introduced to the field of databases by C. J. Date in a 1974 paper called The Relational and Network Approaches: Comparison of the Application Programming Interface. An API became a part of ANSI/SPARC framework for database management systems. This framework treated the application programming interface separately from other interfaces, such as the query interface. Database professionals in the 1970s observed these different interfaces could be combined; a sufficiently rich application interface could support the other interfaces as well.

This observation led to APIs that supported all types of programming, not just application programming. By 1990, the API was defined simply as "a set of services available to a programmer for performing certain tasks" by technologist Carl Malamud.

The conception of the API was expanded again with the dawn of web APIs. Roy Fielding's dissertation Architectural Styles and the Design of Network-based Software Architectures at UC Irvine in 2000 outlined Representational state transfer (REST) and described the idea of a "network-based Application Programming Interface" that Fielding contrasted with traditional "library-based" APIs. XML and JSON web APIs saw widespread commercial adoption beginning in 2000 and continuing as of 2021.

The web API is now the most common meaning of the term API. When used in this way, the term API has some overlap in meaning with the terms communication protocol and remote procedure call.

The Semantic Web proposed by Tim Berners-Lee in 2001 included "semantic APIs" that recast the API as an open, distributed data interface rather than a software behavior interface. Instead, proprietary interfaces and agents became more widespread.

Usage

Libraries and frameworks

The interface to a software library is one type of API. The API describes and prescribes the "expected behavior" (a specification) while the library is an "actual implementation" of this set of rules.

A single API can have multiple implementations (or none, being abstract) in the form of different libraries that share the same programming interface.

The separation of the API from its implementation can allow programs written in one language to use a library written in another. For example, because Scala and Java compile to compatible bytecode, Scala developers can take advantage of any Java API.

API use can vary depending on the type of programming language involved. An API for a procedural language such as Lua could consist primarily of basic routines to execute code, manipulate data or handle errors while an API for an object-oriented language, such as Java, would provide a specification of classes and its class methods.

Language bindings are also APIs. By mapping the features and capabilities of one language to an interface implemented in another language, a language binding allows a library or service written in one language to be used when developing in another language. Tools such as SWIG and F2PY, a Fortran-to-Python interface generator, facilitate the creation of such interfaces.

An API can also be related to a software framework: a framework can be based on several libraries implementing several APIs, but unlike the normal use of an API, the access to the behavior built into the framework is mediated by extending its content with new classes plugged into the framework itself.

Moreover, the overall program flow of control can be out of the control of the caller and in the framework's hands by inversion of control or a similar mechanism.

Operating systems

An API can specify the interface between an application and the operating system. POSIX, for example, specifies a set of common APIs that aim to enable an application written for a POSIX conformant operating system to be compiled for another POSIX conformant operating system.

Linux and Berkeley Software Distribution are examples of operating systems that implement the POSIX APIs.

Microsoft has shown a strong commitment to a backward-compatible API, particularly within its Windows API (Win32) library, so older applications may run on newer versions of Windows using an executable-specific setting called "Compatibility Mode".

An API differs from an application binary interface (ABI) in that an API is source code based while an ABI is binary based. For instance, POSIX provides APIs while the Linux Standard Base provides an ABI.

Remote APIs

Remote APIs allow developers to manipulate remote resources through protocols, specific standards for communication that allow different technologies to work together, regardless of language or platform. For example, the Java Database Connectivity API allows developers to query many different types of databases with the same set of functions, while the Java remote method invocation API uses the Java Remote Method Protocol to allow invocation of functions that operate remotely, but appear local to the developer.

Therefore, remote APIs are useful in maintaining the object abstraction in object-oriented programming; a method call, executed locally on a proxy object, invokes the corresponding method on the remote object, using the remoting protocol, and acquires the result to be used locally as a return value.

A modification of the proxy object will also result in a corresponding modification of the remote object.

Web APIs

Web APIs are the defined interfaces through which interactions happen between an enterprise and applications that use its assets, which also is a Service Level Agreement (SLA) to specify the functional provider and expose the service path or URL for its API users. An API approach is an architectural approach that revolves around providing a program interface to a set of services to different applications serving different types of consumers.

When used in the context of web development, an API is typically defined as a set of specifications, such as Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. An example might be a shipping company API that can be added to an eCommerce-focused website to facilitate ordering shipping services and automatically include current shipping rates, without the site developer having to enter the shipper's rate table into a web database. While "web API" historically has been virtually synonymous with web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based web services and service-oriented architecture (SOA) towards more direct representational state transfer (REST) style web resources and resource-oriented architecture (ROA). Part of this trend is related to the Semantic Web movement toward Resource Description Framework (RDF), a concept to promote web-based ontology engineering technologies. Web APIs allow the combination of multiple APIs into new applications known as mashups. In the social media space, web APIs have allowed web communities to facilitate sharing content and data between communities and applications. In this way, content that is created in one place dynamically can be posted and updated to multiple locations on the web. For example, Twitter's REST API allows developers to access core Twitter data and the Search API provides methods for developers to interact with Twitter Search and trends data.

Design

The design of an API has significant impact on its usage. The principle of information hiding describes the role of programming interfaces as enabling modular programming by hiding the implementation details of the modules so that users of modules need not understand the complexities inside the modules. Thus, the design of an API attempts to provide only the tools a user would expect. The design of programming interfaces represents an important part of software architecture, the organization of a complex piece of software.

Release policies

APIs are one of the more common ways technology companies integrate. Those that provide and use APIs are considered as being members of a business ecosystem.

The main policies for releasing an API are:

  • Private: The API is for internal company use only.
  • Partner: Only specific business partners can use the API. For example, vehicle for hire companies such as Uber and Lyft allow approved third-party developers to directly order rides from within their apps. This allows the companies to exercise quality control by curating which apps have access to the API, and provides them with an additional revenue stream. Public: The API is available for use by the public. For example, Microsoft makes the Windows API public, and Apple releases its API Cocoa, so that software can be written for their platforms. Not all public APIs are generally accessible by everybody. For example, Internet service providers like Cloudflare or Voxility, use RESTful APIs to allow customers and resellers access to their infrastructure information, DDoS stats, network performance or dashboard controls. Access to such APIs is granted either by “API tokens”, or customer status validations.

Public API implications

An important factor when an API becomes public is its "interface stability". Changes to the API—for example adding new parameters to a function call—could break compatibility with the clients that depend on that API.

When parts of a publicly presented API are subject to change and thus not stable, such parts of a particular API should be documented explicitly as "unstable". For example, in the Google Guava library, the parts that are considered unstable, and that might change soon, are marked with the Java annotation @Beta.

A public API can sometimes declare parts of itself as deprecated or rescinded. This usually means that part of the API should be considered a candidate for being removed, or modified in a backward incompatible way. Therefore, these changes allow developers to transition away from parts of the API that will be removed or not supported in the future.

Client code may contain innovative or opportunistic usages that were not intended by the API designers. In other words, for a library with a significant user base, when an element becomes part of the public API, it may be used in diverse ways. On February 19, 2020, Akamai published their annual “State of the Internet” report, showcasing the growing trend of cybercriminals targeting public API platforms at financial services worldwide. From December 2017 through November 2019, Akamai witnessed 85.42 billion credential violation attacks. About 20%, or 16.55 billion, were against hostnames defined as API endpoints. Of these, 473.5 million have targeted financial services sector organizations.

Documentation

API documentation describes what services an API offers and how to use those services, aiming to cover everything a client would need to know for practical purposes.

Documentation is crucial for the development and maintenance of applications using the API. API documentation is traditionally found in documentation files but can also be found in social media such as blogs, forums, and Q&A websites.

Traditional documentation files are often presented via a documentation system, such as Javadoc or Pydoc, that has a consistent appearance and structure. However, the types of content included in the documentation differs from API to API.

In the interest of clarity, API documentation may include a description of classes and methods in the API as well as "typical usage scenarios, code snippets, design rationales, performance discussions, and contracts", but implementation details of the API services themselves are usually omitted.

Restrictions and limitations on how the API can be used are also covered by the documentation. For instance, documentation for an API function could note that its parameters cannot be null, that the function itself is not thread safe, Because API documentation tends to be comprehensive, it is a challenge for writers to keep the documentation updated and for users to read it carefully, potentially yielding bugs.

API documentation can be enriched with metadata information like Java annotations. This metadata can be used by the compiler, tools, and by the run-time environment to implement custom behaviors or custom handling.

It is possible to generate API documentation in a data-driven manner. By observing many programs that use a given API, it is possible to infer the typical usages, as well the required contracts and directives. Then, templates can be used to generate natural language from the mined data.

Dispute over copyright protection for APIs

In 2010, Oracle Corporation sued Google for having distributed a new implementation of Java embedded in the Android operating system. Google had not acquired any permission to reproduce the Java API, although permission had been given to the similar OpenJDK project. Judge William Alsup ruled in the Oracle v. Google case that APIs cannot be copyrighted in the U.S and that a victory for Oracle would have widely expanded copyright protection to a "functional set of symbols" and allowed the copyrighting of simple software commands:

To accept Oracle's claim would be to allow anyone to copyright one version of code to carry out a system of commands and thereby bar all others from writing its different versions to carry out all or part of the same commands.

In 2014, however, Alsup's ruling was overturned on appeal to the Court of Appeals for the Federal Circuit, though the question of whether such use of APIs constitutes fair use was left unresolved. 

In 2016, following a two-week trial, a jury determined that Google's reimplementation of the Java API constituted fair use, but Oracle vowed to appeal the decision. Oracle won on its appeal, with the Court of Appeals for the Federal Circuit ruling that Google's use of the APIs did not qualify for fair use. In 2019, Google appealed to the Supreme Court of the United States over both the copyrightability and fair use rulings, and the Supreme Court granted review. Due to the COVID-19 pandemic, the oral hearings in the case were delayed until October 2020.

Occupy movement

From Wikipedia, the free encyclopedia (Redirected from Oc...