Search This Blog

Friday, November 4, 2022

Spreadsheet

From Wikipedia, the free encyclopedia
 
Example of a spreadsheet holding data about a group of audio tracks.

A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.

Spreadsheet users can adjust any stored value and observe the effects on calculated values. This makes the spreadsheet useful for "what-if" analysis since many cases can be rapidly investigated without manual recalculation. Modern spreadsheet software can have multiple interacting sheets and can display data either as text and numerals or in graphical form.

Besides performing basic arithmetic and mathematical functions, modern spreadsheets provide built-in functions for common financial accountancy and statistical operations. Such calculations as net present value or standard deviation can be applied to tabular data with a pre-programmed function in a formula. Spreadsheet programs also provide conditional expressions, functions to convert between text and numbers, and functions that operate on strings of text.

Spreadsheets have replaced paper-based systems throughout the business world. Although they were first developed for accounting or bookkeeping tasks, they now are used extensively in any context where tabular lists are built, sorted, and shared.

Basics

Python demo - spreadsheet.png

LANPAR, available in 1969, was the first electronic spreadsheet on mainframe and time sharing computers. LANPAR was an acronym: LANguage for Programming Arrays at Random. VisiCalc (1979) was the first electronic spreadsheet on a microcomputer, and it helped turn the Apple II computer into a popular and widely used system. Lotus 1-2-3 was the leading spreadsheet when DOS was the dominant operating system. Microsoft Excel now has the largest market share on the Windows and Macintosh platforms. A spreadsheet program is a standard feature of an office productivity suite; since the advent of web apps, office suites now also exist in web app form.

A spreadsheet consists of a table of cells arranged into rows and columns and referred to by the X and Y locations. X locations, the columns, are normally represented by letters, "A," "B," "C," etc., while rows are normally represented by numbers, 1, 2, 3, etc. A single cell can be referred to by addressing its row and column, "C10". This electronic concept of cell references was first introduced in LANPAR (Language for Programming Arrays at Random) (co-invented by Rene Pardo and Remy Landau) and a variant used in VisiCalc and known as "A1 notation". Additionally, spreadsheets have the concept of a range, a group of cells, normally contiguous. For instance, one can refer to the first ten cells in the first column with the range "A1:A10". LANPAR innovated forward referencing/natural order calculation which didn't re-appear until Lotus 123 and Microsoft's MultiPlan Version 2.

In modern spreadsheet applications, several spreadsheets, often known as worksheets or simply sheets, are gathered together to form a workbook. A workbook is physically represented by a file containing all the data for the book, the sheets, and the cells with the sheets. Worksheets are normally represented by tabs that flip between pages, each one containing one of the sheets, although Numbers changes this model significantly. Cells in a multi-sheet book add the sheet name to their reference, for instance, "Sheet 1!C10". Some systems extend this syntax to allow cell references to different workbooks.

Users interact with sheets primarily through the cells. A given cell can hold data by simply entering it in, or a formula, which is normally created by preceding the text with an equals sign. Data might include the string of text hello world, the number 5 or the date 16-Dec-91. A formula would begin with the equals sign, =5*3, but this would normally be invisible because the display shows the result of the calculation, 15 in this case, not the formula itself. This may lead to confusion in some cases.

The key feature of spreadsheets is the ability for a formula to refer to the contents of other cells, which may, in turn, be the result of a formula. To make such a formula, one replaces a number with a cell reference. For instance, the formula =5*C10 would produce the result of multiplying the value in cell C10 by the number 5. If C10 holds the value 3 the result will be 15. But C10 might also hold its formula referring to other cells, and so on.

The ability to chain formulas together is what gives a spreadsheet its power. Many problems can be broken down into a series of individual mathematical steps, and these can be assigned to individual formulas in cells. Some of these formulas can apply to ranges as well, like the SUM function that adds up all the numbers within a range.

Spreadsheets share many principles and traits of databases, but spreadsheets and databases are not the same things. A spreadsheet is essentially just one table, whereas a database is a collection of many tables with machine-readable semantic relationships. While it is true that a workbook that contains three sheets is indeed a file containing multiple tables that can interact with each other, it lacks the relational structure of a database. Spreadsheets and databases are interoperable—sheets can be imported into databases to become tables within them, and database queries can be exported into spreadsheets for further analysis.

A spreadsheet program is one of the main components of an office productivity suite, which usually also contains a word processor, a presentation program, and a database management system. Programs within a suite use similar commands for similar functions. Usually, sharing data between the components is easier than with a non-integrated collection of functionally equivalent programs. This was particularly an advantage at a time when many personal computer systems used text-mode displays and commands instead of a graphical user interface.

History

Paper spreadsheets

The word "spreadsheet" came from "spread" in its sense of a newspaper or magazine item (text or graphics) that covers two facing pages, extending across the centerfold and treating the two pages as one large page. The compound word 'spread-sheet' came to mean the format used to present book-keeping ledgers—with columns for categories of expenditures across the top, invoices listed down the left margin, and the amount of each payment in the cell where its row and column intersect—which were, traditionally, a "spread" across facing pages of a bound ledger (book for keeping accounting records) or on oversized sheets of paper (termed 'analysis paper') ruled into rows and columns in that format and approximately twice as wide as ordinary paper.

Early implementations

Batch spreadsheet report generator BSRG

A batch "spreadsheet" is indistinguishable from a batch compiler with added input data, producing an output report, i.e., a 4GL or conventional, non-interactive, batch computer program. However, this concept of an electronic spreadsheet was outlined in the 1961 paper "Budgeting Models and System Simulation" by Richard Mattessich. The subsequent work by Mattessich (1964a, Chpt. 9, Accounting and Analytical Methods) and its companion volume, Mattessich (1964b, Simulation of the Firm through a Budget Computer Program) applied computerized spreadsheets to accounting and budgeting systems (on mainframe computers programmed in FORTRAN IV). These batch Spreadsheets dealt primarily with the addition or subtraction of entire columns or rows (of input variables), rather than individual cells.

In 1962, this concept of the spreadsheet, called BCL for Business Computer Language, was implemented on an IBM 1130 and in 1963 was ported to an IBM 7040 by R. Brian Walsh at Marquette University, Wisconsin. This program was written in Fortran. Primitive timesharing was available on those machines. In 1968 BCL was ported by Walsh to the IBM 360/67 timesharing machine at Washington State University. It was used to assist in the teaching of finance to business students. Students were able to take information prepared by the professor and manipulate it to represent it and show ratios etc. In 1964, a book entitled Business Computer Language was written by Kimball, Stoffells and Walsh and both the book and program were copyrighted in 1966 and years later that copyright was renewed.

Applied Data Resources had a FORTRAN preprocessor called Empires.

In the late 1960s, Xerox used BCL to develop a more sophisticated version for their timesharing system.

LANPAR spreadsheet compiler

A key invention in the development of electronic spreadsheets was made by Rene K. Pardo and Remy Landau, who filed in 1970 U.S. Patent 4,398,249 on a spreadsheet automatic natural order calculation algorithm. While the patent was initially rejected by the patent office as being a purely mathematical invention, following 12 years of appeals, Pardo and Landau won a landmark court case at the Predecessor Court of the Federal Circuit (CCPA), overturning the Patent Office in 1983 — establishing that "something does not cease to become patentable merely because the point of novelty is in an algorithm." However, in 1995 the United States Court of Appeals for the Federal Circuit ruled the patent unenforceable.

The actual software was called LANPAR — LANguage for Programming Arrays at Random. This was conceived and entirely developed in the summer of 1969, following Pardo and Landau's recent graduation from Harvard University. Co-inventor Rene Pardo recalls that he felt that one manager at Bell Canada should not have to depend on programmers to program and modify budgeting forms, and he thought of letting users type out forms in any order and having an electronic computer calculate results in the right order ("Forward Referencing/Natural Order Calculation"). Pardo and Landau developed and implemented the software in 1969.

LANPAR was used by Bell Canada, AT&T, and the 18 operating telephone companies nationwide for their local and national budgeting operations. LANPAR was also used by General Motors. Its uniqueness was Pardo's co-invention incorporating forward referencing/natural order calculation (one of the first "non-procedural" computer languages) as opposed to left-to-right, top to bottom sequence for calculating the results in each cell that was used by VisiCalc, SuperCalc, and the first version of MultiPlan. Without forward referencing/natural order calculation, the user had to refresh the spreadsheet until the values in all cells remained unchanged. Once the cell values stayed constant, the user was assured that there were no remaining forward references within the spreadsheet.

Autoplan/Autotab spreadsheet programming language

In 1968, three former employees from the General Electric computer company headquartered in Phoenix, Arizona set out to start their own software development house. A. Leroy Ellison, Harry N. Cantrell, and Russell E. Edwards found themselves doing a large number of calculations when making tables for the business plans that they were presenting to venture capitalists. They decided to save themselves a lot of effort and wrote a computer program that produced their tables for them. This program, originally conceived as a simple utility for their personal use, would turn out to be the first software product offered by the company that would become known as Capex Corporation. "AutoPlan" ran on GE's Time-sharing service; afterward, a version that ran on IBM mainframes was introduced under the name AutoTab. (National CSS offered a similar product, CSSTAB, which had a moderate timesharing user base by the early 1970s. A major application was opinion research tabulation.)

AutoPlan/AutoTab was not a WYSIWYG interactive spreadsheet program, it was a simple scripting language for spreadsheets. The user-defined the names and labels for the rows and columns, then the formulas that defined each row or column. In 1975, Autotab-II was advertised as extending the original to a maximum of "1,500 rows and columns, combined in any proportion the user requires..."

GE Information Services, which operated the time-sharing service, also launched its own spreadsheet system, Financial Analysis Language (FAL), circa 1974. It was later supplemented by an additional spreadsheet language, TABOL, which was developed by an independent author, Oliver Vellacott in the UK. Both FAL and TABOL were integrated with GEIS's database system, DMS.

IBM Financial Planning and Control System

The IBM Financial Planning and Control System was developed in 1976, by Brian Ingham at IBM Canada. It was implemented by IBM in at least 30 countries. It ran on an IBM mainframe and was among the first applications for financial planning developed with APL that completely hid the programming language from the end-user. Through IBM's VM operating system, it was among the first programs to auto-update each copy of the application as new versions were released. Users could specify simple mathematical relationships between rows and between columns. Compared to any contemporary alternatives, it could support very large spreadsheets. It loaded actual financial planning data drawn from the legacy batch system into each user's spreadsheet monthly. It was designed to optimize the power of APL through object kernels, increasing program efficiency by as much as 50 fold over traditional programming approaches.

APLDOT modeling language

An example of an early "industrial weight" spreadsheet was APLDOT, developed in 1976 at the United States Railway Association on an IBM 360/91, running at The Johns Hopkins University Applied Physics Laboratory in Laurel, MD. The application was used successfully for many years in developing such applications as financial and costing models for the US Congress and for Conrail. APLDOT was dubbed a "spreadsheet" because financial analysts and strategic planners used it to solve the same problems they addressed with paper spreadsheet pads.

VisiCalc

VisiCalc running on an Apple II

Because Dan Bricklin and Bob Frankston implemented VisiCalc on the Apple II in 1979 and the IBM PC in 1981, the spreadsheet concept became widely known in the early 1980s. VisiCalc was the first spreadsheet that combined all essential features of modern spreadsheet applications (except for forward referencing/natural order recalculation), such as WYSIWYG interactive user interface, automatic recalculation, status and formula lines, range copying with relative and absolute references, formula building by selecting referenced cells. Unaware of LANPAR at the time PC World magazine called VisiCalc the first electronic spreadsheet.

Bricklin has spoken of watching his university professor create a table of calculation results on a blackboard. When the professor found an error, he had to tediously erase and rewrite several sequential entries in the table, triggering Bricklin to think that he could replicate the process on a computer, using the blackboard as the model to view results of underlying formulas. His idea became VisiCalc, the first application that turned the personal computer from a hobby for computer enthusiasts into a business tool.

VisiCalc went on to become the first "killer application", an application that was so compelling, people would buy a particular computer just to use it. VisiCalc was in no small part responsible for the Apple II's success. The program was later ported to a number of other early computers, notably CP/M machines, the Atari 8-bit family and various Commodore platforms. Nevertheless, VisiCalc remains best known as an Apple II program.

SuperCalc

SuperCalc was a spreadsheet application published by Sorcim in 1980, and originally bundled (along with WordStar) as part of the CP/M software package included with the Osborne 1 portable computer. It quickly became the de facto standard spreadsheet for CP/M and was ported to MS-DOS in 1982.

Lotus 1-2-3 and other MS-DOS spreadsheets

The acceptance of the IBM PC following its introduction in August 1981, began slowly because most of the programs available for it were translations from other computer models. Things changed dramatically with the introduction of Lotus 1-2-3 in November 1982, and release for sale in January 1983. Since it was written especially for the IBM PC, it had a good performance and became the killer app for this PC. Lotus 1-2-3 drove sales of the PC due to the improvements in speed and graphics compared to VisiCalc on the Apple II.

Lotus 1-2-3, along with its competitor Borland Quattro, soon displaced VisiCalc. Lotus 1-2-3 was released on January 26, 1983, started outselling then-most-popular VisiCalc the very same year, and for several years was the leading spreadsheet for DOS.

Microsoft Excel

Microsoft released the first version of Excel for the Macintosh on September 30, 1985, and then ported it to Windows, with the first version being numbered 2.05 (to synchronize with the Macintosh version 2.2) and released in November 1987. The Windows 3.x platforms of the early 1990s made it possible for Excel to take market share from Lotus. By the time Lotus responded with usable Windows products, Microsoft had begun to assemble their Office suite. By 1995, Excel was the market leader, edging out Lotus 1-2-3, and in 2013, IBM discontinued Lotus 1-2-3 altogether.

Web-based spreadsheets

Notable current web-based spreadsheet software:

Mainframe spreadsheets

  • The Works Records System at ICI developed in 1974 on IBM 370/145
  • ExecuCalc, from Parallax Systems, Inc.: Released in late 1982, ExecuCalc was the first mainframe "visi-clone" which duplicated the features of VisiCalc on IBM mainframes with 3270 display terminals. Over 150 copies were licensed (35 to Fortune 500 companies). DP managers were attracted to compatibility and avoiding then-expensive PC purchases (see 1983 Computerworld magazine front page article and advertisement.)

Other spreadsheets

Notable current spreadsheet software:

Discontinued spreadsheet software:

Other products

Several companies have attempted to break into the spreadsheet market with programs based on very different paradigms. Lotus introduced what is likely the most successful example, Lotus Improv, which saw some commercial success, notably in the financial world where its powerful data mining capabilities remain well respected to this day.

Spreadsheet 2000 attempted to dramatically simplify formula construction, but was generally not successful.

Concepts

The main concepts are those of a grid of cells, called a sheet, with either raw data, called values, or formulas in the cells. Formulas say how to mechanically compute new values from existing values. Values are general numbers, but can also be pure text, dates, months, etc. Extensions of these concepts include logical spreadsheets. Various tools for programming sheets, visualizing data, remotely connecting sheets, displaying cells' dependencies, etc. are commonly provided.

Cells

A "cell" can be thought of as a box for holding data. A single cell is usually referenced by its column and row (C2 would represent the cell containing the value 30 in the example table below). Usually rows, representing the dependent variables, are referenced in decimal notation starting from 1, while columns representing the independent variables use 26-adic bijective numeration using the letters A-Z as numerals. Its physical size can usually be tailored to its content by dragging its height or width at box intersections (or for entire columns or rows by dragging the column- or row-headers).

My Spreadsheet

A B C D
01 Sales 100000 30000 70000
02 Purchases 25490 30 200

An array of cells is called a sheet or worksheet. It is analogous to an array of variables in a conventional computer program (although certain unchanging values, once entered, could be considered, by the same analogy, constants). In most implementations, many worksheets may be located within a single spreadsheet. A worksheet is simply a subset of the spreadsheet divided for the sake of clarity. Functionally, the spreadsheet operates as a whole and all cells operate as global variables within the spreadsheet (each variable having 'read' access only except its containing cell).

A cell may contain a value or a formula, or it may simply be left empty. By convention, formulas usually begin with = sign.

Values

A value can be entered from the computer keyboard by directly typing into the cell itself. Alternatively, a value can be based on a formula (see below), which might perform a calculation, display the current date or time, or retrieve external data such as a stock quote or a database value.

The Spreadsheet Value Rule

Computer scientist Alan Kay used the term value rule to summarize a spreadsheet's operation: a cell's value relies solely on the formula the user has typed into the cell. The formula may rely on the value of other cells, but those cells are likewise restricted to user-entered data or formulas. There are no 'side effects' to calculating a formula: the only output is to display the calculated result inside its occupying cell. There is no natural mechanism for permanently modifying the contents of a cell unless the user manually modifies the cell's contents. In the context of programming languages, this yields a limited form of first-order functional programming.

Automatic recalculation

A standard of spreadsheets since the 1980s, this optional feature eliminates the need to manually request the spreadsheet program to recalculate values (nowadays typically the default option unless specifically 'switched off' for large spreadsheets, usually to improve performance). Some earlier spreadsheets required a manual request to recalculate since the recalculation of large or complex spreadsheets often reduced data entry speed. Many modern spreadsheets still retain this option.

Recalculation generally requires that there are no circular dependencies in a spreadsheet. A dependency graph is a graph that has a vertex for each object to be updated, and an edge connecting two objects whenever one of them needs to be updated earlier than the other. Dependency graphs without circular dependencies form directed acyclic graphs, representations of partial orderings (in this case, across a spreadsheet) that can be relied upon to give a definite result.

Real-time update

This feature refers to updating a cell's contents periodically with a value from an external source—such as a cell in a "remote" spreadsheet. For shared, Web-based spreadsheets, it applies to "immediately" updating cells another user has updated. All dependent cells must be updated also.

Locked cell

Once entered, selected cells (or the entire spreadsheet) can optionally be "locked" to prevent accidental overwriting. Typically this would apply to cells containing formulas but might apply to cells containing "constants" such as a kilogram/pounds conversion factor (2.20462262 to eight decimal places). Even though individual cells are marked as locked, the spreadsheet data are not protected until the feature is activated in the file preferences.

Data format

A cell or range can optionally be defined to specify how the value is displayed. The default display format is usually set by its initial content if not specifically previously set, so that for example "31/12/2007" or "31 Dec 2007" would default to the cell format of date. Similarly adding a % sign after a numeric value would tag the cell as a percentage cell format. The cell contents are not changed by this format, only the displayed value.

Some cell formats such as "numeric" or "currency" can also specify the number of decimal places.

This can allow invalid operations (such as doing multiplication on a cell containing a date), resulting in illogical results without an appropriate warning.

Cell formatting

Depending on the capability of the spreadsheet application, each cell (like its counterpart the "style" in a word processor) can be separately formatted using the attributes of either the content (point size, color, bold or italic) or the cell (border thickness, background shading, color). To aid the readability of a spreadsheet, cell formatting may be conditionally applied to data; for example, a negative number may be displayed in red.

A cell's formatting does not typically affect its content and depending on how cells are referenced or copied to other worksheets or applications, the formatting may not be carried with the content.

Named cells

Use of named column variables x & y in Microsoft Excel. Formula for y=x2 resembles Fortran, and Name Manager shows the definitions of x & y.

In most implementations, a cell, or group of cells in a column or row, can be "named" enabling the user to refer to those cells by a name rather than by a grid reference. Names must be unique within the spreadsheet, but when using multiple sheets in a spreadsheet file, an identically named cell range on each sheet can be used if it is distinguished by adding the sheet name. One reason for this usage is for creating or running macros that repeat a command across many sheets. Another reason is that formulas with named variables are readily checked against the algebra they are intended to implement (they resemble Fortran expressions). The use of named variables and named functions also makes the spreadsheet structure more transparent.

Cell reference

In place of a named cell, an alternative approach is to use a cell (or grid) reference. Most cell references indicate another cell in the same spreadsheet, but a cell reference can also refer to a cell in a different sheet within the same spreadsheet, or (depending on the implementation) to a cell in another spreadsheet entirely, or a value from a remote application.

A typical cell reference in "A1" style consists of one or two case-insensitive letters to identify the column (if there are up to 256 columns: A–Z and AA–IV) followed by a row number (e.g., in the range 1–65536). Either part can be relative (it changes when the formula it is in is moved or copied), or absolute (indicated with $ in front of the part concerned of the cell reference). The alternative "R1C1" reference style consists of the letter R, the row number, the letter C, and the column number; relative row or column numbers are indicated by enclosing the number in square brackets. Most current spreadsheets use the A1 style, some providing the R1C1 style as a compatibility option.

When the computer calculates a formula in one cell to update the displayed value of that cell, cell reference(s) in that cell, naming some other cell(s), causes the computer to fetch the value of the named cell(s).

A cell on the same "sheet" is usually addressed as:

=A1

A cell on a different sheet of the same spreadsheet is usually addressed as:

=SHEET2!A1             (that is; the first cell in sheet 2 of the same spreadsheet).

Some spreadsheet implementations in Excel allow cell references to another spreadsheet (not the currently open and active file) on the same computer or a local network. It may also refer to a cell in another open and active spreadsheet on the same computer or network that is defined as shareable. These references contain the complete filename, such as:

='C:\Documents and Settings\Username\My spreadsheets\[main sheet]Sheet1!A1

In a spreadsheet, references to cells automatically update when new rows or columns are inserted or deleted. Care must be taken, however, when adding a row immediately before a set of column totals to ensure that the totals reflect the values of the additional rows—which they often do not.

A circular reference occurs when the formula in one cell refers—directly, or indirectly through a chain of cell references—to another cell that refers back to the first cell. Many common errors cause circular references. However, some valid techniques use circular references. These techniques, after many spreadsheet recalculations, (usually) converge on the correct values for those cells.

Cell ranges

Likewise, instead of using a named range of cells, a range reference can be used. Reference to a range of cells is typical of the form (A1:A6), which specifies all the cells in the range A1 through to A6. A formula such as "=SUM(A1:A6)" would add all the cells specified and put the result in the cell containing the formula itself.

Sheets

In the earliest spreadsheets, cells were a simple two-dimensional grid. Over time, the model has expanded to include a third dimension, and in some cases a series of named grids, called sheets. The most advanced examples allow inversion and rotation operations which can slice and project the data set in various ways.

Formulas

Animation of a simple spreadsheet that multiplies values in the left column by 2, then sums the calculated values from the right column to the bottom-most cell. In this example, only the values in the A column are entered (10, 20, 30), and the remainder of cells are formulas. Formulas in the B column multiply values from the A column using relative references, and the formula in B4 uses the SUM() function to find the sum of values in the B1:B3 range.

A formula identifies the calculation needed to place the result in the cell it is contained within. A cell containing a formula, therefore, has two display components; the formula itself and the resulting value. The formula is normally only shown when the cell is selected by "clicking" the mouse over a particular cell; otherwise, it contains the result of the calculation.

A formula assigns values to a cell or range of cells, and typically has the format:

=expression

where the expression consists of:

When a cell contains a formula, it often contains references to other cells. Such a cell reference is a type of variable. Its value is the value of the referenced cell or some derivation of it. If that cell in turn references other cells, the value depends on the values of those. References can be relative (e.g., A1, or B1:B3), absolute (e.g., $A$1, or $B$1:$B$3) or mixed row– or column-wise absolute/relative (e.g., $A1 is column-wise absolute and A$1 is row-wise absolute).

The available options for valid formulas depend on the particular spreadsheet implementation but, in general, most arithmetic operations and quite complex nested conditional operations can be performed by most of today's commercial spreadsheets. Modern implementations also offer functions to access custom-build functions, remote data, and applications.

A formula may contain a condition (or nested conditions)—with or without an actual calculation—and is sometimes used purely to identify and highlight errors. In the example below, it is assumed the sum of a column of percentages (A1 through A6) is tested for validity and an explicit message put into the adjacent right-hand cell.

=IF(SUM(A1:A6) > 100, "More than 100%", SUM(A1:A6))

Further examples:

=IF(AND(A1<>"",B1<>""),A1/B1,"") means that if both cells A1 and B1 are not <> empty "", then divide A1 by B1 and display, other do not display anything.
=IF(AND(A1<>"",B1<>""),IF(B1<>0,A1/B1,"Division by zero"),"") means that if cells A1 and B1 are not empty, and B1 is not zero, then divide A1 by B1, if B1 is zero, then display "Division by zero", and do not display anything if either A1 and B1 are empty.
=IF(OR(A1<>"",B1<>""),"Either A1 or B1 show text","") means to display the text if either cells A1 or B1 are not empty.

The best way to build up conditional statements is step by step composing followed by trial and error testing and refining code.

A spreadsheet does not have to contain any formulas at all, in which case it could be considered merely a collection of data arranged in rows and columns (a database) like a calendar, timetable, or simple list. Because of its ease of use, formatting, and hyperlinking capabilities, many spreadsheets are used solely for this purpose.

Functions

Use of user-defined function sq(x) in Microsoft Excel.

Spreadsheets usually contain several supplied functions, such as arithmetic operations (for example, summations, averages, and so forth), trigonometric functions, statistical functions, and so forth. In addition there is often a provision for user-defined functions. In Microsoft Excel, these functions are defined using Visual Basic for Applications in the supplied Visual Basic editor, and such functions are automatically accessible on the worksheet. Also, programs can be written that pull information from the worksheet, perform some calculations, and report the results back to the worksheet. In the figure, the name sq is user-assigned, and the function sq is introduced using the Visual Basic editor supplied with Excel. Name Manager displays the spreadsheet definitions of named variables x & y.

Subroutines

Subroutine in Microsoft Excel writes values calculated using x into y.

Functions themselves cannot write into the worksheet but simply return their evaluation. However, in Microsoft Excel, subroutines can write values or text found within the subroutine directly to the spreadsheet. The figure shows the Visual Basic code for a subroutine that reads each member of the named column variable x, calculates its square, and writes this value into the corresponding element of named column variable y. The y column contains no formula because its values are calculated in the subroutine, not on the spreadsheet, and simply are written in.

Remote spreadsheet

Whenever a reference is made to a cell or group of cells that are not located within the current physical spreadsheet file, it is considered as accessing a "remote" spreadsheet. The contents of the referenced cell may be accessed either on the first reference with a manual update or more recently in the case of web-based spreadsheets, as a near real-time value with a specified automatic refresh interval.

Charts

Graph made using Microsoft Excel

Many spreadsheet applications permit charts and graphs (e.g., histograms, pie charts) to be generated from specified groups of cells that are dynamically re-built as cell contents change. The generated graphic component can either be embedded within the current sheet or added as a separate object. To create an Excel histogram, a formula based on the REPT function can be used.

Multi-dimensional spreadsheets

In the late 1980s and early 1990s, first Javelin Software and Lotus Improv appeared. Unlike models in a conventional spreadsheet, they utilized models built on objects called variables, not on data in cells of a report. These multi-dimensional spreadsheets enabled viewing data and algorithms in various self-documenting ways, including simultaneous multiple synchronized views. For example, users of Javelin could move through the connections between variables on a diagram while seeing the logical roots and branches of each variable. This is an example of what is perhaps its primary contribution of the earlier Javelin—the concept of traceability of a user's logic or model structure through its twelve views. A complex model can be dissected and understood by others who had no role in its creation.

In these programs, a time series, or any variable, was an object in itself, not a collection of cells that happen to appear in a row or column. Variables could have many attributes, including complete awareness of their connections to all other variables, data references, and text and image notes. Calculations were performed on these objects, as opposed to a range of cells, so adding two-time series automatically aligns them in calendar time, or in a user-defined time frame. Data were independent of worksheets—variables, and therefore data, could not be destroyed by deleting a row, column, or entire worksheet. For instance, January's costs are subtracted from January's revenues, regardless of where or whether either appears in a worksheet. This permits actions later used in pivot tables, except that flexible manipulation of report tables, was but one of many capabilities supported by variables. Moreover, if costs were entered by week and revenues by month, the program could allocate or interpolate as appropriate. This object design enabled variables and whole models to reference each other with user-defined variable names and to perform multidimensional analysis and massive, but easily editable consolidations.

Trapeze, a spreadsheet on the Mac, went further and explicitly supported not just table columns, but also matrix operators.

Logical spreadsheets

Spreadsheets that have a formula language based upon logical expressions, rather than arithmetic expressions are known as logical spreadsheets. Such spreadsheets can be used to reason deductively about their cell values.

Programming issues

Just as the early programming languages were designed to generate spreadsheet printouts, programming techniques themselves have evolved to process tables (also known as spreadsheets or matrices) of data more efficiently in the computer itself.

End-user development

Spreadsheets are a popular end-user development tool. EUD denotes activities or techniques in which people who are not professional developers create automated behavior and complex data objects without significant knowledge of a programming language. Many people find it easier to perform calculations in spreadsheets than by writing the equivalent sequential program. This is due to several traits of spreadsheets.

  • They use spatial relationships to define program relationships. Humans have highly developed intuitions about spaces, and of dependencies between items. Sequential programming usually requires typing line after line of text, which must be read slowly and carefully to be understood and changed.
  • They are forgiving, allowing partial results and functions to work. One or more parts of a program can work correctly, even if other parts are unfinished or broken. This makes writing and debugging programs easier, and faster. Sequential programming usually needs every program line and character to be correct for a program to run. One error usually stops the whole program and prevents any result. Though this user-friendliness is benefit of spreadsheet development, it often comes with increased risk of errors.
  • Modern spreadsheets allow for secondary notation. The program can be annotated with colors, typefaces, lines, etc. to provide visual cues about the meaning of elements in the program.
  • Extensions that allow users to create new functions can provide the capabilities of a functional language.
  • Extensions that allow users to build and apply models from the domain of machine learning.
  • Spreadsheets are versatile. With their boolean logic and graphics capabilities, even electronic circuit design is possible.
  • Spreadsheets can store relational data and spreadsheet formulas can express all queries of SQL. There exists a query translator, which automatically generates the spreadsheet implementation from the SQL code.

Spreadsheet programs

A "spreadsheet program" is designed to perform general computation tasks using spatial relationships rather than time as the primary organizing principle.

It is often convenient to think of a spreadsheet as a mathematical graph, where the nodes are spreadsheet cells, and the edges are references to other cells specified in formulas. This is often called the dependency graph of the spreadsheet. References between cells can take advantage of spatial concepts such as relative position and absolute position, as well as named locations, to make the spreadsheet formulas easier to understand and manage.

Spreadsheets usually attempt to automatically update cells when the cells depend on change. The earliest spreadsheets used simple tactics like evaluating cells in a particular order, but modern spreadsheets calculate following a minimal recomputation order from the dependency graph. Later spreadsheets also include a limited ability to propagate values in reverse, altering source values so that a particular answer is reached in a certain cell. Since spreadsheet cell formulas are not generally invertible, though, this technique is of somewhat limited value.

Many of the concepts common to sequential programming models have analogs in the spreadsheet world. For example, the sequential model of the indexed loop is usually represented as a table of cells, with similar formulas (normally differing only in which cells they reference).

Spreadsheets have evolved to use scripting programming languages like VBA as a tool for extensibility beyond what the spreadsheet language makes easy.

Shortcomings

While spreadsheets represented a major step forward in quantitative modeling, they have deficiencies. Their shortcomings include the perceived unfriendliness of alpha-numeric cell addresses.

  • Research by ClusterSeven has shown huge discrepancies in the way financial institutions and corporate entities understand, manage and police their often vast estates of spreadsheets and unstructured financial data (including comma-separated values (CSV) files and Microsoft Access databases). One study in early 2011 of nearly 1,500 people in the UK found that 57% of spreadsheet users have never received formal training on the spreadsheet package they use. 72% said that no internal department checks their spreadsheets for accuracy. Only 13% said that Internal Audit reviews their spreadsheets, while a mere 1% receive checks from their risk department.
  • Spreadsheets can have reliability problems. Research studies estimate that around 1% of all formulas in operational spreadsheets are in error.
Despite the high error risks often associated with spreadsheet authorship and use, specific steps can be taken to significantly enhance control and reliability by structurally reducing the likelihood of error occurrence at their source.
  • The practical expressiveness of spreadsheets can be limited unless their modern features are used. Several factors contribute to this limitation. Implementing a complex model on a cell-at-a-time basis requires tedious attention to detail. Authors have difficulty remembering the meanings of hundreds or thousands of cell addresses that appear in formulas.
These drawbacks are mitigated by the use of named variables for cell designations, and employing variables in formulas rather than cell locations and cell-by-cell manipulations. Graphs can be used to show instantly how results are changed by changes in parameter values. The spreadsheet can be made invisible except for a transparent user interface that requests pertinent input from the user, displays results requested by the user, creates reports, and has built-in error traps to prompt correct input.
  • Similarly, formulas expressed in terms of cell addresses are hard to keep straight and hard to audit. Research shows that spreadsheet auditors who check numerical results and cell formulas find no more errors than auditors who only check numerical results. That is another reason to use named variables and formulas employing named variables.
Specifically, spreadsheets typically contain many copies of the same formula. When the formula is modified, the user has to change every cell containing that formula. In contrast, most computer languages allow a formula to appear only once in the code and achieve repetition using loops: making them much easier to implement and audit.
  • The alteration of a dimension demands major surgery. When rows (or columns) are added to or deleted from a table, one has to adjust the size of many downstream tables that depend on the table being changed. In the process, it is often necessary to move other cells around to make room for the new columns or rows and to adjust graph data sources. In large spreadsheets, this can be extremely time-consuming.
  • Adding or removing a dimension is so difficult, one generally has to start over. The spreadsheet as a paradigm forces one to decide on dimensionality right of the beginning of one's spreadsheet creation, even though it is often most natural to make these choices after one's spreadsheet model has matured. The desire to add and remove dimensions also arises in parametric and sensitivity analyses.
  • Collaboration in authoring spreadsheet formulas can be difficult when such collaboration occurs at the level of cells and cell addresses.

Other problems associated with spreadsheets include:

  • Some sources advocate the use of specialized software instead of spreadsheets for some applications (budgeting, statistics)
  • Many spreadsheet software products, such as Microsoft Excel (versions prior to 2007) and OpenOffice.org Calc (versions prior to 2008), have a capacity limit of 65,536 rows by 256 columns (216 and 28 respectively). This can present a problem for people using very large datasets, and may result in data loss. In spite of the time passed, a recent example is the loss of COVID-19 positives in the British statistics for September and October 2020.
  • Lack of auditing and revision control. This makes it difficult to determine who changed what and when. This can cause problems with regulatory compliance. Lack of revision control greatly increases the risk of errors due to the inability to track, isolate and test changes made to a document.
  • Lack of security. Spreadsheets lack controls on who can see and modify particular data. This, combined with the lack of auditing above, can make it easy for someone to commit fraud.
  • Because they are loosely structured, it is easy for someone to introduce an error, either accidentally or intentionally, by entering information in the wrong place or expressing dependencies among cells (such as in a formula) incorrectly.
  • The results of a formula (example "=A1*B1") applies only to a single cell (that is, the cell the formula is located in—in this case perhaps C1), even though it can "extract" data from many other cells, and even real-time dates and actual times. This means that to cause a similar calculation on an array of cells, an almost identical formula (but residing in its own "output" cell) must be repeated for each row of the "input" array. This differs from a "formula" in a conventional computer program, which typically makes one calculation that it applies to all the input in turn. With current spreadsheets, this forced repetition of near-identical formulas can have detrimental consequences from a quality assurance standpoint and is often the cause of many spreadsheet errors. Some spreadsheets have array formulas to address this issue.
  • Trying to manage the sheer volume of spreadsheets that may exist in an organization without proper security, audit trails, the unintentional introduction of errors, and other items listed above can become overwhelming.

While there are built-in and third-party tools for desktop spreadsheet applications that address some of these shortcomings, awareness, and use of these is generally low. A good example of this is that 55% of Capital market professionals "don't know" how their spreadsheets are audited; only 6% invest in a third-party solution.

Spreadsheet risk

Spreadsheet risk is the risk associated with deriving a materially incorrect value from a spreadsheet application that will be utilized in making a related (usually numerically based) decision. Examples include the valuation of an asset, the determination of financial accounts, the calculation of medicinal doses, or the size of a load-bearing beam for structural engineering. The risk may arise from inputting erroneous or fraudulent data values, from mistakes (or incorrect changes) within the logic of the spreadsheet or the omission of relevant updates (e.g., out of date exchange rates). Some single-instance errors have exceeded US$1 billion. Because spreadsheet risk is principally linked to the actions (or inaction) of individuals it is defined as a sub-category of operational risk.

Despite this, research carried out by ClusterSeven revealed that around half (48%) of c-level executives and senior managers at firms reporting annual revenues over £50m said there were either no usage controls at all or poorly applied manual processes over the use of spreadsheets at the firms.

In 2013 Thomas Herndon, a graduate student of economics at the University of Massachusetts Amherst found major coding flaws in the spreadsheet used by the economists Carmen Reinhart and Kenneth Rogoff in Growth in a Time of Debt, a very influential 2010 journal article. The Reinhart and Rogoff article was widely used as justification to drive 2010–2013 European austerity programs.

Clinical metagenomic sequencing

From Wikipedia, the free encyclopedia

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material (DNA or RNA) in clinical samples from patients. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

Outside of clinical medicine, similar work is done to identify genetic material in environmental samples, such as ponds or soil.

Definition

It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail.

Laboratory workflow

A typical mNGS workflow consists of the following steps:

  • Sample acquisition: the most commonly used samples for metagenomic sequencing are blood, stool, cerebrospinal fluid (CSF), urine, or nasopharyngeal swabs. Among these, blood and CSF are the cleanest, having less background noise, while the others are expected to have a great amount of commensals and/or opportunistic infections and thus have more background noise. Samples should be collected with much caution as surgical specimens could be contaminated during handling of the biopsy; for example, lumbar punctures to obtain CSF specimens may be contaminated during the procedure.
  • RNA/DNA extraction: the DNA and the RNA of the sample is extracted by using an extraction kit. If there is a strong previous suspicion of the pathogen genome composition and since the amount of pathogen nucleic acid in more noise samples is overwhelmed by the RNA/DNA of other organisms, selecting an extraction kit of only RNA or DNA would be a more specific and convenient approach. Some commerciable available kits are for example RNeasy PowerSoil Total RNA kit (Qiagen), RNeasy Minikit (Qiagen), MagMAX Viral Isolation kit (ABI), Viral RNA Minikit (Qiagen).
  • Optimization strategies for library preparation: because of high levels of background noise in metagenomic sequencing, several target enrichment procedures have been developed that aim to increase the probability of capturing pathogen-derived transcripts and/or genomes. Generally there are two main approaches that can be used to increase the amount of pathogen signal in a sample: negative selection and positive enrichment.
  1. Negative selection (background depletion or subtraction) targets and eliminates the host and microbiome genomic background, while aiming to preserve the nucleic acid derived from the pathogens of interest. Degradation of genomic background can be performed through broad-spectrum digestion with nucleases, such as DNase I for DNA background, or by removing abundant RNA species (rRNA, mtRNA, globin mRNA) using sequence-specific RNA depletion kits. Also CRISPR-Cas9-based approaches can be performed to target and deplete human mitochrondrial RNA for example. Generally, however, subtraction approaches lead to a certain degree of loss of the targeted pathogen genome, as poor recovery may occur during the cleanup.
  2. Positive enrichment is used to increase pathogen signal rather than reducing background noise. This is commonly done through hybridization-based target capture by probes, which are used to pull out nucleic acid of interest for downstream amplification and sequencing. Panviral probes have been shown to successfully identify diverse types of pathogens in different clinical fluid and respiratory samples, and have been used for sequencing and characterization of novel viruses. However, the probe approach includes extra hybridization and cleanup steps, requiring higher sample input, increasing the risk of losing the target, and increasing the cost and hands-on time.
  • High-throughput sequencing: all the nucleic acids fragments of the library are sequenced. The sequencing platform to be used is chosen depending on different factors such as laboratory's research objectives, personal experience and skill levels. So far, the Illumina MiSeq system has proven to be the most commonly used platform for infectious disease research, pathogen surveillance, and pathogen discovery in research and public health. The instrument is compact enough to fit on a laboratory bench, has a fast runtime as compared to other similar platforms, and has a strong user support community. However, with further improvements of this technology and with additional error reduction and software stabilization, the MinION may be an excellent addition to the arsenal of current sequencing technologies for routine surveillance, especially in smaller laboratories with limited resources. For instance, the MinION was successfully used in the ZiBRA project for real-time Zika virus surveillance of mosquitoes and humans in Brazil, and in Guinea to perform real-time surveillance during the ongoing Ebola outbreak. In general, for limited resources IlluminaMiSeq, iSeq, Ion Torrent PGM, Oxford Nanopore, MinION are used. While for substantial resources Illumina NextSeq, NovaSeq, PacBio Sequel, Oxford Nanopore and PromethION are preferred. Moreover, for pathogen sequencing the use of controls is of fundamental importance ensuring mNGS assay quality and stability over time; PhiX is used as sequencing control, then the other controls include the positive control, an additional internal control (e.g., spiked DNA or other known pathogen) and a negative control (usually water sample).
  • Bioinformatic analysis: Whereas the sequencing itself has been made widely accessible and more user friendly, the data analysis and interpretation that follows still requires specialized bioinformatics expertise and appropriate computational resources. The raw data from a sequencing platform is usually cleaned, trimmed, and filtered to remove low-quality and duplicate reads. Removal of the host genome/transcriptome reads is performed to decrease background noise (e.g., host and environmental reads) and increase the frequency of pathogen reads. This step will also decrease downstream analysis time. Further background noise removal is achieved by mapping of sample reads to the reads from the negative control to ensure elimination of any contaminating reads, such as those associated with the reagents or sampling storage medium. The remaining reads are usually assembled de novo to produce long stretches of sequences called contigs. Taxonomic identification of the resulting contigs is performed by matching them to the genomes and sequences in nucleotide or protein databases; for this, various versions of BLAST are most commonly used. Advanced characterization of bacterial organisms can be also performed, allowing to obtain the necessary depth and breadth of coverage for genetic characterization results. Gene calling can be performed in a variety of ways, including RAST or using NCBI services at the time of full genome submission. Results of multiple annotation tools can be compared for accuracy and completeness and, if necessary, merged using BEACON. For characterization of antibiotic resistance genes, the Resistance Gene Identifier from the Comprehensive Antibiotic Resistance Database (CARD) is commonly used. To characterize virulence factor genes, ShortBRED offers analyses with a customized database from the Virulence Factor Database.

Applications

Infectious diseases diagnosis

One way to detect these pathogens is detect part of their genome by metagenomics sequencing (Next Generation Sequencing-mNGS), which can be targeted or untargeted.

Targeted

Because of that, the sensitivity to detect microorganisms that are being targeted usually increases, but this comes with a limitation of the amount of identifiable pathogens.

Untargeted

The untargeted analysis is a metagenomic "shotgun" approach. The whole DNA and/or RNA is sequenced with this approach using universal primers. The resultant mNGS reads can be assembled into partial or complete genomes. These genome sequences allow to monitor hospital outbreaks to facilitate infection control and public health surveillance. Also, they can be used for subtyping (identificate a specific genetic variant of a microorganism).

Untargeted mNGS is the most promising approach to analyse clinical samples and provide a comprehensive diagnosis of infections. Various groups have validated mNGS in Clinical Laboratory Improvement Amendments (CLIA), such as meningitis or encephalitis, sepsis and pneumonia. This method can be very helpful in the settings where no exact infectious etiology is suspected. For example, in patients with suspected pneumonia, identification of the underlying infectious etiology as in COVID-19 has important clinical and public health implications.

The traditional method consists on formulating a differential diagnosis on the basis of the patient's history, a clinical presentation, imaging findings and laboratory testing. But here it is suggested a different way of diagnosis; metagenomic next-generation sequencing (NGS) is a promising method because a comprehensive spectrum of a potential causes (viral, bacterial; fungus and parasitic) can be identify by a single assay.

Below are some examples of the metagenomic sequencing application in infectious diseases diagnosis.

Examples

Diagnosis of meningitis and encephalitis

The traditional method that is used to the diagnosis of infectious diseases has been challenged in some cases: neuroinflamatory diseases, lack of diagnostic tests for rare pathogens and the limited availability and volume of the Central Nervous System (CNS) samples, because of the requirement for invasive procedures. Owing to these problems, some assays suggest a different way of diagnosis, which is the metagenomic next-generation sequencing (NGS); this is a promising method for diagnosis because a comprehensive spectrum of potential causes (bacterial, viral, fungus and parasitic) can be identify by a single study. Summarising, NGS can identify a broad range of pathogens in a single test.

In some articles they evaluate the clinical usefulness of metagenomic NGS for diagnosis neurologic infections, in parallel with conventional microbiologic testing. It has been seen that the highest diagnostic yield resulted from a combination of metagenomics NGS of CSF and conventional testing, including serologic testing and testing of sample types other than CSF.

Moreover, some findings from different studies have shown that neurologic infections remain undiagnosed in a proportion of patients despite conventional testing and they demonstrate the potential usefulness of clinical metagenomic NGS testing in these patients.

The results of metagenomic NGS can also be valuable even when concordant with results of conventional testing, not only providing reassurance that the conventionally obtained diagnosis is correct but also potentially detecting or ruling out coinfections, specially in immunocompromised patients.

Metagenomic is fundamentally a direct-detection method and relies on the presence of nucleic acid form the causative pathogen in the CSF samples.

Study of antimicrobial resistance

Antimicrobial resistance is a health problem needed to resolve.

Nowadays to detect resistances of different microbes is used a technique called Antibiotic Sensitivity (AST), but several studies have discovered that bacterial resistance is in the genoma and it is transferred by horizontal way (HGT), so sequencing methods are being developed to ease the identification and characterization of those genomes and metagenomes. For the moment exist the following methods to detect antimicrobial resistances:

  • AST: an advantage about this method is that it gives information for patients treatment. There are also some disadvantages, one of them is that this technique is only useful in cultivable bacteria or that is needed competent personal.
  • Sequencing methods: some advantages of those methods over the AST technique are that it is rapid and sensible, it is useful on both bacteria that grow on artificial media and those that do not, and it permits compare studies in several organisms. One type of sequencing method can be used in preference to another depending on the type of the sample, for a genomic sample assembly-based methods is used; for a metagenomic sample it is preferable to use read-based methods.

It is noted that metagenomic sequencing methods have provided better results than genomics ones, due to these present less number of negative falses. Within metagenomics sequencing, functional metagenomic is a powerful approach for characterizing resistomes; a metagenomic library is generated by cloning the total community DNA extracted from a sample into an expression vector, this library is assayed for antimicrobial resistance by plating on selective media that are lethal to the wild-type host. The selected inserts from the surviving recombinant, antimicrobial-resistant host cells are then sequenced, and resulting sequences are subsequently assembled and annotated (PARFuMS).

Functional metagenomics has enabled the discovery of several new antimicrobial resistance mechanisms and their related genes, one such example is the recently discovered tetracycline resistance mechanism by tetracycline destructases.

In conclusion is important to incorporate not only the antimicrobial resistance gene sequence and mechanism but also the genomic context, host bacterial species and geographic location (metagenome).

Pandemic preparedness

Potentially dangerous pathogens such as ebolaviruses, coronaviruses etc., and the closest genetic relative for unknown pathogens, could be identified immediately, prompting further follow-up. Its role in the future of pandemic preparedness is anticipated and could exist as the earliest surveillance system we may have to detect outbreaks of unknown etiology and to respond in an opportune manner.

Clinical microbiome analyses

The use of mNGS to characterize the microbiome has made possible the development of bacterial probiotics to be administrated as pills, for example, as a treatment of Clostridium difficile-associated diseases.

Human host response analyses

The studying of genes expression allows us to characterize a lot of infections, for example infections due to Staphylococcus aureus, Lyme disease, candidiasis, tuberculosis and influenza. Also, this approach can be used for cancer classification.

RNAseq analysis have a lot of other purposes and applications such as to identify novel or under appreciated host–microbial interactions directly from clinical samples, to make indirect diagnosis on the basis of a pathogen specific human host response and to discriminate infectious versus noninfectious causes of acute illness.

Applications in oncology

Challenges

Clinical utility

Even though, most of the metagenomics outcomes data generated consist of case reports which belie the increasing interest on diagnostic metagenomics.

eAccordingly, there is an overall lack of penetration of this approach into the clinical microbiology laboratory, as making a diagnosis with metagenomics is still basically only useful in the context of case report but not for a true daily diagnostic purpose.

Recent cost-effectiveness modelling of metagenomics in the diagnosis of fever of unknown origin concluded that, even after limiting the cost of diagnostic metagenomics to $100–1000 per test, it would require 2.5-4 times the diagnostic yield of computed tomography of the abdomen/pelvis in order to be cost neutral and cautioned against ‘widespread rush’ to deploy metagenomic testing.

Furthermore, in the case of the discovery of potential novel infectious agents, usually only the positives results are published event though the vast majority of sequenced cases are negative, thus resulting in a very biased information. Besides, most of the discovery work based in metagenomic that precede the current diagnostic-based work even mentioned the known agents detected while screening unsolved cases for completely novel causes.

Laboratory validity

To date, most published testing has been run in an unvalidated, unreportable manner. The ‘standard microbiological testing’ that samples are subjected to prior to metagenomics is variable and has not included reverse transcription-polymerase chain reaction (RT-PCR) testing for common respiratory viruses or, routinely 16S/ITS PCR testing.

Given the relative costs of validating and performing metagenomic versus 16S/ITS PCR testing, the second one is considered an easier and more efficient option. A potential exception to the 16S/ITS testing is blood, given the huge amount of 16S sequence available, making clean cutoffs for diagnostic purposes problematic.

Furthermore, almost all of the organisms detected by metagenomics for which there is an associated treatment and thus would be truly actionable are also detectable by 16S/ITS testing (or 16S/ITS-NGS). This makes questionable the utility of metagenomics in many diagnostic cases.

One of the main points to accomplish laboratory validity is the presence of reference standards and controls when performing mNGS assays. They are needed to ensure the quality and stability of this technique over time.

Sense and sensitivity

In clinical microbiology labs, the quantitation of microbial burden is considered a routine function as it is associated with the severity and progression of the disease. To achieve a good quantitation a high sensitivity of the technique is needed.

Whereas interfering substances represent a common problem to clinical chemistry or to PCR diagnostics, the degree of interference from host (for example, in tissue biopsies) or nonpathogen nucleic acids (for example, in stool) in metagenomics is a new twist. In addition, due to the relative size of the human genome in comparison with microbial genomes the interference can occur at low levels of contaminating material.

Another challenge for clinical metagenomics in regards to sensitivity is the diagnosis of coinfections where there are present high-titer pathogens that can generate biased results as they may disproportionately soak up reads and make difficult to distinguish the less predominant pathogens.

In addition to issues with interfering substances, specially in the diagnosis area, accurate quantitation and sensitivities are essential as a confusion in the results can affect to a third person, the patient. For these reason, practitioners currently have to be keenly aware of the index-swapping issues associated with Illumina sequencing which can lead to trace incorrectly barcoded samples.

Since metagenomics has typically been used on patients for whom every other test to date has been negative, questions surrounding analytical sensitivity haven been less germane. But, for ruling out infections causes being one of the more important roles for clinical metagenomics it is essential to be capable to perform a deep enough sequencing to achieve adequate sensitivities. One way could be developing novel library preparation techniques.

Cost considerations

In fact the Illumina monopoly on high-quality next-generation sequencing reagents and the need for accurate and deep sequencing in metagenomics mean that the sequencing reagents alone cost more than FDA-approved syndromic testing panels. Also additional direct costs of metagenomics such as extraction, library preparation, and computational analysis have to be considered.

In general, metagenomic sequencing is most useful and cost efficient for pathogen discovery when at least one of the following criteria are met:

  1. the identification of the organism is not sufficient (one desires to go beyond discovery to produce data for genomic characterization),
  2. a coinfection is suspected,
  3. other simpler assays are ineffective or will take an inordinate amount of time,
  4. screening of environmental samples for previously undescribed or divergent pathogens.

Atomic battery

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Atomic_battery

An atomic battery, nuclear battery, radioisotope battery or radioisotope generator is a device which uses energy from the decay of a radioactive isotope to generate electricity. Like nuclear reactors, they generate electricity from nuclear energy, but differ in that they do not use a chain reaction. Although commonly called batteries, they are technically not electrochemical and cannot be charged or recharged. They are very costly, but have an extremely long life and high energy density, and so they are typically used as power sources for equipment that must operate unattended for long periods of time, such as spacecraft, pacemakers, underwater systems and automated scientific stations in remote parts of the world.

Nuclear battery technology began in 1913, when Henry Moseley first demonstrated a current generated by charged particle radiation. The field received considerable in-depth research attention for applications requiring long-life power sources for space needs during the 1950s and 1960s. In 1954 RCA researched a small atomic battery for small radio receivers and hearing aids. Since RCA's initial research and development in the early 1950s, many types and methods have been designed to extract electrical energy from nuclear sources. The scientific principles are well known, but modern nano-scale technology and new wide-bandgap semiconductors have created new devices and interesting material properties not previously available.

Nuclear batteries can be classified by energy conversion technology into two main groups: thermal converters and non-thermal converters. The thermal types convert some of the heat generated by the nuclear decay into electricity. The most notable example is the radioisotope thermoelectric generator (RTG), often used in spacecraft. The non-thermal converters extract energy directly from the emitted radiation, before it is degraded into heat. They are easier to miniaturize and do not require a thermal gradient to operate, so they are suitable for use in small-scale applications. The most notable example is the betavoltaic cell.

Atomic batteries usually have an efficiency of 0.1–5%. High-efficiency betavoltaic devices can reach 6–8% efficiency.

Thermal conversion

Thermionic conversion

A thermionic converter consists of a hot electrode, which thermionically emits electrons over a space-charge barrier to a cooler electrode, producing a useful power output. Caesium vapor is used to optimize the electrode work functions and provide an ion supply (by surface ionization) to neutralize the electron space charge.

Thermoelectric conversion

Radioisotope-powered cardiac pacemaker being developed by the Atomic Energy Commission, is planned to stimulate the pulsing action of a malfunctioning heart. Circa 1967.

A radioisotope thermoelectric generator (RTG) uses thermocouples. Each thermocouple is formed from two wires of different metals (or other materials). A temperature gradient along the length of each wire produces a voltage gradient from one end of the wire to the other; but the different materials produce different voltages per degree of temperature difference. By connecting the wires at one end, heating that end but cooling the other end, a usable, but small (millivolts), voltage is generated between the unconnected wire ends. In practice, many are connected in series (or in parallel) to generate a larger voltage (or current) from the same heat source, as heat flows from the hot ends to the cold ends. Metal thermocouples have low thermal-to-electrical efficiency. However, the carrier density and charge can be adjusted in semiconductor materials such as bismuth telluride and silicon germanium to achieve much higher conversion efficiencies.

Thermophotovoltaic conversion

Thermophotovoltaic (TPV) cells work by the same principles as a photovoltaic cell, except that they convert infrared light (rather than visible light) emitted by a hot surface, into electricity. Thermophotovoltaic cells have an efficiency slightly higher than thermoelectric couples and can be overlaid on thermoelectric couples, potentially doubling efficiency. The University of Houston TPV Radioisotope Power Conversion Technology development effort is aiming at combining thermophotovoltaic cells concurrently with thermocouples to provide a 3- to 4-fold improvement in system efficiency over current thermoelectric radioisotope generators.

Stirling generators

A Stirling radioisotope generator is a Stirling engine driven by the temperature difference produced by a radioisotope. A more efficient version, the advanced Stirling radioisotope generator, was under development by NASA, but was cancelled in 2013 due to large-scale cost overruns.

Non-thermal conversion

Non-thermal converters extract energy from emitted radiation before it is degraded into heat. Unlike thermoelectric and thermionic converters their output does not depend on the temperature difference. Non-thermal generators can be classified by the type of particle used and by the mechanism by which their energy is converted.

Electrostatic conversion

Energy can be extracted from emitted charged particles when their charge builds up in a conductor, thus creating an electrostatic potential. Without a dissipation mode the voltage can increase up to the energy of the radiated particles, which may range from several kilovolts (for beta radiation) up to megavolts (alpha radiation). The built up electrostatic energy can be turned into usable electricity in one of the following ways.

Direct-charging generator

A direct-charging generator consists of a capacitor charged by the current of charged particles from a radioactive layer deposited on one of the electrodes. Spacing can be either vacuum or dielectric. Negatively charged beta particles or positively charged alpha particles, positrons or fission fragments may be utilized. Although this form of nuclear-electric generator dates back to 1913, few applications have been found in the past for the extremely low currents and inconveniently high voltages provided by direct-charging generators. Oscillator/transformer systems are employed to reduce the voltages, then rectifiers are used to transform the AC power back to direct current.

English physicist H. G. J. Moseley constructed the first of these. Moseley's apparatus consisted of a glass globe silvered on the inside with a radium emitter mounted on the tip of a wire at the center. The charged particles from the radium created a flow of electricity as they moved quickly from the radium to the inside surface of the sphere. As late as 1945 the Moseley model guided other efforts to build experimental batteries generating electricity from the emissions of radioactive elements.

Electromechanical conversion

Electromechanical atomic batteries use the buildup of charge between two plates to pull one bendable plate towards the other, until the two plates touch, discharge, equalizing the electrostatic buildup, and spring back. The mechanical motion produced can be used to produce electricity through flexing of a piezoelectric material or through a linear generator. Milliwatts of power are produced in pulses depending on the charge rate, in some cases multiple times per second (35 Hz).

Radiovoltaic conversion

A radiovoltaic (RV) device converts the energy of ionizing radiation directly into electricity using a semiconductor junction, similar to the conversion of photons into electricity in a photovoltaic cell. Depending on the type of radiation targeted, these devices are called alphavoltaic (AV, αV), betavoltaic (BV, βV) and/or gammavoltaic (GV, γV). Betavoltaics have traditionally received the most attention since (low-energy) beta emitters cause the least amount of radiative damage, thus allowing a longer operating life and less shielding. Interest in alphavoltaic and (more recently) gammavoltaic devices is driven by their potential higher efficiency.

Alphavoltaic conversion

Alphavoltaic devices use a semiconductor junction to produce electrical energy from energetic alpha particles.

Betavoltaic conversion

Betavoltaic devices use a semiconductor junction to produce electrical energy from energetic beta particles (electrons). A commonly used source is the hydrogen isotope tritium.

Betavoltaic devices are particularly well-suited to low-power electrical applications where long life of the energy source is needed, such as implantable medical devices or military and space applications.

Gammavoltaic conversion

Gammavoltaic devices use a semiconductor junction to produce electrical energy from energetic gamma particles (high-energy photons). They have only been considered in the 2010s but were proposed as early as 1981.

A gammavoltaic effect has been reported in perovskite solar cells. Another patented design involves scattering of the gamma particle until its energy has decreased enough to be absorbed in a conventional photovoltaic cell. Gammavoltaic designs using diamond and Schottky diodes are also being investigated.

Radiophotovoltaic (optoelectric) conversion

In a radiophotovoltaic (RPV) device the energy conversion is indirect: the emitted particles are first converted into light using a radioluminescent material (a scintillator or phosphor), and the light is then converted into electricity using a photovoltaic cell. Depending on the type of particle targeted, the conversion type can be more precisely specified as alphaphotovoltaic (APV or α-PV), betaphotovoltaic (BPV or β-PV) or gammaphotovoltaic (GPV or γ-PV).

Radiophotovoltaic conversion can be combined with radiovoltaic conversion to increase the conversion efficiency.

Pacemakers

Medtronic and Alcatel developed a plutonium-powered pacemaker, the Numec NU-5, powered by a 2.5 Ci slug of plutonium 238, first implanted in a human patient in 1970. The 139 Numec NU-5 nuclear pacemakers implanted in the 1970s are expected to never need replacing, an advantage over non-nuclear pacemakers, which require surgical replacement of their batteries every 5 to 10 years. The plutonium "batteries" are expected to produce enough power to drive the circuit for longer than the 88-year halflife of the plutonium.

Radioisotopes used

Atomic batteries use radioisotopes that produce low energy beta particles or sometimes alpha particles of varying energies. Low energy beta particles are needed to prevent the production of high energy penetrating Bremsstrahlung radiation that would require heavy shielding. Radioisotopes such as tritium, nickel-63, promethium-147, and technetium-99 have been tested. Plutonium-238, curium-242, curium-244 and strontium-90 have been used. Besides the nuclear properties of the used isotope, there are also the issues of chemical properties and availability. A product deliberately produced via neutron irradiation or in a particle accelerator is more difficult to obtain than a fission product easily extracted from spent nuclear fuel.

Plutonium-238 must be deliberately produced via neutron irradiation of Neptunium-237 but it can be easily converted into a stable plutonium oxide ceramic. Strontium-90 is easily extracted from spent nuclear fuel but must be converted into the perovskite form strontium titanate to reduce its chemical mobility, cutting power density in half. Caesium-137, another high yield nuclear fission product, is rarely used in atomic batteries because it is difficult to convert into chemically inert substances. Another undesirable property of Cs-137 extracted from spent nuclear fuel is that it is contaminated with other isotopes of Caesium which reduce power density further.

Micro-batteries

In the field of microelectromechanical systems (MEMS), nuclear engineers at the University of Wisconsin, Madison have explored the possibilities of producing minuscule batteries which exploit radioactive nuclei of substances such as polonium or curium to produce electric energy. As an example of an integrated, self-powered application, the researchers have created an oscillating cantilever beam that is capable of consistent, periodic oscillations over very long time periods without the need for refueling. Ongoing work demonstrate that this cantilever is capable of radio frequency transmission, allowing MEMS devices to communicate with one another wirelessly.

These micro-batteries are very light and deliver enough energy to function as power supply for use in MEMS devices and further for supply for nanodevices.

The radiation energy released is transformed into electric energy, which is restricted to the area of the device that contains the processor and the micro-battery that supplies it with energy.

Algorithmic information theory

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Algorithmic_information_theory ...