Search This Blog

Sunday, August 20, 2023

Linear least squares

From Wikipedia, the free encyclopedia

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Main formulations

The three main linear least squares formulations are:

  • Ordinary least squares (OLS) is the most common estimator. OLS estimates are commonly used to analyze both experimental and observational data.
    The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector β:
    where is a vector whose ith element is the ith observation of the dependent variable, and is a matrix whose ij element is the ith observation of the jth independent variable. The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors:
    where is the transpose of row i of the matrix It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[εi2|xi] does not depend on i. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate z that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.
  • Weighted least squares (WLS) are used when heteroscedasticity is present in the error terms of the model.
  • Generalized least squares (GLS) is an extension of the OLS method, that allows efficient estimation of β when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the ith case is inversely proportional to var(εi). This special case of GLS is called "weighted least squares". The GLS solution to an estimation problem is
    where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.

Alternative formulations

Other formulations include:

  • Iteratively reweighted least squares (IRLS) is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data. In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of β.
  • Instrumental variables regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary instrumental variables zi such that E[ziεi] = 0. If Z is the matrix of instruments, then the estimator can be given in closed form as
    Optimal instruments regression is an extension of classical IV regression to the situation where E[εi | zi] = 0.
  • Total least squares (TLS) is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free.
  • Linear Template Fit (LTF) combines a linear regression with (generalized) least squares in order to determine the best estimator. The Linear Template Fit addresses the frequent issue, when the residuals cannot be expressed analytically or are too time consuming to be evaluate repeatedly, as it is often the case in iterative minimization algorithms. In the Linear Template Fit, the residuals are estimated from the random variables and from a linear approximation of the underlying true model, while the true model needs to be provided for at least (were is the number of estimators) distinct reference values β. The true distribution is then approximated by a linear regression, and the best estimators are obtained in closed form as
    where denotes the template matrix with the values of the known or previously determined model for any of the reference values β, are the random variables (e.g. a measurement), and the matrix and the vector are calculated from the values of β. The LTF can also be expressed for Log-normal distribution distributed random variables. A generalization of the LTF is the Quadratic Template Fit, which assumes a second order regression of the model, requires predictions for at least distinct values β, and it finds the best estimator using Newton's method.
  • Percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term.

Objective function

In OLS (i.e., assuming unweighted observations), the optimal value of the objective function is found by substituting the optimal expression for the coefficient vector:

where , the latter equality holding since is symmetric and idempotent. It can be shown from this that under an appropriate assignment of weights the expected value of S is m − n. If instead unit weights are assumed, the expected value of S is , where is the variance of each observation.

If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a chi-squared () distribution with m − n degrees of freedom. Some illustrative percentile values of are given in the following table.

10 9.34 18.3 23.2
25 24.3 37.7 44.3
100 99.3 124 136

These values can be used for a statistical criterion as to the goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation.

For WLS, the ordinary objective function above is replaced for a weighted average of residuals.

Discussion

In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system.

Mathematically, linear least squares is the problem of approximately solving an overdetermined system of linear equations A x = b, where b is not an element of the column space of the matrix A. The approximate solution is realized as an exact solution to A x = b', where b' is the projection of b onto the column space of A. The best approximation is then that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called linear least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are convex and have a closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast, non-linear least squares problems generally must be solved by an iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator.

In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of regression analysis. One basic form of such a model is an ordinary least squares model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned. See outline of regression analysis for an outline of the topic.

Properties

If the experimental errors, , are uncorrelated, have a mean of zero and a constant variance, , the Gauss–Markov theorem states that the least-squares estimator, , has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical distribution function of the errors. In other words, the distribution function of the errors need not be a normal distribution. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.

For example, it is easy to show that the arithmetic mean of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.

However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a maximum likelihood estimator.

These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.

Limitations

An assumption underlying the treatment given above is that the independent variable, x, is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case, total least squares or more generally errors-in-variables models, or rigorous least squares, should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.

In some cases the (weighted) normal equations matrix XTX is ill-conditioned. When fitting polynomials the normal equations matrix is a Vandermonde matrix. Vandermonde matrices become increasingly ill-conditioned as the order of the matrix increases. In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate. Various regularization techniques can be applied in such cases, the most common of which is called ridge regression. If further information about the parameters is known, for example, a range of possible values of , then various techniques can be used to increase the stability of the solution. For example, see constrained least squares.

Another drawback of the least squares estimator is the fact that the norm of the residuals, is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter , e.g., a small value of . However, since the true parameter is necessarily unknown, this quantity cannot be directly minimized. If a prior probability on is known, then a Bayes estimator can be used to minimize the mean squared error, . The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as Stein's phenomenon. For example, if the measurement error is Gaussian, several estimators are known which dominate, or outperform, the least squares technique; the best known of these is the James–Stein estimator. This is an example of more general shrinkage estimators that have been applied to regression problems.

Applications

Uses in data fitting

The primary application of linear least squares is in data fitting. Given a set of m data points consisting of experimentally measured values taken at m values of an independent variable ( may be scalar or vector quantities), and given a model function with it is desired to find the parameters such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters so

Here, the functions may be nonlinear with respect to the variable x.

Ideally, the model function fits the data exactly, so

for all This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the residuals
so to minimize the function

After substituting for and then for , this minimization problem becomes the quadratic minimization problem above with

and the best fit can be found by solving the normal equations.

Example

A plot of the data points (in red), the least squares line of best fit (in blue), and the residuals (in green)

A hypothetical researcher conducts an experiment and obtains four data points: and (shown in red in the diagram on the right). Because of exploratory data analysis or prior knowledge of the subject matter, the researcher suspects that the -values depend on the -values systematically. The -values are assumed to be exact, but the -values contain some uncertainty or "noise", because of the phenomenon being studied, imperfections in the measurements, etc.

Fitting a line

One of the simplest possible relationships between and is a line . The intercept and the slope are initially unknown. The researcher would like to find values of and that cause the line to pass through the four data points. In other words, the researcher would like to solve the system of linear equations

With four equations in two unknowns, this system is overdetermined. There is no exact solution. To consider approximate solutions, one introduces residuals , , , into the equations:
The th residual is the misfit between the th observation and the th prediction :
Among all approximate solutions, the researcher would like to find the one that is "best" in some sense.

In least squares, one focuses on the sum of the squared residuals:

The best solution is defined to be the one that minimizes with respect to and . The minimum can be calculated by setting the partial derivatives of to zero:
These normal equations constitute a system of two linear equations in two unknowns. The solution is and , and the best-fit line is therefore . The residuals are and (see the diagram on the right). The minimum value of the sum of squared residuals is

This calculation can be expressed in matrix notation as follows. The original system of equations is , where

Intuitively,
More rigorously, if is invertible, then the matrix represents orthogonal projection onto the column space of . Therefore, among all vectors of the form , the one closest to is . Setting
it is evident that is a solution.

Fitting a parabola

The result of fitting a quadratic function (in blue) through a set of data points (in red). In linear least squares the function need not be linear in the argument but only in the parameters that are determined to give the best fit.

Suppose that the hypothetical researcher wishes to fit a parabola of the form . Importantly, this model is still linear in the unknown parameters (now just ), so linear least squares still applies. The system of equations incorporating residuals is

The sum of squared residuals is

There is just one partial derivative to set to 0:
The solution is , and the fit model is .

In matrix notation, the equations without residuals are again , where now

By the same logic as above, the solution is

Fitting other curves and surfaces

More generally, one can have regressors , and a linear model

GNU

From Wikipedia, the free encyclopedia
GNU
Debian GNU/Hurd with Xfce4 and web browser Midori
DeveloperCommunity
Written inVarious (notably C and assembly language)
OS familyUnix-like
Working stateCurrent
Source modelFree software
Latest preview0.401 (1 April 2011) [±] R
Marketing targetPersonal computers, mobile devices, embedded devices, servers, mainframes, supercomputers
PlatformsIA-32 (with Hurd kernel only) and Alpha, ARC, ARM, AVR32, Blackfin, C6x, ETRAX CRIS, FR-V, H8/300, Hexagon, Itanium, M32R, m68k, META, MicroBlaze, MIPS, MN103, OpenRISC, PA-RISC, PowerPC, s390, S+core, SuperH, SPARC, TILE64, Unicore32, x86, Xtensa (with Linux-libre kernel only)
Kernel typeMicrokernel (GNU Hurd) or Monolithic kernel (GNU Linux-libre, fork of Linux)
UserlandGNU
LicenseGNU GPL, GNU LGPL, GNU AGPL, GNU FDL, GNU FSDG
Official websitegnu.org

GNU is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popularly known as Linux. Most of GNU is licensed under the GNU Project's own General Public License (GPL).

Richard Stallman, founder of the GNU project

GNU is also the project within which the free software concept originated. Richard Stallman, the founder of the project, views GNU as a "technical means to a social end". Relatedly, Lawrence Lessig states in his introduction to the second edition of Stallman's book Free Software, Free Society that in it Stallman has written about "the social aspects of software and how Free Software can create community and social justice".

Name

GNU is a recursive acronym for "GNU's Not Unix!", chosen because GNU's design is Unix-like, but differs from Unix by being free software and containing no Unix code. Stallman chose the name by using various plays on words, including the song The Gnu.

History

Development of the GNU operating system was initiated by Richard Stallman while he worked at MIT Artificial Intelligence Laboratory. It was called the GNU Project, and was publicly announced on September 27, 1983, on the net.unix-wizards and net.usoft newsgroups by Stallman. Software development began on January 5, 1984, when Stallman quit his job at the Lab so that they could not claim ownership or interfere with distributing GNU components as free software.

The goal was to bring a completely free software operating system into existence. Stallman wanted computer users to be free to study the source code of the software they use, share software with other people, modify the behavior of software, and publish their modified versions of the software. This philosophy was published as the GNU Manifesto in March 1985.

Richard Stallman's experience with the Incompatible Timesharing System (ITS), an early operating system written in assembly language that became obsolete due to discontinuation of PDP-10, the computer architecture for which ITS was written, led to a decision that a portable system was necessary. It was thus decided that the development would be started using C and Lisp as system programming languages, and that GNU would be compatible with Unix. At the time, Unix was already a popular proprietary operating system. The design of Unix was modular, so it could be reimplemented piece by piece.

Much of the needed software had to be written from scratch, but existing compatible third-party free software components were also used such as the TeX typesetting system, the X Window System, and the Mach microkernel that forms the basis of the GNU Mach core of GNU Hurd (the official kernel of GNU). With the exception of the aforementioned third-party components, most of GNU has been written by volunteers; some in their spare time, some paid by companies, educational institutions, and other non-profit organizations. In October 1985, Stallman set up the Free Software Foundation (FSF). In the late 1980s and 1990s, the FSF hired software developers to write the software needed for GNU.

As GNU gained prominence, interested businesses began contributing to development or selling GNU software and technical support. The most prominent and successful of these was Cygnus Solutions, now part of Red Hat.

Components

The system's basic components include the GNU Compiler Collection (GCC), the GNU C library (glibc), and GNU Core Utilities (coreutils), but also the GNU Debugger (GDB), GNU Binary Utilities (binutils), and the GNU Bash shell. GNU developers have contributed to Linux ports of GNU applications and utilities, which are now also widely used on other operating systems such as BSD variants, Solaris and macOS.

Many GNU programs have been ported to other operating systems, including proprietary platforms such as Microsoft Windows and macOS. GNU programs have been shown to be more reliable than their proprietary Unix counterparts.

As of January 2022, there are a total of 459 GNU packages (including decommissioned, 383 excluding) hosted on the official GNU development site.

GNU as an operating system

In its original meaning, and one still common in hardware engineering, the operating system is a basic set of functions to control the hardware and manage things like task scheduling and system calls. In modern terminology used by software developers, the collection of these functions is usually referred to as a kernel, while an 'operating system' is expected to have a more extensive set of programmes. The GNU project maintains two kernels itself, allowing the creation of pure GNU operating systems, but the GNU toolchain is also used with non-GNU kernels. Due to the two different definitions of the term 'operating system', there is an ongoing debate concerning the naming of distributions of GNU packages with a non-GNU kernel. (See below.)

With kernels maintained by GNU and FSF

Parabola GNU/Linux-libre, an example of an FSF approved distribution that uses a rolling release model

GNU Hurd

The original kernel of GNU Project is the GNU Hurd microkernel, which was the original focus of the Free Software Foundation (FSF).

With the April 30, 2015 release of the Debian GNU/Hurd 2015 distro, GNU now provides all required components to assemble an operating system that users can install and use on a computer.

However, the Hurd kernel is not yet considered production-ready but rather a base for further development and non-critical application usage.

Linux-libre

As of 2012, a fork of the Linux kernel became officially part of the GNU Project in the form of Linux-libre, a variant of Linux with all proprietary components removed. The GNU Project has endorsed Linux-libre distributions, such as gNewSense, Trisquel and Parabola GNU/Linux-libre.

With non-GNU kernels

gNewSense, an example of an FSF approved distribution

Because of the development status of Hurd, GNU is usually paired with other kernels such as Linux or FreeBSD. Whether the combination of GNU libraries with external kernels is a GNU operating system with a kernel (e.g. GNU with Linux), because the GNU collection renders the kernel into a usable operating system as understood in modern software development, or whether the kernel is an operating system unto itself with a GNU layer on top (i.e. Linux with GNU), because the kernel can operate a machine without GNU, is a matter of ongoing debate. The FSF maintains that an operating system built using the Linux kernel and GNU tools and utilities should be considered a variant of GNU, and promotes the term GNU/Linux for such systems (leading to the GNU/Linux naming controversy). This view is not exclusive to the FSF Notably, Debian, one of the biggest and oldest Linux distributions, refers to itself as Debian GNU/Linux.

Copyright, GNU licenses, and stewardship

The GNU Project recommends that contributors assign the copyright for GNU packages to the Free Software Foundation, though the Free Software Foundation considers it acceptable to release small changes to an existing project to the public domain. However, this is not required; package maintainers may retain copyright to the GNU packages they maintain, though since only the copyright holder may enforce the license used (such as the GNU GPL), the copyright holder in this case enforces it rather than the Free Software Foundation.

For the development of needed software, Stallman wrote a license called the GNU General Public License (first called Emacs General Public License), with the goal to guarantee users freedom to share and change free software. Stallman wrote this license after his experience with James Gosling and a program called UniPress, over a controversy around software code use in the GNU Emacs program.  For most of the 80s, each GNU package had its own license: the Emacs General Public License, the GCC General Public License, etc. In 1989, FSF published a single license they could use for all their software, and which could be used by non-GNU projects: the GNU General Public License (GPL).

This license is now used by most of GNU software, as well as a large number of free software programs that are not part of the GNU Project; it also historically has been the most commonly used free software license (though recently challenged by the MIT license). It gives all recipients of a program the right to run, copy, modify and distribute it, while forbidding them from imposing further restrictions on any copies they distribute. This idea is often referred to as copyleft.

In 1991, the GNU Lesser General Public License (LGPL), then known as the Library General Public License, was written for the GNU C Library to allow it to be linked with proprietary software. 1991 also saw the release of version 2 of the GNU GPL. The GNU Free Documentation License (FDL), for documentation, followed in 2000. The GPL and LGPL were revised to version 3 in 2007, adding clauses to protect users against hardware restrictions that prevent users from running modified software on their own devices.

Besides GNU's packages, the GNU Project's licenses can and are used by many unrelated projects, such as the Linux kernel, often used with GNU software. A majority of free software such as the X Window System, is licensed under permissive free software licenses.

The original GNU logo, drawn by Etienne Suvasa
Anniversary logo

The logo for GNU is a gnu head. Originally drawn by Etienne Suvasa, a bolder and simpler version designed by Aurelio Heckert is now preferred. It appears in GNU software and in printed and electronic documentation for the GNU Project, and is also used in Free Software Foundation materials.

There was also a modified version of the official logo. It was created by the Free Software Foundation in September 2013 in order to commemorate the 30th anniversary of the GNU Project.

Free software movement

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Free_software_movement

The free software movement is a social movement with the goal of obtaining and guaranteeing certain freedoms for software users, namely the freedoms to run, study, modify, and share copies of software. Software which meets these requirements, The Four Essential Freedoms of Free Software, is termed free software.

Although drawing on traditions and philosophies among members of the 1970s hacker culture and academia, Richard Stallman formally founded the movement in 1983 by launching the GNU Project. Stallman later established the Free Software Foundation in 1985 to support the movement.

Philosophy

The philosophy of the Free Software Movement is based on promoting collaboration between programmers and computer users. This process necessitates the rejection of proprietary software and the promotion of free software. Stallman notes that this action would not hinder the progression of technology, as he states, "Wasteful duplication of system programming effort will be avoided. This effort can go instead into advancing the state of the art."

Members of the Free Software Movement believe that all software users should have the freedoms listed in The Free Software Definition. Members hold the belief that it is immoral to prohibit or prevent people from exercising these freedoms, and that they are required in creating a community where software users can help each other and have control over their technology. Regarding proprietary software, some believe that it is not strictly immoral, citing increased profitability in the business models available for proprietary software, along with technical features and convenience.

The Free Software Foundation believes all software needs free documentation, as programmers should have the ability to update manuals to reflect modifications made to the software. Within the movement, the FLOSS Manuals foundation specializes in providing such documentation.

Actions

GNU and Tux mascots around free software supporters at FISL 16

Writing and spreading free software

The core work of the free software movement is focused on software development. The free software movement also rejects proprietary software, refusing to install software that does not give them the freedoms of free software. According to Stallman, "The only thing in the software field that is worse than an unauthorised copy of a proprietary program, is an authorised copy of the proprietary program because this does the same harm to its whole community of users, and in addition, usually the developer, the perpetrator of this evil, profits from it."

Building awareness

Some supporters of the free software movement take up public speaking, or host a stall at software-related conferences to raise awareness of software freedom. This is seen as important since people who receive free software, but who are not aware that it is free software, will later accept a non-free replacement or will add software that is not free software.

Organisations

Asia

Africa

North America

South America

Europe

Australia

Legislation and government

A lot of lobbying work has been done against software patents and expansions of copyright law. Other lobbying focuses directly on the use of free software by government agencies and government-funded projects.

Asia

China

In June 1997, the Society for Study, Application, and Development of Free Software was established under the China Software Industry Association in Beijing. Through this organization, the website freesoft.cei.gov.cn was developed, though the website is currently inaccessible on IP addresses located in the United States. The use of open-source software Linux in China has moved beyond government and educational institutions and has extended to other organizations such as financial institutions, telecommunications, and public security. Several Chinese researchers and scholars have claimed that the existence of FOSS in China has been important in challenging the presence of Microsoft, which Guangnan Ni, a member of the Chinese Academy of Engineering stated, "The monopoly of (Microsoft Windows) is even more powerful in China than other places in the world". Yi Zhou, a professor of mathematics at Fudan University, has also alleged that, "Government procurement of FLOSS for a number of years in China has compelled Microsoft to cut its prices of Office software substantially" 

India

Government of India had issued Policy on Adoption of Open Source Software for Government of India in 2015 to drive uptake within the government. With the vision to transform India as a Software Product Nation, National Policy on Software Products-2019 was approved by the Government.

Pakistan

Free and Open Source Software (Foss) is crucial for countries such as Pakistan which is set up by Union of Information Technology. For the case of Pakistan, Pakistan Software Export Board (PSEB) aids in the creation and advocate of FOSS usage in various government departments in addition to curbing illegality of copying that is software piracy.Promotion of adoption of FOSS is essential however it comes with problems of proprietary anti competition software practices including indulging in bribing and corruption by government departments.Pakistan works on the introduction  of usage of open type  basis of source Solutions in the curricula  in schools and colleges. This is because of FOSS uniqueness in terms of political, democratic and social varieties of aspect regarding  information communication and technology.

North America

United States

In the United States, there have been efforts to pass legislation at the state level encouraging the use of free software by state government agencies.

On January 11, 2022, two bills were shown on the New Hampshire legislating floor. The first bill called “HB 1273” was introduced by Democratic New Hampshire representative Eric Gallager, the bill prioritized “replacing proprietary software used by state agencies with free software.” Gallager stated that to an extent, the proposed legislation will help distinguish “free software" and “open-source software”, this will also put these two into state regulation. The second bill called “HB 1581” was proposed by Grafton Republican representative Lex Berezhny. The bill would’ve restored a requisite forcing “state agencies to use proprietary software” and as Lex put it, “when it is the most effective solution.” He also said that requisite was happening between 2012 and 2018. According to the Concord Monitor, the state of New Hampshire had an already “thriving open source software community” with a view of “live free or die” but they had difficulty getting that notion with the state.

South America

Peru

Congressmen Edgar David Villanueva and Jacques Rodrich Ackerman have been instrumental in introducing free software in Peru, with bill 1609 on "Free Software in Public Administration". The incident invited the attention of Microsoft, Peru, whose general manager wrote a letter to Villanueva. His response received worldwide attention and is seen as a classic piece of argumentation favouring use of free software in governments.

Uruguay

Uruguay has a sanctioned law requiring that the state give priority to free software. It also requires that information be exchanged in open formats.

Venezuela

The Government of Venezuela implemented a free software law in January 2006. Decree No. 3,390 mandated all government agencies to migrate to free software over a two-year period.

Europe

Publiccode.eu is a campaign launched demanding a legislation requiring that publicly financed software developed for the public sector be made publicly available under a Free and Open Source Software licence. If it is public money, it should be public code as well.

France

The French Gendarmerie and the French National Assembly utilize the open source operating system Linux.

United Kingdom

Gov.uk keeps a list of "key components, tools and services that have gone into the construction of GOV.UK".

Events

Free Software events happening all around the world connects people to increase visibility for Free software projects and foster collaborations.

Economics

The free software movement has been extensively analyzed using economic methodologies, including perspectives from heterodox economics. Of particular interest to economists is the willingness of programmers in the free software movement to work, often producing higher-quality than proprietary programmers, without financial compensation.

In his 1998 article "The High-Tech Gift Economy", Richard Barbrook suggested that the then-nascent free software movement represented a return to the gift economy building on hobbyism and the absence of economic scarcity on the internet.

Gabriella Coleman has emphasized the importance of accreditation, respect, and honour within the free software community as a form of compensation for contributions to projects, over and against financial motivations.

The Swedish Marxian economist Johan Söderberg has argued that the free software movement represents a complete alternative to capitalism that may be expanded to create a post-work society. He argues that the combination of a manipulation of intellectual property law and private property to make goods available to the public and a thorough blend between labor and fun make the free software movement a communist economy.

Subgroups and schisms

Since its inception, there is an ongoing contension between the many FLOSS organizations (FSF, OSI, Debian, Mozilla Foundation, Apache Foundation, etc.) within the free software movement, with the main conflicts centered around the organization's needs for compromise and pragmatism rather than adhering to founding values and philosophies.

Open source

The Open Source Initiative (OSI) was founded in February 1998 by Eric Raymond and Bruce Perens to promote the term "open-source software" as an alternative term for free software. The OSI aimed to address the perceived shortcomings and ambiguity of the term "free software", as well as shifting the focus of free software from a social and ethical issue to instead emphasize open source as a superior model for software development. The latter became the view of Eric Raymond and Linus Torvalds, while Bruce Perens argued that open source was meant to popularize free software under a new brand and called for a return to basic ethical principles.

Some free software advocates use the terms "Free and Open-Source Software" (FOSS) or "Free/Libre and Open-Source Software" (FLOSS) as a form of inclusive compromise, which brings free and open-source software advocates together to work on projects cohesively. Some users believe this is an ideal solution in order to promote both the user's freedom with the software and the pragmatic efficiency of an open-source development model. This view is reinforced by fact that majority of OSI-approved licenses and self-avowed open-source programs are also compatible with the free software formalisms and vice versa.

While free and open source software are often linked together, they offer two separate ideas and values. Richard Stallman has referred to open source as "a non-movement", as it "does not campaign for anything".

"Open source" addresses software being open as a practical question rather than an ethical dilemma – non-free software is not the best solution but nonetheless a solution. The free software movement views free software as a moral imperative: that proprietary software should be rejected, and that only free software should be developed and taught in order to make computing technology beneficial to the general public.

Although the movements have differing values and goals, collaborations between the Free Software Movement and Open Source Initiative have taken place when it comes to practical projects. By 2005, Richard Glass considered the differences to be a "serious fracture" but "vitally important to those on both sides of the fracture" and "of little importance to anyone else studying the movement from a software engineering perspective" since they have had "little effect on the field".

Criticism and controversy

Principle compromises

Eric Raymond criticises the speed at which the free software movement is progressing, suggesting that temporary compromises should be made for long-term gains. Raymond argues that this could raise awareness of the software and thus increase the free software movement's influence on relevant standards and legislation.

Richard Stallman, on the other hand, sees the current level of compromise as a greater cause for worry.

Programmer income

Stallman said that this is where people get the misconception of "free": there is no wrong in programmers' requesting payment for a proposed project, or charging for copies of free software. Restricting and controlling the user's decisions on use is the actual violation of freedom. Stallman defends that in some cases, monetary incentive is not necessary for motivation since the pleasure in expressing creativity is a reward in itself. Conversely, Stallman admits that it is not easy to raise money for free software projects.

"Viral" copyleft licensing

The free software movement champions copyleft licensing schema (often pejoratively called "viral licenses"). In its strongest form, copyleft mandates that any works derived from copyleft-licensed software must also carry a copyleft license, so the license spreads from work to work like a computer virus might spread from machine to machine. Stallman has previously stated his opposition to describing the GNU GPL as "viral". These licensing terms can only be enforced through asserting copyrights.

Critics of copyleft licensing challenge the idea that restricting modifications is in line with the free software movement's emphasis on various "freedoms", especially when alternatives like MIT, BSD, and Apache licenses are more permissive. Proponents enjoy the assurance that copylefted work cannot usually be incorporated into non-free software projects. They emphasize that copyleft licenses may not attach for all uses and that in any case, developers can simply choose not to use copyleft-licensed software.

License proliferation and compatibility

FLOSS license proliferation is a serious concern in the FLOSS domain due to increased complexity of license compatibility considerations which limits and complicates source code reuse between FLOSS projects. The OSI and the FSF maintain their own lists of dozens of existing and acceptable FLOSS licenses. There is an agreement among most that the creation of new licenses should be minimized and those created should be made compatible with the major existing FLOSS licenses. Therefore, there was a strong controversy around the update of the GNU GPLv2 to the GNU GPLv3 in 2007, as the updated license is not compatible with the previous version. Several projects (mostly of the open source faction like the Linux kernel) decided to not adopt the GPLv3 while almost all of the GNU project's packages adopted it.

Tubulin

From Wikipedia, the free encyclopedia
 
Tubulin
kif1a head-microtubule complex structure in atp-form

Tubulin in molecular biology can refer either to the tubulin protein superfamily of globular proteins, or one of the member proteins of that superfamily. α- and β-tubulins polymerize into microtubules, a major component of the eukaryotic cytoskeleton. Microtubules function in many essential cellular processes, including mitosis. Tubulin-binding drugs kill cancerous cells by inhibiting microtubule dynamics, which are required for DNA segregation and therefore cell division.

In eukaryotes, there are six members of the tubulin superfamily, although not all are present in all species. Both α and β tubulins have a mass of around 50 kDa and are thus in a similar range compared to actin (with a mass of ~42 kDa). In contrast, tubulin polymers (microtubules) tend to be much bigger than actin filaments due to their cylindrical nature.

Tubulin was long thought to be specific to eukaryotes. More recently, however, several prokaryotic proteins have been shown to be related to tubulin.

Characterization

Tubulin is characterized by the evolutionarily conserved Tubulin/FtsZ family, GTPase protein domain.

This GTPase protein domain is found in all eukaryotic tubulin chains, as well as the bacterial protein TubZ, the archaeal protein CetZ, and the FtsZ protein family widespread in bacteria and archaea.

Function

Microtubules

Tubulin and Microtubule Metrics Infographic
Tubulin and microtubule metrics 

α- and β-tubulin polymerize into dynamic microtubules. In eukaryotes, microtubules are one of the major components of the cytoskeleton, and function in many processes, including structural support, intracellular transport, and DNA segregation.

Comparison of the architectures of a 5-protofilament bacterial microtubule (left; BtubA in dark blue; BtubB in light-blue) and a 13-protofilament eukaryotic microtubule (right; α-tubulin in white; β-tubulin in black). Seams and start-helices are indicated in green and red, respectively.

Microtubules are assembled from dimers of α- and β-tubulin. These subunits are slightly acidic, with an isoelectric point between 5.2 and 5.8. Each has a molecular weight of approximately 50 kDa.

To form microtubules, the dimers of α- and β-tubulin bind to GTP and assemble onto the (+) ends of microtubules while in the GTP-bound state. The β-tubulin subunit is exposed on the plus end of the microtubule, while the α-tubulin subunit is exposed on the minus end. After the dimer is incorporated into the microtubule, the molecule of GTP bound to the β-tubulin subunit eventually hydrolyzes into GDP through inter-dimer contacts along the microtubule protofilament. The GTP molecule bound to the α-tubulin subunit is not hydrolyzed during the whole process. Whether the β-tubulin member of the tubulin dimer is bound to GTP or GDP influences the stability of the dimer in the microtubule. Dimers bound to GTP tend to assemble into microtubules, while dimers bound to GDP tend to fall apart; thus, this GTP cycle is essential for the dynamic instability of the microtubule.

Bacterial microtubules

Homologs of α- and β-tubulin have been identified in the Prosthecobacter genus of bacteria. They are designated BtubA and BtubB to identify them as bacterial tubulins. Both exhibit homology to both α- and β-tubulin. While structurally highly similar to eukaryotic tubulins, they have several unique features, including chaperone-free folding and weak dimerization. Cryogenic electron microscopy showed that BtubA/B forms microtubules in vivo, and suggested that these microtubules comprise only five protofilaments, in contrast to eukaryotic microtubules, which usually contain 13. Subsequent in vitro studies have shown that BtubA/B forms four-stranded 'mini-microtubules'.

DNA segregation

Cell division

Prokaryotic division

FtsZ is found in nearly all Bacteria and Archaea, where it functions in cell division, localizing to a ring in the middle of the dividing cell and recruiting other components of the divisome, the group of proteins that together constrict the cell envelope to pinch off the cell, yielding two daughter cells. FtsZ can polymerize into tubes, sheets, and rings in vitro, and forms dynamic filaments in vivo.

TubZ functions in segregating low copy-number plasmids during bacterial cell division. The protein forms a structure unusual for a tubulin homolog; two helical filaments wrap around one another. This may reflect an optimal structure for this role since the unrelated plasmid-partitioning protein ParM exhibits a similar structure.

Cell shape

CetZ functions in cell shape changes in pleomorphic Haloarchaea. In Haloferax volcanii, CetZ forms dynamic cytoskeletal structures required for differentiation from a plate-shaped cell form into a rod-shaped form that exhibits swimming motility.

Types

Eukaryotic

The tubulin superfamily contains six families (alpha-(α), beta-(β), gamma-(γ), delta-(δ), epsilon-(ε), and zeta-(ζ) tubulins).

α-Tubulin

Human α-tubulin subtypes include:

β-Tubulin

β-tubulin in Tetrahymena sp.

All drugs that are known to bind to human tubulin bind to β-tubulin. These include paclitaxel, colchicine, and the vinca alkaloids, each of which have a distinct binding site on β-tubulin.

In addition, several anti-worm drugs preferentially target the colchicine site of β-Tubulin in worm rather than in higher eukaryotes. While mebendazole still retains some binding affinity to human and Drosophila β-tubulin, albendazole almost exclusively binds to the β-tubulin of worms and other lower eukaryotes.

Class III β-tubulin is a microtubule element expressed exclusively in neurons, and is a popular identifier specific for neurons in nervous tissue. It binds colchicine much more slowly than other isotypes of β-tubulin.

β1-tubulin, sometimes called class VI β-tubulin, is the most divergent at the amino acid sequence level. It is expressed exclusively in megakaryocytes and platelets in humans and appears to play an important role in the formation of platelets. When class VI β-tubulin were expressed in mammalian cells, they cause disruption of microtubule network, microtubule fragment formation, and can ultimately cause marginal-band like structures present in megakaryocytes and platelets.

Katanin is a protein complex that severs microtubules at β-tubulin subunits, and is necessary for rapid microtubule transport in neurons and in higher plants.

Human β-tubulins subtypes include:

γ-Tubulin

Γ-tubulin ring complex (γ-TuRC)

γ-Tubulin, another member of the tubulin family, is important in the nucleation and polar orientation of microtubules. It is found primarily in centrosomes and spindle pole bodies, since these are the areas of most abundant microtubule nucleation. In these organelles, several γ-tubulin and other protein molecules are found in complexes known as γ-tubulin ring complexes (γ-TuRCs), which chemically mimic the (+) end of a microtubule and thus allow microtubules to bind. γ-tubulin also has been isolated as a dimer and as a part of a γ-tubulin small complex (γTuSC), intermediate in size between the dimer and the γTuRC. γ-tubulin is the best understood mechanism of microtubule nucleation, but certain studies have indicated that certain cells may be able to adapt to its absence, as indicated by mutation and RNAi studies that have inhibited its correct expression. Besides forming a γ-TuRC to nucleate and organize microtubules, γ-tubulin can polymerize into filaments that assemble into bundles and meshworks.

Human γ-tubulin subtypes include:

Members of the γ-tubulin ring complex:

δ and ε-Tubulin

Delta (δ) and epsilon (ε) tubulin have been found to localize at centrioles and may play a role in centriole structure and function, though neither is as well-studied as the α- and β- forms.

Human δ- and ε-tubulin genes include:

ζ-Tubulin

Zeta-tubulin (IPR004058) is present in many eukaryotes, but missing from others, including placental mammals. It has been shown to be associated with the basal foot structure of centrioles in multiciliated epithelial cells.

Prokaryotic

BtubA/B

BtubA (Q8GCC5) and BtubB (Q8GCC1) are found in some bacterial species in the Verrucomicrobiota genus Prosthecobacter. Their evolutionary relationship to eukaryotic tubulins is unclear, although they may have descended from a eukaryotic lineage by lateral gene transfer. Compared to other bacterial homologs, they are much more similar to eukaryotic tubulins. In an assembled structure, BtubB acts like α-tubulin and BtubA acts like β-tubulin.

FtsZ

Many bacterial and euryarchaeotal cells use FtsZ to divide via binary fission. All chloroplasts and some mitochondria, both organelles derived from endosymbiosis of bacteria, also use FtsZ. It was the first prokaryotic cytoskeletal protein identified.

TubZ

TubZ (Q8KNP3; pBt156) was identified in Bacillus thuringiensis as essential for plasmid maintenance. It binds to a DNA-binding protein called TubR (Q8KNP2; pBt157) to pull the plasmid around.

CetZ

CetZ (D4GVD7) is found in the euryarchaeal clades of Methanomicrobia and Halobacteria, where it functions in cell shape differentiation.

Phage tubulins

Phages of the genus Phikzlikevirus, as well as a Serratia phage PCH45, use a shell protein (Q8SDA8) to build a nucleus-like structure called the phage nucleus. This structure encloses DNA as well as replication and transcription machinery. It protects phage DNA from host defenses like restriction enzymes and type I CRISPR-Cas systems. A spindle-forming tubulin, variously named PhuZ (B3FK34) and gp187, centers the nucleus in the cell.

Odinarchaeota tubulin

Asgard archaea tubulin from hydrothermal-living Odinarchaeota (OdinTubulin) was identified as a genuine tubulin. OdinTubulin forms protomers and protofilaments most similar to eukaryotic microtubules, yet assembles into ring systems more similar to FtsZ, indicating that OdinTubulin may represent an evolution intermediate between FtsZ and microtubule-forming tubulins. 

Pharmacology

Tubulins are targets for anticancer drugs such as vinblastine and vincristine, and paclitaxel. The anti-worm drugs mebendazole and albendazole as well as the anti-gout agent colchicine bind to tubulin and inhibit microtubule formation. While the former ultimately lead to cell death in worms, the latter arrests neutrophil motility and decreases inflammation in humans. The anti-fungal drug griseofulvin targets microtubule formation and has applications in cancer treatment.

Post-translational modifications

When incorporated into microtubules, tubulin accumulates a number of post-translational modifications, many of which are unique to these proteins. These modifications include detyrosination, acetylation, polyglutamylation, polyglycylation, phosphorylation, ubiquitination, sumoylation, and palmitoylation. Tubulin is also prone to oxidative modification and aggregation during, for example, acute cellular injury.

Nowadays there are many scientific investigations of the acetylation done in some microtubules, specially the one by α-tubulin N-acetyltransferase (ATAT1) which is being demonstrated to play an important role in many biological and molecular functions and, therefore, it is also associated with many human diseases, specially neurological diseases.

Introduction to entropy

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Introduct...