Bio-inspired computing, short for biologically inspired computing, is a field of study which seeks to solve computer science problems using models of biology. It relates to connectionism, social behavior, and emergence. Within computer science, bio-inspired computing relates to artificial intelligence and machine learning. Bio-inspired computing is a major subset of natural computation.
History
Early Ideas
The ideas behind biological computing trace back to 1936 and the
first description of an abstract computer, which is now known as a Turing machine. Turing
firstly described the abstract construct using a biological specimen.
Turing imagined a mathematician that has three important attributes. He always has a pencil with an eraser, an unlimited number of papers
and a working set of eyes. The eyes allow the mathematician to see and
perceive any symbols written on the paper while the pencil allows him to
write and erase any symbols that he wants. Lastly, the unlimited paper
allows him to store anything he wants memory. Using these ideas he was
able to describe an abstraction of the modern digital computer. However
Turing mentioned that anything that can perform these functions can be
considered such a machine and he even said that even electricity should
not be required to describe digital computation and machine thinking in
general.
Neural Networks
First described in 1943 by Warren McCulloch and Walter Pitts,
neural networks are a prevalent example of biological systems inspiring
the creation of computer algorithms. They first mathematically described that a system of simplistic neurons was able to produce simple logical operations such as logical conjunction, disjunction and negation.
They further showed that a system of neural networks can be used to
carry out any calculation that requires finite memory. Around 1970 the
research around neural networks slowed down and many consider a 1969 book by Marvin Minsky and Seymour Papert as the main cause. Their book showed that neural network models were able only model
systems that are based on Boolean functions that are true only after a
certain threshold value. Such functions are also known as threshold functions.
The book also showed that a large amount of systems cannot be
represented as such meaning that a large amount of systems cannot be
modeled by neural networks. Another book by James Rumelhart and David
McClelland in 1986 brought neural networks back to the spotlight by
demonstrating the linear back-propagation algorithm something that
allowed the development of multi-layered neural networks that did not
adhere to those limits.
Ant Colonies
Douglas Hofstadter in 1979 described an idea of a biological
system capable of performing intelligent calculations even though the
individuals comprising the system might not be intelligent. More specifically, he gave the example of an ant colony that can carry
out intelligent tasks together but each individual ant cannot exhibiting
something called "emergent behavior."
Azimi et al. in 2009 showed that what they described as the "ant
colony" algorithm, a clustering algorithm that is able to output the
number of clusters and produce highly competitive final clusters
comparable to other traditional algorithms. Lastly Hölder and Wilson in 2009 concluded using historical data that
ants have evolved to function as a single "superogranism" colony. A very important result since it suggested that group selection evolutionary algorithms coupled together with algorithms similar to the "ant colony" can be potentially used to develop more powerful algorithms.
Areas of research
Some areas of study in biologically inspired computing, and their biological counterparts:
Bio-inspired
computing can be used to train a virtual insect. The insect is trained
to navigate in an unknown terrain for finding food equipped with six
simple rules:
turn right for target-and-obstacle left;
turn left for target-and-obstacle right;
turn left for target-left-obstacle-right;
turn right for target-right-obstacle-left;
turn left for target-left without obstacle;
turn right for target-right without obstacle.
The virtual insect controlled by the trained spiking neural network can find food after training in any unknown terrain. After several generations of rule application it is usually the case that some forms of complex behaviour emerge.
Complexity gets built upon complexity until the result is something
markedly complex, and quite often completely counterintuitive from what
the original rules would be expected to produce (see complex systems). For this reason, when modeling the neural network, it is necessary to accurately model an in vivo
network, by live collection of "noise" coefficients that can be used to
refine statistical inference and extrapolation as system complexity
increases.
Natural evolution is a good analogy to this method–the rules of evolution (selection, recombination/reproduction, mutation and more recently transposition)
are in principle simple rules, yet over millions of years have produced
remarkably complex organisms. A similar technique is used in genetic algorithms.
Brain-inspired computing
Brain-inspired
computing refers to computational models and methods that are mainly
based on the mechanism of the brain, rather than completely imitating
the brain. The goal is to enable the machine to realize various
cognitive abilities and coordination mechanisms of human beings in a
brain-inspired manner, and finally achieve or exceed Human intelligence
level.
Research
Artificial intelligence
researchers are now aware of the benefits of learning from the brain
information processing mechanism. And the progress of brain science and
neuroscience also provides the necessary basis for artificial
intelligence to learn from the brain information processing mechanism.
Brain and neuroscience researchers are also trying to apply the
understanding of brain information processing to a wider range of
science field. The development of the discipline benefits from the push
of information technology and smart technology and in turn brain and
neuroscience will also inspire the next generation of the transformation
of information technology.
The influence of brain science on Brain-inspired computing
Advances
in brain and neuroscience, especially with the help of new technologies
and new equipment, support researchers to obtain multi-scale,
multi-type biological evidence of the brain through different
experimental methods, and are trying to reveal the structure of
bio-intelligence from different aspects and functional basis. From the
microscopic neurons, synaptic working mechanisms and their
characteristics, to the mesoscopic network connection model,
to the links in the macroscopic brain interval and their synergistic
characteristics, the multi-scale structure and functional mechanisms of
brains derived from these experimental and mechanistic studies will
provide important inspiration for building a future brain-inspired
computing model.
Brain-inspired chip
Broadly
speaking, brain-inspired chip refers to a chip designed with reference
to the structure of human brain neurons and the cognitive mode of human
brain. Obviously, the "neuromorphic
chip" is a brain-inspired chip that focuses on the design of the chip
structure with reference to the human brain neuron model and its tissue
structure, which represents a major direction of brain-inspired chip
research. Along with the rise and development of “brain plans” in
various countries, a large number of research results on neuromorphic
chips have emerged, which have received extensive international
attention and are well known to the academic community and the industry.
For example, EU-backed SpiNNaker and BrainScaleS, Stanford's Neurogrid, IBM's TrueNorth, and Qualcomm's Zeroth.
TrueNorth is a brain-inspired chip that IBM has been developing
for nearly 10 years. The US DARPA program has been funding IBM to
develop pulsed neural network chips for intelligent processing since
2008. In 2011, IBM first developed two cognitive silicon prototypes by
simulating brain structures that could learn and process information
like the brain. Each neuron of a brain-inspired chip is cross-connected
with massive parallelism. In 2014, IBM released a second-generation
brain-inspired chip called "TrueNorth." Compared with the first
generation brain-inspired chips, the performance of the TrueNorth chip
has increased dramatically, and the number of neurons has increased from
256 to 1 million; the number of programmable synapses has increased
from 262,144 to 256 million; Subsynaptic operation with a total power
consumption of 70 mW and a power consumption of 20 mW per square
centimeter. At the same time, TrueNorth handles a nuclear volume of only
1/15 of the first generation of brain chips. At present, IBM has
developed a prototype of a neuron computer that uses 16 TrueNorth chips
with real-time video processing capabilities. The super-high indicators and excellence of the TrueNorth chip have
caused a great stir in the academic world at the beginning of its
release.
In 2012, the Institute of Computing Technology of the Chinese
Academy of Sciences(CAS) and the French Inria collaborated to develop
the first chip in the world to support the deep neural network processor
architecture chip "Cambrian". The technology has won the best international conferences in the field
of computer architecture, ASPLOS and MICRO, and its design method and
performance have been recognized internationally. The chip can be used
as an outstanding representative of the research direction of
brain-inspired chips.
Unclear Brain mechanism cognition
The
human brain is a product of evolution. Although its structure and
information processing mechanism are constantly optimized, compromises
in the evolution process are inevitable. The cranial nervous system is a
multi-scale structure. There are still several important problems in
the mechanism of information processing at each scale, such as the fine
connection structure of neuron scales and the mechanism of brain-scale
feedback. Therefore, even a comprehensive calculation of the number of
neurons and synapses is only 1/1000 of the size of the human brain, and
it is still very difficult to study at the current level of scientific
research. Recent advances in brain simulation linked individual variability in human cognitive processing speed and fluid intelligence to the balance of excitation and inhibition in structural brain networks, functional connectivity, winner-take-all decision-making and attractorworking memory.
Unclear Brain-inspired computational models and algorithms
In
the future research of cognitive brain computing model, it is necessary
to model the brain information processing system based on multi-scale
brain neural system data analysis results, construct a brain-inspired
multi-scale neural network computing model, and simulate multi-modality
of brain in multi-scale. Intelligent behavioral ability such as
perception, self-learning and memory, and choice. Machine learning
algorithms are not flexible and require high-quality sample data that is
manually labeled on a large scale. Training models require a lot of
computational overhead. Brain-inspired artificial intelligence still
lacks advanced cognitive ability and inferential learning ability.
Constrained Computational architecture and capabilities
Most
of the existing brain-inspired chips are still based on the research of
von Neumann architecture, and most of the chip manufacturing materials
are still using traditional semiconductor materials. The neural chip is
only borrowing the most basic unit of brain information processing. The
most basic computer system, such as storage and computational fusion,
pulse discharge mechanism, the connection mechanism between neurons,
etc., and the mechanism between different scale information processing
units has not been integrated into the study of brain-inspired computing
architecture. Now an important international trend is to develop neural
computing components such as brain memristors, memory containers, and
sensory sensors based on new materials such as nanometers, thus
supporting the construction of more complex brain-inspired computing
architectures. The development of brain-inspired computers and
large-scale brain computing systems based on brain-inspired chip
development also requires a corresponding software environment to
support its wide application.
Arcade system boards have used specialized graphics circuits since the 1970s. In early video game hardware, RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor.
In 1984, Hitachi released the ARTC HD63484, the first major CMOS graphics processor for personal computers. The ARTC could display up to 4K resolution when in monochrome mode. It was used in a number of graphics cards and terminals during the late 1980s. In 1985, the Amiga was released with a custom graphics chip including a blitter for bitmap manipulation, line drawing, and area fill. It also included a coprocessor
with its own simple instruction set, that was capable of manipulating
graphics hardware registers in sync with the video beam (e.g. for
per-scanline palette switches, sprite multiplexing, and hardware
windowing), or driving the blitter. In 1986, Texas Instruments released the TMS34010, the first fully programmable graphics processor. It could run general-purpose code but also had a graphics-oriented
instruction set. During 1990–1992, this chip became the basis of the Texas Instruments Graphics Architecture ("TIGA") Windows accelerator cards.
The IBM 8514 Micro Channel adapter, with memory add-on
In 1987, the IBM 8514 graphics system was released. It was one of the first video cards for IBM PC compatibles that implemented fixed-function 2D primitives in electronic hardware. Sharp's X68000, released in 1987, used a custom graphics chipset with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields. It served as a development machine for Capcom's CP System arcade board. Fujitsu's FM Towns computer, released in 1989, had support for a 16,777,216 color palette. In 1988, the first dedicated polygonal 3D graphics boards were introduced in arcades with the Namco System 21 and Taito Air System.
In 1991, S3 Graphics introduced the S3 86C911, which its designers named after the Porsche 911 as an indication of the performance increase it promised. The 86C911 spawned a variety of imitators: by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips. Fixed-function Windows accelerators
surpassed expensive general-purpose graphics coprocessors in Windows
performance, and such coprocessors faded from the PC market.
In the early- and mid-1990s, real-time
3D graphics became increasingly common in arcade, computer, and console
games, which led to increasing public demand for hardware-accelerated
3D graphics. Early examples of mass-market 3D graphics hardware can be
found in arcade system boards such as the Sega Model 1, Namco System 22, and Sega Model 2, and the fifth-generation video game consoles such as the Saturn, PlayStation, and Nintendo 64. Arcade systems such as the Sega Model 2 and SGIOnyx-based Namco Magic Edge Hornet Simulator in 1993 were capable of hardware T&L (transform, clipping, and lighting) years before appearing in consumer graphics cards.Another early example is the Super FX chip, a RISC-based on-cartridge graphics chip used in some SNES games, notably Doom and Star Fox. Some systems used DSPs to accelerate transformations. Fujitsu, which worked on the Sega Model 2 arcade system, began working on integrating T&L into a single LSI solution for use in home computers in 1995; the Fujitsu Pinolite, the first 3D geometry processor for personal computers, released in 1997. The first hardware T&L GPU on homevideo game consoles was the Nintendo 64's Reality Coprocessor, released in 1996. In 1997, Mitsubishi released the 3Dpro/2MP, a GPU capable of transformation and lighting, for workstations and Windows NT desktops; ATi used it for its FireGL 4000graphics card, released in 1997.
The term "GPU" was coined by Sony in reference to the 32-bit Sony GPU (designed by Toshiba) in the PlayStation video game console, released in 1994.
In the PC world, notable failed attempts for low-cost 3D graphics chips included the S3ViRGE, ATI Rage, and MatroxMystique. These chips were essentially previous-generation 2D accelerators with 3D features bolted on. Many were pin-compatible
with the earlier-generation chips for ease of implementation and
minimal cost. Initially, 3D graphics were possible only with discrete
boards dedicated to accelerating 3D functions (and lacking 2D graphical
user interface (GUI) acceleration entirely) such as the PowerVR and the 3dfxVoodoo.
However, as manufacturing technology continued to progress, video, 2D
GUI acceleration, and 3D functionality were all integrated into one
chip. Rendition's Verite chipsets were among the first to do this well. In 1997, Rendition collaborated with Hercules
and Fujitsu on a "Thriller Conspiracy" project which combined a Fujitsu
FXG-1 Pinolite geometry processor with a Vérité V2200 core to create a
graphics card with a full T&L engine years before Nvidia's GeForce 256; This card, designed to reduce the load placed upon the system's CPU, never made it to market.[citation needed] NVIDIA RIVA 128 was one of the first consumer-facing GPU integrated 3D processing unit and 2D processing unit on a chip.
OpenGL
was introduced in the early 1990s by Silicon Graphics as a professional
graphics API, with proprietary hardware support for 3D rasterization.
In 1994, Microsoft acquired Softimage, the dominant CGI movie production tool used for early CGI movie hits like Jurassic Park, Terminator 2 and Titanic.
With that deal came a strategic relationship with SGI and a commercial
license of their OpenGL libraries, enabling Microsoft to port the API to
the Windows NT OS but not to the upcoming release of Windows 95.
Although it was little known at the time, SGI had contracted with
Microsoft to transition from Unix to the forthcoming Windows NT OS;
the deal which was signed in 1995 was not announced publicly until
1998. In the intervening period, Microsoft worked closely with SGI to
port OpenGL to Windows NT. In that era, OpenGL had no standard driver
model for competing hardware accelerators to compete on the basis of
support for higher level 3D texturing and lighting functionality. In
1994 Microsoft announced DirectX 1.0 and support for gaming in the
forthcoming Windows 95 consumer OS. In 1995 Microsoft announced the acquisition of UK based Rendermorphics Ltd
and the Direct3D driver model for the acceleration of consumer 3D
graphics. The Direct3D driver model shipped with DirectX 2.0 in 1996. It
included standards and specifications for 3D chip makers to compete to
support 3D texture, lighting and Z-buffering. ATI, which was later to be
acquired by AMD, began development on the first Direct3D GPUs. Nvidia
quickly pivoted from a failed deal with Sega
in 1996 to aggressively embracing support for Direct3D. In this era
Microsoft merged their internal Direct3D and OpenGL teams and worked
closely with SGI to unify driver standards for both industrial and
consumer 3D graphics hardware accelerators. Microsoft ran annual events
for 3D chip makers called "Meltdowns" to test their 3D hardware and
drivers to work both with Direct3D and OpenGL. It was during this period
of strong Microsoft influence over 3D standards that 3D accelerator
cards moved beyond being simple rasterizers
to become more powerful general purpose processors as support for
hardware accelerated texture mapping, lighting, Z-buffering and compute
created the modern GPU. During this period the same Microsoft team
responsible for Direct3D and OpenGL driver standardization introduced
their own Microsoft 3D chip design called Talisman. Details of this era are documented extensively in the books "Game of X" v.1 and v.2 by Russel Demaria, "Renegades of the Empire" by Mike Drummond, "Opening the Xbox" by Dean Takahashi and "Masters of Doom" by David Kushner. The NvidiaGeForce 256
(also known as NV10) was the first consumer-level card with
hardware-accelerated T&L. While the OpenGL API provided software
support for texture mapping and lighting, the first 3D hardware
acceleration for these features arrived with the first Direct3D accelerated consumer GPU's.
2000s
NVIDIA
released the GeForce 256, marketed as the world's first GPU,
integrating transform and lighting engines for advanced 3D graphics
rendering. Nvidia was first to produce a chip capable of programmable shading: the GeForce 3.
Each pixel could now be processed by a short program that could include
additional image textures as inputs, and each geometric vertex could
likewise be processed by a short program before it was projected onto
the screen. Used in the Xbox console, this chip competed with the one in the PlayStation 2,
which used a custom vector unit for hardware-accelerated vertex
processing (commonly referred to as VU0/VU1). The earliest incarnations
of shader execution engines used in Xbox were not general-purpose and
could not execute arbitrary pixel code. Vertices and pixels were
processed by different units, which had their resources, with pixel
shaders having tighter constraints (because they execute at higher
frequencies than vertices). Pixel shading engines were more akin to a
highly customizable function block and did not "run" a program. Many of
these disparities between vertex and pixel shading were not addressed
until the Unified Shader Model.
In October 2002, with the introduction of the ATIRadeon 9700 (also known as R300), the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point
math, and were quickly becoming as flexible as CPUs, yet orders of
magnitude faster for image-array operations. Pixel shading is often used
for bump mapping, which adds texture to make an object look shiny, dull, rough, or even round or extruded.
With the introduction of the Nvidia GeForce 8 series and new generic stream processing units, GPUs became more generalized computing devices. Parallel GPUs are making computational inroads against the CPU, and a subfield of research, dubbed GPU computing or GPGPU for general purpose computing on GPU, has found applications in fields as diverse as machine learning, oil exploration, scientific image processing, linear algebra, statistics, 3D reconstruction, and stock options pricing. GPGPU
was the precursor to what is now called a compute shader (e.g. CUDA,
OpenCL, DirectCompute) and actually abused the hardware to a degree by
treating the data passed to algorithms as texture maps and executing
algorithms by drawing a triangle or quad with an appropriate pixel
shader.This entails some overheads since units like the scan converter are involved where they are not needed (nor are triangle manipulations even a concern—except to invoke the pixel shader).
Nvidia's CUDA platform, first introduced in 2007, was the earliest widely adopted programming model for GPU computing. OpenCL is an open standard defined by the Khronos Group that allows for the development of code for both GPUs and CPUs with an emphasis on portability. OpenCL solutions are supported by Intel, AMD, Nvidia, and ARM, and
according to a report in 2011 by Evans Data, OpenCL had become the
second most popular HPC tool.
2010s
In 2010, Nvidia partnered with Audi to power their cars' dashboards, using the Tegra GPU to provide increased functionality to cars' navigation and entertainment systems. Advances in GPU technology in cars helped advance self-driving technology. AMD's Radeon HD 6000 series cards were released in 2010, and in 2011 AMD released its 6000M Series discrete GPUs for mobile devices. The Kepler line of graphics cards by Nvidia were released in 2012 and
were used in the Nvidia's 600 and 700 series cards. A feature in this
GPU microarchitecture included GPU boost, a technology that adjusts the
clock-speed of a video card to increase or decrease it according to its
power draw. The Kepler microarchitecture was manufactured.
The PS4 and Xbox One were released in 2013; they both use GPUs based on AMD's Radeon HD 7850 and 7790. Nvidia's Kepler line of GPUs was followed by the Maxwell line, manufactured on the same process. Nvidia's 28 nm chips were manufactured by TSMC
in Taiwan using the 28 nm process. Compared to the 40 nm technology
from the past, this manufacturing process allowed a 20 percent boost in
performance while drawing less power. Virtual reality headsets have high system requirements; manufacturers recommended the GTX 970 and the R9 290X or better at the time of their release. Cards based on the Pascal microarchitecture were released in 2016. The GeForce 10 series
of cards are of this generation of graphics cards. They are made using
the 16 nm manufacturing process which improves upon previous
microarchitectures. Nvidia released one non-consumer card under the new Volta
architecture, the Titan V. Changes from the Titan XP, Pascal's high-end
card, include an increase in the number of CUDA cores, the addition of
tensor cores, and HBM2.
Tensor cores are designed for deep learning, while high-bandwidth
memory is on-die, stacked, lower-clocked memory that offers an extremely
wide memory bus. To emphasize that the Titan V is not a gaming card,
Nvidia removed the "GeForce GTX" suffix it adds to consumer gaming
cards.
In 2018, Nvidia launched the RTX 20 series GPUs that added
ray-tracing cores to GPUs, improving their performance on lighting
effects. Polaris 11 and Polaris 10
GPUs from AMD are fabricated by a 14 nm process. Their release resulted
in a substantial increase in the performance per watt of AMD video
cards. AMD also released the Vega GPU series for the high end market as a
competitor to Nvidia's high end Pascal cards, also featuring HBM2 like
the Titan V.
In 2019, AMD released the successor to their Graphics Core Next (GCN) microarchitecture/instruction set. Dubbed RDNA, the first product featuring it was the Radeon RX 5000 series of video cards. The company announced that the successor to the RDNA microarchitecture would be incremental (a "refresh"). AMD unveiled the Radeon RX 6000 series, its RDNA 2 graphics cards with support for hardware-accelerated ray tracing. The product series, launched in late 2020, consisted of the RX 6800, RX 6800 XT, and RX 6900 XT. The RX 6700 XT, which is based on Navi 22, was launched in early 2021.
The PlayStation 5 and Xbox Series X and Series S were released in 2020; they both use GPUs based on the RDNA 2 microarchitecture with incremental improvements and different GPU configurations in each system's implementation.
Intel first entered the GPU market
in the late 1990s, but produced lackluster 3D accelerators compared to
the competition at the time. Rather than attempting to compete with the
high-end manufacturers Nvidia and ATI/AMD, they began integrating Intel Graphics Technology GPUs into motherboard chipsets, beginning with the Intel 810 for the Pentium III, and later into CPUs. They began with the Intel Atom 'Pineview' laptop processor in 2009, continuing in 2010 with desktop processors in the first generation of the Intel Core
line and with contemporary Pentiums and Celerons. This resulted in a
large nominal market share, as the majority of computers with an Intel
CPU also featured this embedded graphics processor. These generally
lagged behind discrete processors in performance. Intel re-entered the
discrete GPU market in 2022 with its Arc series, which competed with the then-current GeForce 30 series and Radeon 6000 series cards at competitive prices.
In the 2020s, GPUs have been increasingly used for calculations involving embarrassingly parallel problems, such as training of neural networks on enormous datasets that are needed for large language models. Specialized processing cores on some modern workstation's GPUs are dedicated for deep learning
since they have significant FLOPS performance increases, using 4×4
matrix multiplication and division, resulting in hardware performance up
to 128 TFLOPS in some applications. These tensor cores are expected to appear in consumer cards, as well.
GPU companies
Many companies have produced GPUs under a number of brand names. In 2009, Intel, Nvidia, and AMD/ATI were the market share leaders, with 49.4%, 27.8%, and 20.6% market share respectively. In addition, Matrox produces GPUs. Chinese companies such as Jingjia Micro have also produced GPUs for the domestic market although in terms of worldwide sales, they still lag behind market leaders.
Modern GPUs have traditionally used most of their transistors to do calculations related to 3D computer graphics. In addition to the 3D hardware, today's GPUs include basic 2D acceleration and framebuffer
capabilities (usually with a VGA compatibility mode). Newer cards such
as AMD/ATI HD5000–HD7000 lack dedicated 2D acceleration; it is emulated
by 3D hardware. GPUs were initially used to accelerate the
memory-intensive work of texture mapping and rendering polygons. Later, dedicated hardware was added to accelerate geometric calculations such as the rotation and translation of vertices into different coordinate systems. Recent developments in GPUs include support for programmable shaders which can manipulate vertices and textures with many of the same operations that are supported by CPUs, oversampling and interpolation techniques to reduce aliasing, and very high-precision color spaces.
Several
factors of GPU construction affect the performance of the card for
real-time rendering, such as the size of the connector pathways in the semiconductor device fabrication, the clock signal frequency, and the number and size of various on-chip memory caches.
Performance is also affected by the number of streaming multiprocessors
(SM) for NVidia GPUs, or compute units (CU) for AMD GPUs, or Xe cores
for Intel discrete GPUs, which describe the number of on-silicon
processor core units within the GPU chip that perform the core
calculations, typically working in parallel with other SM/CUs on the
GPU. GPU performance is typically measured in floating point operations
per second (FLOPS);
GPUs in the 2010s and 2020s typically deliver performance measured in
teraflops (TFLOPS). This is an estimated performance measure, as other
factors can affect the actual display rate.
GPU accelerated video decoding and encoding
The ATI HD5470 GPU (above, with copper heatpipe attached) features UVD 2.1 which enables it to decode AVC and VC-1 video formats.
Most GPUs made since 1995 support the YUVcolor space and hardware overlays, important for digital video playback, and many GPUs made since 2000 also support MPEG primitives such as motion compensation and iDCT. This hardware-accelerated video decoding, in which portions of the video decoding process and video post-processing
are offloaded to the GPU hardware, is commonly referred to as "GPU
accelerated video decoding", "GPU assisted video decoding", "GPU
hardware accelerated video decoding", or "GPU hardware assisted video
decoding".
In the 1970s, the term "GPU" originally stood for graphics processor unit
and described a programmable processing unit working independently from
the CPU that was responsible for graphics manipulation and output. In 1994, Sony used the term (now standing for graphics processing unit) in reference to the PlayStation console's Toshiba-designed Sony GPU. The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as "the world's first GPU". It was presented as a "single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines". Rival ATI Technologies coined the term "visual processing unit" or VPU with the release of the Radeon 9700 in 2002. The AMD Alveo MA35D features dual VPU’s, each using the 5 nm process in 2023.
In personal computers, there are two main forms of GPUs. Each has many synonyms:
Dedicated graphics processing units uses RAM
that is dedicated to the GPU rather than relying on the computer’s main
system memory. This RAM is usually specially selected for the expected
serial workload of the graphics card (see GDDR). Sometimes systems with dedicated discrete GPUs were called "DIS" systems as opposed to "UMA" systems (see next section).
Dedicated GPUs are not necessarily removable, nor does it
necessarily interface with the motherboard in a standard fashion. The
term "dedicated" refers to the fact that graphics cards have RAM that is dedicated to the card's use, not to the fact that most
dedicated GPUs are removable. Dedicated GPUs for portable computers are
most commonly interfaced through a non-standard and often proprietary
slot due to size and weight constraints. Such ports may still be
considered PCIe or AGP in terms of their logical host interface, even if
they are not physically interchangeable with their counterparts.
Graphics cards with dedicated GPUs typically interface with the motherboard by means of an expansion slot such as PCI Express (PCIe) or Accelerated Graphics Port
(AGP). They can usually be replaced or upgraded with relative ease,
assuming the motherboard is capable of supporting the upgrade. A few
graphics cards still use Peripheral Component Interconnect (PCI) slots, but their bandwidth is so limited that they are generally used only when a PCIe or AGP slot is not available.
Technologies such as Scan-Line Interleave by 3dfx, SLI and NVLink by Nvidia and CrossFire
by AMD allow multiple GPUs to draw images simultaneously for a single
screen, increasing the processing power available for graphics. These
technologies, however, are increasingly uncommon; most games do not
fully use multiple GPUs, as most users cannot afford them. Multiple GPUs are still used on supercomputers (like in Summit), on workstations to accelerate video (processing multiple videos at once) and 3D rendering, for VFX, GPGPU workloads and for simulations, and in AI to expedite training, as is the case with Nvidia's lineup of
DGX workstations and servers, Tesla GPUs, and Intel's Ponte Vecchio
GPUs.
Integrated graphics processing unit
The position of an integrated GPU in a northbridge/southbridge system layoutAn ASRock motherboard with integrated graphics, which has HDMI, VGA and DVI-out ports
Integrated graphics processing units (IGPU), integrated graphics, shared graphics solutions, integrated graphics processors (IGP), or unified memory architectures
(UMA) use a portion of a computer's system RAM rather than dedicated
graphics memory. IGPs can be integrated onto a motherboard as part of
its northbridge chipset, or on the same die (integrated circuit) with the CPU (like AMD APU or Intel HD Graphics). On certain motherboards, AMD's IGPs can use dedicated sideport memory: a separate fixed block of
high performance memory that is dedicated for use by the GPU. As of
early 2007 computers with integrated graphics account for about 90% of all PC shipments. They are less costly to implement than dedicated graphics processing,
but tend to be less capable. Historically, integrated processing was
considered unfit for 3D games or graphically intensive programs but
could run less intensive programs such as Adobe Flash. Examples of such
IGPs would be offerings from SiS and VIA circa 2004. However, modern integrated graphics processors such as AMD Accelerated Processing Unit and Intel Graphics Technology (HD, UHD, Iris, Iris Pro, Iris Plus, and Xe-LP) can handle 2D graphics or low-stress 3D graphics.
Since GPU computations are memory-intensive, integrated
processing may compete with the CPU for relatively slow system RAM, as
it has minimal or no dedicated video memory. IGPs use system memory with
bandwidth up to a current maximum of 128 GB/s, whereas a discrete
graphics card may have a bandwidth of more than 1000 GB/s between its VRAM and GPU core. This memory bus bandwidth can limit the performance of the GPU, though multi-channel memory can mitigate this deficiency. Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.
On systems with "Unified Memory Architecture" (UMA), including modern AMD processors with integrated graphics, modern Intel processors with integrated graphics, Apple processors, the PS5 and Xbox Series (among others), the CPU cores
and the GPU block share the same pool of RAM and memory address space.
This allows the system to dynamically allocate memory between the CPU
cores and the GPU block based on memory needs (without needing a large
static split of the RAM) and thanks to zero copy transfers, removes the
need for either copying data over a bus
between physically separate RAM pools or copying between separate
address spaces on a single physical pool of RAM, allowing more efficient
transfer of data.
Hybrid graphics processing
Hybrid
GPUs compete with integrated graphics in the low-end desktop and
notebook markets. The most common implementations of this are ATI's HyperMemory and Nvidia's TurboCache.
Hybrid graphics cards are somewhat more expensive than integrated
graphics, but much less expensive than dedicated graphics cards. They
share memory with the system and have a small dedicated memory cache, to
make up for the high latency
of the system RAM. Technologies within PCI Express make this possible.
While these solutions are sometimes advertised as having as much as
768 MB of RAM, this refers to how much can be shared with the system
memory.
Stream processing and general purpose GPUs (GPGPU)
It is common to use a general purpose graphics processing unit (GPGPU) as a modified form of stream processor (or a vector processor), running compute kernels.
This turns the massive computational power of a modern graphics
accelerator's shader pipeline into general-purpose computing power. In
certain applications requiring massive vector operations, this can yield
several orders of magnitude higher performance than a conventional CPU.
The two largest discrete (see "Dedicated graphics processing unit" above) GPU designers, AMD and Nvidia, are pursuing this approach with an array of applications. Both Nvidia and AMD teamed with Stanford University to create a GPU-based client for the Folding@home
distributed computing project for protein folding calculations. In
certain circumstances, the GPU calculates forty times faster than the
CPUs traditionally used by such applications.
GPGPUs can be used for many types of embarrassingly parallel tasks including ray tracing. They are generally suited to high-throughput computations that exhibit data-parallelism to exploit the wide vector width SIMD architecture of the GPU.
GPU-based high performance computers play a significant role in
large-scale modelling. Three of the ten most powerful supercomputers in
the world take advantage of GPU acceleration.
GPUs support API extensions to the C programming language such as OpenCL and OpenMP. Furthermore, each GPU vendor introduced its own API which only works with their cards: AMD APP SDK from AMD, and CUDA from Nvidia. These allow functions called compute kernels
to run on the GPU's stream processors. This makes it possible for C
programs to take advantage of a GPU's ability to operate on large
buffers in parallel, while still using the CPU when appropriate. CUDA
was the first API to allow CPU-based applications to directly access the
resources of a GPU for more general purpose computing without the
limitations of using a graphics API.
Since 2005 there has been interest in using the performance offered by GPUs for evolutionary computation in general, and for accelerating the fitness evaluation in genetic programming in particular. Most approaches compile linear or tree programs
on the host PC and transfer the executable to the GPU to be run.
Typically a performance advantage is only obtained by running the single
active program simultaneously on many example problems in parallel,
using the GPU's SIMD architecture. However, substantial acceleration can also be obtained by not compiling
the programs, and instead transferring them to the GPU, to be
interpreted there. Acceleration can then be obtained by either interpreting multiple
programs simultaneously, simultaneously running multiple example
problems, or combinations of both. A modern GPU can simultaneously
interpret hundreds of thousands of very small programs.
External GPU (eGPU)
An
external GPU is a graphics processor located outside of the housing of
the computer, similar to a large external hard drive. External graphics
processors are sometimes used with laptop computers. Laptops might have a
substantial amount of RAM and a sufficiently powerful central
processing unit (CPU), but often lack a powerful graphics processor, and
instead have a less powerful but more energy-efficient on-board
graphics chip. On-board graphics chips are often not powerful enough for
playing video games, or for other graphically intensive tasks, such as
editing video or 3D animation/rendering.
Therefore, it is desirable to attach a GPU to some external bus of a notebook. PCI Express is the only bus used for this purpose. The port may be, for example, an ExpressCard or mPCIe port (PCIe ×1, up to 5 or 2.5 Gbit/s respectively), a Thunderbolt 1, 2, or 3 port (PCIe ×4, up to 10, 20, or 40 Gbit/s respectively), a USB4 port with Thunderbolt compatibility, or an OCuLink port. Those ports are only available on certain notebook systems.[96] eGPU enclosures include their own power supply (PSU), because powerful GPUs can consume hundreds of watts.[97]
Graphics processing units (GPU) have continued to increase in energy usage, while CPUs designers have recently[when?]
focused on improving performance per watt. High performance GPUs may
draw large amount of power, therefore intelligent techniques are
required to manage GPU power consumption. Measures like 3DMark2006 score per watt can help identify more efficient GPUs.[98] However that may not adequately incorporate efficiency in typical use, where much time is spent doing less demanding tasks.[99]
With modern GPUs, energy usage is an important constraint on the
maximum computational capabilities that can be achieved. GPU designs are
usually highly scalable, allowing the manufacturer to put multiple
chips on the same video card, or to use multiple video cards that work
in parallel. Peak performance of any system is essentially limited by
the amount of power it can draw and the amount of heat it can dissipate.
Consequently, performance per watt of a GPU design translates directly
into peak performance of a system that uses that design.
Since GPUs may also be used for some general purpose computation, sometimes their performance is measured in terms also applied to CPUs, such as FLOPS per watt.
Sales
In
2013, 438.3 million GPUs were shipped globally and the forecast for 2014
was 414.2 million. However, by the third quarter of 2022, shipments of
PC GPUs totaled around 75.5 million units, down 19% year-over-year.