A Medley of Potpourri

Tuesday, February 11, 2020

Embedded system

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Embedded_system

An embedded system on a plug-in card with processor, memory, power supply, and external interfaces

An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electrical system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts. Because an embedded system typically controls physical operations of the machine that it is embedded within, it often has real-time computing constraints. Embedded systems control many devices in common use today. Ninety-eight percent of all microprocessors manufactured are used in embedded systems.

Modern embedded systems are often based on microcontrollers (i.e. microprocessors with integrated memory and peripheral interfaces), but ordinary microprocessors (using external chips for memory and peripheral interface circuits) are also common, especially in more complex systems. In either case, the processor(s) used may be types ranging from general purpose to those specialized in certain class of computations, or even custom designed for the application at hand. A common standard class of dedicated processors is the digital signal processor (DSP). Since the embedded system is dedicated to specific tasks, design engineers can optimize it to reduce the size and cost of the product and increase the reliability and performance. Some embedded systems are mass-produced, benefiting from economies of scale.

Embedded systems range from portable devices such as digital watches and MP3 players, to large stationary installations like traffic light controllers, programmable logic controllers, and large complex systems like hybrid vehicles, medical imaging systems, and avionics. Complexity varies from low, with a single microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large equipment rack.

History

Background

The origins of the microprocessor and the microcontroller can be traced back to the MOS integrated circuit, which is an integrated circuit chip fabricated from MOSFETs (metal-oxide-semiconductor field-effect transistors) and was developed in the early 1960s. By 1964, MOS chips had reached higher transistor density and lower manufacturing costs than bipolar chips. MOS chips further increased in complexity at a rate predicted by Moore's law, leading to large-scale integration (LSI) with hundreds of transistors on a single MOS chip by the late 1960s. The application of MOS LSI chips to computing was the basis for the first microprocessors, as engineers began recognizing that a complete computer processor system could be contained on several MOS LSI chips.

The first multi-chip microprocessors, the Four-Phase Systems AL1 in 1969 and the Garrett AiResearch MP944 in 1970, were developed with multiple MOS LSI chips. The first single-chip microprocessor was the Intel 4004, released in 1971. It was developed by Federico Faggin, using his silicon-gate MOS technology, along with Intel engineers Marcian Hoff and Stan Mazor, and Busicom engineer Masatoshi Shima.

Development

One of the very first recognizably modern embedded systems was the Apollo Guidance Computer, developed ca. 1965 by Charles Stark Draper at the MIT Instrumentation Laboratory. At the project's inception, the Apollo guidance computer was considered the riskiest item in the Apollo project as it employed the then newly developed monolithic integrated circuits to reduce the size and weight. An early mass-produced embedded system was the Autonetics D-17 guidance computer for the Minuteman missile, released in 1961. When the Minuteman II went into production in 1966, the D-17 was replaced with a new computer that was the first high-volume use of integrated circuits.

Since these early applications in the 1960s, embedded systems have come down in price and there has been a dramatic rise in processing power and functionality. An early microprocessor for example, the Intel 4004 (released in 1971), was designed for calculators and other small systems but still required external memory and support chips. In 1978 National Engineering Manufacturers Association released a "standard" for programmable microcontrollers, including almost any computer-based controllers, such as single board computers, numerical, and event-based controllers.

As the cost of microprocessors and microcontrollers fell it became feasible to replace expensive knob-based analog components such as potentiometers and variable capacitors with up/down buttons or knobs read out by a microprocessor even in consumer products. By the early 1980s, memory, input and output system components had been integrated into the same chip as the processor forming a microcontroller. Microcontrollers find applications where a general-purpose computer would be too costly.

A comparatively low-cost microcontroller may be programmed to fulfill the same role as a large number of separate components. Although in this context an embedded system is usually more complex than a traditional solution, most of the complexity is contained within the microcontroller itself. Very few additional components may be needed and most of the design effort is in the software. Software prototype and test can be quicker compared with the design and construction of a new circuit not using an embedded processor.

Applications

Embedded Computer Sub-Assembly for Accupoll Electronic Voting Machine

Embedded systems are commonly found in consumer, industrial, automotive, home appliances, medical, commercial and military applications.

Telecommunications systems employ numerous embedded systems from telephone switches for the network to cell phones at the end user. Computer networking uses dedicated routers and network bridges to route data.

Consumer electronics include MP3 players, mobile phones, video game consoles, digital cameras, GPS receivers, and printers. Household appliances, such as microwave ovens, washing machines and dishwashers, include embedded systems to provide flexibility, efficiency and features. Advanced HVAC systems use networked thermostats to more accurately and efficiently control temperature that can change by time of day and season. Home automation uses wired- and wireless-networking that can be used to control lights, climate, security, audio/visual, surveillance, etc., all of which use embedded devices for sensing and controlling.

Transportation systems from flight to automobiles increasingly use embedded systems. New airplanes contain advanced avionics such as inertial guidance systems and GPS receivers that also have considerable safety requirements. Various electric motors — brushless DC motors, induction motors and DC motors — use electric/electronic motor controllers. Automobiles, electric vehicles, and hybrid vehicles increasingly use embedded systems to maximize efficiency and reduce pollution. Other automotive safety systems include anti-lock braking system (ABS), Electronic Stability Control (ESC/ESP), traction control (TCS) and automatic four-wheel drive.

Medical equipment uses embedded systems for vital signs monitoring, electronic stethoscopes for amplifying sounds, and various medical imaging (PET, SPECT, CT, and MRI) for non-invasive internal inspections. Embedded systems within medical equipment are often powered by industrial computers.

Embedded systems are used in transportation, fire safety, safety and security, medical applications and life critical systems, as these systems can be isolated from hacking and thus, be more reliable, unless connected to wired or wireless networks via on-chip 3G cellular or other methods for IoT monitoring and control purposes. For fire safety, the systems can be designed to have greater ability to handle higher temperatures and continue to operate. In dealing with security, the embedded systems can be self-sufficient and be able to deal with cut electrical and communication systems.

A new class of miniature wireless devices called motes are networked wireless sensors. Wireless sensor networking, WSN, makes use of miniaturization made possible by advanced IC design to couple full wireless subsystems to sophisticated sensors, enabling people and companies to measure a myriad of things in the physical world and act on this information through IT monitoring and control systems. These motes are completely self-contained, and will typically run off a battery source for years before the batteries need to be changed or charged.

Embedded Wi-Fi modules provide a simple means of wirelessly enabling any device that communicates via a serial port.

Characteristics

Embedded systems are designed to do some specific task, rather than be a general-purpose computer for multiple tasks. Some also have real-time performance constraints that must be met, for reasons such as safety and usability; others may have low or no performance requirements, allowing the system hardware to be simplified to reduce costs.

Embedded systems are not always standalone devices. Many embedded systems consist of small parts within a larger device that serves a more general purpose. For example, the Gibson Robot Guitar features an embedded system for tuning the strings, but the overall purpose of the Robot Guitar is, of course, to play music. Similarly, an embedded system in an automobile provides a specific function as a subsystem of the car itself.

e-con Systems eSOM270 & eSOM300 Computer on Modules

The program instructions written for embedded systems are referred to as firmware, and are stored in read-only memory or flash memory chips. They run with limited computer hardware resources: little memory, small or non-existent keyboard or screen.

User interface

Embedded system text user interface using MicroVGA

Embedded systems range from no user interface at all, in systems dedicated only to one task, to complex graphical user interfaces that resemble modern computer desktop operating systems. Simple embedded devices use buttons, LEDs, graphic or character LCDs (HD44780 LCD for example) with a simple menu system.

More sophisticated devices that use a graphical screen with touch sensing or screen-edge buttons provide flexibility while minimizing space used: the meaning of the buttons can change with the screen, and selection involves the natural behavior of pointing at what is desired. Handheld systems often have a screen with a "joystick button" for a pointing device.

Some systems provide user interface remotely with the help of a serial (e.g. RS-232, USB, I²C, etc.) or network (e.g. Ethernet) connection. This approach gives several advantages: extends the capabilities of embedded system, avoids the cost of a display, simplifies BSP and allows one to build a rich user interface on the PC. A good example of this is the combination of an embedded web server running on an embedded device (such as an IP camera) or a network router. The user interface is displayed in a web browser on a PC connected to the device, therefore needing no software to be installed.

Processors in embedded systems

Examples of properties of typical embedded computers when compared with general-purpose counterparts are low power consumption, small size, rugged operating ranges, and low per-unit cost. This comes at the price of limited processing resources, which make them significantly more difficult to program and to interact with. However, by building intelligence mechanisms on top of the hardware, taking advantage of possible existing sensors and the existence of a network of embedded units, one can both optimally manage available resources at the unit and network levels as well as provide augmented functions, well beyond those available. For example, intelligent techniques can be designed to manage power consumption of embedded systems.

Embedded processors can be broken into two broad categories. Ordinary microprocessors (μP) use separate integrated circuits for memory and peripherals. Microcontrollers (μC) have on-chip peripherals, thus reducing power consumption, size and cost. In contrast to the personal computer market, many different basic CPU architectures are used since software is custom-developed for an application and is not a commodity product installed by the end user. Both Von Neumann as well as various degrees of Harvard architectures are used. RISC as well as non-RISC processors are found. Word lengths vary from 4-bit to 64-bits and beyond, although the most typical remain 8/16-bit. Most architectures come in a large number of different variants and shapes, many of which are also manufactured by several different companies.

Numerous microcontrollers have been developed for embedded systems use. General-purpose microprocessors are also used in embedded systems, but generally, require more support circuitry than microcontrollers.

Ready-made computer boards

PC/104 and PC/104+ are examples of standards for ready-made computer boards intended for small, low-volume embedded and ruggedized systems, mostly x86-based. These are often physically small compared to a standard PC, although still quite large compared to most simple (8/16-bit) embedded systems. They often use DOS, Linux, NetBSD, or an embedded real-time operating system such as MicroC/OS-II, QNX or VxWorks. Sometimes these boards use non-x86 processors.

In certain applications, where small size or power efficiency are not primary concerns, the components used may be compatible with those used in general purpose x86 personal computers. Boards such as the VIA EPIA range help to bridge the gap by being PC-compatible but highly integrated, physically smaller or have other attributes making them attractive to embedded engineers. The advantage of this approach is that low-cost commodity components may be used along with the same software development tools used for general software development. Systems built in this way are still regarded as embedded since they are integrated into larger devices and fulfill a single role. Examples of devices that may adopt this approach are ATMs and arcade machines, which contain code specific to the application.

However, most ready-made embedded systems boards are not PC-centered and do not use the ISA or PCI buses. When a system-on-a-chip processor is involved, there may be little benefit to having a standardized bus connecting discrete components, and the environment for both hardware and software tools may be very different.

One common design style uses a small system module, perhaps the size of a business card, holding high density BGA chips such as an ARM-based system-on-a-chip processor and peripherals, external flash memory for storage, and DRAM for runtime memory. The module vendor will usually provide boot software and make sure there is a selection of operating systems, usually including Linux and some real time choices. These modules can be manufactured in high volume, by organizations familiar with their specialized testing issues, and combined with much lower volume custom mainboards with application-specific external peripherals.

Implementation of embedded systems has advanced so that they can easily be implemented with already-made boards that are based on worldwide accepted platforms. These platforms include, but are not limited to, Arduino and Raspberry Pi.

ASIC and FPGA solutions

A common array for very-high-volume embedded systems is the system on a chip (SoC) that contains a complete system consisting of multiple processors, multipliers, caches and interfaces on a single chip. SoCs can be implemented as an application-specific integrated circuit (ASIC) or using a field-programmable gate array (FPGA).

Peripherals

A close-up of the SMSC LAN91C110 (SMSC 91x) chip, an embedded Ethernet chip

Embedded systems talk with the outside world via peripherals, such as:

Serial Communication Interfaces (SCI): RS-232, RS-422, RS-485, etc.
Synchronous Serial Communication Interface: I2C, SPI, SSC and ESSI (Enhanced Synchronous Serial Interface)
Universal Serial Bus (USB)
Multi Media Cards (SD cards, Compact Flash, etc.)
Networks: Ethernet, LonWorks, etc.
Fieldbuses: CAN-Bus, LIN-Bus, PROFIBUS, etc.
Timers: PLL(s), Capture/Compare and Time Processing Units
Discrete IO: aka General Purpose Input/Output (GPIO)
Analog to Digital/Digital to Analog (ADC/DAC)
Debugging: JTAG, ISP, BDM Port, BITP, and DB9 ports.

Tools

As with other software, embedded system designers use compilers, assemblers, and debuggers to develop embedded system software. However, they may also use some more specific tools:

In circuit debuggers or emulators (see next section).
Utilities to add a checksum or CRC to a program, so the embedded system can check if the program is valid.
For systems using digital signal processing, developers may use a math workbench to simulate the mathematics.
System level modeling and simulation tools help designers to construct simulation models of a system with hardware components such as processors, memories, DMA, interfaces, buses and software behavior flow as a state diagram or flow diagram using configurable library blocks. Simulation is conducted to select right components by performing power vs. performance trade-off, reliability analysis and bottleneck analysis. Typical reports that helps designer to make architecture decisions includes application latency, device throughput, device utilization, power consumption of the full system as well as device-level power consumption.
A model-based development tool creates and simulate graphical data flow and UML state chart diagrams of components like digital filters, motor controllers, communication protocol decoding and multi-rate tasks.
Custom compilers and linkers may be used to optimize specialized hardware.
An embedded system may have its own special language or design tool, or add enhancements to an existing language such as Forth or Basic.
Another alternative is to add a real-time operating system or embedded operating system
Modeling and code generating tools often based on state machines

Software tools can come from several sources:

Software companies that specialize in the embedded market
Ported from the GNU software development tools
Sometimes, development tools for a personal computer can be used if the embedded processor is a close relative to a common PC processor

As the complexity of embedded systems grows, higher level tools and operating systems are migrating into machinery where it makes sense. For example, cellphones, personal digital assistants and other consumer computers often need significant software that is purchased or provided by a person other than the manufacturer of the electronics. In these systems, an open programming environment such as Linux, NetBSD, OSGi or Embedded Java is required so that the third-party software provider can sell to a large market.

Embedded systems are commonly found in consumer, cooking, industrial, automotive, medical applications. Some examples of embedded systems are MP3 players, mobile phones, video game consoles, digital cameras, DVD players, and GPS. Household appliances, such as microwave ovens, washing machines and dishwashers, include embedded systems to provide flexibility and efficiency.

Debugging

Embedded debugging may be performed at different levels, depending on the facilities available. The different metrics that characterize the different forms of embedded debugging are: does it slow down the main application, how close is the debugged system or application to the actual system or application, how expressive are the triggers that can be set for debugging (e.g., inspecting the memory when a particular program counter value is reached), and what can be inspected in the debugging process (such as, only memory, or memory and registers, etc.).

From simplest to most sophisticated they can be roughly grouped into the following areas:

Interactive resident debugging, using the simple shell provided by the embedded operating system (e.g. Forth and Basic)
External debugging using logging or serial port output to trace operation using either a monitor in flash or using a debug server like the Remedy Debugger that even works for heterogeneous multicore systems.
An in-circuit debugger (ICD), a hardware device that connects to the microprocessor via a JTAG or Nexus interface. This allows the operation of the microprocessor to be controlled externally, but is typically restricted to specific debugging capabilities in the processor.
An in-circuit emulator (ICE) replaces the microprocessor with a simulated equivalent, providing full control over all aspects of the microprocessor.
A complete emulator provides a simulation of all aspects of the hardware, allowing all of it to be controlled and modified, and allowing debugging on a normal PC. The downsides are expense and slow operation, in some cases up to 100 times slower than the final system.
For SoC designs, the typical approach is to verify and debug the design on an FPGA prototype board. Tools such as Certus are used to insert probes in the FPGA RTL that make signals available for observation. This is used to debug hardware, firmware and software interactions across multiple FPGA with capabilities similar to a logic analyzer.
Software-only debuggers have the benefit that they do not need any hardware modification but have to carefully control what they record in order to conserve time and storage space.

Unless restricted to external debugging, the programmer can typically load and run software through the tools, view the code running in the processor, and start or stop its operation. The view of the code may be as HLL source-code, assembly code or mixture of both.

Because an embedded system is often composed of a wide variety of elements, the debugging strategy may vary. For instance, debugging a software- (and microprocessor-) centric embedded system is different from debugging an embedded system where most of the processing is performed by peripherals (DSP, FPGA, and co-processor). An increasing number of embedded systems today use more than one single processor core. A common problem with multi-core development is the proper synchronization of software execution. In this case, the embedded system design may wish to check the data traffic on the buses between the processor cores, which requires very low-level debugging, at signal/bus level, with a logic analyzer, for instance.

Tracing

Real-time operating systems (RTOS) often supports tracing of operating system events. A graphical view is presented by a host PC tool, based on a recording of the system behavior. The trace recording can be performed in software, by the RTOS, or by special tracing hardware. RTOS tracing allows developers to understand timing and performance issues of the software system and gives a good understanding of the high-level system behaviors.

Reliability

Embedded systems often reside in machines that are expected to run continuously for years without errors, and in some cases recover by themselves if an error occurs. Therefore, the software is usually developed and tested more carefully than that for personal computers, and unreliable mechanical moving parts such as disk drives, switches or buttons are avoided.

Specific reliability issues may include:

The system cannot safely be shut down for repair, or it is too inaccessible to repair. Examples include space systems, undersea cables, navigational beacons, bore-hole systems, and automobiles.
The system must be kept running for safety reasons. "Limp modes" are less tolerable. Often backups are selected by an operator. Examples include aircraft navigation, reactor control systems, safety-critical chemical factory controls, train signals.
The system will lose large amounts of money when shut down: Telephone switches, factory controls, bridge and elevator controls, funds transfer and market making, automated sales and service.

A variety of techniques are used, sometimes in combination, to recover from errors—both software bugs such as memory leaks, and also soft errors in the hardware:

watchdog timer that resets the computer unless the software periodically notifies the watchdog subsystems with redundant spares that can be switched over to software "limp modes" that provide partial function
Designing with a Trusted Computing Base (TCB) architecture ensures a highly secure & reliable system environment
A hypervisor designed for embedded systems, is able to provide secure encapsulation for any subsystem component, so that a compromised software component cannot interfere with other subsystems, or privileged-level system software. This encapsulation keeps faults from propagating from one subsystem to another, thereby improving reliability. This may also allow a subsystem to be automatically shut down and restarted on fault detection.
Immunity Aware Programming

High vs. low volume

For high volume systems such as portable music players or mobile phones, minimizing cost is usually the primary design consideration. Engineers typically select hardware that is just “good enough” to implement the necessary functions.

For low-volume or prototype embedded systems, general purpose computers may be adapted by limiting the programs or by replacing the operating system with a real-time operating system.

Embedded software architectures

There are several different types of software architecture in common use.

Simple control loop

In this design, the software simply has a loop. The loop calls subroutines, each of which manages a part of the hardware or software. Hence it is called a simple control loop or control loop.

Interrupt-controlled system

Some embedded systems are predominantly controlled by interrupts. This means that tasks performed by the system are triggered by different kinds of events; an interrupt could be generated, for example, by a timer in a predefined frequency, or by a serial port controller receiving a byte.

These kinds of systems are used if event handlers need low latency, and the event handlers are short and simple. Usually, these kinds of systems run a simple task in a main loop also, but this task is not very sensitive to unexpected delays.

Sometimes the interrupt handler will add longer tasks to a queue structure. Later, after the interrupt handler has finished, these tasks are executed by the main loop. This method brings the system close to a multitasking kernel with discrete processes.

Cooperative multitasking

A nonpreemptive multitasking system is very similar to the simple control loop scheme, except that the loop is hidden in an API. The programmer defines a series of tasks, and each task gets its own environment to “run” in. When a task is idle, it calls an idle routine, usually called “pause”, “wait”, “yield”, “nop” (stands for no operation), etc.

The advantages and disadvantages are similar to that of the control loop, except that adding new software is easier, by simply writing a new task, or adding to the queue.

Preemptive multitasking or multi-threading

In this type of system, a low-level piece of code switches between tasks or threads based on a timer (connected to an interrupt). This is the level at which the system is generally considered to have an "operating system" kernel. Depending on how much functionality is required, it introduces more or less of the complexities of managing multiple tasks running conceptually in parallel.

As any code can potentially damage the data of another task (except in larger systems using an MMU) programs must be carefully designed and tested, and access to shared data must be controlled by some synchronization strategy, such as message queues, semaphores or a non-blocking synchronization scheme.

Because of these complexities, it is common for organizations to use a real-time operating system (RTOS), allowing the application programmers to concentrate on device functionality rather than operating system services, at least for large systems; smaller systems often cannot afford the overhead associated with a generic real-time system, due to limitations regarding memory size, performance, or battery life. The choice that an RTOS is required brings in its own issues, however, as the selection must be done prior to starting to the application development process. This timing forces developers to choose the embedded operating system for their device based upon current requirements and so restricts future options to a large extent. The restriction of future options becomes more of an issue as product life decreases. Additionally the level of complexity is continuously growing as devices are required to manage variables such as serial, USB, TCP/IP, Bluetooth, Wireless LAN, trunk radio, multiple channels, data and voice, enhanced graphics, multiple states, multiple threads, numerous wait states and so on. These trends are leading to the uptake of embedded middleware in addition to a real-time operating system.

Microkernels and exokernels

A microkernel is a logical step up from a real-time OS. The usual arrangement is that the operating system kernel allocates memory and switches the CPU to different threads of execution. User mode processes implement major functions such as file systems, network interfaces, etc.

In general, microkernels succeed when the task switching and intertask communication is fast and fail when they are slow.

Exokernels communicate efficiently by normal subroutine calls. The hardware and all the software in the system are available to and extensible by application programmers.

Monolithic kernels

In this case, a relatively large kernel with sophisticated capabilities is adapted to suit an embedded environment. This gives programmers an environment similar to a desktop operating system like Linux or Microsoft Windows, and is therefore very productive for development; on the downside, it requires considerably more hardware resources, is often more expensive, and, because of the complexity of these kernels, can be less predictable and reliable.

Common examples of embedded monolithic kernels are embedded Linux, VXWorks and Windows CE.

Despite the increased cost in hardware, this type of embedded system is increasing in popularity, especially on the more powerful embedded devices such as wireless routers and GPS navigation systems. Here are some of the reasons:

Ports to common embedded chip sets are available.
They permit re-use of publicly available code for device drivers, web servers, firewalls, and other code.
Development systems can start out with broad feature-sets, and then the distribution can be configured to exclude unneeded functionality, and save the expense of the memory that it would consume.
Many engineers believe that running application code in user mode is more reliable and easier to debug, thus making the development process easier and the code more portable.^{[citation needed]}
Features requiring faster response than can be guaranteed can often be placed in hardware.

Additional software components

In addition to the core operating system, many embedded systems have additional upper-layer software components. These components consist of networking protocol stacks like CAN, TCP/IP, FTP, HTTP, and HTTPS, and also included storage capabilities like FAT and flash memory management systems. If the embedded device has audio and video capabilities, then the appropriate drivers and codecs will be present in the system. In the case of the monolithic kernels, many of these software layers are included. In the RTOS category, the availability of the additional software components depends upon the commercial offering.

Domain-specific architectures

In the automotive sector, AUTOSAR is a standard architecture for embedded software.

Multi-core processor

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Multi-core_processor

Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache.

An Intel Core 2 Duo E6750 dual-core processor.

An AMD Athlon X2 6400+ dual-core processor.

A multi-core processor is a computer processor integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions, as if the computer had several processors. The instructions are ordinary CPU instructions (such as add, move data, and branch) but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP) or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core. A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared-memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, two-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores; heterogeneous multi-core systems have cores that are not identical (e.g. big.LITTLE have heterogeneous cores that share the same instruction set, while AMD Accelerated Processing Units have cores that don't even share the same instruction set). Just as with single-processor systems, cores in multi-core systems may implement architectures such as VLIW, superscalar, vector, or multithreading.

Multi-core processors are widely used across many application domains, including general-purpose, embedded, network, digital signal processing (DSP), and graphics (GPU).

The improvement in performance gained by the use of a multi-core processor depends very much on the software algorithms used and their implementation. In particular, possible gains are limited by the fraction of the software that can run in parallel simultaneously on multiple cores; this effect is described by Amdahl's law. In the best case, so-called embarrassingly parallel problems may realize speedup factors near the number of cores, or even more if the problem is split up enough to fit within each core's cache(s), avoiding use of much slower main-system memory. Most applications, however, are not accelerated so much unless programmers invest a prohibitive amount of effort in re-factoring the whole problem. The parallelization of software is a significant ongoing topic of research. Cointegration of multiprocessor applications provides flexibility in network architecture design. Adaptability within parallel models is an additional feature of systems utilizing these protocols.

Terminology

The terms multi-core and dual-core most commonly refer to some sort of central processing unit (CPU), but are sometimes also applied to digital signal processors (DSP) and system on a chip (SoC). The terms are generally used only to refer to multi-core microprocessors that are manufactured on the same integrated circuit die; separate microprocessor dies in the same package are generally referred to by another name, such as multi-chip module. This article uses the terms "multi-core" and "dual-core" for CPUs manufactured on the same integrated circuit, unless otherwise noted.

In contrast to multi-core systems, the term multi-CPU refers to multiple physically separate processing-units (which often contain special circuitry to facilitate communication between each other).

The terms many-core and massively multi-core are sometimes used to describe multi-core architectures with an especially high number of cores (tens to thousands).

Some systems use many soft microprocessor cores placed on a single FPGA. Each "core" can be considered a "semiconductor intellectual property core" as well as a CPU core.

Development

While manufacturing technology improves, reducing the size of individual gates, physical limits of semiconductor-based microelectronics have become a major design concern. These physical limitations can cause significant heat dissipation and data synchronization problems. Various other methods are used to improve CPU performance. Some instruction-level parallelism (ILP) methods such as superscalar pipelining are suitable for many applications, but are inefficient for others that contain difficult-to-predict code. Many applications are better suited to thread-level parallelism (TLP) methods, and multiple independent CPUs are commonly used to increase a system's overall TLP. A combination of increased available space (due to refined manufacturing processes) and the demand for increased TLP led to the development of multi-core CPUs.

Commercial incentives

Several business motives drive the development of multi-core architectures. For decades, it was possible to improve performance of a CPU by shrinking the area of the integrated circuit (IC), which reduced the cost per device on the IC. Alternatively, for the same circuit area, more transistors could be used in the design, which increased functionality, especially for complex instruction set computing (CISC) architectures. Clock rates also increased by orders of magnitude in the decades of the late 20th century, from several megahertz in the 1980s to several gigahertz in the early 2000s.

As the rate of clock speed improvements slowed, increased use of parallel computing in the form of multi-core processors has been pursued to improve overall processing performance. Multiple cores were used on the same CPU chip, which could then lead to better sales of CPU chips with two or more cores. For example, Intel has produced a 48-core processor for research in cloud computing; each core has an x86 architecture.

Technical factors

Since computer manufacturers have long implemented symmetric multiprocessing (SMP) designs using discrete CPUs, the issues regarding implementing multi-core processor architecture and supporting it with software are well known.

Additionally:

Using a proven processing-core design without architectural changes reduces design risk significantly.
For general-purpose processors, much of the motivation for multi-core processors comes from greatly diminished gains in processor performance from increasing the operating frequency. This is due to three primary factors:
1. The memory wall; the increasing gap between processor and memory speeds. This, in effect, pushes for cache sizes to be larger in order to mask the latency of memory. This helps only to the extent that memory bandwidth is not the bottleneck in performance.
2. The ILP wall; the increasing difficulty of finding enough parallelism in a single instruction stream to keep a high-performance single-core processor busy.
3. The power wall; the trend of consuming exponentially increasing power (and thus also generating exponentially increasing heat) with each factorial increase of operating frequency. This increase can be mitigated by "shrinking" the processor by using smaller traces for the same logic. The power wall poses manufacturing, system design and deployment problems that have not been justified in the face of the diminished gains in performance due to the memory wall and ILP wall.

In order to continue delivering regular performance improvements for general-purpose processors, manufacturers such as Intel and AMD have turned to multi-core designs, sacrificing lower manufacturing-costs for higher performance in some applications and systems. Multi-core architectures are being developed, but so are the alternatives. An especially strong contender for established markets is the further integration of peripheral functions into the chip.

Advantages

The proximity of multiple CPU cores on the same die allows the cache coherency circuitry to operate at a much higher clock rate than what is possible if the signals have to travel off-chip. Combining equivalent CPUs on a single die significantly improves the performance of cache snoop (alternative: Bus snooping) operations. Put simply, this means that signals between different CPUs travel shorter distances, and therefore those signals degrade less. These higher-quality signals allow more data to be sent in a given time period, since individual signals can be shorter and do not need to be repeated as often.

Assuming that the die can physically fit into the package, multi-core CPU designs require much less printed circuit board (PCB) space than do multi-chip SMP designs. Also, a dual-core processor uses slightly less power than two coupled single-core processors, principally because of the decreased power required to drive signals external to the chip. Furthermore, the cores share some circuitry, like the L2 cache and the interface to the front-side bus (FSB). In terms of competing technologies for the available silicon die area, multi-core design can make use of proven CPU core library designs and produce a product with lower risk of design error than devising a new wider-core design. Also, adding more cache suffers from diminishing returns.

Multi-core chips also allow higher performance at lower energy. This can be a big factor in mobile devices that operate on batteries. Since each core in a multi-core CPU is generally more energy-efficient, the chip becomes more efficient than having a single large monolithic core. This allows higher performance with less energy. A challenge in this, however, is the additional overhead of writing parallel code.

Disadvantages

Maximizing the usage of the computing resources provided by multi-core processors requires adjustments both to the operating system (OS) support and to existing application software. Also, the ability of multi-core processors to increase application performance depends on the use of multiple threads within applications.

Integration of a multi-core chip can lower the chip production yields. They are also more difficult to manage thermally than lower-density single-core designs. Intel has partially countered this first problem by creating its quad-core designs by combining two dual-core ones on a single die with a unified cache, hence any two working dual-core dies can be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core CPU. From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence. Finally, raw processing power is not the only constraint on system performance. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. In a 2009 report, Dr Jun Ni showed that if a single core is close to being memory-bandwidth limited, then going to dual-core might give 30% to 70% improvement; if memory bandwidth is not a problem, then a 90% improvement can be expected; however, Amdahl's law makes this claim dubious. It would be possible for an application that used two CPUs to end up running faster on a single-core one if communication between the CPUs was the limiting factor, which would count as more than 100% improvement.

Hardware

Trends

The trend in processor development has been towards an ever-increasing number of cores, as processors with hundreds or even thousands of cores become theoretically possible. In addition, multi-core chips mixed with simultaneous multithreading, memory-on-chip, and special-purpose "heterogeneous" (or asymmetric) cores promise further performance and efficiency gains, especially in processing multimedia, recognition and networking applications. For example, a big.LITTLE core includes a high-performance core (called 'big') and a low-power core (called 'LITTLE'). There is also a trend towards improving energy-efficiency by focusing on performance-per-watt with advanced fine-grain or ultra fine-grain power management and dynamic voltage and frequency scaling (i.e. laptop computers and portable media players).

Chips designed from the outset for a large number of cores (rather than having evolved from single core designs) are sometimes referred to as manycore designs, emphasising qualitative differences.

Architecture

The composition and balance of the cores in multi-core architecture show great variety. Some architectures use one core design repeated consistently ("homogeneous"), while others use a mixture of different cores, each optimized for a different, "heterogeneous" role.

How multiple cores are implemented and integrated significantly affects both the developer's programming skills and the consumer's expectations of apps and interactivity versus the device.^[14] A device advertised as being octa-core will only have independent cores if advertised as True Octa-core, or similar styling, as opposed to being merely two sets of quad-cores each with fixed clock speeds.

The article "CPU designers debate multi-core future" by Rick Merritt, EE Times 2008, includes these comments:

Chuck Moore [...] suggested computers should be like cellphones, using a variety of specialty cores to run modular software scheduled by a high-level applications programming interface.
[...] Atsushi Hasegawa, a senior chief engineer at Renesas, generally agreed. He suggested the cellphone's use of many specialty cores working in concert is a good model for future multi-core designs.
[...] Anant Agarwal, founder and chief executive of startup Tilera, took the opposing view. He said multi-core chips need to be homogeneous collections of general-purpose cores to keep the software model simple.

Software effects

An outdated version of an anti-virus application may create a new thread for a scan process, while its GUI thread waits for commands from the user (e.g. cancel the scan). In such cases, a multi-core architecture is of little benefit for the application itself due to the single thread doing all the heavy lifting and the inability to balance the work evenly across multiple cores. Programming truly multithreaded code often requires complex co-ordination of threads and can easily introduce subtle and difficult-to-find bugs due to the interweaving of processing on data shared between threads (see thread-safety). Consequently, such code is much more difficult to debug than single-threaded code when it breaks. There has been a perceived lack of motivation for writing consumer-level threaded applications because of the relative rarity of consumer-level demand for maximum use of computer hardware. Also, serial tasks like decoding the entropy encoding algorithms used in video codecs are impossible to parallelize because each result generated is used to help create the next result of the entropy decoding algorithm.

Given the increasing emphasis on multi-core chip design, stemming from the grave thermal and power consumption problems posed by any further significant increase in processor clock speeds, the extent to which software can be multithreaded to take advantage of these new chips is likely to be the single greatest constraint on computer performance in the future. If developers are unable to design software to fully exploit the resources provided by multiple cores, then they will ultimately reach an insurmountable performance ceiling.

The telecommunications market had been one of the first that needed a new design of parallel datapath packet processing because there was a very quick adoption of these multiple-core processors for the datapath and the control plane. These MPUs are going to replace the traditional Network Processors that were based on proprietary microcode or picocode.

Parallel programming techniques can benefit from multiple cores directly. Some existing parallel programming models such as Cilk Plus, OpenMP, OpenHMPP, FastFlow, Skandium, MPI, and Erlang can be used on multi-core platforms. Intel introduced a new abstraction for C++ parallelism called TBB. Other research efforts include the Codeplay Sieve System, Cray's Chapel, Sun's Fortress, and IBM's X10.

Multi-core processing has also affected the ability of modern computational software development. Developers programming in newer languages might find that their modern languages do not support multi-core functionality. This then requires the use of numerical libraries to access code written in languages like C and Fortran, which perform math computations faster than newer languages like C#. Intel's MKL and AMD's ACML are written in these native languages and take advantage of multi-core processing. Balancing the application workload across processors can be problematic, especially if they have different performance characteristics. There are different conceptual models to deal with the problem, for example using a coordination language and program building blocks (programming libraries or higher-order functions). Each block can have a different native implementation for each processor type. Users simply program using these abstractions and an intelligent compiler chooses the best implementation based on the context.

Managing concurrency acquires a central role in developing parallel applications. The basic steps in designing parallel applications are:

Partitioning: The partitioning stage of a design is intended to expose opportunities for parallel execution. Hence, the focus is on defining a large number of small tasks in order to yield what is termed a fine-grained decomposition of a problem.

Communication: The tasks generated by a partition are intended to execute concurrently but cannot, in general, execute independently. The computation to be performed in one task will typically require data associated with another task. Data must then be transferred between tasks so as to allow computation to proceed. This information flow is specified in the communication phase of a design.

Agglomeration: In the third stage, development moves from the abstract toward the concrete. Developers revisit decisions made in the partitioning and communication phases with a view to obtaining an algorithm that will execute efficiently on some class of parallel computer. In particular, developers consider whether it is useful to combine, or agglomerate, tasks identified by the partitioning phase, so as to provide a smaller number of tasks, each of greater size. They also determine whether it is worthwhile to replicate data and computation.

Mapping: In the fourth and final stage of the design of parallel algorithms, the developers specify where each task is to execute. This mapping problem does not arise on uniprocessors or on shared-memory computers that provide automatic task scheduling.

On the other hand, on the server side, multi-core processors are ideal because they allow many users to connect to a site simultaneously and have independent threads of execution. This allows for Web servers and application servers that have much better throughput.

Licensing

Vendors may license some software "per processor". This can give rise to ambiguity, because a "processor" may consist either of a single core or of a combination of cores.

Initially, for some of its enterprise software, Microsoft continued to use a per-socket licensing system. However, for some software such as BizTalk Server 2013, SQL Server 2014, and Windows Server 2016, Microsoft has shifted to per-core licensing.
Oracle Corporation counts an AMD X2 or an Intel dual-core CPU as a single processor but uses other metrics for other types, especially for processors with more than two cores.

Embedded applications

An embedded system on a plug-in card with processor, memory, power supply, and external interfaces

Embedded computing operates in an area of processor technology distinct from that of "mainstream" PCs. The same technological drives towards multi-core apply here too. Indeed, in many cases the application is a "natural" fit for multi-core technologies, if the task can easily be partitioned between the different processors.

In addition, embedded software is typically developed for a specific hardware release, making issues of software portability, legacy code or supporting independent developers less critical than is the case for PC or enterprise computing. As a result, it is easier for developers to adopt new technologies and as a result there is a greater variety of multi-core processing architectures and suppliers.

Network processors

As of 2010, multi-core network processors have become mainstream, with companies such as Freescale Semiconductor, Cavium Networks, Wintegra and Broadcom all manufacturing products with eight processors. For the system developer, a key challenge is how to exploit all the cores in these devices to achieve maximum networking performance at the system level, despite the performance limitations inherent in a symmetric multiprocessing (SMP) operating system. Companies such as 6WIND provide portable packet processing software designed so that the networking data plane runs in a fast path environment outside the operating system of the network device.

Digital signal processing

In digital signal processing the same trend applies: Texas Instruments has the three-core TMS320C6488 and four-core TMS320C5441, Freescale the four-core MSC8144 and six-core MSC8156 (and both have stated they are working on eight-core successors). Newer entries include the Storm-1 family from Stream Processors, Inc with 40 and 80 general purpose ALUs per chip, all programmable in C as a SIMD engine and Picochip with three-hundred processors on a single die, focused on communication applications.

Heterogeneous systems

In heterogeneous computing, where a system uses more than one kind of processor or cores, multi-core solutions are becoming more common: Xilinx Zynq UltraScale+ MPSoC has Quad-core ARM Cortex-A53 and Dual-core ARM Cortex-R5. Software solutions such as OpenAMP are being used to help with inter processor communication.

Mobile devices may use the ARM big.LITTLE architecture.

Symmetric multiprocessing

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Symmetric_multiprocessing

Diagram of a symmetric multiprocessing system

Symmetric multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.

Professor John D. Kubiatowicz considers traditionally SMP systems to contain processors without caches. Culler and Pal-Singh in their 1998 book "Parallel Computer Architecture: A Hardware/Software Approach" mention: "The term SMP is widely used but causes a bit of confusion. [...] The more precise description of what is intended by SMP is a shared memory multiprocessor where the cost of accessing a memory location is the same for all processors; that is, it has uniform access costs when the access actually is to memory. If the location is cached, the access will be faster, but cache access times and memory access times are the same on all processors."

SMP systems are tightly coupled multiprocessor systems with a pool of homogeneous processors running independently of each other. Each processor, executing different programs and working on different sets of data, has the capability of sharing common resources (memory, I/O device, interrupt system and so on) that are connected using a system bus or a crossbar.

Design

SMP systems have centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors. Usually each processor has an associated private high-speed memory known as cache memory (or cache) to speed up the main memory data access and to reduce the system bus traffic.

Processors may be interconnected using buses, crossbar switches or on-chip mesh networks. The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and power consumption of the interconnect among the various processors, the memory, and the disk arrays. Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much higher processor counts at the sacrifice of programmability:

Serious programming challenges remain with this kind of architecture because it requires two distinct modes of programming; one for the CPUs themselves and one for the interconnect between the CPUs. A single programming language would have to be able to not only partition the workload, but also comprehend the memory locality, which is severe in a mesh-based architecture.

SMP systems allow any processor to work on any task no matter where the data for that task is located in memory, provided that each task in the system is not in execution on two or more processors at the same time. With proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.

History

The earliest production system with multiple identical processors was the Burroughs B5000, which was functional around 1961. However at run-time this was asymmetric, with one processor restricted to application programs while the other processor mainly handled the operating system and hardware interrupts. The Burroughs D825 first implemented SMP in 1962.

IBM offered dual-processor computer systems based on its System/360 Model 65 and the closely related Model 67 and 67-2. The operating systems that ran on these machines were OS/360 M65MP and TSS/360. Other software developed at universities, notably the Michigan Terminal System (MTS), used both CPUs. Both processors could access data channels and initiate I/O. In OS/360 M65MP, peripherals could generally be attached to either processor since the operating system kernel ran on both processors (though with a "big lock" around the I/O handler). The MTS supervisor (UMMPS) has the ability to run on both CPUs of the IBM System/360 model 67-2. Supervisor locks were small and used to protect individual common data structures that might be accessed simultaneously from either CPU.

Other mainframes that supported SMP included the UNIVAC 1108 II, released in 1965, which supported up to three CPUs, and the GE-635 and GE-645, although GECOS on multiprocessor GE-635 systems ran in a master-slave asymmetric fashion, unlike Multics on multiprocessor GE-645 systems, which ran in a symmetric fashion.

Starting with its version 7.0 (1972), Digital Equipment Corporation's operating system TOPS-10 implemented the SMP feature, the earliest system running SMP was the DECSystem 1077 dual KI10 processor system. Later KL10 system could aggregate up to 8 CPUs in a SMP manner. In contrast, DECs first multi-processor VAX system, the VAX-11/782, was asymmetric, but later VAX multiprocessor systems were SMP.

Early commercial Unix SMP implementations included the Sequent Computer Systems Balance 8000 (released in 1984) and Balance 21000 (released in 1986). Both models were based on 10 MHz National Semiconductor NS32032 processors, each with a small write-through cache connected to a common memory to form a shared memory system. Another early commercial Unix SMP implementation was the NUMA based Honeywell Information Systems Italy XPS-100 designed by Dan Gielan of VAST Corporation in 1985. Its design supported up to 14 processors, but due to electrical limitations, the largest marketed version was a dual processor system. The operating system was derived and ported by VAST Corporation from AT&T 3B20 Unix SysVr3 code used internally within AT&T.

Earlier non-commercial multiprocessing UNIX ports existed, including a port named MUNIX created at the Naval Postgraduate School by 1975.

Uses

Time-sharing and server systems can often use SMP without changes to applications, as they may have multiple processes running in parallel, and a system with more than one process running can run different processes on different processors.

On personal computers, SMP is less useful for applications that have not been modified. If the system rarely runs more than one process at a time, SMP is useful only for applications that have been modified for multithreaded (multitasked) processing. Custom-programmed software can be written or modified to use multiple threads, so that it can make use of multiple processors.

Multithreaded programs can also be used in time-sharing and server systems that support multithreading, allowing them to make more use of multiple processors.

Advantages/Disadvantages

In current SMP systems, all of the processors are tightly coupled inside the same box with a bus or switch; on earlier SMP systems, a single CPU took an entire cabinet. Some of the components that are shared are global memory, disks, and I/O devices. Only one copy of an OS runs on all the processors, and the OS must be designed to take advantage of this architecture. Some of the basic advantages involves cost-effective ways to increase throughput. To solve different problems and tasks, SMP applies multiple processors to that one problem, known as parallel programming.

However, there are a few limits on the scalability of SMP due to cache coherence and shared objects.

Programming

Uniprocessor and SMP systems require different programming methods to achieve maximum performance. Programs running on SMP systems may experience an increase in performance even when they have been written for uniprocessor systems. This is because hardware interrupts usually suspends program execution while the kernel that handles them can execute on an idle processor instead. The effect in most applications (e.g. games) is not so much a performance increase as the appearance that the program is running much more smoothly. Some applications, particularly building software and some distributed computing projects, run faster by a factor of (nearly) the number of additional processors. (Compilers by themselves are single threaded, but, when building a software project with multiple compilation units, if each compilation unit is handled independently, this creates an embarrassingly parallel situation across the entire multi-compilation-unit project, allowing near linear scaling of compilation time. Distributed computing projects are inherently parallel by design.)

Systems programmers must build support for SMP into the operating system, otherwise, the additional processors remain idle and the system functions as a uniprocessor system.

SMP systems can also lead to more complexity regarding instruction sets. A homogeneous processor system typically requires extra registers for "special instructions" such as SIMD (MMX, SSE, etc.), while a heterogeneous system can implement different types of hardware for different instructions/uses.

Performance

When more than one program executes at the same time, an SMP system has considerably better performance than a uni-processor, because different programs can run on different CPUs simultaneously. Similarly, Asymmetric multiprocessing (AMP) usually allows only one processor to run a program or task at a time. For example, AMP can be used in assigning specific tasks to CPU based to priority and importance of task completion. AMP was created well before SMP in terms of handling multiple CPUs, which explains the lack of performance based on the example provided.

In cases where an SMP environment processes many jobs, administrators often experience a loss of hardware efficiency. Software programs have been developed to schedule jobs and other functions of the computer so that the processor utilization reaches its maximum potential. Good software packages can achieve this maximum potential by scheduling each CPU separately, as well as being able to integrate multiple SMP machines and clusters.

Access to RAM is serialized; this and cache coherency issues causes performance to lag slightly behind the number of additional processors in the system.

Alternatives

Diagram of a typical SMP system. Three processors are connected to the same memory module through a system bus or crossbar switch

SMP uses a single shared system bus that represents one of the earliest styles of multiprocessor machine architectures, typically used for building smaller computers with up to 8 processors.

Larger computer systems might use newer architectures such as NUMA (Non-Uniform Memory Access), which dedicates different memory banks to different processors. In a NUMA architecture, processors may access local memory quickly and remote memory more slowly. This can dramatically improve memory throughput as long as the data are localized to specific processes (and thus processors). On the downside, NUMA makes the cost of moving data from one processor to another, as in workload balancing, more expensive. The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users.

Finally, there is computer clustered multiprocessing (such as Beowulf), in which not all memory is available to all processors. Clustering techniques are used fairly extensively to build very large supercomputers.

Variable SMP

Variable Symmetric Multiprocessing (vSMP) is a specific mobile use case technology initiated by NVIDIA. This technology includes an extra fifth core in a quad-core device, called the Companion core, built specifically for executing tasks at a lower frequency during mobile active standby mode, video playback, and music playback.

Project Kal-El (Tegra 3), patented by NVIDIA, was the first SoC (System on Chip) to implement this new vSMP technology. This technology not only reduces mobile power consumption during active standby state, but also maximizes quad core performance during active usage for intensive mobile applications. Overall this technology addresses the need for increase in battery life performance during active and standby usage by reducing the power consumption in mobile processors.

Unlike current SMP architectures, the vSMP Companion core is OS transparent meaning that the operating system and the running applications are totally unaware of this extra core but are still able to take advantage of it. Some of the advantages of the vSMP architecture includes cache coherency, OS efficiency, and power optimization. The advantages for this architecture are explained below:

Cache Coherency: There are no consequences for synchronizing caches between cores running at different frequencies since vSMP does not allow the Companion core and the main cores to run simultaneously.
OS Efficiency: It is inefficient when multiple CPU cores are run at different asynchronous frequencies because this could lead to possible scheduling issues. With vSMP, the active CPU cores will run at similar frequencies to optimize OS scheduling.
Power Optimization: In asynchronous clocking based architecture, each core is on a different power plane to handle voltage adjustments for different operating frequencies. The result of this could impact performance. vSMP technology is able to dynamically enable and disable certain cores for active and standby usage, reducing overall power consumption.

These advantages lead the vSMP architecture to considerably benefit over other architectures using asynchronous clocking technologies.

Asymmetric multiprocessing

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Asymmetry

An asymmetric multiprocessing (AMP) system is a multiprocessor computer system where not all of the multiple interconnected central processing units (CPUs) are treated equally. For example, a system might allow (either at the hardware or operating system level) only one CPU to execute operating system code or might allow only one CPU to perform I/O operations. Other AMP systems might allow any CPU to execute operating system code and perform I/O operations, so that they were symmetric with regard to processor roles, but attached some or all peripherals to particular CPUs, so that they were asymmetric with respect to the peripheral attachment.

Asymmetric multiprocessing was the only method for handling multiple CPUs before symmetric multiprocessing (SMP) was available. It has also been used to provide less expensive options on systems where SMP was available. Additionally, AMP is used in applications that are dedicated, such as embedded systems, when individual processors can be dedicated to specific tasks at design time.

Asymmetric multiprocessing

Background and history

For the room-size computers of the 1960s and 1970s, a cost-effective way to increase compute power was to add a second CPU. Since these computers were already close to the fastest available (near the peak of the price:performance ratio), two standard-speed CPUs were much less expensive than a CPU that ran twice as fast. Also, adding a second CPU was less expensive than a second complete computer, which would need its own peripherals, thus requiring much more floor space and an increased operations staff.

Notable early AMP offerings by computer manufacturers were the Burroughs B5000, the DECsystem-1055, and the IBM System/360 model 65MP. There were also dual-CPU machines built at universities.

The problem with adding a second CPU to a computer system was that the operating system had been developed for single-CPU systems, and extending it to handle multiple CPUs efficiently and reliably took a long time. To fill this gap, operating systems intended for single CPUs were initially extended to provide minimal support for a second CPU. In this minimal support, the operating system ran on the “boot” processor, with the other only allowed to run user programs. In the case of the Burroughs B5000, the second processor's hardware was not capable of running "control state" code.

Other systems allowed the operating system to run on all processors, but either attached all the peripherals to one processor or attached particular peripherals to particular processors.

Burroughs B5000 and B5500

An option on the Burroughs B5000 was “Processor B”. This second processor, unlike “Processor A” had no connection to the peripherals, though the two processors shared main memory, and Processor B could not run in Control State. The operating system ran only on Processor A. When there was a user job to be executed, it might be run on Processor B, but when that job tried to access the operating system the processor halted and signaled Processor A. The requested operating system service was then run on Processor A.

On the B5500, either Processor A or Processor B could be designated as Processor 1 by a switch on the engineer's panel, with the other processor being Processor 2; both processors shared main memory and had hardware access to the I/O processors hence the peripherals, but only Processor 1 could respond to peripheral interrupts. When a job on Processor 2 required an operating system service it would be rescheduled on Processor 1, which was responsible for both initiating I/O processor activity and responding to interrupts indicating completion. In practice, this meant that while user jobs could run on either Processor 1 or Processor 2 and could access intrinsic library routines that didn't require kernel support, the operating system would schedule them on the latter whenever possible.

CDC 6500 and 6700

Control Data Corporation offered two configurations of its CDC 6000 series that featured two central processors. The CDC 6500 was a CDC 6400 with two central processors. The CDC 6700 was a CDC 6600 with the CDC 6400 central processor added to it.

These systems were organized quite differently from the other multiprocessors in this article. The operating system ran on the peripheral processors, while the user's application ran on the CPUs. Thus, the terms ASMP and SMP do not properly apply to these multiprocessors.

DECsystem-10

Digital Equipment Corporation (DEC) offered a dual-processor version of its DECsystem-1050 which used two KA10 processors; all peripherals were attached to one processor, the primary processor, and the primary processor ran the operating system code. This offering was extended to the KL-10 and KS-10 processors in the PDP-10 line; in those systems, the boot CPU is designated the "policy CPU", which runs the command interpreter, swaps jobs in and out of memory, and performs a few other functions; other operating system functions, and I/O, can be performed by any of the processors, and if the policy processor fails, another processor takes over as the policy processor.

PDP-11/74

Digital Equipment Corporation developed, but never released, a multiprocessor PDP-11, the PDP-11/74, running a multiprocessor version of RSX-11M. In that system, either processor could run operating system code, and could perform I/O, but not all peripherals were accessible to all processors; most peripherals were attached to one or the other of the CPUs, so that a processor to which a peripheral wasn't attached would, when it needed to perform an I/O operation on that peripheral, request the processor to which the peripheral was attached to perform the operation.

VAX-11/782

DEC's first multi-processor VAX system, the VAX-11/782, was an asymmetric dual-processor system; only the first processor had access to the I/O devices.

IBM System/370 model 168

Two options were available for the IBM System/370 Model 168 for attaching a second processor. One was the IBM 3062 Attached Processing Unit, in which the second processor had no access to the channels, and was therefore similar to the B5000's Processor B or the second processor on a VAX-11/782. The other option offered a complete second CPU, and was thus more like the System/360 model 65MP.

Search This Blog

Tuesday, February 11, 2020

Embedded system

History

Background

Development

Applications

Characteristics

User interface

Processors in embedded systems

Ready-made computer boards

ASIC and FPGA solutions

Peripherals

Tools

Debugging

Tracing

Reliability

High vs. low volume

Embedded software architectures

Simple control loop

Interrupt-controlled system

Cooperative multitasking

Preemptive multitasking or multi-threading

Microkernels and exokernels

Monolithic kernels

Additional software components

Domain-specific architectures

Multi-core processor

Terminology

Development

Commercial incentives

Technical factors

Advantages

Disadvantages

Hardware

Trends

Architecture

Software effects

Licensing

Embedded applications

Network processors

Digital signal processing

Heterogeneous systems

Symmetric multiprocessing

Design

History

Uses

Advantages/Disadvantages

Programming

Performance

Alternatives

Variable SMP

Asymmetric multiprocessing

Background and history

Burroughs B5000 and B5500

CDC 6500 and 6700

DECsystem-10

PDP-11/74

VAX-11/782

IBM System/370 model 168

Introduction to entropy