Search This Blog

Monday, November 27, 2023

Instrumentation

From Wikipedia, the free encyclopedia

Instrumentation is a collective term for measuring instruments, used for indicating, measuring and recording physical quantities. It is also a field of study about the art and science about making measurement instruments, involving the related areas of metrology, automation, and control theory. The term has its origins in the art and science of scientific instrument-making.

Instrumentation can refer to devices as simple as direct-reading thermometers, or as complex as multi-sensor components of industrial control systems. Today, instruments can be found in laboratories, refineries, factories and vehicles, as well as in everyday household use (e.g., smoke detectors and thermostats)

Measurement parameters

Control valve

Instrumentation is used to measure many parameters (physical values), including:

History

A local instrumentation panel on a steam turbine

The history of instrumentation can be divided into several phases.

Pre-industrial

Elements of industrial instrumentation have long histories. Scales for comparing weights and simple pointers to indicate position are ancient technologies. Some of the earliest measurements were of time. One of the oldest water clocks was found in the tomb of the ancient Egyptian pharaoh Amenhotep I, buried around 1500 BCE. Improvements were incorporated in the clocks. By 270 BCE they had the rudiments of an automatic control system device.In 1663 Christopher Wren presented the Royal Society with a design for a "weather clock". A drawing shows meteorological sensors moving pens over paper driven by clockwork. Such devices did not become standard in meteorology for two centuries. The concept has remained virtually unchanged as evidenced by pneumatic chart recorders, where a pressurized bellows displaces a pen. Integrating sensors, displays, recorders and controls was uncommon until the industrial revolution, limited by both need and practicality.

Early industrial

The evolution of analogue control loop signalling from the pneumatic era to the electronic era

Early systems used direct process connections to local control panels for control and indication, which from the early 1930s saw the introduction of pneumatic transmitters and automatic 3-term (PID) controllers.

The ranges of pneumatic transmitters were defined by the need to control valves and actuators in the field. Typically a signal ranged from 3 to 15 psi (20 to 100kPa or 0.2 to 1.0 kg/cm2) as a standard, was standardized with 6 to 30 psi occasionally being used for larger valves. Transistor electronics enabled wiring to replace pipes, initially with a range of 20 to 100mA at up to 90V for loop powered devices, reducing to 4 to 20mA at 12 to 24V in more modern systems. A transmitter is a device that produces an output signal, often in the form of a 4–20 mA electrical current signal, although many other options using voltage, frequency, pressure, or ethernet are possible. The transistor was commercialized by the mid-1950s.

Instruments attached to a control system provided signals used to operate solenoids, valves, regulators, circuit breakers, relays and other devices. Such devices could control a desired output variable, and provide either remote monitoring or automated control capabilities.

Each instrument company introduced their own standard instrumentation signal, causing confusion until the 4–20 mA range was used as the standard electronic instrument signal for transmitters and valves. This signal was eventually standardized as ANSI/ISA S50, “Compatibility of Analog Signals for Electronic Industrial Process Instruments", in the 1970s. The transformation of instrumentation from mechanical pneumatic transmitters, controllers, and valves to electronic instruments reduced maintenance costs as electronic instruments were more dependable than mechanical instruments. This also increased efficiency and production due to their increase in accuracy. Pneumatics enjoyed some advantages, being favored in corrosive and explosive atmospheres.

Automatic process control

Example of a single industrial control loop, showing continuously modulated control of process flow

In the early years of process control, process indicators and control elements such as valves were monitored by an operator that walked around the unit adjusting the valves to obtain the desired temperatures, pressures, and flows. As technology evolved pneumatic controllers were invented and mounted in the field that monitored the process and controlled the valves. This reduced the amount of time process operators were needed to monitor the process. Later years the actual controllers were moved to a central room and signals were sent into the control room to monitor the process and outputs signals were sent to the final control element such as a valve to adjust the process as needed. These controllers and indicators were mounted on a wall called a control board. The operators stood in front of this board walking back and forth monitoring the process indicators. This again reduced the number and amount of time process operators were needed to walk around the units. The most standard pneumatic signal level used during these years was 3–15 psig.

Large integrated computer-based systems

Pneumatic "three term" pneumatic PID controller, widely used before electronics became reliable and cheaper and safe to use in hazardous areas (Siemens Telepneu Example)
A pre-DCS/SCADA era central control room. Whilst the controls are centralised in one place, they are still discrete and not integrated into one system.
A DCS control room where plant information and controls are displayed on computer graphics screens. The operators are seated and can view and control any part of the process from their screens, whilst retaining a plant overview.

Process control of large industrial plants has evolved through many stages. Initially, control would be from panels local to the process plant. However this required a large manpower resource to attend to these dispersed panels, and there was no overall view of the process. The next logical development was the transmission of all plant measurements to a permanently-staffed central control room. Effectively this was the centralisation of all the localised panels, with the advantages of lower manning levels and easier overview of the process. Often the controllers were behind the control room panels, and all automatic and manual control outputs were transmitted back to plant.

However, whilst providing a central control focus, this arrangement was inflexible as each control loop had its own controller hardware, and continual operator movement within the control room was required to view different parts of the process. With coming of electronic processors and graphic displays it became possible to replace these discrete controllers with computer-based algorithms, hosted on a network of input/output racks with their own control processors. These could be distributed around plant, and communicate with the graphic display in the control room or rooms. The distributed control concept was born.

The introduction of DCSs and SCADA allowed easy interconnection and re-configuration of plant controls such as cascaded loops and interlocks, and easy interfacing with other production computer systems. It enabled sophisticated alarm handling, introduced automatic event logging, removed the need for physical records such as chart recorders, allowed the control racks to be networked and thereby located locally to plant to reduce cabling runs, and provided high level overviews of plant status and production levels.

Application

In some cases the sensor is a very minor element of the mechanism. Digital cameras and wristwatches might technically meet the loose definition of instrumentation because they record and/or display sensed information. Under most circumstances neither would be called instrumentation, but when used to measure the elapsed time of a race and to document the winner at the finish line, both would be called instrumentation.

Household

A very simple example of an instrumentation system is a mechanical thermostat, used to control a household furnace and thus to control room temperature. A typical unit senses temperature with a bi-metallic strip. It displays temperature by a needle on the free end of the strip. It activates the furnace by a mercury switch. As the switch is rotated by the strip, the mercury makes physical (and thus electrical) contact between electrodes.

Another example of an instrumentation system is a home security system. Such a system consists of sensors (motion detection, switches to detect door openings), simple algorithms to detect intrusion, local control (arm/disarm) and remote monitoring of the system so that the police can be summoned. Communication is an inherent part of the design.

Kitchen appliances use sensors for control.

  • A refrigerator maintains a constant temperature by actuating the cooling system when the temperature becomes too high.
  • An automatic ice machine makes ice until a limit switch is thrown.
  • Pop-up bread toasters allow the time to be set.
  • Non-electronic gas ovens will regulate the temperature with a thermostat controlling the flow of gas to the gas burner. These may feature a sensor bulb sited within the main chamber of the oven. In addition, there may be a safety cut-off flame supervision device: after ignition, the burner's control knob must be held for a short time in order for a sensor to become hot, and permit the flow of gas to the burner. If the safety sensor becomes cold, this may indicate the flame on the burner has become extinguished, and to prevent a continuous leak of gas the flow is stopped.
  • Electric ovens use a temperature sensor and will turn on heating elements when the temperature is too low. More advanced ovens will actuate fans in response to temperature sensors, to distribute heat or to cool.
  • A common toilet refills the water tank until a float closes the valve. The float is acting as a water level sensor.

Automotive

Modern automobiles have complex instrumentation. In addition to displays of engine rotational speed and vehicle linear speed, there are also displays of battery voltage and current, fluid levels, fluid temperatures, distance traveled and feedbacks of various controls (turn signals, parking brake, headlights, transmission position). Cautions may be displayed for special problems (fuel low, check engine, tire pressure low, door ajar, seat belt unfastened). Problems are recorded so they can be reported to diagnostic equipment. Navigation systems can provide voice commands to reach a destination. Automotive instrumentation must be cheap and reliable over long periods in harsh environments. There may be independent airbag systems which contain sensors, logic and actuators. Anti-skid braking systems use sensors to control the brakes, while cruise control affects throttle position. A wide variety of services can be provided via communication links as the OnStar system. Autonomous cars (with exotic instrumentation) have been demonstrated.

Aircraft

Early aircraft had a few sensors. "Steam gauges" converted air pressures into needle deflections that could be interpreted as altitude and airspeed. A magnetic compass provided a sense of direction. The displays to the pilot were as critical as the measurements.

A modern aircraft has a far more sophisticated suite of sensors and displays, which are embedded into avionics systems. The aircraft may contain inertial navigation systems, global positioning systems, weather radar, autopilots, and aircraft stabilization systems. Redundant sensors are used for reliability. A subset of the information may be transferred to a crash recorder to aid mishap investigations. Modern pilot displays now include computer displays including head-up displays.

Air traffic control radar is distributed instrumentation system. The ground portion transmits an electromagnetic pulse and receives an echo (at least). Aircraft carry transponders that transmit codes on reception of the pulse. The system displays aircraft map location, an identifier and optionally altitude. The map location is based on sensed antenna direction and sensed time delay. The other information is embedded in the transponder transmission.

Laboratory instrumentation

Among the possible uses of the term is a collection of laboratory test equipment controlled by a computer through an IEEE-488 bus (also known as GPIB for General Purpose Instrument Bus or HPIB for Hewlitt Packard Instrument Bus). Laboratory equipment is available to measure many electrical and chemical quantities. Such a collection of equipment might be used to automate the testing of drinking water for pollutants.

Instrumentation engineering

The instrumentation part of a piping and instrumentation diagram will be developed by an instrumentation engineer.

Instrumentation engineering is the engineering specialization focused on the principle and operation of measuring instruments that are used in design and configuration of automated systems in areas such as electrical and pneumatic domains, and the control of quantities being measured. They typically work for industries with automated processes, such as chemical or manufacturing plants, with the goal of improving system productivity, reliability, safety, optimization and stability. To control the parameters in a process or in a particular system, devices such as microprocessors, microcontrollers or PLCs are used, but their ultimate aim is to control the parameters of a system.

Instrumentation engineering is loosely defined because the required tasks are very domain dependent. An expert in the biomedical instrumentation of laboratory rats has very different concerns than the expert in rocket instrumentation. Common concerns of both are the selection of appropriate sensors based on size, weight, cost, reliability, accuracy, longevity, environmental robustness and frequency response. Some sensors are literally fired in artillery shells. Others sense thermonuclear explosions until destroyed. Invariably sensor data must be recorded, transmitted or displayed. Recording rates and capacities vary enormously. Transmission can be trivial or can be clandestine, encrypted and low-power in the presence of jamming. Displays can be trivially simple or can require consultation with human factors experts. Control system design varies from trivial to a separate specialty.

Instrumentation engineers are responsible for integrating the sensors with the recorders, transmitters, displays or control systems, and producing the Piping and instrumentation diagram for the process. They may design or specify installation, wiring and signal conditioning. They may be responsible for commissioning, calibration, testing and maintenance of the system.

In a research environment it is common for subject matter experts to have substantial instrumentation system expertise. An astronomer knows the structure of the universe and a great deal about telescopes – optics, pointing and cameras (or other sensing elements). That often includes the hard-won knowledge of the operational procedures that provide the best results. For example, an astronomer is often knowledgeable of techniques to minimize temperature gradients that cause air turbulence within the telescope.

Instrumentation technologists, technicians and mechanics specialize in troubleshooting, repairing and maintaining instruments and instrumentation systems.

Typical industrial transmitter signal types

  • HART – Data signalling, often overlaid on a current loop

Impact of modern development

Ralph Müller (1940) stated, "That the history of physical science is largely the history of instruments and their intelligent use is well known. The broad generalizations and theories which have arisen from time to time have stood or fallen on the basis of accurate measurement, and in several instances new instruments have had to be devised for the purpose. There is little evidence to show that the mind of modern man is superior to that of the ancients. His tools are incomparably better."

Davis Baird has argued that the major change associated with Floris Cohen's identification of a "fourth big scientific revolution" after World War II is the development of scientific instrumentation, not only in chemistry but across the sciences. In chemistry, the introduction of new instrumentation in the 1940s was "nothing less than a scientific and technological revolution" in which classical wet-and-dry methods of structural organic chemistry were discarded, and new areas of research opened up.

As early as 1954, W. A. Wildhack discussed both the productive and destructive potential inherent in process control. The ability to make precise, verifiable and reproducible measurements of the natural world, at levels that were not previously observable, using scientific instrumentation, has "provided a different texture of the world". This instrumentation revolution fundamentally changes human abilities to monitor and respond, as is illustrated in the examples of DDT monitoring and the use of UV spectrophotometry and gas chromatography to monitor water pollutants.

Strange loop

From Wikipedia, the free encyclopedia

A strange loop is a cyclic structure that goes through several levels in a hierarchical system. It arises when, by moving only upwards or downwards through the system, one finds oneself back where one started. Strange loops may involve self-reference and paradox. The concept of a strange loop was proposed and extensively discussed by Douglas Hofstadter in Gödel, Escher, Bach, and is further elaborated in Hofstadter's book I Am a Strange Loop, published in 2007.

A tangled hierarchy is a hierarchical consciousness system in which a strange loop appears.

Definitions

A strange loop is a hierarchy of levels, each of which is linked to at least one other by some type of relationship. A strange loop hierarchy is "tangled" (Hofstadter refers to this as a "heterarchy"), in that there is no well defined highest or lowest level; moving through the levels, one eventually returns to the starting point, i.e., the original level. Examples of strange loops that Hofstadter offers include: many of the works of M. C. Escher, the Canon 5. a 2 from J.S. Bach's Musical Offering, the information flow network between DNA and enzymes through protein synthesis and DNA replication, and self-referential Gödelian statements in formal systems.

In I Am a Strange Loop, Hofstadter defines strange loops as follows:

And yet when I say "strange loop", I have something else in mind — a less concrete, more elusive notion. What I mean by "strange loop" is — here goes a first stab, anyway — not a physical circuit but an abstract loop in which, in the series of stages that constitute the cycling-around, there is a shift from one level of abstraction (or structure) to another, which feels like an upwards movement in an hierarchy, and yet somehow the successive "upward" shifts turn out to give rise to a closed cycle. That is, despite one's sense of departing ever further from one's origin, one winds up, to one's shock, exactly where one had started out. In short, a strange loop is a paradoxical level-crossing feedback loop. (pp. 101–102)

In cognitive science

According to Hofstadter, strange loops take form in human consciousness as the complexity of active symbols in the brain inevitably leads to the same kind of self-reference which Gödel proved was inherent in any complex logical or arithmetical system in his incompleteness theorem. Gödel showed that mathematics and logic contain strange loops: propositions that not only refer to mathematical and logical truths, but also to the symbol systems expressing those truths. This leads to the sort of paradoxes seen in statements such as "This statement is false," wherein the sentence's basis of truth is found in referring to itself and its assertion, causing a logical paradox.

Hofstadter argues that the psychological self arises out of a similar kind of paradox. We are not born with an "I" – the ego emerges only gradually as experience shapes our dense web of active symbols into a tapestry rich and complex enough to begin twisting back upon itself. According to this view the psychological "I" is a narrative fiction, something created only from intake of symbolic data and its own ability to create stories about itself from that data. The consequence is that a perspective (a mind) is a culmination of a unique pattern of symbolic activity in our nervous systems, which suggests that the pattern of symbolic activity that makes identity, that constitutes subjectivity, can be replicated within the brains of others, and perhaps even in artificial brains.

Strangeness

The "strangeness" of a strange loop comes from our way of perceiving, because we categorize our input in a small number of "symbols" (by which Hofstadter means groups of neurons standing for one thing in the outside world). So the difference between the video-feedback loop and our strange loops, our "I"s, is that while the former converts light to the same pattern on a screen, the latter categorizes a pattern and outputs its essence, so that as we get closer and closer to our essence, we get further down our strange loop.

Downward causality

Hofstadter thinks our minds appear to us to determine the world by way of "downward causality", which refers to a situation where a cause-and-effect relationship in a system gets flipped upside-down. Hofstadter says this happens in the proof of Gödel's incompleteness theorem:

Merely from knowing the formula's meaning, one can infer its truth or falsity without any effort to derive it in the old-fashioned way, which requires one to trudge methodically "upwards" from the axioms. This is not just peculiar; it is astonishing. Normally, one cannot merely look at what a mathematical conjecture says and simply appeal to the content of that statement on its own to deduce whether the statement is true or false. (pp. 169–170)

Hofstadter claims a similar "flipping around of causality" appears to happen in minds possessing self-consciousness. The mind perceives itself as the cause of certain feelings ("I" am the source of my desires), while according to popular scientific models, feelings and desires are strictly caused by the interactions of neurons.

The parallels between downward causation in formal systems and downward causation in brains are explored by Theodor Nenu (2022), together with other aspects of Hofstadter's metaphysics of mind. Nenu also questions the correctness of the above quote by focusing on the sentence which "says about itself" that it is provable (also known as a Henkin-sentence, named after logician Leon Henkin). It turns out that under suitable metamathematical choices (where the Hilbert-Bernays provability conditions do not obtain), one can construct formally undecidable (or even formally refutable) Henkin-sentences for the arithmetical system under investigation. This system might very well be Hofstadter's Typographical Number Theory used in Gödel, Escher, Bach or the more familiar Peano Arithmetic or some other sufficiently rich formal arithmetic. Thus, there are examples of sentences "which say about themselves that they are provable", but they don't exhibit the sort of downward causal powers described in the displayed quote.

Examples

Hofstadter points to Bach's Canon per Tonos, M. C. Escher's drawings Waterfall, Drawing Hands, Ascending and Descending, and the liar paradox as examples that illustrate the idea of strange loops, which is expressed fully in the proof of Gödel's incompleteness theorem.

The "chicken or the egg" paradox is perhaps the best-known strange loop problem.

The "ouroboros", which depicts a dragon eating its own tail, is perhaps one of the most ancient and universal symbolic representations of the reflexive loop concept.

A Shepard tone is another illustrative example of a strange loop. Named after Roger Shepard, it is a sound consisting of a superposition of tones separated by octaves. When played with the base pitch of the tone moving upwards or downwards, it is referred to as the Shepard scale. This creates the auditory illusion of a tone that continually ascends or descends in pitch, yet which ultimately seems to get no higher or lower. In a similar way a sound with seemingly ever increasing tempo can be constructed, as was demonstrated by Jean-Claude Risset.

Visual illusions depicting strange loops include the Penrose stairs and the Barberpole illusion.

A quine in software programming is a program that produces a new version of itself without any input from the outside. A similar concept is metamorphic code.

Efron's dice are four dice that are intransitive under gambler's preference. I.e., the dice are ordered A > B > C > D > A, where x > y means "a gambler prefers x to y".

Individual preferences are always transitive, excluding preferences when given explicit rules such as in Efron's dice or rock-paper-scissors; however, aggregate preferences of a group may be intransitive. This can result in a Condorcet paradox wherein following a path from one candidate across a series of majority preferences may return back to the original candidate, leaving no clear preference by the group. In this case, some candidate beats an opponent, who in turn beats another opponent, and so forth, until a candidate is reached who beats the original candidate.

The liar paradox and Russell's paradox also involve strange loops, as does René Magritte's painting The Treachery of Images.

The mathematical phenomenon of polysemy has been observed to be a strange loop. At the denotational level, the term refers to situations where a single entity can be seen to mean more than one mathematical object. See Tanenbaum (1999).

The Stonecutter is an old Japanese fairy tale with a story that explains social and natural hierarchies as a strange loop.

Name mangling

From Wikipedia, the free encyclopedia

In compiler construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.

It provides a way of encoding additional information in the name of a function, structure, class or another data type in order to pass more semantic information from the compiler to the linker.

The need for name mangling arises where the language allows different entities to be named with the same identifier as long as they occupy a different namespace (typically defined by a module, class, or explicit namespace directive) or have different signatures (such as in function overloading). It is required in these use cases because each signature might require different, specialized calling convention in the machine code.

Any object code produced by compilers is usually linked with other pieces of object code (produced by the same or another compiler) by a type of program called a linker. The linker needs a great deal of information on each program entity. For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.

The simple programming languages of the 1970s, like C, only distinguished subroutines by their name, ignoring other information including parameter and return types. Later programming languages, like C++, defined stricter requirements for routines to be considered "equal", such as the parameter types, return type, and calling convention of a function. These requirements enable method overloading and detection of some bugs (such as using different definitions of a function when compiling different source files). These stricter requirements needed to work with existing tools and conventions; therefore, additional requirements were encoded in the name of the symbol, since that was the only information the traditional linker had about a symbol.

Another use of name mangling is for detecting additional non-signature related changes, such as function purity, or whether it can potentially throw an exception or trigger garbage collection. An example of a language doing this is D. These are more of a simplified error checking. For example, functions int f(); and int g(int) pure; could be compiled into one object file, but then their signatures changed to float f(); int g(int); and used to compile other source calling it. At link time the linker will detect there is no function f(int) and return an error. Similarly, the linker will not be able to detect that the return type of f is different, and return an error. Otherwise, incompatible calling conventions would be used, and most likely produce the wrong result or crash the program. Mangling doesn't usually capture every detail of the calling process. For example, it doesn't fully prevent errors like changes of data members of a struct or class. For example, struct S {}; void f(S) {} could be compiled into one object file, then the definition for S changed to be struct S { int x; }; and used in the compilation of a call to f(S()). In such cases, the compiler will usually use a different calling convention, but in both cases f will mangle to the same name, so the linker will not detect this problem, and the result will usually be a crash or data- or memory corruption at runtime.

Examples

C

Although name mangling is not generally required or used by languages that do not support function overloading, like C and classic Pascal, they use it in some cases to provide additional information about a function. For example, compilers targeted at Microsoft Windows platforms support a variety of calling conventions, which determine the manner in which parameters are sent to subroutines and results are returned. Because the different calling conventions are incompatible with one another, compilers mangle symbols with codes detailing which convention should be used to call the specific routine.

The mangling scheme for Windows was established by Microsoft and has been informally followed by other compilers including Digital Mars, Borland, and GNU GCC when compiling code for the Windows platforms. The scheme even applies to other languages, such as Pascal, D, Delphi, Fortran, and C#. This allows subroutines written in those languages to call, or be called by, existing Windows libraries using a calling convention different from their default.

When compiling the following C examples:

int _cdecl    f (int x) { return 0; }
int _stdcall  g (int y) { return 0; }
int _fastcall h (int z) { return 0; }

32-bit compilers emit, respectively:

_f
_g@4
@h@4

In the stdcall and fastcall mangling schemes, the function is encoded as _name@X and @name@X respectively, where X is the number of bytes, in decimal, of the argument(s) in the parameter list (including those passed in registers, for fastcall). In the case of cdecl, the function name is merely prefixed by an underscore.

The 64-bit convention on Windows (Microsoft C) has no leading underscore. This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits. For example, Fortran code can use 'alias' to link against a C method by name as follows:

SUBROUTINE f()
!DEC$ ATTRIBUTES C, ALIAS:'_f' :: f
END SUBROUTINE

This will compile and link fine under 32 bits, but generate an unresolved external _f under 64 bits. One workaround for this is not to use 'alias' at all (in which the method names typically need to be capitalized in C and Fortran). Another is to use the BIND option:

SUBROUTINE f() BIND(C,NAME="f")
END SUBROUTINE

In C, most compilers also mangle static functions and variables (and in C++ functions and variables declared static or put in the anonymous namespace) in translation units using the same mangling rules as for their non-static versions. If functions with the same name (and parameters for C++) are also defined and used in different translation units, it will also mangle to the same name, potentially leading to a clash. However, they will not be equivalent if they are called in their respective translation units. Compilers are usually free to emit arbitrary mangling for these functions, because it is illegal to access these from other translation units directly, so they will never need linking between different object code (linking of them is never needed). To prevent linking conflicts, compilers will use standard mangling, but will use so-called 'local' symbols. When linking many such translation units there might be multiple definitions of a function with the same name, but resulting code will only call one or another depending on which translation unit it came from. This is usually done using the relocation mechanism.

C++

C++ compilers are the most widespread users of name mangling. The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers that produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.

The C++ language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as classes, templates, namespaces, and operator overloading, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling (decorating) the name of a symbol. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.

Simple example

A single C++ translation unit might define two functions named f():

int  f () { return 1; }
int  f (int)  { return 0; }
void g () { int i = f(), j = f(0); }

These are distinct functions, with no relation to each other apart from the name. The C++ compiler will therefore encode the type information in the symbol name, the result being something resembling:

int  __f_v () { return 1; }
int  __f_i (int)  { return 0; } 
void __g_v () { int i = __f_v(), j = __f_i(0); }

Even though its name is unique, g() is still mangled: name mangling applies to all C++ symbols (except for those in an extern "C"{} block).

Complex example

The mangled symbols in this example, in the comments below the respective identifier name, are those produced by the GNU GCC 3.x compilers, according to the IA-64 (Itanium) ABI:

namespace wikipedia 
{
   class article 
   {
   public:
      std::string format ();  // = _ZN9wikipedia7article6formatEv

      bool print_to (std::ostream&);  // = _ZN9wikipedia7article8print_toERSo

      class wikilink 
      {
      public:
         wikilink (std::string const& name);  // = _ZN9wikipedia7article8wikilinkC1ERKSs
      };
   };
}

All mangled symbols begin with _Z (note that an identifier beginning with an underscore followed by a capital letter is a reserved identifier in C, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of <length, id> pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes:

_ZN9wikipedia7article6formatE

For functions, this is then followed by the type information; as format() is a void function, this is simply v; hence:

_ZN9wikipedia7article6formatEv

For print_to, the standard type std::ostream (which is a typedef for std::basic_ostream<char, std::char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:

_ZN9wikipedia7article8print_toERSo

How different compilers mangle the same functions

There isn't a standardized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:

Compiler void h(int) void h(int, char) void h(void)
Intel C++ 8.0 for Linux _Z1hi _Z1hic _Z1hv
HP aC++ A.05.55 IA-64
IAR EWARM C++
GCC 3.x and higher
Clang 1.x and higher
GCC 2.9.x h__Fi h__Fic h__Fv
HP aC++ A.03.45 PA-RISC
Microsoft Visual C++ v6-v10 (mangling details) ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
Digital Mars C++
Borland C++ v3.1 @h$qi @h$qizc @h$qv
OpenVMS C++ v6.5 (ARM mode) H__XI H__XIC H__XV
OpenVMS C++ v6.5 (ANSI mode)
CXX$__7H__FIC26CDH77 CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 CXX$_Z1HI2DSQ26A CXX$_Z1HIC2NP3LI4 CXX$_Z1HV0BCA19V
SunPro CC __1cBh6Fi_v_ __1cBh6Fic_v_ __1cBh6F_v_
Tru64 C++ v6.5 (ARM mode) h__Xi h__Xic h__Xv
Tru64 C++ v6.5 (ANSI mode) __7h__Fi __7h__Fic __7h__Fv
Watcom C++ 10.6 W?h$n(i)v W?h$n(ia)v W?h$n()v

Notes:

  • The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as the ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (ARM). With the advent of new features in standard C++, particularly templates, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identically mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backward compatible.
  • On IA-64, a standard Application Binary Interface (ABI) exists (see external links), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x, in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
  • The Visual Studio and Windows SDK include the program undname which prints the C-style function prototype for a given mangled name.
  • On Microsoft Windows, the Intel compiler and Clang uses the Visual C++ name mangling for compatibility.

Handling of C symbols when linking from C++

The job of the common C++ idiom:

#ifdef __cplusplus 
extern "C" {
#endif
    /* ... */
#ifdef __cplusplus
}
#endif

is to ensure that the symbols within are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.

For example, the standard strings library, <string.h>, usually contains something resembling:

#ifdef __cplusplus
extern "C" {
#endif

void *memset (void *, int, size_t);
char *strcat (char *, const char *);
int   strcmp (const char *, const char *);
char *strcpy (char *, const char *);

#ifdef __cplusplus
}
#endif

Thus, code such as:

if (strcmp(argv[1], "-x") == 0) 
    strcpy(a, argv[2]);
else 
    memset (a, 0, sizeof(a));

uses the correct, unmangled strcmp and memset. If the extern "C" had not been used, the (SunPro) C++ compiler would produce code equivalent to:

if (__1cGstrcmp6Fpkc1_i_(argv[1], "-x") == 0) 
    __1cGstrcpy6Fpcpkc_0_(a, argv[2]);
else 
    __1cGmemset6FpviI_0_ (a, 0, sizeof(a));

Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.

Standardized name mangling in C++

It would seem that standardized name mangling in the C++ language would lead to greater interoperability between compiler implementations. However, such a standardization by itself would not suffice to guarantee C++ compiler interoperability and it might even create a false impression that interoperability is possible and safe when it isn't. Name mangling is only one of several application binary interface (ABI) details that need to be decided and observed by a C++ implementation. Other ABI aspects like exception handling, virtual table layout, structure, and stack frame padding also cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g., length of symbols) dictate a particular mangling scheme. A standardized requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker that understood the C++ language.

The C++ standard therefore does not attempt to standardize name mangling. On the contrary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI are incompatible.

Nevertheless, as detailed in the section above, on some platforms the full C++ ABI has been standardized, including name mangling.

Real-world effects of C++ name mangling

Because C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g., GNU GCC and the OS vendor's compiler) wished to install the Boost C++ Libraries, it would have to be compiled multiple times (once for GCC and once for the vendor compiler).

It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g., classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).

For this reason, name decoration is an important aspect of any C++-related ABI.

There are instances, particularly in large, complex code bases, where it can be difficult or impractical to map the mangled name emitted within a linker error message back to the particular corresponding token/variable-name in the source. This problem can make identifying the relevant source file(s) very difficult for build or test engineers even if only one compiler and linker are in use. Demanglers (including those within the linker error reporting mechanisms) sometimes help but the mangling mechanism itself may discard critical disambiguating information.

Demangle via c++filt

$ c++filt -n _ZNK3MapI10StringName3RefI8GDScriptE10ComparatorIS0_E16DefaultAllocatorE3hasERKS0_
Map<StringName, Ref<GDScript>, Comparator<StringName>, DefaultAllocator>::has(StringName const&) const

Demangle via builtin GCC ABI

#include <stdio.h>
#include <stdlib.h>
#include <cxxabi.h>

int main() {
	const char *mangled_name = "_ZNK3MapI10StringName3RefI8GDScriptE10ComparatorIS0_E16DefaultAllocatorE3hasERKS0_";
	int status = -1;
	char *demangled_name = abi::__cxa_demangle(mangled_name, NULL, NULL, &status);
	printf("Demangled: %s\n", demangled_name);
	free(demangled_name);
	return 0;
}

Output:

Demangled: Map<StringName, Ref<GDScript>, Comparator<StringName>, DefaultAllocator>::has(StringName const&) const

Java

In Java, the signature of a method or a class contains its name and the types of its method arguments and return value, where applicable. The format of signatures is documented, as the language, compiler, and .class file format were all designed together (and had object-orientation and universal interoperability in mind from the start).

Creating unique names for inner and anonymous classes

The scope of anonymous classes is confined to their parent class, so the compiler must produce a "qualified" public name for the inner class, to avoid conflict where other classes with the same name (inner or not) exist in the same namespace. Similarly, anonymous classes must have "fake" public names generated for them (as the concept of anonymous classes only exists in the compiler, not the runtime). So, compiling the following java program

public class foo {
    class bar {
        public int x;
    }

    public void zark () {
        Object f = new Object () {
            public String toString() {
                return "hello";
            }
        };
    }
}

will produce three .class files:

  • foo.class, containing the main (outer) class foo
  • foo$bar.class, containing the named inner class foo.bar
  • foo$1.class, containing the anonymous inner class (local to method foo.zark)

All of these class names are valid (as $ symbols are permitted in the JVM specification) and these names are "safe" for the compiler to generate, as the Java language definition advises not to use $ symbols in normal java class definitions.

Name resolution in Java is further complicated at runtime, as fully qualified class names are unique only inside a specific classloader instance. Classloaders are ordered hierarchically and each Thread in the JVM has a so-called context class loader, so in cases where two different classloader instances contain classes with the same name, the system first tries to load the class using the root (or system) classloader and then goes down the hierarchy to the context class loader.

Java Native Interface

Java's native method support allows Java language programs to call out to programs written in another language (generally either C or C++). There are two name-resolution concerns here, neither of which is implemented in a particularly standard manner:

  • JVM to native name translation - this seems to be more stable, since Oracle makes its scheme public.
  • Normal C++ name mangling - see above.

Python

In Python, mangling is used for class attributes that one does not want subclasses to use which are designated as such by giving them a name with two or more leading underscores and no more than one trailing underscore. For example, __thing will be mangled, as will ___thing and __thing_, but __thing__ and __thing___ will not. Python's runtime does not restrict access to such attributes, the mangling only prevents name collisions if a derived class defines an attribute with the same name.

On encountering name mangled attributes, Python transforms these names by prepending a single underscore and the name of the enclosing class, for example:

>>> class Test:
...     def __mangled_name(self):
...         pass
...     def normal_name(self):
...         pass
>>> t = Test()
>>> [attr for attr in dir(t) if "name" in attr]
['_Test__mangled_name', 'normal_name']

Pascal

Borland's Turbo Pascal / Delphi range

To avoid name mangling in Pascal, use:

exports
  myFunc name 'myFunc',
  myProc name 'myProc';

Free Pascal

Free Pascal supports function and operator overloading, thus it also uses name mangling to support these features. On the other hand, Free Pascal is capable of calling symbols defined in external modules created with another language and exporting its own symbols to be called by another language. For further information, consult Chapter 6.2 and 7.1 of Free Pascal Programmer's Guide.

Fortran

Name mangling is also necessary in Fortran compilers, originally because the language is case insensitive. Further mangling requirements were imposed later in the evolution of the language because of the addition of modules and other features in the Fortran 90 standard. The case mangling, especially, is a common issue that must be dealt with in order to call Fortran libraries, such as LAPACK, from other languages, such as C.

Because of the case insensitivity, the name of a subroutine or function FOO must be converted to a standardized case and format by the compiler so that it will be linked in the same way regardless of case. Different compilers have implemented this in various ways, and no standardization has occurred. The AIX and HP-UX Fortran compilers convert all identifiers to lower case foo, while the Cray and Unicos Fortran compilers converted identifiers to all upper case FOO. The GNU g77 compiler converts identifiers to lower case plus an underscore foo_, except that identifiers already containing an underscore FOO_BAR have two underscores appended foo_bar__, following a convention established by f2c. Many other compilers, including SGI's IRIX compilers, GNU Fortran, and Intel's Fortran compiler (except on Microsoft Windows), convert all identifiers to lower case plus an underscore (foo_ and foo_bar_, respectively). On Microsoft Windows, the Intel Fortran compiler defaults to uppercase without an underscore.

Identifiers in Fortran 90 modules must be further mangled, because the same procedure name may occur in different modules. Since the Fortran 2003 Standard requires that module procedure names not conflict with other external symbols, compilers tend to use the module name and the procedure name, with a distinct marker in between. For example:

module m 
contains
   integer function five()
      five = 5
   end function five
end module m

In this module, the name of the function will be mangled as __m_MOD_five (e.g., GNU Fortran), m_MP_five_ (e.g., Intel's ifort), m.five_ (e.g., Oracle's sun95), etc. Since Fortran does not allow overloading the name of a procedure, but uses generic interface blocks and generic type-bound procedures instead, the mangled names do not need to incorporate clues about the arguments.

The Fortran 2003 BIND option overrides any name mangling done by the compiler, as shown above.

Rust

Function names are mangled by default in Rust. However, this can be disabled by the #[no_mangle] function attribute. This attribute can be used to export functions to C, C++, or Objective-C. Additionally, along with the #[start] function attribute or the #[no_main] crate attribute, it allows the user to define a C-style entry point for the program.

Rust has used many versions of symbol mangling schemes that can be selected at compile time with an -Z symbol-mangling-version option. The following manglers are defined:

  • legacy A C++ style mangling based on the Itanium IA-64 C++ ABI. Symbols begin with _ZN, and filename hashes are used for disambiguation. Used since Rust 1.9.
  • v0 An improved version of the legacy scheme, with changes for Rust. Symbols begin with _R. Polymorphism can be encoded. Functions don't have return types encoded (Rust does not have overloading). Unicode names use modified punycode. Compression (backreference) use byte-based addressing. Used since Rust 1.37.

Examples are provided in the Rust symbol-names tests.

Objective-C

Essentially two forms of method exist in Objective-C, the class ("static") method, and the instance method. A method declaration in Objective-C is of the following form:

+ (return-type) name0:parameter0 name1:parameter1 ...
– (return-type) name0:parameter0 name1:parameter1 ...

Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:

+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;

With instance methods looking like this:

- (id) value;
- (id) setValue: (id) new_value;

Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme for class methods:

_c_Class_name0_name1_ ...

and this for instance methods:

_i_Class_name0_name1_ ...

The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method + (id) initWithX: (int) number andY: (int) number;, if belonging to the Point class would translate as _c_Point_initWithX_andY_, and the instance method (belonging to the same class) - (id) value; would translate to _i_Point_value.

Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a selector. In Objective-C, one can manage selectors directly — they have a specific type in Objective-C — SEL.

During compilation, a table is built that maps the textual representation, such as _i_Point_value, to selectors (which are given a type SEL). Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to — different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too, these are known as implementation pointers, and are also given a type, IMP.

Message sends are encoded by the compiler as calls to the id objc_msgSend (id receiver, SEL selector, ...) function, or one of its cousins, where receiver is the receiver of the message, and SEL determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implementation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the SEL to IMP lookup tables, the functions are essentially anonymous.

The SEL value for a selector does not vary between classes. This enables polymorphism.

The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.

Since Objective-C does not support namespaces, there is no need for the mangling of class names (that do appear as symbols in generated binaries).

Swift

Swift keeps metadata about functions (and more) in the mangled symbols referring to them. This metadata includes the function's name, attributes, module name, parameter types, return type, and more. For example:

The mangled name for a method func calculate(x: int) -> int of a MyClass class in module test is _TFC4test7MyClass9calculatefS0_FT1xSi_Si, for 2014 Swift. The components and their meanings are as follows:

  • _T: The prefix for all Swift symbols. Everything will start with this.
  • F: Non-curried function.
  • C: Function of a class, i.e. a method
  • 4test: Module name, prefixed with its length.
  • 7MyClass: Name of class the function belongs to, prefixed with its length.
  • 9calculate: Function name, prefixed with its length.
  • f: The function attribute. In this case ‘f’, which means a normal function.
  • S0: Designates the type of the first parameter (namely the class instance) as the first in the type stack (here MyClass is not nested and thus has index 0).
  • _FT: This begins the type list for the parameter tuple of the function.
  • 1x: External name of first parameter of the function.
  • Si: Indicates builtin Swift type Swift.Int for the first parameter.
  • _Si: The return type: again Swift.Int.

Mangling for versions since Swift 4.0 is documented officially. It retains some similarity to Itanium.

Cellular automaton

From Wikipedia, the free encyclopedia https://en.wikipedi...