A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java,
requiring more sophisticated techniques. Decompilers are usually unable
to perfectly reconstruct the original source code, thus will frequently
produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.
Introduction
The term decompiler is most commonly applied to a program which translatesexecutable programs (the output from a compiler) into source code in a (relatively) high level language
which, when compiled, will produce an executable whose behavior is the
same as the original executable program. By comparison, a disassembler translates an executable program into assembly language (and an assembler could be used for assembling it back into an executable program).
Decompilation is the act of using a decompiler, although the term
can also refer to the output of a decompiler. It can be used for the
recovery of lost source code, and is also useful in some cases for computer security, interoperability and error correction.
The success of decompilation depends on the amount of information
present in the code being decompiled and the sophistication of the
analysis performed on it. The bytecode formats used by many virtual
machines (such as the Java Virtual Machine or the .NET FrameworkCommon Language Runtime) often include extensive metadata and high-level features that make decompilation quite feasible. The application of debug data, i.e. debug-symbols, may enable to reproduce the original names of variables and structures and even the line numbers. Machine language without such metadata or debug data is much harder to decompile.
Some compilers and post-compilation tools produce obfuscated code
(that is, they attempt to produce output that is very difficult to
decompile, or that decompiles to confusing output). This is done to make
it more difficult to reverse engineer the executable.
While decompilers are normally used to (re-)create source code
from binary executables, there are also decompilers to turn specific
binary data files into human-readable and editable sources.
The success level achieved by decompilers can be impacted by various factors. These include the abstraction
level of the source language, if the object code contains explicit
class structure information, it aids the decompilation process.
Descriptive information, especially with naming details, also
accelerates the compiler's work. Moreover, less optimized code is
quicker to decompile since optimization can cause greater deviation from
the original code.
Design
Decompilers
can be thought of as composed of a series of phases each of which
contributes specific aspects of the overall decompilation process.
Loader
The first decompilation phase loads and parses the input machine code or intermediate language program's binary file
format. It should be able to discover basic facts about the input
program, such as the architecture (Pentium, PowerPC, etc.) and the entry
point. In many cases, it should be able to find the equivalent of the main function of a C program, which is the start of the user written
code. This excludes the runtime initialization code, which should not
be decompiled if possible. If available the symbol tables and debug data
are also loaded. The front end may be able to identify the libraries
used even if they are linked with the code, this will provide library
interfaces. If it can determine the compiler or compilers used it may
provide useful information in identifying code idioms.
Disassembly
The next logical phase is the disassembly of machine code instructions into a machine independent intermediate representation (IR). For example, the Pentium machine instruction
moveax,[ebx+0x04]
might be translated to the IR
eax:=m[ebx+4];
Idioms
Idiomatic
machine code sequences are sequences of code whose combined semantics
are not immediately apparent from the instructions' individual
semantics. Either as part of the disassembly phase, or as part of later
analyses, these idiomatic sequences need to be translated into known
equivalent IR. For example, the x86 assembly code:
cdqeax; edx is set to the sign-extension≠edi,edi +(tex)pushxoreax,edxsubeax,edx
could be translated to
eax := abs(eax);
Some idiomatic sequences are machine independent; some involve only one instruction. For example, xoreax,eax clears the eax register (sets it to zero). This can be implemented with a machine independent simplification rule, such as a = 0.
In general, it is best to delay detection of idiomatic sequences
if possible, to later stages that are less affected by instruction
ordering. For example, the instruction scheduling phase of a compiler
may insert other instructions into an idiomatic sequence, or change the
ordering of instructions in the sequence. A pattern matching process in
the disassembly phase would probably not recognize the altered pattern.
Later phases group instruction expressions into more complex
expressions, and modify them into a canonical (standardized) form,
making it more likely that even the altered idiom will match a higher
level pattern later in the decompilation.
Various
program analyses can be applied to the IR. In particular, expression
propagation combines the semantics of several instructions into more
complex expressions. For example,
could result in the following IR after expression propagation:
m[ebx+12] := m[ebx+12] - (m[ebx+4] + m[ebx+8]);
The resulting expression is more like high level language, and has also eliminated the use of the machine register eax. Later analyses may eliminate the ebx register.
Data flow analysis
The places where register contents are defined and used must be traced using data flow analysis.
The same analysis can be applied to locations that are used for
temporaries and local data. A different name can then be formed for each
such connected set of value definitions and uses. It is possible that
the same local variable location was used for more than one variable in
different parts of the original program. Even worse it is possible for
the data flow analysis to identify a path whereby a value may flow
between two such uses even though it would never actually happen or
matter in reality. This may in bad cases lead to needing to define a
location as a union of types. The decompiler may allow the user to
explicitly break such unnatural dependencies which will lead to clearer
code. This of course means a variable is potentially used without being
initialized and so indicates a problem in the original program.
Type analysis
A
good machine code decompiler will perform type analysis. Here, the way
registers or memory locations are used result in constraints on the
possible type of the location. For example, an and instruction implies that the operand is an integer; programs do not use such an operation on floating point values (except in special library code) or on pointers. An add
instruction results in three constraints, since the operands may be
both integer, or one integer and one pointer (with integer and pointer
results respectively; the third constraint comes from the ordering of
the two operands when the types are different).
Various high level expressions can be recognized which trigger
recognition of structures or arrays. However, it is difficult to
distinguish many of the possibilities, because of the freedom that
machine code or even some high level languages such as C allow with
casts and pointer arithmetic.
The example from the previous section could result in the following high level code:
The penultimate decompilation phase involves structuring of the IR into higher level constructs such as while loops and if/then/else conditional statements. For example, the machine code
Unstructured code is more difficult to translate into structured code
than already structured code. Solutions include replicating some code,
or adding boolean variables.
Code generation
The
final phase is the generation of the high level code in the back end of
the decompiler. Just as a compiler may have several back ends for
generating machine code for different architectures, a decompiler may
have several back ends for generating high level code in different high
level languages.
Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps using some form of graphical user interface.
This would allow the user to enter comments, and non-generic variable
and function names. However, these are almost as easily entered in a
post decompilation edit. The user may want to change structural aspects,
such as converting a while loop to a for loop. These are less readily modified with a simple text editor, although source code refactoring
tools may assist with this process. The user may need to enter
information that failed to be identified during the type analysis phase,
e.g. modifying a memory expression to an array or structure expression.
Finally, incorrect IR may need to be corrected, or changes made to
cause the output code to be more readable.
Other techniques
Decompilers using neural networks have been developed. Such a decompiler may be trained by machine learning to improve its accuracy over time.
Legality
The majority of computer programs are covered by copyright
laws. Although the precise scope of what is covered by copyright
differs from region to region, copyright law generally provides the
author (the programmer(s) or employer) with a collection of exclusive
rights to the program. These rights include the right to make copies, including copies made into the computer’s RAM (unless creating such a copy is essential for using the program).
Since the decompilation process involves making multiple such copies, it
is generally prohibited without the authorization of the copyright
holder. However, because decompilation is often a necessary step in
achieving software interoperability, copyright laws in both the United States and Europe permit decompilation to a limited extent.
In the United States, the copyright fair use defence has been successfully invoked in decompilation cases. For example, in Sega v. Accolade,
the court held that Accolade could lawfully engage in decompilation in
order to circumvent the software locking mechanism used by Sega's game
consoles. Additionally, the Digital Millennium Copyright Act (PUBLIC LAW 105–304) has proper exemptions for both Security Testing and Evaluation in §1201(i), and Reverse Engineering in §1201(f).
In Europe, the 1991 Software Directive
explicitly provides for a right to decompile in order to achieve
interoperability. The result of a heated debate between, on the one
side, software protectionists, and, on the other, academics as well as
independent software developers, Article 6 permits decompilation only if
a number of conditions are met:
First, a person or entity must have a licence to use the program to be decompiled.
Second, decompilation must be necessary to achieve interoperability
with the target program or other programs. Interoperability information
should therefore not be readily available, such as through manuals or API
documentation. This is an important limitation. The necessity must be
proven by the decompiler. The purpose of this important limitation is
primarily to provide an incentive for developers to document and
disclose their products' interoperability information.
Third, the decompilation process must, if possible, be confined to
the parts of the target program relevant to interoperability. Since one
of the purposes of decompilation is to gain an understanding of the
program structure, this third limitation may be difficult to meet.
Again, the burden of proof is on the decompiler.
In addition, Article 6 prescribes that the information obtained
through decompilation may not be used for other purposes and that it may
not be given to others.
Overall, the decompilation right provided by Article 6 codifies
what is claimed to be common practice in the software industry. Few
European lawsuits are known to have emerged from the decompilation
right. This could be interpreted as meaning one of three things:
) the decompilation right is not used frequently and the decompilation right may therefore have been unnecessary,
) the decompilation right functions well and provides sufficient legal certainty not to give rise to legal disputes or
) illegal decompilation goes largely undetected.
In a report of 2000 regarding implementation of the Software Directive by the European member states, the European Commission seemed to support the second interpretation.
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software
accomplishes a task with very little (if any) insight into exactly how
it does so. Depending on the system under consideration and the
technologies employed, the knowledge gained during reverse engineering
can help with repurposing obsolete objects, doing security analysis, or
learning how something works.
Although the process is specific to the object on which it is
being performed, all reverse engineering processes consist of three
basic steps: information extraction, modeling, and review. Information
extraction is the practice of gathering all relevant information for
performing the operation. Modeling is the practice of combining the
gathered information into an abstract model, which can be used as a
guide for designing the new object or system. Review is the testing of
the model to ensure the validity of the chosen abstract. Reverse engineering is applicable in the fields of computer engineering, mechanical engineering, design, electronic engineering, software engineering, chemical engineering, and systems biology.
Overview
There
are many reasons for performing reverse engineering in various fields.
Reverse engineering has its origins in the analysis of hardware for
commercial or military advantage.
However, the reverse engineering process may not always be concerned
with creating a copy or changing the artifact in some way. It may be
used as part of an analysis to deduce
design features from products with little or no additional knowledge
about the procedures involved in their original production.
In some cases, the goal of the reverse engineering process can simply be a redocumentation of legacy systems. Even when the reverse-engineered product is that of a competitor, the goal may not be to copy it but to perform competitor analysis. Reverse engineering may also be used to create interoperable products
and despite some narrowly-tailored United States and European Union
legislation, the legality of using specific reverse engineering
techniques for that purpose has been hotly contested in courts worldwide
for more than two decades.
Software
reverse engineering can help to improve the understanding of the
underlying source code for the maintenance and improvement of the
software, relevant information can be extracted to make a decision for
software development and graphical representations of the code can
provide alternate views regarding the source code, which can help to
detect and fix a software bug
or vulnerability. Frequently, as some software develops, its design
information and improvements are often lost over time, but that lost
information can usually be recovered with reverse engineering. The
process can also help to cut down the time required to understand the
source code, thus reducing the overall cost of the software development.
Reverse engineering can also help to detect and to eliminate a
malicious code written to the software with better code detectors.
Reversing a source code can be used to find alternate uses of the source
code, such as detecting the unauthorized replication of the source code
where it was not intended to be used, or revealing how a competitor's
product was built. That process is commonly used for "cracking" software and media to remove their copy protection, or to create a possibly-improved copy or even a knockoff, which is usually the goal of a competitor or a hacker.
Interfacing. Reverse engineering can be used when a
system is required to interface to another system and how both systems
would negotiate is to be established. Such requirements typically exist
for interoperability.
Military or commercialespionage.
Learning about an enemy's or competitor's latest research by stealing
or capturing a prototype and dismantling it may result in the
development of a similar product or a better countermeasure against it.
Obsolescence. Integrated circuits
are often designed on proprietary systems and built on production
lines, which become obsolete in only a few years. When systems using
those parts can no longer be maintained since the parts are no longer
made, the only way to incorporate the functionality into new technology
is to reverse-engineer the existing chip and then to redesign
it using newer tools by using the understanding gained as a guide.
Another obsolescence originated problem that can be solved by reverse
engineering is the need to support (maintenance and supply for
continuous operation) existing legacy devices that are no longer
supported by their original equipment manufacturer. The problem is particularly critical in military operations.
Product security analysis. That examines how a product works
by determining the specifications of its components and estimate costs
and identifies potential patent infringement.
Also part of product security analysis is acquiring sensitive data by
disassembling and analyzing the design of a system component. Another intent may be to remove copy protection or to circumvent access restrictions.
Competitive technical intelligence. That is to understand what one's competitor is actually doing, rather than what it says that it is doing.
Saving money. Finding out what a piece of electronics can do may spare a user from purchasing a separate product.
Repurposing. Obsolete objects are then reused in a different-but-useful manner.
Design.
Production and design companies applied Reverse Engineering to
practical craft-based manufacturing process. The companies can work on
"historical" manufacturing collections through 3D scanning, 3D
re-modeling and re-design. In 2013 Italian manufactures Baldi and Savio Firmino together with University of Florence optimized their innovation, design, and production processes.
Common uses
Machines
As computer-aided design
(CAD) has become more popular, reverse engineering has become a viable
method to create a 3D virtual model of an existing physical part for use
in 3D CAD, CAM, CAE, or other software.
The reverse-engineering process involves measuring an object and then
reconstructing it as a 3D model. The physical object can be measured
using 3D scanning technologies like CMMs, laser scanners, structured light digitizers, or industrial CT scanning (computed tomography). The measureddata alone, usually represented as a point cloud, lacks topologicalinformation and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh.
Reverse engineering aims to go beyond producing such a mesh and to
recover the design intent in terms of simple analytical surfaces where
appropriate (planes, cylinders, etc.) as well as possibly NURBS surfaces to produce a boundary-representation
CAD model. Recovery of such a model allows a design to be modified to
meet new requirements, a manufacturing plan to be generated, etc.
Hybrid modeling is a commonly used term when NURBS and parametric modeling are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling.
Areas of freeform data can be combined with exact geometric surfaces to
create a hybrid model. A typical example of this would be the reverse
engineering of a cylinder head, which includes freeform cast features,
such as water jackets and high-tolerance machined areas.
Reverse engineering is also used by businesses to bring existing
physical geometry into digital product development environments, to make
a digital 3D record of their own products, or to assess competitors'
products. It is used to analyze how a product works, what it does, what
components it has; estimate costs; identify potential patent infringement; etc.
Value engineering,
a related activity that is also used by businesses, involves
deconstructing and analyzing products. However, the objective is to find
opportunities for cost-cutting.
Reverse engineering of printed circuit boards
involves recreating fabrication data for a particular circuit board.
This is done primarily to identify a design, and learn the functional
and structural characteristics of a design. It also allows for the
discovery of the design principles behind a product, especially if this
design information is not easily available.
Outdated PCBs are often subject to reverse engineering,
especially when they perform highly critical functions such as powering
machinery, or other electronic components. Reverse engineering these old
parts can allow the reconstruction of the PCB if it performs some
crucial task, as well as finding alternatives which provide the same
function, or in upgrading the old PCB.
Reverse engineering PCBs largely follow the same series of steps.
First, images are created by drawing, scanning, or taking photographs
of the PCB. Then, these images are ported to suitable reverse
engineering software in order to create a rudimentary design for the new
PCB. The quality of these images that is necessary for suitable reverse
engineering is proportional to the complexity of the PCB itself. More
complicated PCBs require well lighted photos on dark backgrounds, while
fairly simple PCBs can be recreated simply with just basic dimensioning.
Each layer of the PCB is carefully recreated in the software with the
intent of producing a final design as close to the initial. Then, the
schematics for the circuit are finally generated using an appropriate
tool.
Software
In 1990, the Institute of Electrical and Electronics Engineers
(IEEE) defined (software) reverse engineering (SRE) as "the process of
analyzing a
subject system to identify the system's components and their
interrelationships and to create representations of the system in
another form or at a higher
level of abstraction" in which the "subject system" is the end product
of software development. Reverse engineering is a process of examination
only, and the software system under consideration is not modified,
which would otherwise be re-engineering
or restructuring. Reverse engineering can be performed from any stage
of the product cycle, not necessarily from the functional end product.
There are two components in reverse engineering: redocumentation
and design recovery. Redocumentation is the creation of new
representation of the computer code so that it is easier to understand.
Meanwhile, design recovery is the use of deduction or reasoning from
general knowledge or personal experience of the product to understand
the product's functionality fully. It can also be seen as "going backwards through the development cycle".
In this model, the output of the implementation phase (in source code
form) is reverse-engineered back to the analysis phase, in an inversion
of the traditional waterfall model. Another term for this technique is program comprehension.
The Working Conference on Reverse Engineering (WCRE) has been held
yearly to explore and expand the techniques of reverse engineering. Computer-aided software engineering (CASE) and automated code generation have contributed greatly in the field of reverse engineering.
Software anti-tamper technology like obfuscation
is used to deter both reverse engineering and re-engineering of
proprietary software and software-powered systems. In practice, two main
types of reverse engineering emerge. In the first case, source code is
already available for the software, but higher-level aspects of the
program, which are perhaps poorly documented or documented but no longer
valid, are discovered. In the second case, there is no source code
available for the software, and any efforts towards discovering one
possible source code for the software are regarded as reverse
engineering. The second usage of the term is more familiar to most
people. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.
On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API but has the goals to find bugs and undocumented features by bashing the product from outside.
Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking"), circumvention of access restrictions often present in consumer electronics, customization of embedded systems
(such as engine management systems), in-house repairs or retrofits,
enabling of additional features on low-cost "crippled" hardware (such as
some graphics card chip-sets), or even mere satisfaction of curiosity.
Binary software
Binary reverse engineering is performed if source code for a software is unavailable. This process is sometimes termed reverse code engineering, or RCE. For example, decompilation of binaries for the Java platform can be accomplished by using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PCBIOS, which launched the historic IBM PC compatible industry that has been the overwhelmingly-dominant computer hardware platform for many years. Reverse engineering of software is protected in the US by the fair use exception in copyright law. The Samba software, which allows systems that do not run Microsoft Windows systems to share files with systems that run it, is a classic example of software reverse engineering
since the Samba project had to reverse-engineer unpublished information
about how Windows file sharing worked so that non-Windows computers
could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing that for the Microsoft Office file formats. The ReactOS
project is even more ambitious in its goals by striving to provide
binary (ABI and API) compatibility with the current Windows operating
systems of the NT branch, which allows software and drivers written for
Windows to run on a clean-room reverse-engineered free software (GPL) counterpart. WindowsSCOPE
allows for reverse-engineering the full contents of a Windows system's
live memory including a binary-level, graphical reverse engineering of
all running processes.
Another classic, if not well-known, example is that in 1987 Bell Laboratories reverse-engineered the Mac OS System 4.1, originally running on the Apple Macintosh SE, so that it could run it on RISC machines of their own.
Binary software techniques
Reverse engineering of software can be accomplished by various methods.
The three main groups of software reverse engineering are
Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, such as for accessing a computer bus or computer network
connection and revealing the traffic data thereon. Bus or network
behavior can then be analyzed to produce a standalone implementation
that mimics that behavior. That is especially useful for reverse
engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. It works on any computer program but can take quite some time, especially for those who are not used to machine code. The Interactive Disassembler is a particularly popular tool.
Decompilation using a decompiler,
a process that tries, with varying results, to recreate the source code
in some high-level language for a program only available in machine
code or bytecode.
Software classification
Software
classification is the process of identifying similarities between
different software binaries (such as two different versions of the same
binary) used to detect code relations between software samples. The task
was traditionally done manually for several reasons (such as patch
analysis for vulnerability detection and copyright infringement), but it can now be done somewhat automatically for large numbers of samples.
This method is being used mostly for long and thorough reverse
engineering tasks (complete analysis of a complex algorithm or big piece
of software). In general, statistical classification
is considered to be a hard problem, which is also true for software
classification, and so few solutions/tools that handle this task well.
Source code
A number of UML tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering". See List of UML tools.
Although UML is one approach in providing "reverse engineering"
more recent advances in international standards activities have resulted
in the development of the Knowledge Discovery Metamodel
(KDM). The standard delivers an ontology for the intermediate (or
abstracted) representation of programming language constructs and their
interrelationships. An Object Management Group standard (on its way to becoming an ISO standard as well),
KDM has started to take hold in industry with the development of tools
and analysis environments that can deliver the extraction and analysis
of source, binary, and byte code. For source code analysis, KDM's
granular standards' architecture enables the extraction of software
system flows (data, control, and call maps), architectures, and business
layer knowledge (rules, terms, and process). The standard enables the
use of a common data format (XMI) enabling the correlation of the
various layers of system knowledge for either detailed analysis (such as
root cause, impact) or derived analysis (such as business process
extraction). Although efforts to represent language constructs can be
never-ending because of the number of languages, the continuous
evolution of software languages, and the development of new languages,
the standard does allow for the use of extensions to support the broad
language set as well as evolution. KDM is compatible with UML, BPMN,
RDF, and other standards enabling migration into other environments and
thus leverage system knowledge for efforts such as software system
transformation and enterprise business layer analysis.
Protocols
Protocols are sets of rules that describe message formats and how messages are exchanged: the protocol state machine.
Accordingly, the problem of protocol reverse-engineering can be
partitioned into two subproblems: message format and state-machine
reverse-engineering.
The message formats have traditionally been reverse-engineered by
a tedious manual process, which involved analysis of how protocol
implementations process messages, but recent research proposed a number
of automatic solutions. Typically, the automatic approaches group observe messages into clusters by using various clustering analyses, or they emulate the protocol implementation tracing the message processing.
There has been less work on reverse-engineering of state-machines
of protocols. In general, the protocol state-machines can be learned
either through a process of offline learning,
which passively observes communication and attempts to build the most
general state-machine accepting all observed sequences of messages, and online learning,
which allows interactive generation of probing sequences of messages
and listening to responses to those probing sequences. In general,
offline learning of small state-machines is known to be NP-complete, but online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al. and an online approach by Cho et al.
Other components of typical protocols, like encryption and hash
functions, can be reverse-engineered automatically as well. Typically,
the automatic approaches trace the execution of protocol implementations
and try to detect buffers in memory holding unencrypted packets.
Integrated circuits/smart cards
Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope
(SEM). That technique can reveal the complete hardware and software
part of the smart card. The major problem for the attacker is to bring
everything into the right order to find out how everything works. The
makers of the card try to hide keys and operations by mixing up memory
positions, such as by bus scrambling.
In some cases, it is even possible to attach a probe to measure
voltages while the smart card is still operational. The makers of the
card employ sensors to detect and prevent that attack.
That attack is not very common because it requires both a large
investment in effort and special equipment that is generally available
only to large chip manufacturers. Furthermore, the payoff from this
attack is low since other security techniques are often used such as
shadow accounts. It is still uncertain whether attacks against
chip-and-PIN cards to replicate encryption data and then to crack PINs
would provide a cost-effective attack on multifactor authentication.
Full reverse engineering proceeds in several major steps.
The first step after images have been taken with a SEM is
stitching the images together, which is necessary because each layer
cannot be captured by a single shot. A SEM needs to sweep across the
area of the circuit and take several hundred images to cover the entire
layer. Image stitching takes as input several hundred pictures and
outputs a single properly-overlapped picture of the complete layer.
Next, the stitched layers need to be aligned because the sample,
after etching, cannot be put into the exact same position relative to
the SEM each time. Therefore, the stitched versions will not overlap in
the correct fashion, as on the real circuit. Usually, three
corresponding points are selected, and a transformation applied on the
basis of that.
To extract the circuit structure, the aligned, stitched images
need to be segmented, which highlights the important circuitry and
separates it from the uninteresting background and insulating materials.
Finally, the wires can be traced from one layer to the next, and
the netlist of the circuit, which contains all of the circuit's
information, can be reconstructed.
Military applications
Reverse engineering is often used by people to copy other nations'
technologies, devices, or information that have been obtained by regular
troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Here are well-known examples from the Second World War and later:
Jerry can: British and American forces in WW2
noticed that the Germans had gasoline cans with an excellent design.
They reverse-engineered copies of those cans, which cans were popularly
known as "Jerry cans".
Panzerschreck: The Germans captured an American bazooka during the Second World War and reverse engineered it to create the larger Panzerschreck.
Tupolev Tu-4: In 1944, three American B-29 bombers on missions over Japan were forced to land in the Soviet Union.
The Soviets, who did not have a similar strategic bomber, decided to
copy the B-29. Within three years, they had developed the Tu-4, a
nearly-perfect copy.
SCR-584 radar: copied by the Soviet Union after the Second World War, it is known for a few modifications - СЦР-584, Бинокль-Д.
V-2
rocket: Technical documents for the V-2 and related technologies were
captured by the Western Allies at the end of the war. The Americans
focused their reverse engineering efforts via Operation Paperclip, which led to the development of the PGM-11 Redstone rocket.
The Soviets used captured German engineers to reproduce technical
documents and plans and worked from captured hardware to make their
clone of the rocket, the R-1. Thus began the postwar Soviet rocket program, which led to the R-7 and the beginning of the space race.
K-13/R-3S missile (NATO reporting name AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder, was made possible after a Taiwanese (ROCAF) AIM-9B hit a Chinese PLA MiG-17 without exploding in September 1958.
The missile became lodged within the airframe, and the pilot returned
to base with what Soviet scientists would describe as a university
course in missile development.
Toophan missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the BGM-71 TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution
ending all plans for such co-production. Iran was later successful in
reverse-engineering the missile and now produces its own copy, the
Toophan.
China has reversed engineered many examples of Western and Russian hardware, from fighter aircraft to missiles and HMMWV cars, such as the MiG-15,17,19,21 (which became the J-2,5,6,7) and the Su-33 (which became the J-15).
During the Second World War, Polish and British cryptographers studied captured German "Enigma" message encryption machines for weaknesses. Their operation was then simulated on electromechanical devices, "bombes",
which tried all the possible scrambler settings of the "Enigma"
machines that helped the breaking of coded messages that had been sent
by the Germans.
Also during the Second World War, British scientists analyzed and defeated a series of increasingly-sophisticated radio navigation systems used by the Luftwaffe
to perform guided bombing missions at night. The British
countermeasures to the system were so effective that in some cases,
German aircraft were led by signals to land at RAF bases since they believed that they had returned to German territory.
Gene networks
Reverse engineering concepts have been applied to biology as well, specifically to the task of understanding the structure and function of gene regulatory networks.
They regulate almost every aspect of biological behavior and allow
cells to carry out physiological processes and responses to
perturbations. Understanding the structure and the dynamic behavior of
gene networks is therefore one of the paramount challenges of systems
biology, with immediate practical repercussions in several applications
that are beyond basic research.
There are several methods for reverse engineering gene regulatory
networks by using molecular biology and data science methods. They have
been generally divided into six classes:
Coexpression methods are based on the notion that if two genes
exhibit a similar expression profile, they may be related although no
causation can be simply inferred from coexpression.
Sequence motif methods analyze gene promoters to find specific transcription factorbinding domains. If a transcription factor is predicted to bind a promoter of a specific gene, a regulatory connection can be hypothesized.
Orthology methods transfer gene network knowledge from one species to another.
Literature methods implement text mining and manual research to identify putative or experimentally-proven gene network connections.
Transcriptional complexes methods leverage information on
protein-protein interactions between transcription factors, thus
extending the concept of gene networks to include transcriptional
regulatory complexes.
Often, gene network reliability is tested by genetic perturbation
experiments followed by dynamic modelling, based on the principle that
removing one network node has predictable effects on the functioning of
the remaining nodes of the network.
Applications of the reverse engineering of gene networks range from understanding mechanisms of plant physiology to the highlighting of new targets for anticancer therapy.
Overlap with patent law
Reverse
engineering applies primarily to gaining understanding of a process or
artifact in which the manner of its construction, use, or internal
processes has not been made clear by its creator.
Patented
items do not of themselves have to be reverse-engineered to be studied,
for the essence of a patent is that inventors provide a detailed public
disclosure themselves, and in return receive legal protection of the invention
that is involved. However, an item produced under one or more patents
could also include other technology that is not patented and not
disclosed. Indeed, one common motivation of reverse engineering is to
determine whether a competitor's product contains patent infringement or copyright infringement.
Legality
United States
In the United States, even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained.
Reverse engineering of computer software often falls under both contract law as a breach of contract as well as any other relevant laws. That is because most end-user license agreements
specifically prohibit it, and US courts have ruled that if such terms
are present, they override the copyright law that expressly permits it
(see Bowers v. Baystate Technologies. According to Section 103(f) of the Digital Millennium Copyright Act (17 U.S.C. § 1201 (f)),
a person in legal possession of a program may reverse-engineer and
circumvent its protection if that is necessary to achieve
"interoperability", a term that broadly covers other devices and
programs that can interact with it, make use of it, and to use and
transfer data to and from it in useful ways. A limited exemption exists
that allows the knowledge thus gained to be shared and used for
interoperability purposes.
European Union
EU Directive 2009/24 on the legal protection of computer programs, which superseded an earlier (1991) directive, governs reverse engineering in the European Union.
(Redirected from Collaborative software development model) Software development is the process used to create software. Programming and maintaining the source code
is the central step of this process, but it also includes conceiving
the project, evaluating its feasibility, analyzing the business
requirements, software design, testing, to release. Software engineering, in addition to development, also includes project management, employee management, and other overhead functions.
Software development may be sequential, in which each step is complete
before the next begins, but iterative development methods where multiple
steps can be executed at once and earlier steps can be revisited have
also been devised to improve flexibility, efficiency, and scheduling.
Each of the available methodologies are best suited to specific kinds
of projects, based on various technical, organizational, project, and
team considerations.
The simplest methodology is the "code and fix", typically used
by a single programmer working on a small project. After briefly
considering the purpose of the program, the programmer codes it and runs
it to see if it works. When they are done, the product is released.
This methodology is useful for prototypes but cannot be used for more
elaborate programs.
In the top-down waterfall model, feasibility, analysis, design, development, quality assurance,
and implementation occur sequentially in that order. This model
requires one step to be complete before the next begins, causing delays,
and makes it impossible to revise previous steps if necessary.
With iterative
processes these steps are interleaved with each other for improved
flexibility, efficiency, and more realistic scheduling. Instead of
completing the project all at once, one might go through most of the
steps with one component at a time. Iterative development also lets
developers prioritize the most important features, enabling lower
priority ones to be dropped later on if necessary.Agile
is one popular method, originally intended for small or medium sized
projects, that focuses on giving developers more control over the
features that they work on to reduce the risk of time or cost overruns. Derivatives of agile include extreme programming and Scrum. Open-source software development
typically uses agile methodology with concurrent design, coding, and
testing, due to reliance on a distributed network of volunteer
contributors.
Beyond agile, some companies integrate information technology (IT) operations with software development, which is called DevOps or DevSecOps including computer security. DevOps includes continuous development, testing, integration of new code in the version control system, deployment of the new code, and sometimes delivery of the code to clients. The purpose of this integration is to deliver IT services more quickly and efficiently.
Another focus in many programming methodologies is the idea of trying to catch issues such as security vulnerabilities and bugs as early as possible (shift-left testing) to reduce the cost of tracking and fixing them.
In 2009, it was estimated that 32 percent of software projects
were delivered on time and budget, and with the full functionality. An
additional 44 percent were delivered, but missing at least one of these
features. The remaining 24 percent were cancelled prior to release.
The sources of ideas for software products are plentiful. These ideas can come from market research including the demographics
of potential new customers, existing customers, sales prospects who
rejected the product, other internal software development staff, or a
creative third party. Ideas for software products are usually first
evaluated by marketing
personnel for economic feasibility, fit with existing channels of
distribution, possible effects on existing product lines, required features,
and fit with the company's marketing objectives. In the marketing
evaluation phase, the cost and time assumptions become evaluated. The feasibility analysis estimates the project's return on investment, its development cost and timeframe. Based on this analysis, the company can make a business decision to invest in further development.
After deciding to develop the software, the company is focused on
delivering the product at or below the estimated cost and time, and with
a high standard of quality (i.e., lack of bugs) and the desired
functionality. Nevertheless, most software projects run late and
sometimes compromises are made in features or quality to meet a
deadline.
Analysis
Software analysis begins with a requirements analysis to capture the business needs of the software.
Challenges for the identification of needs are that current or
potential users may have different and incompatible needs, may not
understand their own needs, and change their needs during the process of
software development.
Ultimately, the result of analysis is a detailed specification for the
product that developers can work from. Software analysts often decompose the project into smaller objects, components that can be reused for increased cost-effectiveness, efficiency, and reliability. Decomposing the project may enable a multi-threaded implementation that runs significantly faster on multiprocessor computers.
During the analysis and design phases of software development, structured analysis is often used to break down the customer's requirements into pieces that can be implemented by software programmers. The underlying logic of the program may be represented in data-flow diagrams, data dictionaries, pseudocode, state transition diagrams, and/or entity relationship diagrams. If the project incorporates a piece of legacy software that has not been modeled, this software may be modeled to help ensure it is correctly incorporated with the newer software.
Design involves choices about the implementation of the software, such as which programming languages
and database software to use, or how the hardware and network
communications will be organized. Design may be iterative with users
consulted about their needs in a process of trial and error. Design often involves people expert in aspect such as database design, screen architecture, and the performance of servers and other hardware. Designers often attempt to find patterns in the software's functionality to spin off distinct modules that can be reused with object-oriented programming. An example of this is the model–view–controller, an interface between a graphical user interface and the backend.
The central feature of software development is creating and
understanding the software that implements the desired functionality.
There are various strategies for writing the code. Cohesive software
has various components that are independent from each other.
Coupling is the interrelation of different software components, which
is viewed as undesirable because it increases the difficulty of maintenance.
Often, software programmers do not follow industry best practices,
resulting in code that is inefficient, difficult to understand, or
lacking documentation on its functionality. These standards are especially likely to break down in the presence of deadlines. As a result, testing, debugging, and revising the code becomes much more difficult. Code refactoring, for example adding more comments to the code, is a solution to improve the understandibility of code.
Testing is the process of ensuring that the code executes correctly and without errors. Debugging
is performed by each software developer on their own code to confirm
that the code does what it is intended to. In particular, it is crucial
that the software executes on all inputs, even if the result is
incorrect. Code reviews
by other developers are often used to scrutinize new code added to the
project, and according to some estimates dramatically reduce the number
of bugs persisting after testing is complete. Once the code has been submitted, quality assurance—a separate department of non-programmers for most large companies—test the accuracy of the entire software product. Acceptance tests derived from the original software requirements are a popular tool for this.
Quality testing also often includes stress and load checking (whether
the software is robust to heavy levels of input or usage), integration testing (to ensure that the software is adequately integrated with other software), and compatibility testing (measuring the software's performance across different operating systems or browsers). When tests are written before the code, this is called test-driven development.
Production is the phase in which software is deployed to the end user. During production, the developer may create technical support resources for users
or a process for fixing bugs and errors that were not caught earlier.
There might also be a return to earlier development phases if user needs
changed or were misunderstood.
Workers
Software
development is performed by software developers, usually working on a
team. Efficient communications between team members is essential to
success. This is more easily achieved if the team is small, used to
working together, and located near each other.
Communications also help identify problems at an earlier state of
development and avoid duplicated effort. Many development projects avoid
the risk of losing essential knowledge held by only one employee by
ensuring that multiple workers are familiar with each component.
Software development involves professionals from various fields, not
just software programmers but also individuals specialized in testing,
documentation writing, graphic design, user support, marketing, and fundraising. Although workers for proprietary software are paid, most contributors to open-source software are volunteers. Alternately, they may be paid by companies whose business model does not involve selling the software, but something else—such as services and modifications to open source software.
Models and tools
Computer-aided software engineering
Computer-aided software engineering (CASE) is tools for the partial automation of software development.
CASE enables designers to sketch out the logic of a program, whether
one to be written, or an already existing one to help integrate it with
new code or reverse engineer it (for example, to change the programming language).
Documentation comes in two forms that are usually kept separate—that
intended for software developers, and that made available to the end
user to help them use the software.Most developer documentation is in the form of code comments for each file, class, and method that cover the application programming interface (API)—how the piece of software can be accessed by another—and often implementation details. This documentation is helpful for new developers to understand the project when they begin working on it. In agile development, the documentation is often written at the same time as the code. User documentation is more frequently written by technical writers.
Accurate estimation is crucial at the feasibility stage and in
delivering the product on time and within budget. The process of
generating estimations is often delegated by the project manager.
Because the effort estimation is directly related to the size of the
complete application, it is strongly influenced by addition of features
in the requirements—the more requirements, the higher the development
cost. Aspects not related to functionality, such as the experience of
the software developers and code reusability, are also essential to
consider in estimation. As of 2019,
most of the tools for estimating the amount of time and resources for
software development were designed for conventional applications and are
not applicable to web applications or mobile applications.
Version control is a popular way of managing changes made to the
software. Whenever a new version is checked in, the software saves a backup
of all modified files. If multiple programmers are working on the
software simultaneously, it manages the merging of their code changes.
The software highlights cases where there is a conflict between two sets
of changes and allows programmers to fix the conflict.
The purpose of viewpoints and views is to enable human engineers to comprehend very complex systems and to organize the elements of the problem around domains of expertise. In the engineering
of physically intensive systems, viewpoints often correspond to
capabilities and responsibilities within the engineering organization.
Intellectual property
Intellectual property can be an issue when developers integrate open-source code or libraries into a proprietary product, because most open-source licenses
used for software require that modifications be released under the same
license. As an alternative, developers may choose a proprietary
alternative or write their own software module.
Smart manufacturing is a broad category of manufacturing that employs computer-integrated manufacturing,
high levels of adaptability and rapid design changes, digital
information technology, and more flexible technical workforce training. Other goals sometimes include fast changes in production levels based on demand, optimization of the supply chain, efficient production and recyclability. In this concept, as smart factory has interoperable systems, multi-scale dynamic modelling and simulation, intelligent automation, strong cyber security, and networked sensors.
The broad definition of smart manufacturing covers many different
technologies. Some of the key technologies in the smart manufacturing
movement include big data processing capabilities, industrial
connectivity devices and services, and advanced robotics.
Big data processing
Smart manufacturing utilizes big data analytics, to refine complicated processes and manage supply chains.
Big data analytics refers to a method for gathering and understanding
large data sets in terms of what are known as the three V's, velocity,
variety and volume. Velocity informs the frequency of data acquisition,
which can be concurrent with the application of previous data. Variety
describes the different types of data that may be handled. Volume
represents the amount of data.
Big data analytics allows an enterprise to use smart manufacturing to
predict demand and the need for design changes rather than reacting to
orders placed.
Some products have embedded sensors, which produce large amounts
of data that can be used to understand consumer behavior and improve
future versions of the product.
Advanced robotics
Advanced industrial robots,
also known as smart machines, operate autonomously and can communicate
directly with manufacturing systems. In some advanced manufacturing
contexts, they can work with humans for co-assembly tasks.
By evaluating sensory input and distinguishing between different
product configurations, these machines are able to solve problems and
make decisions independent of people. These robots are able to complete
work beyond what they were initially programmed to do and have
artificial intelligence that allows them to learn from experience.
These machines have the flexibility to be reconfigured and re-purposed.
This gives them the ability to respond rapidly to design changes and
innovation, which is a competitive advantage over more traditional
manufacturing processes.
An area of concern surrounding advanced robotics is the safety and
well-being of the human workers who interact with robotic systems.
Traditionally, measures have been taken to segregate robots from the
human workforce, but advances in robotic cognitive ability have opened
up opportunities, such as cobots, for robots to work collaboratively with people.
Cloud computing allows large amounts of data storage
or computational power to be rapidly applied to manufacturing, and
allow a large amount of data on machine performance and output quality
to be collected. This can improve machine configuration, predictive
maintenance, and fault analysis. Better predictions can facilitate
better strategies for ordering raw materials or scheduling production
runs.
As of 2019, 3D printing
is mainly used in rapid prototyping, design iteration, and small-scale
production. Improvements in speed, quality, and materials could make it
useful in mass production and mass customization.
However, 3D printing developed so much in recent years that it is
no longer used just as technology for prototyping. 3D printing sector
is moving beyond prototyping especially it is becoming increasingly
widespread in supply chains. The industries where digital manufacturing
with 3D printing is the most seen are automotive, industrial and
medical. In the auto industry, 3D printing is used not only for
prototyping but also for the full production of final parts and
products. 3D printing has also been used by suppliers and digital
manufacturers coming together to help fight COVID-19.
3D printing allows to prototype more successfully, thus companies
are saving time and money as significant volumes of parts can be
produced in a short period. There is great potential for 3D printing to
revolutionise supply chains, hence more companies are using it. The
main challenge that 3D printing faces is the change of people's mindset.
Moreover, some workers will need to re-learn a set of new skills to
manage 3D printing technology.
Eliminating workplace inefficiencies and hazards
Smart
manufacturing can also be attributed to surveying workplace
inefficiencies and assisting in worker safety. Efficiency optimization
is a huge focus for adopters of "smart" systems, which is done through
data research and intelligent learning automation. For instance
operators can be given personal access cards with inbuilt Wi-Fi and
Bluetooth, which can connect to the machines and a Cloud platform to
determine which operator is working on which machine in real time.
An intelligent, interconnected 'smart' system can be established to set
a performance target, determine if the target is obtainable, and
identify inefficiencies through failed or delayed performance targets.
In general, automation may alleviate inefficiencies due to human error.
And in general, evolving AI eliminates the inefficiencies of its
predecessors.
As robots take on more of the physical tasks of manufacturing,
workers no longer need to be present and are exposed to fewer hazards.
Impact of Industry 4.0
Industry 4.0
is a project in the high-tech strategy of the German government that
promotes the computerization of traditional industries such as
manufacturing. The goal is the intelligent factory (Smart Factory) that
is characterized by adaptability, resource efficiency,
and ergonomics, as well as the integration of customers and business
partners in business and value processes. Its technological foundation
consists of cyber-physical systems and the Internet of Things.
This kind of "intelligent manufacturing" makes a great use of:
Wireless connections, both during product assembly and long-distance interactions with them;
Last generation sensors, distributed along the supply chain and the same products (Internet of things)
Elaboration of a great amount of data to control all phases of construction, distribution and usage of a good.
European Roadmap "Factories of the Future" and German one "Industrie 4.0″ illustrate several of the action lines to undertake and the related benefits. Some examples are:
Advanced manufacturing processes and rapid prototyping will make possible for each customer to order one-of-a-kind product without significant cost increase.
Collaborative Virtual Factory (VF) platforms will drastically reduce
cost and time associated to new product design and engineering of the
production process, by exploiting complete simulation and virtual
testing throughout the Product Lifecycle.
Advanced Human-Machine interaction (HMI) and augmented reality
(AR) devices will help increasing safety in production plants and
reducing physical demand to workers (whose age has an increasing trend).
Machine learning will be fundamental to optimize the production processes, both for reducing lead times and reducing the energy consumption.
The Ministry of Economy, Trade and Industry in South Korea announced on 10 March 2016 that it had aided the construction of smart factories in 1,240 small and medium enterprises,
which it said resulted in an average 27.6% decrease in defective
products, 7.1% faster production of prototypes, and 29.2% lower cost.