The open-design movement involves the development of physical
products, machines and systems through use of publicly shared design
information. This includes the making of both free and open-source software (FOSS) as well as open-source hardware.
The process is generally facilitated by the Internet and often
performed without monetary compensation. The goals and philosophy of the
movement are identical to that of the open-source movement, but are implemented for the development of physical products rather than software. Open design is a form of co-creation, where the final product is designed by the users, rather than an external stakeholder such as a private company.
Origin
Sharing of manufacturing information can be traced back to the 18th and 19th century. Aggressive patenting put an end to that period of extensive knowledge sharing.
More recently, principles of open design have been related to the free and open-source software movements. In 1997 Eric S. Raymond, Tim O'Reilly and Larry Augustin established "open source" as an alternative expression to "free software", and in 1997 Bruce Perens published The Open Source Definition.
In late 1998, Dr. Sepehr Kiani (a PhD in mechanical engineering from
MIT) realized that designers could benefit from open source policies,
and in early 1999 he convinced Dr. Ryan Vallance and Dr. Samir Nayfeh of
the potential benefits of open design in machine design applications. Together they established the Open Design Foundation (ODF) as a non-profit corporation, and set out to develop an Open Design Definition.
The idea of open design was taken up, either simultaneously or
subsequently, by several other groups and individuals. The principles of
open design are closely similar to those of open-source hardware design, which emerged in March 1998 when Reinoud Lamberts of the Delft University of Technology proposed on his "Open Design Circuits" website the creation of a hardware design community in the spirit of free software.
Ronen Kadushin coined the title "Open Design" in his 2004
Master's thesis, and the term was later formalized in the 2010 Open
Design Manifesto.
Current directions
The open-design movement currently unites two trends. On one hand, people apply their skills and time on projects for the common good, perhaps where funding or commercial interest is lacking, for developing countries
or to help spread ecological or cheaper technologies. On the other
hand, open design may provide a framework for developing advanced
projects and technologies that might be beyond the resource of any
single company or country and involve people who, without the copyleft
mechanism, might not collaborate otherwise. There is now also a third
trend, where these two methods come together to use high-tech
open-source (e.g. 3D printing) but customized local solutions for sustainable development.
Open Design holds great potential in driving future innovation as
recent research has proven that stakeholder users working together
produce more innovative designs than designers consulting users through
more traditional means.
The open-design movement may arguably organize production by
prioritising socio-ecological well-being over corporate profits,
over-production and excess consumption.
Open machine design as compared to open-source software
The
open-design movement is currently fairly nascent but holds great
potential for the future. In some respects design and engineering are
even more suited to open collaborative development than the increasingly
common open-source software projects, because with 3D models and
photographs the concept can often be understood visually. It is not even
necessary that the project members speak the same languages to usefully
collaborate.
However, there are certain barriers to overcome for open design
when compared to software development where there are mature and widely
used tools available and the duplication and distribution of code cost
next to nothing. Creating, testing and modifying physical designs is not
quite so straightforward because of the effort, time and cost required
to create the physical artefact; although with access to emerging
flexible computer-controlled manufacturing techniques the complexity and
effort of construction can be significantly reduced (see tools
mentioned in the fab lab article).
Organizations
Open design was considered in 2012 a fledgling movement consisting of several unrelated or loosely related initiatives.
Many of these organizations are single, funded projects, while a few
organizations are focusing on an area needing development. In some cases
(e.g. Thingiverse for 3D printable designs or Appropedia for open source appropriate technology) organizations are making an effort to create a centralized open source design repository as this enables innovation. Notable organizations include:
AguaClara, an open-source engineering group at Cornell University publishing a design tool and CAD designs for water treatment plants
Arduino, an open-source electronics hardware platform, community and company
Software rot (bit rot, code rot, software erosion, software decay, or software entropy) is the deterioration of softwarequality or performance over time that leads to it becoming faulty, unusable, or needing upgrade.
Since software cannot physically decay, the term is hyperbole. The process is due to either changes in the source code or to the environment in which the software operates.
The Jargon File, a compendium of hacker lore, defines "bit rot" as a jocular explanation for the degradation of a software program
over time even if "nothing has changed"; the idea behind this is almost
as if the bits that make up the program were subject to radioactive
decay.
Causes
Several
factors are responsible for software rot, including changes to the
environment in which the software operates, degradation of compatibility
between parts of the software itself, and the emergence of bugs in unused or rarely used code.
Environment change
When changes occur in the program's environment, particularly changes
which the designer of the program did not anticipate, the software may
no longer operate as originally intended. For example, many early computer game designers used the CPUclock speed as a timer in their games. However, newer CPU clocks were faster, so the gameplay speed increased accordingly, making the games less usable over time.
Onceability
There
are changes in the environment not related to the program's designer,
but its users. Initially, a user could bring the system into working
order, and have it working flawlessly for a certain amount of time. But,
when the system stops working correctly, or the users want to access
the configuration controls, they cannot repeat that initial step because
of the different context and the unavailable information (password
lost, missing instructions, or simply a hard-to-manage user interface that was first configured by trial and error). Information Architect Jonas Söderström has named this concept Onceability, and defines it as "the quality in a technical system that prevents a user from restoring the system, once it has failed".
Unused code
Infrequently
used portions of code, such as document filters or interfaces designed
to be used by other programs, may contain bugs that go unnoticed. With
changes in user requirements and other external factors, this code may
be executed later, thereby exposing the bugs and making the software
appear less functional.
Normal maintenance of software and systems may also cause software rot. In particular, when a program contains multiple parts which function at arm's length from one another, failing to consider how changes to one part that affect the others may introduce bugs.
In some cases, this may take the form of libraries that the
software uses being changed in a way which adversely affects the
software. If the old version of a library that previously worked with
the software can no longer be used due to conflicts with other software
or security flaws that were found in the old version, there may no
longer be a viable version of a needed library for the program to use.
Online connectivity
Modern
commercial software often connects to an online server for license
verification and accessing information. If the online service powering
the software is shut down, it may stop working.
Since the late 2010s most websites use secure HTTPS connections. However this requires encryption keys called root certificates
which have expiration dates. After the certificates expire the device
loses connectivity to most websites unless the keys are continuously
updated.
Another issue is that in March 2021 old encryption standards TLS 1.0 and TLS 1.1 were deprecated. This means that operating systems, browsers and other online software that do not support at least TLS 1.2
cannot connect to most websites, even to download patches or update the
browser, if these are available. This is occasionally called the "TLS
apocalypse".
Products that cannot connect to most websites include PowerMacs,
old Unix boxes and Microsoft Windows versions older than Server
2008/Windows 7.
The Internet Explorer 8 browser in Server 2008/Windows 7 does support
TLS 1.2 but it is disabled by default.
Classification
Software rot is usually classified as being either 'dormant rot' or 'active rot'.
Dormant rot
Software
that is not currently being used gradually becomes unusable as the
remainder of the application changes. Changes in user requirements and
the software environment also contribute to the deterioration.
Active rot
Software
that is being continuously modified may lose its integrity over time if
proper mitigating processes are not consistently applied. However, much
software requires continuous changes to meet new requirements and
correct bugs, and re-engineering software each time a change is made is
rarely practical. This creates what is essentially an evolution
process for the program, causing it to depart from the original
engineered design. As a consequence of this and a changing environment,
assumptions made by the original designers may be invalidated, thereby
introducing bugs.
In practice, adding new features may be prioritized over updating documentation;
without documentation, however, it is possible for specific knowledge
pertaining to parts of the program to be lost. To some extent, this can
be mitigated by following best current practices for coding conventions.
Active software rot slows once an application is near the end of
its commercial life and further development ceases. Users often learn to
work around any remaining software bugs, and the behaviour of the software becomes consistent as nothing is changing.
Examples
AI program example
Many seminal programs from the early days of AI research have suffered from irreparable software rot. For example, the original SHRDLU
program (an early natural language understanding program) cannot be run
on any modern day computer or computer simulator, as it was developed
during the days when LISP and PLANNER were still in development stage,
and thus uses non-standard macros and software libraries which do not
exist anymore.
Forked online forum example
Suppose an administrator creates a forum using open source
forum software, and then heavily modifies it by adding new features and
options. This process requires extensive modifications to existing
code and deviation from the original functionality of that software.
From here, there are several ways software rot can affect the system:
The administrator can accidentally make changes which conflict
with each other or the original software, causing the forum to behave
unexpectedly or break down altogether. This leaves them in a very bad
position: as they have deviated so greatly from the original code,
technical support and assistance in reviving the forum will be difficult
to obtain.
A security hole may be discovered in the original forum source code,
requiring a security patch. However, because the administrator has
modified the code so extensively, the patch may not be directly
applicable to their code, requiring the administrator to effectively
rewrite the update.
The administrator who made the modifications could vacate their
position, leaving the new administrator with a convoluted and heavily
modified forum that lacks full documentation. Without fully
understanding the modifications, it is difficult for the new
administrator to make changes without introducing conflicts and bugs.
Furthermore, documentation of the original system may no longer be
available, or worse yet, misleading due to subtle differences in
functional requirements.
Wiki example
Suppose a webmaster installs the latest version of MediaWiki,
the software that powers wikis such as Wikipedia, then never applies
any updates. Over time, the web host is likely to update their versions
of the programming language (such as PHP) and the database (such as MariaDB)
without consulting the webmaster. After a long enough time, this will
eventually break complex websites that have not been updated, because
the latest versions of PHP and MariaDB will have breaking changes as
they hard deprecate certain built-in functions, breaking backwards compatibility and causing fatal errors. Other problems that can arise with un-updated website software include security vulnerabilities and spam.
Refactoring
is a means of addressing the problem of software rot. It is described
as the process of rewriting existing code to improve its structure
without affecting its external behaviour. This includes removing dead code
and rewriting sections that have been modified extensively and no
longer work efficiently. Care must be taken not to change the software's
external behaviour, as this could introduce incompatibilities and
thereby itself contribute to software rot. Some design principles to
consider when it comes to refactoring is maintaining the hierarchical
structure of the code and implementing abstraction to simplify and generalize code structures.
A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java,
requiring more sophisticated techniques. Decompilers are usually unable
to perfectly reconstruct the original source code, thus will frequently
produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.
Introduction
The term decompiler is most commonly applied to a program which translatesexecutable programs (the output from a compiler) into source code in a (relatively) high level language
which, when compiled, will produce an executable whose behavior is the
same as the original executable program. By comparison, a disassembler translates an executable program into assembly language (and an assembler could be used for assembling it back into an executable program).
Decompilation is the act of using a decompiler, although the term
can also refer to the output of a decompiler. It can be used for the
recovery of lost source code, and is also useful in some cases for computer security, interoperability and error correction.
The success of decompilation depends on the amount of information
present in the code being decompiled and the sophistication of the
analysis performed on it. The bytecode formats used by many virtual
machines (such as the Java Virtual Machine or the .NET FrameworkCommon Language Runtime) often include extensive metadata and high-level features that make decompilation quite feasible. The application of debug data, i.e. debug-symbols, may enable to reproduce the original names of variables and structures and even the line numbers. Machine language without such metadata or debug data is much harder to decompile.
Some compilers and post-compilation tools produce obfuscated code
(that is, they attempt to produce output that is very difficult to
decompile, or that decompiles to confusing output). This is done to make
it more difficult to reverse engineer the executable.
While decompilers are normally used to (re-)create source code
from binary executables, there are also decompilers to turn specific
binary data files into human-readable and editable sources.
The success level achieved by decompilers can be impacted by various factors. These include the abstraction
level of the source language, if the object code contains explicit
class structure information, it aids the decompilation process.
Descriptive information, especially with naming details, also
accelerates the compiler's work. Moreover, less optimized code is
quicker to decompile since optimization can cause greater deviation from
the original code.
Design
Decompilers
can be thought of as composed of a series of phases each of which
contributes specific aspects of the overall decompilation process.
Loader
The first decompilation phase loads and parses the input machine code or intermediate language program's binary file
format. It should be able to discover basic facts about the input
program, such as the architecture (Pentium, PowerPC, etc.) and the entry
point. In many cases, it should be able to find the equivalent of the main function of a C program, which is the start of the user written
code. This excludes the runtime initialization code, which should not
be decompiled if possible. If available the symbol tables and debug data
are also loaded. The front end may be able to identify the libraries
used even if they are linked with the code, this will provide library
interfaces. If it can determine the compiler or compilers used it may
provide useful information in identifying code idioms.
Disassembly
The next logical phase is the disassembly of machine code instructions into a machine independent intermediate representation (IR). For example, the Pentium machine instruction
moveax,[ebx+0x04]
might be translated to the IR
eax:=m[ebx+4];
Idioms
Idiomatic
machine code sequences are sequences of code whose combined semantics
are not immediately apparent from the instructions' individual
semantics. Either as part of the disassembly phase, or as part of later
analyses, these idiomatic sequences need to be translated into known
equivalent IR. For example, the x86 assembly code:
cdqeax; edx is set to the sign-extension≠edi,edi +(tex)pushxoreax,edxsubeax,edx
could be translated to
eax := abs(eax);
Some idiomatic sequences are machine independent; some involve only one instruction. For example, xoreax,eax clears the eax register (sets it to zero). This can be implemented with a machine independent simplification rule, such as a = 0.
In general, it is best to delay detection of idiomatic sequences
if possible, to later stages that are less affected by instruction
ordering. For example, the instruction scheduling phase of a compiler
may insert other instructions into an idiomatic sequence, or change the
ordering of instructions in the sequence. A pattern matching process in
the disassembly phase would probably not recognize the altered pattern.
Later phases group instruction expressions into more complex
expressions, and modify them into a canonical (standardized) form,
making it more likely that even the altered idiom will match a higher
level pattern later in the decompilation.
Various
program analyses can be applied to the IR. In particular, expression
propagation combines the semantics of several instructions into more
complex expressions. For example,
could result in the following IR after expression propagation:
m[ebx+12] := m[ebx+12] - (m[ebx+4] + m[ebx+8]);
The resulting expression is more like high level language, and has also eliminated the use of the machine register eax. Later analyses may eliminate the ebx register.
Data flow analysis
The places where register contents are defined and used must be traced using data flow analysis.
The same analysis can be applied to locations that are used for
temporaries and local data. A different name can then be formed for each
such connected set of value definitions and uses. It is possible that
the same local variable location was used for more than one variable in
different parts of the original program. Even worse it is possible for
the data flow analysis to identify a path whereby a value may flow
between two such uses even though it would never actually happen or
matter in reality. This may in bad cases lead to needing to define a
location as a union of types. The decompiler may allow the user to
explicitly break such unnatural dependencies which will lead to clearer
code. This of course means a variable is potentially used without being
initialized and so indicates a problem in the original program.
Type analysis
A
good machine code decompiler will perform type analysis. Here, the way
registers or memory locations are used result in constraints on the
possible type of the location. For example, an and instruction implies that the operand is an integer; programs do not use such an operation on floating point values (except in special library code) or on pointers. An add
instruction results in three constraints, since the operands may be
both integer, or one integer and one pointer (with integer and pointer
results respectively; the third constraint comes from the ordering of
the two operands when the types are different).
Various high level expressions can be recognized which trigger
recognition of structures or arrays. However, it is difficult to
distinguish many of the possibilities, because of the freedom that
machine code or even some high level languages such as C allow with
casts and pointer arithmetic.
The example from the previous section could result in the following high level code:
The penultimate decompilation phase involves structuring of the IR into higher level constructs such as while loops and if/then/else conditional statements. For example, the machine code
Unstructured code is more difficult to translate into structured code
than already structured code. Solutions include replicating some code,
or adding boolean variables.
Code generation
The
final phase is the generation of the high level code in the back end of
the decompiler. Just as a compiler may have several back ends for
generating machine code for different architectures, a decompiler may
have several back ends for generating high level code in different high
level languages.
Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps using some form of graphical user interface.
This would allow the user to enter comments, and non-generic variable
and function names. However, these are almost as easily entered in a
post decompilation edit. The user may want to change structural aspects,
such as converting a while loop to a for loop. These are less readily modified with a simple text editor, although source code refactoring
tools may assist with this process. The user may need to enter
information that failed to be identified during the type analysis phase,
e.g. modifying a memory expression to an array or structure expression.
Finally, incorrect IR may need to be corrected, or changes made to
cause the output code to be more readable.
Other techniques
Decompilers using neural networks have been developed. Such a decompiler may be trained by machine learning to improve its accuracy over time.
Legality
The majority of computer programs are covered by copyright
laws. Although the precise scope of what is covered by copyright
differs from region to region, copyright law generally provides the
author (the programmer(s) or employer) with a collection of exclusive
rights to the program. These rights include the right to make copies, including copies made into the computer’s RAM (unless creating such a copy is essential for using the program).
Since the decompilation process involves making multiple such copies, it
is generally prohibited without the authorization of the copyright
holder. However, because decompilation is often a necessary step in
achieving software interoperability, copyright laws in both the United States and Europe permit decompilation to a limited extent.
In the United States, the copyright fair use defence has been successfully invoked in decompilation cases. For example, in Sega v. Accolade,
the court held that Accolade could lawfully engage in decompilation in
order to circumvent the software locking mechanism used by Sega's game
consoles. Additionally, the Digital Millennium Copyright Act (PUBLIC LAW 105–304) has proper exemptions for both Security Testing and Evaluation in §1201(i), and Reverse Engineering in §1201(f).
In Europe, the 1991 Software Directive
explicitly provides for a right to decompile in order to achieve
interoperability. The result of a heated debate between, on the one
side, software protectionists, and, on the other, academics as well as
independent software developers, Article 6 permits decompilation only if
a number of conditions are met:
First, a person or entity must have a licence to use the program to be decompiled.
Second, decompilation must be necessary to achieve interoperability
with the target program or other programs. Interoperability information
should therefore not be readily available, such as through manuals or API
documentation. This is an important limitation. The necessity must be
proven by the decompiler. The purpose of this important limitation is
primarily to provide an incentive for developers to document and
disclose their products' interoperability information.
Third, the decompilation process must, if possible, be confined to
the parts of the target program relevant to interoperability. Since one
of the purposes of decompilation is to gain an understanding of the
program structure, this third limitation may be difficult to meet.
Again, the burden of proof is on the decompiler.
In addition, Article 6 prescribes that the information obtained
through decompilation may not be used for other purposes and that it may
not be given to others.
Overall, the decompilation right provided by Article 6 codifies
what is claimed to be common practice in the software industry. Few
European lawsuits are known to have emerged from the decompilation
right. This could be interpreted as meaning one of three things:
) the decompilation right is not used frequently and the decompilation right may therefore have been unnecessary,
) the decompilation right functions well and provides sufficient legal certainty not to give rise to legal disputes or
) illegal decompilation goes largely undetected.
In a report of 2000 regarding implementation of the Software Directive by the European member states, the European Commission seemed to support the second interpretation.
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software
accomplishes a task with very little (if any) insight into exactly how
it does so. Depending on the system under consideration and the
technologies employed, the knowledge gained during reverse engineering
can help with repurposing obsolete objects, doing security analysis, or
learning how something works.
Although the process is specific to the object on which it is
being performed, all reverse engineering processes consist of three
basic steps: information extraction, modeling, and review. Information
extraction is the practice of gathering all relevant information for
performing the operation. Modeling is the practice of combining the
gathered information into an abstract model, which can be used as a
guide for designing the new object or system. Review is the testing of
the model to ensure the validity of the chosen abstract. Reverse engineering is applicable in the fields of computer engineering, mechanical engineering, design, electronic engineering, software engineering, chemical engineering, and systems biology.
Overview
There
are many reasons for performing reverse engineering in various fields.
Reverse engineering has its origins in the analysis of hardware for
commercial or military advantage.
However, the reverse engineering process may not always be concerned
with creating a copy or changing the artifact in some way. It may be
used as part of an analysis to deduce
design features from products with little or no additional knowledge
about the procedures involved in their original production.
In some cases, the goal of the reverse engineering process can simply be a redocumentation of legacy systems. Even when the reverse-engineered product is that of a competitor, the goal may not be to copy it but to perform competitor analysis. Reverse engineering may also be used to create interoperable products
and despite some narrowly-tailored United States and European Union
legislation, the legality of using specific reverse engineering
techniques for that purpose has been hotly contested in courts worldwide
for more than two decades.
Software
reverse engineering can help to improve the understanding of the
underlying source code for the maintenance and improvement of the
software, relevant information can be extracted to make a decision for
software development and graphical representations of the code can
provide alternate views regarding the source code, which can help to
detect and fix a software bug
or vulnerability. Frequently, as some software develops, its design
information and improvements are often lost over time, but that lost
information can usually be recovered with reverse engineering. The
process can also help to cut down the time required to understand the
source code, thus reducing the overall cost of the software development.
Reverse engineering can also help to detect and to eliminate a
malicious code written to the software with better code detectors.
Reversing a source code can be used to find alternate uses of the source
code, such as detecting the unauthorized replication of the source code
where it was not intended to be used, or revealing how a competitor's
product was built. That process is commonly used for "cracking" software and media to remove their copy protection, or to create a possibly-improved copy or even a knockoff, which is usually the goal of a competitor or a hacker.
Interfacing. Reverse engineering can be used when a
system is required to interface to another system and how both systems
would negotiate is to be established. Such requirements typically exist
for interoperability.
Military or commercialespionage.
Learning about an enemy's or competitor's latest research by stealing
or capturing a prototype and dismantling it may result in the
development of a similar product or a better countermeasure against it.
Obsolescence. Integrated circuits
are often designed on proprietary systems and built on production
lines, which become obsolete in only a few years. When systems using
those parts can no longer be maintained since the parts are no longer
made, the only way to incorporate the functionality into new technology
is to reverse-engineer the existing chip and then to redesign
it using newer tools by using the understanding gained as a guide.
Another obsolescence originated problem that can be solved by reverse
engineering is the need to support (maintenance and supply for
continuous operation) existing legacy devices that are no longer
supported by their original equipment manufacturer. The problem is particularly critical in military operations.
Product security analysis. That examines how a product works
by determining the specifications of its components and estimate costs
and identifies potential patent infringement.
Also part of product security analysis is acquiring sensitive data by
disassembling and analyzing the design of a system component. Another intent may be to remove copy protection or to circumvent access restrictions.
Competitive technical intelligence. That is to understand what one's competitor is actually doing, rather than what it says that it is doing.
Saving money. Finding out what a piece of electronics can do may spare a user from purchasing a separate product.
Repurposing. Obsolete objects are then reused in a different-but-useful manner.
Design.
Production and design companies applied Reverse Engineering to
practical craft-based manufacturing process. The companies can work on
"historical" manufacturing collections through 3D scanning, 3D
re-modeling and re-design. In 2013 Italian manufactures Baldi and Savio Firmino together with University of Florence optimized their innovation, design, and production processes.
Common uses
Machines
As computer-aided design
(CAD) has become more popular, reverse engineering has become a viable
method to create a 3D virtual model of an existing physical part for use
in 3D CAD, CAM, CAE, or other software.
The reverse-engineering process involves measuring an object and then
reconstructing it as a 3D model. The physical object can be measured
using 3D scanning technologies like CMMs, laser scanners, structured light digitizers, or industrial CT scanning (computed tomography). The measureddata alone, usually represented as a point cloud, lacks topologicalinformation and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh.
Reverse engineering aims to go beyond producing such a mesh and to
recover the design intent in terms of simple analytical surfaces where
appropriate (planes, cylinders, etc.) as well as possibly NURBS surfaces to produce a boundary-representation
CAD model. Recovery of such a model allows a design to be modified to
meet new requirements, a manufacturing plan to be generated, etc.
Hybrid modeling is a commonly used term when NURBS and parametric modeling are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling.
Areas of freeform data can be combined with exact geometric surfaces to
create a hybrid model. A typical example of this would be the reverse
engineering of a cylinder head, which includes freeform cast features,
such as water jackets and high-tolerance machined areas.
Reverse engineering is also used by businesses to bring existing
physical geometry into digital product development environments, to make
a digital 3D record of their own products, or to assess competitors'
products. It is used to analyze how a product works, what it does, what
components it has; estimate costs; identify potential patent infringement; etc.
Value engineering,
a related activity that is also used by businesses, involves
deconstructing and analyzing products. However, the objective is to find
opportunities for cost-cutting.
Reverse engineering of printed circuit boards
involves recreating fabrication data for a particular circuit board.
This is done primarily to identify a design, and learn the functional
and structural characteristics of a design. It also allows for the
discovery of the design principles behind a product, especially if this
design information is not easily available.
Outdated PCBs are often subject to reverse engineering,
especially when they perform highly critical functions such as powering
machinery, or other electronic components. Reverse engineering these old
parts can allow the reconstruction of the PCB if it performs some
crucial task, as well as finding alternatives which provide the same
function, or in upgrading the old PCB.
Reverse engineering PCBs largely follow the same series of steps.
First, images are created by drawing, scanning, or taking photographs
of the PCB. Then, these images are ported to suitable reverse
engineering software in order to create a rudimentary design for the new
PCB. The quality of these images that is necessary for suitable reverse
engineering is proportional to the complexity of the PCB itself. More
complicated PCBs require well lighted photos on dark backgrounds, while
fairly simple PCBs can be recreated simply with just basic dimensioning.
Each layer of the PCB is carefully recreated in the software with the
intent of producing a final design as close to the initial. Then, the
schematics for the circuit are finally generated using an appropriate
tool.
Software
In 1990, the Institute of Electrical and Electronics Engineers
(IEEE) defined (software) reverse engineering (SRE) as "the process of
analyzing a
subject system to identify the system's components and their
interrelationships and to create representations of the system in
another form or at a higher
level of abstraction" in which the "subject system" is the end product
of software development. Reverse engineering is a process of examination
only, and the software system under consideration is not modified,
which would otherwise be re-engineering
or restructuring. Reverse engineering can be performed from any stage
of the product cycle, not necessarily from the functional end product.
There are two components in reverse engineering: redocumentation
and design recovery. Redocumentation is the creation of new
representation of the computer code so that it is easier to understand.
Meanwhile, design recovery is the use of deduction or reasoning from
general knowledge or personal experience of the product to understand
the product's functionality fully. It can also be seen as "going backwards through the development cycle".
In this model, the output of the implementation phase (in source code
form) is reverse-engineered back to the analysis phase, in an inversion
of the traditional waterfall model. Another term for this technique is program comprehension.
The Working Conference on Reverse Engineering (WCRE) has been held
yearly to explore and expand the techniques of reverse engineering. Computer-aided software engineering (CASE) and automated code generation have contributed greatly in the field of reverse engineering.
Software anti-tamper technology like obfuscation
is used to deter both reverse engineering and re-engineering of
proprietary software and software-powered systems. In practice, two main
types of reverse engineering emerge. In the first case, source code is
already available for the software, but higher-level aspects of the
program, which are perhaps poorly documented or documented but no longer
valid, are discovered. In the second case, there is no source code
available for the software, and any efforts towards discovering one
possible source code for the software are regarded as reverse
engineering. The second usage of the term is more familiar to most
people. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.
On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API but has the goals to find bugs and undocumented features by bashing the product from outside.
Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking"), circumvention of access restrictions often present in consumer electronics, customization of embedded systems
(such as engine management systems), in-house repairs or retrofits,
enabling of additional features on low-cost "crippled" hardware (such as
some graphics card chip-sets), or even mere satisfaction of curiosity.
Binary software
Binary reverse engineering is performed if source code for a software is unavailable. This process is sometimes termed reverse code engineering, or RCE. For example, decompilation of binaries for the Java platform can be accomplished by using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PCBIOS, which launched the historic IBM PC compatible industry that has been the overwhelmingly-dominant computer hardware platform for many years. Reverse engineering of software is protected in the US by the fair use exception in copyright law. The Samba software, which allows systems that do not run Microsoft Windows systems to share files with systems that run it, is a classic example of software reverse engineering
since the Samba project had to reverse-engineer unpublished information
about how Windows file sharing worked so that non-Windows computers
could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing that for the Microsoft Office file formats. The ReactOS
project is even more ambitious in its goals by striving to provide
binary (ABI and API) compatibility with the current Windows operating
systems of the NT branch, which allows software and drivers written for
Windows to run on a clean-room reverse-engineered free software (GPL) counterpart. WindowsSCOPE
allows for reverse-engineering the full contents of a Windows system's
live memory including a binary-level, graphical reverse engineering of
all running processes.
Another classic, if not well-known, example is that in 1987 Bell Laboratories reverse-engineered the Mac OS System 4.1, originally running on the Apple Macintosh SE, so that it could run it on RISC machines of their own.
Binary software techniques
Reverse engineering of software can be accomplished by various methods.
The three main groups of software reverse engineering are
Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, such as for accessing a computer bus or computer network
connection and revealing the traffic data thereon. Bus or network
behavior can then be analyzed to produce a standalone implementation
that mimics that behavior. That is especially useful for reverse
engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. It works on any computer program but can take quite some time, especially for those who are not used to machine code. The Interactive Disassembler is a particularly popular tool.
Decompilation using a decompiler,
a process that tries, with varying results, to recreate the source code
in some high-level language for a program only available in machine
code or bytecode.
Software classification
Software
classification is the process of identifying similarities between
different software binaries (such as two different versions of the same
binary) used to detect code relations between software samples. The task
was traditionally done manually for several reasons (such as patch
analysis for vulnerability detection and copyright infringement), but it can now be done somewhat automatically for large numbers of samples.
This method is being used mostly for long and thorough reverse
engineering tasks (complete analysis of a complex algorithm or big piece
of software). In general, statistical classification
is considered to be a hard problem, which is also true for software
classification, and so few solutions/tools that handle this task well.
Source code
A number of UML tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering". See List of UML tools.
Although UML is one approach in providing "reverse engineering"
more recent advances in international standards activities have resulted
in the development of the Knowledge Discovery Metamodel
(KDM). The standard delivers an ontology for the intermediate (or
abstracted) representation of programming language constructs and their
interrelationships. An Object Management Group standard (on its way to becoming an ISO standard as well),
KDM has started to take hold in industry with the development of tools
and analysis environments that can deliver the extraction and analysis
of source, binary, and byte code. For source code analysis, KDM's
granular standards' architecture enables the extraction of software
system flows (data, control, and call maps), architectures, and business
layer knowledge (rules, terms, and process). The standard enables the
use of a common data format (XMI) enabling the correlation of the
various layers of system knowledge for either detailed analysis (such as
root cause, impact) or derived analysis (such as business process
extraction). Although efforts to represent language constructs can be
never-ending because of the number of languages, the continuous
evolution of software languages, and the development of new languages,
the standard does allow for the use of extensions to support the broad
language set as well as evolution. KDM is compatible with UML, BPMN,
RDF, and other standards enabling migration into other environments and
thus leverage system knowledge for efforts such as software system
transformation and enterprise business layer analysis.
Protocols
Protocols are sets of rules that describe message formats and how messages are exchanged: the protocol state machine.
Accordingly, the problem of protocol reverse-engineering can be
partitioned into two subproblems: message format and state-machine
reverse-engineering.
The message formats have traditionally been reverse-engineered by
a tedious manual process, which involved analysis of how protocol
implementations process messages, but recent research proposed a number
of automatic solutions. Typically, the automatic approaches group observe messages into clusters by using various clustering analyses, or they emulate the protocol implementation tracing the message processing.
There has been less work on reverse-engineering of state-machines
of protocols. In general, the protocol state-machines can be learned
either through a process of offline learning,
which passively observes communication and attempts to build the most
general state-machine accepting all observed sequences of messages, and online learning,
which allows interactive generation of probing sequences of messages
and listening to responses to those probing sequences. In general,
offline learning of small state-machines is known to be NP-complete, but online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al. and an online approach by Cho et al.
Other components of typical protocols, like encryption and hash
functions, can be reverse-engineered automatically as well. Typically,
the automatic approaches trace the execution of protocol implementations
and try to detect buffers in memory holding unencrypted packets.
Integrated circuits/smart cards
Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope
(SEM). That technique can reveal the complete hardware and software
part of the smart card. The major problem for the attacker is to bring
everything into the right order to find out how everything works. The
makers of the card try to hide keys and operations by mixing up memory
positions, such as by bus scrambling.
In some cases, it is even possible to attach a probe to measure
voltages while the smart card is still operational. The makers of the
card employ sensors to detect and prevent that attack.
That attack is not very common because it requires both a large
investment in effort and special equipment that is generally available
only to large chip manufacturers. Furthermore, the payoff from this
attack is low since other security techniques are often used such as
shadow accounts. It is still uncertain whether attacks against
chip-and-PIN cards to replicate encryption data and then to crack PINs
would provide a cost-effective attack on multifactor authentication.
Full reverse engineering proceeds in several major steps.
The first step after images have been taken with a SEM is
stitching the images together, which is necessary because each layer
cannot be captured by a single shot. A SEM needs to sweep across the
area of the circuit and take several hundred images to cover the entire
layer. Image stitching takes as input several hundred pictures and
outputs a single properly-overlapped picture of the complete layer.
Next, the stitched layers need to be aligned because the sample,
after etching, cannot be put into the exact same position relative to
the SEM each time. Therefore, the stitched versions will not overlap in
the correct fashion, as on the real circuit. Usually, three
corresponding points are selected, and a transformation applied on the
basis of that.
To extract the circuit structure, the aligned, stitched images
need to be segmented, which highlights the important circuitry and
separates it from the uninteresting background and insulating materials.
Finally, the wires can be traced from one layer to the next, and
the netlist of the circuit, which contains all of the circuit's
information, can be reconstructed.
Military applications
Reverse engineering is often used by people to copy other nations'
technologies, devices, or information that have been obtained by regular
troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Here are well-known examples from the Second World War and later:
Jerry can: British and American forces in WW2
noticed that the Germans had gasoline cans with an excellent design.
They reverse-engineered copies of those cans, which cans were popularly
known as "Jerry cans".
Panzerschreck: The Germans captured an American bazooka during the Second World War and reverse engineered it to create the larger Panzerschreck.
Tupolev Tu-4: In 1944, three American B-29 bombers on missions over Japan were forced to land in the Soviet Union.
The Soviets, who did not have a similar strategic bomber, decided to
copy the B-29. Within three years, they had developed the Tu-4, a
nearly-perfect copy.
SCR-584 radar: copied by the Soviet Union after the Second World War, it is known for a few modifications - СЦР-584, Бинокль-Д.
V-2
rocket: Technical documents for the V-2 and related technologies were
captured by the Western Allies at the end of the war. The Americans
focused their reverse engineering efforts via Operation Paperclip, which led to the development of the PGM-11 Redstone rocket.
The Soviets used captured German engineers to reproduce technical
documents and plans and worked from captured hardware to make their
clone of the rocket, the R-1. Thus began the postwar Soviet rocket program, which led to the R-7 and the beginning of the space race.
K-13/R-3S missile (NATO reporting name AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder, was made possible after a Taiwanese (ROCAF) AIM-9B hit a Chinese PLA MiG-17 without exploding in September 1958.
The missile became lodged within the airframe, and the pilot returned
to base with what Soviet scientists would describe as a university
course in missile development.
Toophan missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the BGM-71 TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution
ending all plans for such co-production. Iran was later successful in
reverse-engineering the missile and now produces its own copy, the
Toophan.
China has reversed engineered many examples of Western and Russian hardware, from fighter aircraft to missiles and HMMWV cars, such as the MiG-15,17,19,21 (which became the J-2,5,6,7) and the Su-33 (which became the J-15).
During the Second World War, Polish and British cryptographers studied captured German "Enigma" message encryption machines for weaknesses. Their operation was then simulated on electromechanical devices, "bombes",
which tried all the possible scrambler settings of the "Enigma"
machines that helped the breaking of coded messages that had been sent
by the Germans.
Also during the Second World War, British scientists analyzed and defeated a series of increasingly-sophisticated radio navigation systems used by the Luftwaffe
to perform guided bombing missions at night. The British
countermeasures to the system were so effective that in some cases,
German aircraft were led by signals to land at RAF bases since they believed that they had returned to German territory.
Gene networks
Reverse engineering concepts have been applied to biology as well, specifically to the task of understanding the structure and function of gene regulatory networks.
They regulate almost every aspect of biological behavior and allow
cells to carry out physiological processes and responses to
perturbations. Understanding the structure and the dynamic behavior of
gene networks is therefore one of the paramount challenges of systems
biology, with immediate practical repercussions in several applications
that are beyond basic research.
There are several methods for reverse engineering gene regulatory
networks by using molecular biology and data science methods. They have
been generally divided into six classes:
Coexpression methods are based on the notion that if two genes
exhibit a similar expression profile, they may be related although no
causation can be simply inferred from coexpression.
Sequence motif methods analyze gene promoters to find specific transcription factorbinding domains. If a transcription factor is predicted to bind a promoter of a specific gene, a regulatory connection can be hypothesized.
Orthology methods transfer gene network knowledge from one species to another.
Literature methods implement text mining and manual research to identify putative or experimentally-proven gene network connections.
Transcriptional complexes methods leverage information on
protein-protein interactions between transcription factors, thus
extending the concept of gene networks to include transcriptional
regulatory complexes.
Often, gene network reliability is tested by genetic perturbation
experiments followed by dynamic modelling, based on the principle that
removing one network node has predictable effects on the functioning of
the remaining nodes of the network.
Applications of the reverse engineering of gene networks range from understanding mechanisms of plant physiology to the highlighting of new targets for anticancer therapy.
Overlap with patent law
Reverse
engineering applies primarily to gaining understanding of a process or
artifact in which the manner of its construction, use, or internal
processes has not been made clear by its creator.
Patented
items do not of themselves have to be reverse-engineered to be studied,
for the essence of a patent is that inventors provide a detailed public
disclosure themselves, and in return receive legal protection of the invention
that is involved. However, an item produced under one or more patents
could also include other technology that is not patented and not
disclosed. Indeed, one common motivation of reverse engineering is to
determine whether a competitor's product contains patent infringement or copyright infringement.
Legality
United States
In the United States, even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained.
Reverse engineering of computer software often falls under both contract law as a breach of contract as well as any other relevant laws. That is because most end-user license agreements
specifically prohibit it, and US courts have ruled that if such terms
are present, they override the copyright law that expressly permits it
(see Bowers v. Baystate Technologies. According to Section 103(f) of the Digital Millennium Copyright Act (17 U.S.C. § 1201 (f)),
a person in legal possession of a program may reverse-engineer and
circumvent its protection if that is necessary to achieve
"interoperability", a term that broadly covers other devices and
programs that can interact with it, make use of it, and to use and
transfer data to and from it in useful ways. A limited exemption exists
that allows the knowledge thus gained to be shared and used for
interoperability purposes.
European Union
EU Directive 2009/24 on the legal protection of computer programs, which superseded an earlier (1991) directive, governs reverse engineering in the European Union.