Search This Blog

Wednesday, May 22, 2024

Decompiler

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Decompiler

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

Introduction

The term decompiler is most commonly applied to a program which translates executable programs (the output from a compiler) into source code in a (relatively) high level language which, when compiled, will produce an executable whose behavior is the same as the original executable program. By comparison, a disassembler translates an executable program into assembly language (and an assembler could be used for assembling it back into an executable program).

Decompilation is the act of using a decompiler, although the term can also refer to the output of a decompiler. It can be used for the recovery of lost source code, and is also useful in some cases for computer security, interoperability and error correction. The success of decompilation depends on the amount of information present in the code being decompiled and the sophistication of the analysis performed on it. The bytecode formats used by many virtual machines (such as the Java Virtual Machine or the .NET Framework Common Language Runtime) often include extensive metadata and high-level features that make decompilation quite feasible. The application of debug data, i.e. debug-symbols, may enable to reproduce the original names of variables and structures and even the line numbers. Machine language without such metadata or debug data is much harder to decompile.

Some compilers and post-compilation tools produce obfuscated code (that is, they attempt to produce output that is very difficult to decompile, or that decompiles to confusing output). This is done to make it more difficult to reverse engineer the executable.

While decompilers are normally used to (re-)create source code from binary executables, there are also decompilers to turn specific binary data files into human-readable and editable sources.

The success level achieved by decompilers can be impacted by various factors. These include the abstraction level of the source language,  if the object code contains explicit class structure information, it aids the decompilation process. Descriptive information, especially with naming details, also accelerates the compiler's work. Moreover, less optimized code is quicker to decompile since optimization can cause greater deviation from the original code.

Design

Decompilers can be thought of as composed of a series of phases each of which contributes specific aspects of the overall decompilation process.

Loader

The first decompilation phase loads and parses the input machine code or intermediate language program's binary file format. It should be able to discover basic facts about the input program, such as the architecture (Pentium, PowerPC, etc.) and the entry point. In many cases, it should be able to find the equivalent of the main function of a C program, which is the start of the user written code. This excludes the runtime initialization code, which should not be decompiled if possible. If available the symbol tables and debug data are also loaded. The front end may be able to identify the libraries used even if they are linked with the code, this will provide library interfaces. If it can determine the compiler or compilers used it may provide useful information in identifying code idioms.

Disassembly

The next logical phase is the disassembly of machine code instructions into a machine independent intermediate representation (IR). For example, the Pentium machine instruction

mov    eax, [ebx+0x04]

might be translated to the IR

eax  := m[ebx+4];

Idioms

Idiomatic machine code sequences are sequences of code whose combined semantics are not immediately apparent from the instructions' individual semantics. Either as part of the disassembly phase, or as part of later analyses, these idiomatic sequences need to be translated into known equivalent IR. For example, the x86 assembly code:

    cdq    eax             ; edx is set to the sign-extension≠edi,edi +(tex)push
    xor    eax, edx
    sub    eax, edx

could be translated to

eax  := abs(eax);

Some idiomatic sequences are machine independent; some involve only one instruction. For example, xor eax, eax clears the eax register (sets it to zero). This can be implemented with a machine independent simplification rule, such as a = 0.

In general, it is best to delay detection of idiomatic sequences if possible, to later stages that are less affected by instruction ordering. For example, the instruction scheduling phase of a compiler may insert other instructions into an idiomatic sequence, or change the ordering of instructions in the sequence. A pattern matching process in the disassembly phase would probably not recognize the altered pattern. Later phases group instruction expressions into more complex expressions, and modify them into a canonical (standardized) form, making it more likely that even the altered idiom will match a higher level pattern later in the decompilation.

It is particularly important to recognize the compiler idioms for subroutine calls, exception handling, and switch statements. Some languages also have extensive support for strings or long integers.

Program analysis

Various program analyses can be applied to the IR. In particular, expression propagation combines the semantics of several instructions into more complex expressions. For example,

    mov   eax,[ebx+0x04]
    add   eax,[ebx+0x08]
    sub   [ebx+0x0C],eax

could result in the following IR after expression propagation:

m[ebx+12]  := m[ebx+12] - (m[ebx+4] + m[ebx+8]);

The resulting expression is more like high level language, and has also eliminated the use of the machine register eax. Later analyses may eliminate the ebx register.

Data flow analysis

The places where register contents are defined and used must be traced using data flow analysis. The same analysis can be applied to locations that are used for temporaries and local data. A different name can then be formed for each such connected set of value definitions and uses. It is possible that the same local variable location was used for more than one variable in different parts of the original program. Even worse it is possible for the data flow analysis to identify a path whereby a value may flow between two such uses even though it would never actually happen or matter in reality. This may in bad cases lead to needing to define a location as a union of types. The decompiler may allow the user to explicitly break such unnatural dependencies which will lead to clearer code. This of course means a variable is potentially used without being initialized and so indicates a problem in the original program.

Type analysis

A good machine code decompiler will perform type analysis. Here, the way registers or memory locations are used result in constraints on the possible type of the location. For example, an and instruction implies that the operand is an integer; programs do not use such an operation on floating point values (except in special library code) or on pointers. An add instruction results in three constraints, since the operands may be both integer, or one integer and one pointer (with integer and pointer results respectively; the third constraint comes from the ordering of the two operands when the types are different).

Various high level expressions can be recognized which trigger recognition of structures or arrays. However, it is difficult to distinguish many of the possibilities, because of the freedom that machine code or even some high level languages such as C allow with casts and pointer arithmetic.

The example from the previous section could result in the following high level code:

struct T1 *ebx;
    struct T1 {
        int v0004;
        int v0008;
        int v000C;
    };
ebx->v000C -= ebx->v0004 + ebx->v0008;

Structuring

The penultimate decompilation phase involves structuring of the IR into higher level constructs such as while loops and if/then/else conditional statements. For example, the machine code

    xor eax, eax
l0002:
    or  ebx, ebx
    jge l0003
    add eax,[ebx]
    mov ebx,[ebx+0x4]
    jmp l0002
l0003:
    mov [0x10040000],eax

could be translated into:

eax = 0;
while (ebx < 0) {
    eax += ebx->v0000;
    ebx = ebx->v0004;
}
v10040000 = eax;

Unstructured code is more difficult to translate into structured code than already structured code. Solutions include replicating some code, or adding boolean variables.

Code generation

The final phase is the generation of the high level code in the back end of the decompiler. Just as a compiler may have several back ends for generating machine code for different architectures, a decompiler may have several back ends for generating high level code in different high level languages.

Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps using some form of graphical user interface. This would allow the user to enter comments, and non-generic variable and function names. However, these are almost as easily entered in a post decompilation edit. The user may want to change structural aspects, such as converting a while loop to a for loop. These are less readily modified with a simple text editor, although source code refactoring tools may assist with this process. The user may need to enter information that failed to be identified during the type analysis phase, e.g. modifying a memory expression to an array or structure expression. Finally, incorrect IR may need to be corrected, or changes made to cause the output code to be more readable.

Other techniques

Decompilers using neural networks have been developed. Such a decompiler may be trained by machine learning to improve its accuracy over time.

Legality

The majority of computer programs are covered by copyright laws. Although the precise scope of what is covered by copyright differs from region to region, copyright law generally provides the author (the programmer(s) or employer) with a collection of exclusive rights to the program. These rights include the right to make copies, including copies made into the computer’s RAM (unless creating such a copy is essential for using the program). Since the decompilation process involves making multiple such copies, it is generally prohibited without the authorization of the copyright holder. However, because decompilation is often a necessary step in achieving software interoperability, copyright laws in both the United States and Europe permit decompilation to a limited extent.

In the United States, the copyright fair use defence has been successfully invoked in decompilation cases. For example, in Sega v. Accolade, the court held that Accolade could lawfully engage in decompilation in order to circumvent the software locking mechanism used by Sega's game consoles. Additionally, the Digital Millennium Copyright Act (PUBLIC LAW 105–304) has proper exemptions for both Security Testing and Evaluation in §1201(i), and Reverse Engineering in §1201(f).

In Europe, the 1991 Software Directive explicitly provides for a right to decompile in order to achieve interoperability. The result of a heated debate between, on the one side, software protectionists, and, on the other, academics as well as independent software developers, Article 6 permits decompilation only if a number of conditions are met:

  • First, a person or entity must have a licence to use the program to be decompiled.
  • Second, decompilation must be necessary to achieve interoperability with the target program or other programs. Interoperability information should therefore not be readily available, such as through manuals or API documentation. This is an important limitation. The necessity must be proven by the decompiler. The purpose of this important limitation is primarily to provide an incentive for developers to document and disclose their products' interoperability information.
  • Third, the decompilation process must, if possible, be confined to the parts of the target program relevant to interoperability. Since one of the purposes of decompilation is to gain an understanding of the program structure, this third limitation may be difficult to meet. Again, the burden of proof is on the decompiler.

In addition, Article 6 prescribes that the information obtained through decompilation may not be used for other purposes and that it may not be given to others.

Overall, the decompilation right provided by Article 6 codifies what is claimed to be common practice in the software industry. Few European lawsuits are known to have emerged from the decompilation right. This could be interpreted as meaning one of three things:

  1. ) the decompilation right is not used frequently and the decompilation right may therefore have been unnecessary,
  2. ) the decompilation right functions well and provides sufficient legal certainty not to give rise to legal disputes or
  3. ) illegal decompilation goes largely undetected.

In a report of 2000 regarding implementation of the Software Directive by the European member states, the European Commission seemed to support the second interpretation.

Reverse engineering

From Wikipedia, the free encyclopedia

Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomplishes a task with very little (if any) insight into exactly how it does so. Depending on the system under consideration and the technologies employed, the knowledge gained during reverse engineering can help with repurposing obsolete objects, doing security analysis, or learning how something works.

Although the process is specific to the object on which it is being performed, all reverse engineering processes consist of three basic steps: information extraction, modeling, and review. Information extraction is the practice of gathering all relevant information for performing the operation. Modeling is the practice of combining the gathered information into an abstract model, which can be used as a guide for designing the new object or system. Review is the testing of the model to ensure the validity of the chosen abstract. Reverse engineering is applicable in the fields of computer engineering, mechanical engineering, design, electronic engineering, software engineering, chemical engineering, and systems biology.

Overview

There are many reasons for performing reverse engineering in various fields. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage. However, the reverse engineering process may not always be concerned with creating a copy or changing the artifact in some way. It may be used as part of an analysis to deduce design features from products with little or no additional knowledge about the procedures involved in their original production.

In some cases, the goal of the reverse engineering process can simply be a redocumentation of legacy systems. Even when the reverse-engineered product is that of a competitor, the goal may not be to copy it but to perform competitor analysis. Reverse engineering may also be used to create interoperable products and despite some narrowly-tailored United States and European Union legislation, the legality of using specific reverse engineering techniques for that purpose has been hotly contested in courts worldwide for more than two decades.

Software reverse engineering can help to improve the understanding of the underlying source code for the maintenance and improvement of the software, relevant information can be extracted to make a decision for software development and graphical representations of the code can provide alternate views regarding the source code, which can help to detect and fix a software bug or vulnerability. Frequently, as some software develops, its design information and improvements are often lost over time, but that lost information can usually be recovered with reverse engineering. The process can also help to cut down the time required to understand the source code, thus reducing the overall cost of the software development. Reverse engineering can also help to detect and to eliminate a malicious code written to the software with better code detectors. Reversing a source code can be used to find alternate uses of the source code, such as detecting the unauthorized replication of the source code where it was not intended to be used, or revealing how a competitor's product was built. That process is commonly used for "cracking" software and media to remove their copy protection, or to create a possibly-improved copy or even a knockoff, which is usually the goal of a competitor or a hacker.

Malware developers often use reverse engineering techniques to find vulnerabilities in an operating system to build a computer virus that can exploit the system vulnerabilities. Reverse engineering is also being used in cryptanalysis to find vulnerabilities in substitution cipher, symmetric-key algorithm or public-key cryptography.

There are other uses to reverse engineering:

  • Interfacing. Reverse engineering can be used when a system is required to interface to another system and how both systems would negotiate is to be established. Such requirements typically exist for interoperability.
  • Military or commercial espionage. Learning about an enemy's or competitor's latest research by stealing or capturing a prototype and dismantling it may result in the development of a similar product or a better countermeasure against it.
  • Obsolescence. Integrated circuits are often designed on proprietary systems and built on production lines, which become obsolete in only a few years. When systems using those parts can no longer be maintained since the parts are no longer made, the only way to incorporate the functionality into new technology is to reverse-engineer the existing chip and then to redesign it using newer tools by using the understanding gained as a guide. Another obsolescence originated problem that can be solved by reverse engineering is the need to support (maintenance and supply for continuous operation) existing legacy devices that are no longer supported by their original equipment manufacturer. The problem is particularly critical in military operations.
  • Product security analysis. That examines how a product works by determining the specifications of its components and estimate costs and identifies potential patent infringement. Also part of product security analysis is acquiring sensitive data by disassembling and analyzing the design of a system component. Another intent may be to remove copy protection or to circumvent access restrictions.
  • Competitive technical intelligence. That is to understand what one's competitor is actually doing, rather than what it says that it is doing.
  • Saving money. Finding out what a piece of electronics can do may spare a user from purchasing a separate product.
  • Repurposing. Obsolete objects are then reused in a different-but-useful manner.
  • Design. Production and design companies applied Reverse Engineering to practical craft-based manufacturing process. The companies can work on "historical" manufacturing collections through 3D scanning, 3D re-modeling and re-design. In 2013 Italian manufactures Baldi and Savio Firmino together with University of Florence optimized their innovation, design, and production processes.

Common uses

Machines

As computer-aided design (CAD) has become more popular, reverse engineering has become a viable method to create a 3D virtual model of an existing physical part for use in 3D CAD, CAM, CAE, or other software. The reverse-engineering process involves measuring an object and then reconstructing it as a 3D model. The physical object can be measured using 3D scanning technologies like CMMs, laser scanners, structured light digitizers, or industrial CT scanning (computed tomography). The measured data alone, usually represented as a point cloud, lacks topological information and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh. Reverse engineering aims to go beyond producing such a mesh and to recover the design intent in terms of simple analytical surfaces where appropriate (planes, cylinders, etc.) as well as possibly NURBS surfaces to produce a boundary-representation CAD model. Recovery of such a model allows a design to be modified to meet new requirements, a manufacturing plan to be generated, etc.

Hybrid modeling is a commonly used term when NURBS and parametric modeling are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling. Areas of freeform data can be combined with exact geometric surfaces to create a hybrid model. A typical example of this would be the reverse engineering of a cylinder head, which includes freeform cast features, such as water jackets and high-tolerance machined areas.

Reverse engineering is also used by businesses to bring existing physical geometry into digital product development environments, to make a digital 3D record of their own products, or to assess competitors' products. It is used to analyze how a product works, what it does, what components it has; estimate costs; identify potential patent infringement; etc.

Value engineering, a related activity that is also used by businesses, involves deconstructing and analyzing products. However, the objective is to find opportunities for cost-cutting.

Printed circuit boards

Reverse engineering of printed circuit boards involves recreating fabrication data for a particular circuit board. This is done primarily to identify a design, and learn the functional and structural characteristics of a design. It also allows for the discovery of the design principles behind a product, especially if this design information is not easily available.

Outdated PCBs are often subject to reverse engineering, especially when they perform highly critical functions such as powering machinery, or other electronic components. Reverse engineering these old parts can allow the reconstruction of the PCB if it performs some crucial task, as well as finding alternatives which provide the same function, or in upgrading the old PCB. 

Reverse engineering PCBs largely follow the same series of steps. First, images are created by drawing, scanning, or taking photographs of the PCB. Then, these images are ported to suitable reverse engineering software in order to create a rudimentary design for the new PCB. The quality of these images that is necessary for suitable reverse engineering is proportional to the complexity of the PCB itself. More complicated PCBs require well lighted photos on dark backgrounds, while fairly simple PCBs can be recreated simply with just basic dimensioning. Each layer of the PCB is carefully recreated in the software with the intent of producing a final design as close to the initial. Then, the schematics for the circuit are finally generated using an appropriate tool.

Software

In 1990, the Institute of Electrical and Electronics Engineers (IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction" in which the "subject system" is the end product of software development. Reverse engineering is a process of examination only, and the software system under consideration is not modified, which would otherwise be re-engineering or restructuring. Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product.

There are two components in reverse engineering: redocumentation and design recovery. Redocumentation is the creation of new representation of the computer code so that it is easier to understand. Meanwhile, design recovery is the use of deduction or reasoning from general knowledge or personal experience of the product to understand the product's functionality fully. It can also be seen as "going backwards through the development cycle". In this model, the output of the implementation phase (in source code form) is reverse-engineered back to the analysis phase, in an inversion of the traditional waterfall model. Another term for this technique is program comprehension. The Working Conference on Reverse Engineering (WCRE) has been held yearly to explore and expand the techniques of reverse engineering. Computer-aided software engineering (CASE) and automated code generation have contributed greatly in the field of reverse engineering.

Software anti-tamper technology like obfuscation is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, which are perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering. The second usage of the term is more familiar to most people. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.

On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API but has the goals to find bugs and undocumented features by bashing the product from outside.

Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking"), circumvention of access restrictions often present in consumer electronics, customization of embedded systems (such as engine management systems), in-house repairs or retrofits, enabling of additional features on low-cost "crippled" hardware (such as some graphics card chip-sets), or even mere satisfaction of curiosity.

Binary software

Binary reverse engineering is performed if source code for a software is unavailable. This process is sometimes termed reverse code engineering, or RCE. For example, decompilation of binaries for the Java platform can be accomplished by using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PC BIOS, which launched the historic IBM PC compatible industry that has been the overwhelmingly-dominant computer hardware platform for many years. Reverse engineering of software is protected in the US by the fair use exception in copyright law. The Samba software, which allows systems that do not run Microsoft Windows systems to share files with systems that run it, is a classic example of software reverse engineering since the Samba project had to reverse-engineer unpublished information about how Windows file sharing worked so that non-Windows computers could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing that for the Microsoft Office file formats. The ReactOS project is even more ambitious in its goals by striving to provide binary (ABI and API) compatibility with the current Windows operating systems of the NT branch, which allows software and drivers written for Windows to run on a clean-room reverse-engineered free software (GPL) counterpart. WindowsSCOPE allows for reverse-engineering the full contents of a Windows system's live memory including a binary-level, graphical reverse engineering of all running processes.

Another classic, if not well-known, example is that in 1987 Bell Laboratories reverse-engineered the Mac OS System 4.1, originally running on the Apple Macintosh SE, so that it could run it on RISC machines of their own.

Binary software techniques

Reverse engineering of software can be accomplished by various methods. The three main groups of software reverse engineering are

  1. Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, such as for accessing a computer bus or computer network connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a standalone implementation that mimics that behavior. That is especially useful for reverse engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
  2. Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. It works on any computer program but can take quite some time, especially for those who are not used to machine code. The Interactive Disassembler is a particularly popular tool.
  3. Decompilation using a decompiler, a process that tries, with varying results, to recreate the source code in some high-level language for a program only available in machine code or bytecode.

Software classification

Software classification is the process of identifying similarities between different software binaries (such as two different versions of the same binary) used to detect code relations between software samples. The task was traditionally done manually for several reasons (such as patch analysis for vulnerability detection and copyright infringement), but it can now be done somewhat automatically for large numbers of samples.

This method is being used mostly for long and thorough reverse engineering tasks (complete analysis of a complex algorithm or big piece of software). In general, statistical classification is considered to be a hard problem, which is also true for software classification, and so few solutions/tools that handle this task well.

Source code

A number of UML tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering". See List of UML tools.

Although UML is one approach in providing "reverse engineering" more recent advances in international standards activities have resulted in the development of the Knowledge Discovery Metamodel (KDM). The standard delivers an ontology for the intermediate (or abstracted) representation of programming language constructs and their interrelationships. An Object Management Group standard (on its way to becoming an ISO standard as well), KDM has started to take hold in industry with the development of tools and analysis environments that can deliver the extraction and analysis of source, binary, and byte code. For source code analysis, KDM's granular standards' architecture enables the extraction of software system flows (data, control, and call maps), architectures, and business layer knowledge (rules, terms, and process). The standard enables the use of a common data format (XMI) enabling the correlation of the various layers of system knowledge for either detailed analysis (such as root cause, impact) or derived analysis (such as business process extraction). Although efforts to represent language constructs can be never-ending because of the number of languages, the continuous evolution of software languages, and the development of new languages, the standard does allow for the use of extensions to support the broad language set as well as evolution. KDM is compatible with UML, BPMN, RDF, and other standards enabling migration into other environments and thus leverage system knowledge for efforts such as software system transformation and enterprise business layer analysis.

Protocols

Protocols are sets of rules that describe message formats and how messages are exchanged: the protocol state machine. Accordingly, the problem of protocol reverse-engineering can be partitioned into two subproblems: message format and state-machine reverse-engineering.

The message formats have traditionally been reverse-engineered by a tedious manual process, which involved analysis of how protocol implementations process messages, but recent research proposed a number of automatic solutions. Typically, the automatic approaches group observe messages into clusters by using various clustering analyses, or they emulate the protocol implementation tracing the message processing.

There has been less work on reverse-engineering of state-machines of protocols. In general, the protocol state-machines can be learned either through a process of offline learning, which passively observes communication and attempts to build the most general state-machine accepting all observed sequences of messages, and online learning, which allows interactive generation of probing sequences of messages and listening to responses to those probing sequences. In general, offline learning of small state-machines is known to be NP-complete, but online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al. and an online approach by Cho et al.

Other components of typical protocols, like encryption and hash functions, can be reverse-engineered automatically as well. Typically, the automatic approaches trace the execution of protocol implementations and try to detect buffers in memory holding unencrypted packets.

Integrated circuits/smart cards

Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope (SEM). That technique can reveal the complete hardware and software part of the smart card. The major problem for the attacker is to bring everything into the right order to find out how everything works. The makers of the card try to hide keys and operations by mixing up memory positions, such as by bus scrambling.

In some cases, it is even possible to attach a probe to measure voltages while the smart card is still operational. The makers of the card employ sensors to detect and prevent that attack. That attack is not very common because it requires both a large investment in effort and special equipment that is generally available only to large chip manufacturers. Furthermore, the payoff from this attack is low since other security techniques are often used such as shadow accounts. It is still uncertain whether attacks against chip-and-PIN cards to replicate encryption data and then to crack PINs would provide a cost-effective attack on multifactor authentication.

Full reverse engineering proceeds in several major steps.

The first step after images have been taken with a SEM is stitching the images together, which is necessary because each layer cannot be captured by a single shot. A SEM needs to sweep across the area of the circuit and take several hundred images to cover the entire layer. Image stitching takes as input several hundred pictures and outputs a single properly-overlapped picture of the complete layer.

Next, the stitched layers need to be aligned because the sample, after etching, cannot be put into the exact same position relative to the SEM each time. Therefore, the stitched versions will not overlap in the correct fashion, as on the real circuit. Usually, three corresponding points are selected, and a transformation applied on the basis of that.

To extract the circuit structure, the aligned, stitched images need to be segmented, which highlights the important circuitry and separates it from the uninteresting background and insulating materials.

Finally, the wires can be traced from one layer to the next, and the netlist of the circuit, which contains all of the circuit's information, can be reconstructed.

Military applications

Reverse engineering is often used by people to copy other nations' technologies, devices, or information that have been obtained by regular troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Here are well-known examples from the Second World War and later:

  • Jerry can: British and American forces in WW2 noticed that the Germans had gasoline cans with an excellent design. They reverse-engineered copies of those cans, which cans were popularly known as "Jerry cans".
  • Panzerschreck: The Germans captured an American bazooka during the Second World War and reverse engineered it to create the larger Panzerschreck.
  • Tupolev Tu-4: In 1944, three American B-29 bombers on missions over Japan were forced to land in the Soviet Union. The Soviets, who did not have a similar strategic bomber, decided to copy the B-29. Within three years, they had developed the Tu-4, a nearly-perfect copy.
  • SCR-584 radar: copied by the Soviet Union after the Second World War, it is known for a few modifications - СЦР-584, Бинокль-Д.
  • V-2 rocket: Technical documents for the V-2 and related technologies were captured by the Western Allies at the end of the war. The Americans focused their reverse engineering efforts via Operation Paperclip, which led to the development of the PGM-11 Redstone rocket. The Soviets used captured German engineers to reproduce technical documents and plans and worked from captured hardware to make their clone of the rocket, the R-1. Thus began the postwar Soviet rocket program, which led to the R-7 and the beginning of the space race.
  • K-13/R-3S missile (NATO reporting name AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder, was made possible after a Taiwanese (ROCAF) AIM-9B hit a Chinese PLA MiG-17 without exploding in September 1958. The missile became lodged within the airframe, and the pilot returned to base with what Soviet scientists would describe as a university course in missile development.
  • Toophan missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the BGM-71 TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution ending all plans for such co-production. Iran was later successful in reverse-engineering the missile and now produces its own copy, the Toophan.
  • China has reversed engineered many examples of Western and Russian hardware, from fighter aircraft to missiles and HMMWV cars, such as the MiG-15,17,19,21 (which became the J-2,5,6,7) and the Su-33 (which became the J-15).
  • During the Second World War, Polish and British cryptographers studied captured German "Enigma" message encryption machines for weaknesses. Their operation was then simulated on electromechanical devices, "bombes", which tried all the possible scrambler settings of the "Enigma" machines that helped the breaking of coded messages that had been sent by the Germans.
  • Also during the Second World War, British scientists analyzed and defeated a series of increasingly-sophisticated radio navigation systems used by the Luftwaffe to perform guided bombing missions at night. The British countermeasures to the system were so effective that in some cases, German aircraft were led by signals to land at RAF bases since they believed that they had returned to German territory.

Gene networks

Reverse engineering concepts have been applied to biology as well, specifically to the task of understanding the structure and function of gene regulatory networks. They regulate almost every aspect of biological behavior and allow cells to carry out physiological processes and responses to perturbations. Understanding the structure and the dynamic behavior of gene networks is therefore one of the paramount challenges of systems biology, with immediate practical repercussions in several applications that are beyond basic research. There are several methods for reverse engineering gene regulatory networks by using molecular biology and data science methods. They have been generally divided into six classes:

The six classes of gene network inference methods, according to
  • Coexpression methods are based on the notion that if two genes exhibit a similar expression profile, they may be related although no causation can be simply inferred from coexpression.
  • Sequence motif methods analyze gene promoters to find specific transcription factor binding domains. If a transcription factor is predicted to bind a promoter of a specific gene, a regulatory connection can be hypothesized.
  • Chromatin ImmunoPrecipitation (ChIP) methods investigate the genome-wide profile of DNA binding of chosen transcription factors to infer their downstream gene networks.
  • Orthology methods transfer gene network knowledge from one species to another.
  • Literature methods implement text mining and manual research to identify putative or experimentally-proven gene network connections.
  • Transcriptional complexes methods leverage information on protein-protein interactions between transcription factors, thus extending the concept of gene networks to include transcriptional regulatory complexes.

Often, gene network reliability is tested by genetic perturbation experiments followed by dynamic modelling, based on the principle that removing one network node has predictable effects on the functioning of the remaining nodes of the network. Applications of the reverse engineering of gene networks range from understanding mechanisms of plant physiology to the highlighting of new targets for anticancer therapy.

Overlap with patent law

Reverse engineering applies primarily to gaining understanding of a process or artifact in which the manner of its construction, use, or internal processes has not been made clear by its creator.

Patented items do not of themselves have to be reverse-engineered to be studied, for the essence of a patent is that inventors provide a detailed public disclosure themselves, and in return receive legal protection of the invention that is involved. However, an item produced under one or more patents could also include other technology that is not patented and not disclosed. Indeed, one common motivation of reverse engineering is to determine whether a competitor's product contains patent infringement or copyright infringement.

Legality

United States

In the United States, even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained.

Reverse engineering of computer software often falls under both contract law as a breach of contract as well as any other relevant laws. That is because most end-user license agreements specifically prohibit it, and US courts have ruled that if such terms are present, they override the copyright law that expressly permits it (see Bowers v. Baystate Technologies. According to Section 103(f) of the Digital Millennium Copyright Act (17 U.S.C. § 1201 (f)), a person in legal possession of a program may reverse-engineer and circumvent its protection if that is necessary to achieve "interoperability", a term that broadly covers other devices and programs that can interact with it, make use of it, and to use and transfer data to and from it in useful ways. A limited exemption exists that allows the knowledge thus gained to be shared and used for interoperability purposes.

European Union

EU Directive 2009/24 on the legal protection of computer programs, which superseded an earlier (1991) directive, governs reverse engineering in the European Union.

Software development

From Wikipedia, the free encyclopedia
(Redirected from Collaborative software development model)

Software development
is the process used to create software. Programming and maintaining the source code is the central step of this process, but it also includes conceiving the project, evaluating its feasibility, analyzing the business requirements, software design, testing, to release. Software engineering, in addition to development, also includes project management, employee management, and other overhead functions. Software development may be sequential, in which each step is complete before the next begins, but iterative development methods where multiple steps can be executed at once and earlier steps can be revisited have also been devised to improve flexibility, efficiency, and scheduling.

Software development involves professionals from various fields, not just software programmers but also individuals specialized in testing, documentation writing, graphic design, user support, marketing, and fundraising. A number of tools and models are commonly used in software development, such as integrated development environment (IDE), version control, computer-aided software engineering, and software documentation.

Methodologies

Flowchart of the evolutionary prototyping model, an iterative development model

Each of the available methodologies are best suited to specific kinds of projects, based on various technical, organizational, project, and team considerations.

  • The simplest methodology is the "code and fix", typically used by a single programmer working on a small project. After briefly considering the purpose of the program, the programmer codes it and runs it to see if it works. When they are done, the product is released. This methodology is useful for prototypes but cannot be used for more elaborate programs.
  • In the top-down waterfall model, feasibility, analysis, design, development, quality assurance, and implementation occur sequentially in that order. This model requires one step to be complete before the next begins, causing delays, and makes it impossible to revise previous steps if necessary.
  • With iterative processes these steps are interleaved with each other for improved flexibility, efficiency, and more realistic scheduling. Instead of completing the project all at once, one might go through most of the steps with one component at a time. Iterative development also lets developers prioritize the most important features, enabling lower priority ones to be dropped later on if necessary. Agile is one popular method, originally intended for small or medium sized projects, that focuses on giving developers more control over the features that they work on to reduce the risk of time or cost overruns. Derivatives of agile include extreme programming and Scrum. Open-source software development typically uses agile methodology with concurrent design, coding, and testing, due to reliance on a distributed network of volunteer contributors.
  • Beyond agile, some companies integrate information technology (IT) operations with software development, which is called DevOps or DevSecOps including computer security. DevOps includes continuous development, testing, integration of new code in the version control system, deployment of the new code, and sometimes delivery of the code to clients. The purpose of this integration is to deliver IT services more quickly and efficiently.

Another focus in many programming methodologies is the idea of trying to catch issues such as security vulnerabilities and bugs as early as possible (shift-left testing) to reduce the cost of tracking and fixing them.

In 2009, it was estimated that 32 percent of software projects were delivered on time and budget, and with the full functionality. An additional 44 percent were delivered, but missing at least one of these features. The remaining 24 percent were cancelled prior to release.

Steps

Software development life cycle refers to the systematic process of developing applications.

Feasibility

The sources of ideas for software products are plentiful. These ideas can come from market research including the demographics of potential new customers, existing customers, sales prospects who rejected the product, other internal software development staff, or a creative third party. Ideas for software products are usually first evaluated by marketing personnel for economic feasibility, fit with existing channels of distribution, possible effects on existing product lines, required features, and fit with the company's marketing objectives. In the marketing evaluation phase, the cost and time assumptions become evaluated. The feasibility analysis estimates the project's return on investment, its development cost and timeframe. Based on this analysis, the company can make a business decision to invest in further development. After deciding to develop the software, the company is focused on delivering the product at or below the estimated cost and time, and with a high standard of quality (i.e., lack of bugs) and the desired functionality. Nevertheless, most software projects run late and sometimes compromises are made in features or quality to meet a deadline.

Analysis

Software analysis begins with a requirements analysis to capture the business needs of the software. Challenges for the identification of needs are that current or potential users may have different and incompatible needs, may not understand their own needs, and change their needs during the process of software development. Ultimately, the result of analysis is a detailed specification for the product that developers can work from. Software analysts often decompose the project into smaller objects, components that can be reused for increased cost-effectiveness, efficiency, and reliability. Decomposing the project may enable a multi-threaded implementation that runs significantly faster on multiprocessor computers.

During the analysis and design phases of software development, structured analysis is often used to break down the customer's requirements into pieces that can be implemented by software programmers. The underlying logic of the program may be represented in data-flow diagrams, data dictionaries, pseudocode, state transition diagrams, and/or entity relationship diagrams. If the project incorporates a piece of legacy software that has not been modeled, this software may be modeled to help ensure it is correctly incorporated with the newer software.

Design

Design involves choices about the implementation of the software, such as which programming languages and database software to use, or how the hardware and network communications will be organized. Design may be iterative with users consulted about their needs in a process of trial and error. Design often involves people expert in aspect such as database design, screen architecture, and the performance of servers and other hardware. Designers often attempt to find patterns in the software's functionality to spin off distinct modules that can be reused with object-oriented programming. An example of this is the model–view–controller, an interface between a graphical user interface and the backend.

Programming

The central feature of software development is creating and understanding the software that implements the desired functionality. There are various strategies for writing the code. Cohesive software has various components that are independent from each other. Coupling is the interrelation of different software components, which is viewed as undesirable because it increases the difficulty of maintenance. Often, software programmers do not follow industry best practices, resulting in code that is inefficient, difficult to understand, or lacking documentation on its functionality. These standards are especially likely to break down in the presence of deadlines. As a result, testing, debugging, and revising the code becomes much more difficult. Code refactoring, for example adding more comments to the code, is a solution to improve the understandibility of code.

Testing

Testing is the process of ensuring that the code executes correctly and without errors. Debugging is performed by each software developer on their own code to confirm that the code does what it is intended to. In particular, it is crucial that the software executes on all inputs, even if the result is incorrect. Code reviews by other developers are often used to scrutinize new code added to the project, and according to some estimates dramatically reduce the number of bugs persisting after testing is complete. Once the code has been submitted, quality assurance—a separate department of non-programmers for most large companies—test the accuracy of the entire software product. Acceptance tests derived from the original software requirements are a popular tool for this. Quality testing also often includes stress and load checking (whether the software is robust to heavy levels of input or usage), integration testing (to ensure that the software is adequately integrated with other software), and compatibility testing (measuring the software's performance across different operating systems or browsers). When tests are written before the code, this is called test-driven development.

Production

Production is the phase in which software is deployed to the end user. During production, the developer may create technical support resources for users or a process for fixing bugs and errors that were not caught earlier. There might also be a return to earlier development phases if user needs changed or were misunderstood.

Workers

Software development is performed by software developers, usually working on a team. Efficient communications between team members is essential to success. This is more easily achieved if the team is small, used to working together, and located near each other. Communications also help identify problems at an earlier state of development and avoid duplicated effort. Many development projects avoid the risk of losing essential knowledge held by only one employee by ensuring that multiple workers are familiar with each component. Software development involves professionals from various fields, not just software programmers but also individuals specialized in testing, documentation writing, graphic design, user support, marketing, and fundraising. Although workers for proprietary software are paid, most contributors to open-source software are volunteers. Alternately, they may be paid by companies whose business model does not involve selling the software, but something else—such as services and modifications to open source software.

Models and tools

Computer-aided software engineering

Computer-aided software engineering (CASE) is tools for the partial automation of software development. CASE enables designers to sketch out the logic of a program, whether one to be written, or an already existing one to help integrate it with new code or reverse engineer it (for example, to change the programming language).

Documentation

Documentation comes in two forms that are usually kept separate—that intended for software developers, and that made available to the end user to help them use the software. Most developer documentation is in the form of code comments for each file, class, and method that cover the application programming interface (API)—how the piece of software can be accessed by another—and often implementation details. This documentation is helpful for new developers to understand the project when they begin working on it. In agile development, the documentation is often written at the same time as the code. User documentation is more frequently written by technical writers.

Effort estimation

Accurate estimation is crucial at the feasibility stage and in delivering the product on time and within budget. The process of generating estimations is often delegated by the project manager. Because the effort estimation is directly related to the size of the complete application, it is strongly influenced by addition of features in the requirements—the more requirements, the higher the development cost. Aspects not related to functionality, such as the experience of the software developers and code reusability, are also essential to consider in estimation. As of 2019, most of the tools for estimating the amount of time and resources for software development were designed for conventional applications and are not applicable to web applications or mobile applications.

Integrated development environment

Anjuta, a C and C++ IDE for the GNOME environment

An integrated development environment (IDE) supports software development with enhanced features compared to a simple text editor. IDEs often include automated compiling, syntax highlighting of errors, debugging assistance, integration with version control, and semi-automation of tests.

Version control

Version control is a popular way of managing changes made to the software. Whenever a new version is checked in, the software saves a backup of all modified files. If multiple programmers are working on the software simultaneously, it manages the merging of their code changes. The software highlights cases where there is a conflict between two sets of changes and allows programmers to fix the conflict.

View model

The TEAF Matrix of Views and Perspectives

A view model is a framework that provides the viewpoints on the system and its environment, to be used in the software development process. It is a graphical representation of the underlying semantics of a view.

The purpose of viewpoints and views is to enable human engineers to comprehend very complex systems and to organize the elements of the problem around domains of expertise. In the engineering of physically intensive systems, viewpoints often correspond to capabilities and responsibilities within the engineering organization.

Intellectual property

Intellectual property can be an issue when developers integrate open-source code or libraries into a proprietary product, because most open-source licenses used for software require that modifications be released under the same license. As an alternative, developers may choose a proprietary alternative or write their own software module.

Absolute monarchy

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Absolute_monarchy     ...