Search This Blog

Saturday, September 21, 2024

Reverse engineering

From Wikipedia, the free encyclopedia

Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomplishes a task with very little (if any) insight into exactly how it does so. Depending on the system under consideration and the technologies employed, the knowledge gained during reverse engineering can help with repurposing obsolete objects, doing security analysis, or learning how something works.

Although the process is specific to the object on which it is being performed, all reverse engineering processes consist of three basic steps: information extraction, modeling, and review. Information extraction is the practice of gathering all relevant information for performing the operation. Modeling is the practice of combining the gathered information into an abstract model, which can be used as a guide for designing the new object or system. Review is the testing of the model to ensure the validity of the chosen abstract. Reverse engineering is applicable in the fields of computer engineering, mechanical engineering, design, electronic engineering, software engineering, chemical engineering, and systems biology.

Overview

There are many reasons for performing reverse engineering in various fields. Reverse engineering has its origins in the analysis of hardware for commercial or military advantage. However, the reverse engineering process may not always be concerned with creating a copy or changing the artifact in some way. It may be used as part of an analysis to deduce design features from products with little or no additional knowledge about the procedures involved in their original production.

In some cases, the goal of the reverse engineering process can simply be a redocumentation of legacy systems. Even when the reverse-engineered product is that of a competitor, the goal may not be to copy it but to perform competitor analysis. Reverse engineering may also be used to create interoperable products and despite some narrowly-tailored United States and European Union legislation, the legality of using specific reverse engineering techniques for that purpose has been hotly contested in courts worldwide for more than two decades.

Software reverse engineering can help to improve the understanding of the underlying source code for the maintenance and improvement of the software, relevant information can be extracted to make a decision for software development and graphical representations of the code can provide alternate views regarding the source code, which can help to detect and fix a software bug or vulnerability. Frequently, as some software develops, its design information and improvements are often lost over time, but that lost information can usually be recovered with reverse engineering. The process can also help to cut down the time required to understand the source code, thus reducing the overall cost of the software development. Reverse engineering can also help to detect and to eliminate a malicious code written to the software with better code detectors. Reversing a source code can be used to find alternate uses of the source code, such as detecting the unauthorized replication of the source code where it was not intended to be used, or revealing how a competitor's product was built. That process is commonly used for "cracking" software and media to remove their copy protection or to create a possibly-improved copy or even a knockoff, which is usually the goal of a competitor or a hacker.

Malware developers often use reverse engineering techniques to find vulnerabilities in an operating system to build a computer virus that can exploit the system vulnerabilities. Reverse engineering is also being used in cryptanalysis to find vulnerabilities in substitution cipher, symmetric-key algorithm or public-key cryptography.

There are other uses to reverse engineering:

  • Interfacing. Reverse engineering can be used when a system is required to interface to another system and how both systems would negotiate is to be established. Such requirements typically exist for interoperability.
  • Military or commercial espionage. Learning about an enemy's or competitor's latest research by stealing or capturing a prototype and dismantling it may result in the development of a similar product or a better countermeasure against it.
  • Obsolescence. Integrated circuits are often designed on proprietary systems and built on production lines, which become obsolete in only a few years. When systems using those parts can no longer be maintained since the parts are no longer made, the only way to incorporate the functionality into new technology is to reverse-engineer the existing chip and then to redesign it using newer tools by using the understanding gained as a guide. Another obsolescence originated problem that can be solved by reverse engineering is the need to support (maintenance and supply for continuous operation) existing legacy devices that are no longer supported by their original equipment manufacturer. The problem is particularly critical in military operations.
  • Product security analysis. That examines how a product works by determining the specifications of its components and estimate costs and identifies potential patent infringement. Also part of product security analysis is acquiring sensitive data by disassembling and analyzing the design of a system component. Another intent may be to remove copy protection or to circumvent access restrictions.
  • Competitive technical intelligence. That is to understand what one's competitor is actually doing, rather than what it says that it is doing.
  • Saving money. Finding out what a piece of electronics can do may spare a user from purchasing a separate product.
  • Repurposing. Obsolete objects are then reused in a different-but-useful manner.
  • Design. Production and design companies applied Reverse Engineering to practical craft-based manufacturing process. The companies can work on "historical" manufacturing collections through 3D scanning, 3D re-modeling and re-design. In 2013 Italian manufactures Baldi and Savio Firmino together with University of Florence optimized their innovation, design, and production processes.

Common uses

Machines

As computer-aided design (CAD) has become more popular, reverse engineering has become a viable method to create a 3D virtual model of an existing physical part for use in 3D CAD, CAM, CAE, or other software. The reverse-engineering process involves measuring an object and then reconstructing it as a 3D model. The physical object can be measured using 3D scanning technologies like CMMs, laser scanners, structured light digitizers, or industrial CT scanning (computed tomography). The measured data alone, usually represented as a point cloud, lacks topological information and design intent. The former may be recovered by converting the point cloud to a triangular-faced mesh. Reverse engineering aims to go beyond producing such a mesh and to recover the design intent in terms of simple analytical surfaces where appropriate (planes, cylinders, etc.) as well as possibly NURBS surfaces to produce a boundary-representation CAD model. Recovery of such a model allows a design to be modified to meet new requirements, a manufacturing plan to be generated, etc.

Hybrid modeling is a commonly used term when NURBS and parametric modeling are implemented together. Using a combination of geometric and freeform surfaces can provide a powerful method of 3D modeling. Areas of freeform data can be combined with exact geometric surfaces to create a hybrid model. A typical example of this would be the reverse engineering of a cylinder head, which includes freeform cast features, such as water jackets and high-tolerance machined areas.

Reverse engineering is also used by businesses to bring existing physical geometry into digital product development environments, to make a digital 3D record of their own products, or to assess competitors' products. It is used to analyze how a product works, what it does, what components it has; estimate costs; identify potential patent infringement; etc.

Value engineering, a related activity that is also used by businesses, involves deconstructing and analyzing products. However, the objective is to find opportunities for cost-cutting.

Printed circuit boards

Reverse engineering of printed circuit boards involves recreating fabrication data for a particular circuit board. This is done primarily to identify a design, and learn the functional and structural characteristics of a design. It also allows for the discovery of the design principles behind a product, especially if this design information is not easily available.

Outdated PCBs are often subject to reverse engineering, especially when they perform highly critical functions such as powering machinery, or other electronic components. Reverse engineering these old parts can allow the reconstruction of the PCB if it performs some crucial task, as well as finding alternatives which provide the same function, or in upgrading the old PCB. 

Reverse engineering PCBs largely follow the same series of steps. First, images are created by drawing, scanning, or taking photographs of the PCB. Then, these images are ported to suitable reverse engineering software in order to create a rudimentary design for the new PCB. The quality of these images that is necessary for suitable reverse engineering is proportional to the complexity of the PCB itself. More complicated PCBs require well lighted photos on dark backgrounds, while fairly simple PCBs can be recreated simply with just basic dimensioning. Each layer of the PCB is carefully recreated in the software with the intent of producing a final design as close to the initial. Then, the schematics for the circuit are finally generated using an appropriate tool.

Software

In 1990, the Institute of Electrical and Electronics Engineers (IEEE) defined (software) reverse engineering (SRE) as "the process of analyzing a subject system to identify the system's components and their interrelationships and to create representations of the system in another form or at a higher level of abstraction" in which the "subject system" is the end product of software development. Reverse engineering is a process of examination only, and the software system under consideration is not modified, which would otherwise be re-engineering or restructuring. Reverse engineering can be performed from any stage of the product cycle, not necessarily from the functional end product.

There are two components in reverse engineering: redocumentation and design recovery. Redocumentation is the creation of new representation of the computer code so that it is easier to understand. Meanwhile, design recovery is the use of deduction or reasoning from general knowledge or personal experience of the product to understand the product's functionality fully. It can also be seen as "going backwards through the development cycle". In this model, the output of the implementation phase (in source code form) is reverse-engineered back to the analysis phase, in an inversion of the traditional waterfall model. Another term for this technique is program comprehension. The Working Conference on Reverse Engineering (WCRE) has been held yearly to explore and expand the techniques of reverse engineering. Computer-aided software engineering (CASE) and automated code generation have contributed greatly in the field of reverse engineering.

Software anti-tamper technology like obfuscation is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, which are perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering. The second usage of the term is more familiar to most people. Reverse engineering of software can make use of the clean room design technique to avoid copyright infringement.

On a related note, black box testing in software engineering has a lot in common with reverse engineering. The tester usually has the API but has the goals to find bugs and undocumented features by bashing the product from outside.

Other purposes of reverse engineering include security auditing, removal of copy protection ("cracking"), circumvention of access restrictions often present in consumer electronics, customization of embedded systems (such as engine management systems), in-house repairs or retrofits, enabling of additional features on low-cost "crippled" hardware (such as some graphics card chip-sets), or even mere satisfaction of curiosity.

Binary software

Binary reverse engineering is performed if source code for a software is unavailable. This process is sometimes termed reverse code engineering, or RCE. For example, decompilation of binaries for the Java platform can be accomplished by using Jad. One famous case of reverse engineering was the first non-IBM implementation of the PC BIOS, which launched the historic IBM PC compatible industry that has been the overwhelmingly-dominant computer hardware platform for many years. Reverse engineering of software is protected in the US by the fair use exception in copyright law. The Samba software, which allows systems that do not run Microsoft Windows systems to share files with systems that run it, is a classic example of software reverse engineering since the Samba project had to reverse-engineer unpublished information about how Windows file sharing worked so that non-Windows computers could emulate it. The Wine project does the same thing for the Windows API, and OpenOffice.org is one party doing that for the Microsoft Office file formats. The ReactOS project is even more ambitious in its goals by striving to provide binary (ABI and API) compatibility with the current Windows operating systems of the NT branch, which allows software and drivers written for Windows to run on a clean-room reverse-engineered free software (GPL) counterpart. WindowsSCOPE allows for reverse-engineering the full contents of a Windows system's live memory including a binary-level, graphical reverse engineering of all running processes.

Another classic, if not well-known, example is that in 1987 Bell Laboratories reverse-engineered the Mac OS System 4.1, originally running on the Apple Macintosh SE, so that it could run it on RISC machines of their own.

Binary software techniques

Reverse engineering of software can be accomplished by various methods. The three main groups of software reverse engineering are

  1. Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, such as for accessing a computer bus or computer network connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a standalone implementation that mimics that behavior. That is especially useful for reverse engineering device drivers. Sometimes, reverse engineering on embedded systems is greatly assisted by tools deliberately introduced by the manufacturer, such as JTAG ports or other debugging means. In Microsoft Windows, low-level debuggers such as SoftICE are popular.
  2. Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics. It works on any computer program but can take quite some time, especially for those who are not used to machine code. The Interactive Disassembler is a particularly popular tool.
  3. Decompilation using a decompiler, a process that tries, with varying results, to recreate the source code in some high-level language for a program only available in machine code or bytecode.

Software classification

Software classification is the process of identifying similarities between different software binaries (such as two different versions of the same binary) used to detect code relations between software samples. The task was traditionally done manually for several reasons (such as patch analysis for vulnerability detection and copyright infringement), but it can now be done somewhat automatically for large numbers of samples.

This method is being used mostly for long and thorough reverse engineering tasks (complete analysis of a complex algorithm or big piece of software). In general, statistical classification is considered to be a hard problem, which is also true for software classification, and so few solutions/tools that handle this task well.

Source code

A number of UML tools refer to the process of importing and analysing source code to generate UML diagrams as "reverse engineering". See List of UML tools.

Although UML is one approach in providing "reverse engineering" more recent advances in international standards activities have resulted in the development of the Knowledge Discovery Metamodel (KDM). The standard delivers an ontology for the intermediate (or abstracted) representation of programming language constructs and their interrelationships. An Object Management Group standard (on its way to becoming an ISO standard as well), KDM has started to take hold in industry with the development of tools and analysis environments that can deliver the extraction and analysis of source, binary, and byte code. For source code analysis, KDM's granular standards' architecture enables the extraction of software system flows (data, control, and call maps), architectures, and business layer knowledge (rules, terms, and process). The standard enables the use of a common data format (XMI) enabling the correlation of the various layers of system knowledge for either detailed analysis (such as root cause, impact) or derived analysis (such as business process extraction). Although efforts to represent language constructs can be never-ending because of the number of languages, the continuous evolution of software languages, and the development of new languages, the standard does allow for the use of extensions to support the broad language set as well as evolution. KDM is compatible with UML, BPMN, RDF, and other standards enabling migration into other environments and thus leverage system knowledge for efforts such as software system transformation and enterprise business layer analysis.

Protocols

Protocols are sets of rules that describe message formats and how messages are exchanged: the protocol state machine. Accordingly, the problem of protocol reverse-engineering can be partitioned into two subproblems: message format and state-machine reverse-engineering.

The message formats have traditionally been reverse-engineered by a tedious manual process, which involved analysis of how protocol implementations process messages, but recent research proposed a number of automatic solutions. Typically, the automatic approaches group observe messages into clusters by using various clustering analyses, or they emulate the protocol implementation tracing the message processing.

There has been less work on reverse-engineering of state-machines of protocols. In general, the protocol state-machines can be learned either through a process of offline learning, which passively observes communication and attempts to build the most general state-machine accepting all observed sequences of messages, and online learning, which allows interactive generation of probing sequences of messages and listening to responses to those probing sequences. In general, offline learning of small state-machines is known to be NP-complete, but online learning can be done in polynomial time. An automatic offline approach has been demonstrated by Comparetti et al. and an online approach by Cho et al.

Other components of typical protocols, like encryption and hash functions, can be reverse-engineered automatically as well. Typically, the automatic approaches trace the execution of protocol implementations and try to detect buffers in memory holding unencrypted packets.

Integrated circuits/smart cards

Reverse engineering is an invasive and destructive form of analyzing a smart card. The attacker uses chemicals to etch away layer after layer of the smart card and takes pictures with a scanning electron microscope (SEM). That technique can reveal the complete hardware and software part of the smart card. The major problem for the attacker is to bring everything into the right order to find out how everything works. The makers of the card try to hide keys and operations by mixing up memory positions, such as by bus scrambling.

In some cases, it is even possible to attach a probe to measure voltages while the smart card is still operational. The makers of the card employ sensors to detect and prevent that attack. That attack is not very common because it requires both a large investment in effort and special equipment that is generally available only to large chip manufacturers. Furthermore, the payoff from this attack is low since other security techniques are often used such as shadow accounts. It is still uncertain whether attacks against chip-and-PIN cards to replicate encryption data and then to crack PINs would provide a cost-effective attack on multifactor authentication.

Full reverse engineering proceeds in several major steps.

The first step after images have been taken with a SEM is stitching the images together, which is necessary because each layer cannot be captured by a single shot. A SEM needs to sweep across the area of the circuit and take several hundred images to cover the entire layer. Image stitching takes as input several hundred pictures and outputs a single properly-overlapped picture of the complete layer.

Next, the stitched layers need to be aligned because the sample, after etching, cannot be put into the exact same position relative to the SEM each time. Therefore, the stitched versions will not overlap in the correct fashion, as on the real circuit. Usually, three corresponding points are selected, and a transformation applied on the basis of that.

To extract the circuit structure, the aligned, stitched images need to be segmented, which highlights the important circuitry and separates it from the uninteresting background and insulating materials.

Finally, the wires can be traced from one layer to the next, and the netlist of the circuit, which contains all of the circuit's information, can be reconstructed.

Military applications

Reverse engineering is often used by people to copy other nations' technologies, devices, or information that have been obtained by regular troops in the fields or by intelligence operations. It was often used during the Second World War and the Cold War. Here are well-known examples from the Second World War and later:

  • Jerry can: British and American forces in WW2 noticed that the Germans had gasoline cans with an excellent design. They reverse-engineered copies of those cans, which cans were popularly known as "Jerry cans".
  • Nakajima G5N: In 1939, the U.S. Douglas Aircraft Company sold its DC-4E airliner prototype to Imperial Japanese Airways, which was secretly acting as a front for the Imperial Japanese Navy, which wanted a long-range strategic bomber but had been hindered by the Japanese aircraft industry's inexperience with heavy long-range aircraft. The DC-4E was transferred to the Nakajima Aircraft Company and dismantled for study; as a cover story, the Japanese press reported that it had crashed in Tokyo Bay. The wings, engines, and landing gear of the G5N were copied directly from the DC-4E.
  • Panzerschreck: The Germans captured an American bazooka during the Second World War and reverse engineered it to create the larger Panzerschreck.
  • Tupolev Tu-4: In 1944, three American B-29 bombers on missions over Japan were forced to land in the Soviet Union. The Soviets, who did not have a similar strategic bomber, decided to copy the B-29. Within three years, they had developed the Tu-4, a nearly-perfect copy.
  • SCR-584 radar: copied by the Soviet Union after the Second World War, it is known for a few modifications - СЦР-584, Бинокль-Д.
  • V-2 rocket: Technical documents for the V-2 and related technologies were captured by the Western Allies at the end of the war. The Americans focused their reverse engineering efforts via Operation Paperclip, which led to the development of the PGM-11 Redstone rocket. The Soviets used captured German engineers to reproduce technical documents and plans and worked from captured hardware to make their clone of the rocket, the R-1. Thus began the postwar Soviet rocket program, which led to the R-7 and the beginning of the space race.
  • K-13/R-3S missile (NATO reporting name AA-2 Atoll), a Soviet reverse-engineered copy of the AIM-9 Sidewinder, was made possible after a Taiwanese (ROCAF) AIM-9B hit a Chinese PLA MiG-17 without exploding in September 1958. The missile became lodged within the airframe, and the pilot returned to base with what Soviet scientists would describe as a university course in missile development.
  • Toophan missile: In May 1975, negotiations between Iran and Hughes Missile Systems on co-production of the BGM-71 TOW and Maverick missiles stalled over disagreements in the pricing structure, the subsequent 1979 revolution ending all plans for such co-production. Iran was later successful in reverse-engineering the missile and now produces its own copy, the Toophan.
  • China has reversed engineered many examples of Western and Russian hardware, from fighter aircraft to missiles and HMMWV cars, such as the MiG-15,17,19,21 (which became the J-2,5,6,7) and the Su-33 (which became the J-15).
  • During the Second World War, Polish and British cryptographers studied captured German "Enigma" message encryption machines for weaknesses. Their operation was then simulated on electromechanical devices, "bombes", which tried all the possible scrambler settings of the "Enigma" machines that helped the breaking of coded messages that had been sent by the Germans.
  • Also during the Second World War, British scientists analyzed and defeated a series of increasingly-sophisticated radio navigation systems used by the Luftwaffe to perform guided bombing missions at night. The British countermeasures to the system were so effective that in some cases, German aircraft were led by signals to land at RAF bases since they believed that they had returned to German territory.

Gene networks

Reverse engineering concepts have been applied to biology as well, specifically to the task of understanding the structure and function of gene regulatory networks. They regulate almost every aspect of biological behavior and allow cells to carry out physiological processes and responses to perturbations. Understanding the structure and the dynamic behavior of gene networks is therefore one of the paramount challenges of systems biology, with immediate practical repercussions in several applications that are beyond basic research. There are several methods for reverse engineering gene regulatory networks by using molecular biology and data science methods. They have been generally divided into six classes:

The six classes of gene network inference methods, according to
  • Coexpression methods are based on the notion that if two genes exhibit a similar expression profile, they may be related although no causation can be simply inferred from coexpression.
  • Sequence motif methods analyze gene promoters to find specific transcription factor binding domains. If a transcription factor is predicted to bind a promoter of a specific gene, a regulatory connection can be hypothesized.
  • Chromatin ImmunoPrecipitation (ChIP) methods investigate the genome-wide profile of DNA binding of chosen transcription factors to infer their downstream gene networks.
  • Orthology methods transfer gene network knowledge from one species to another.
  • Literature methods implement text mining and manual research to identify putative or experimentally-proven gene network connections.
  • Transcriptional complexes methods leverage information on protein-protein interactions between transcription factors, thus extending the concept of gene networks to include transcriptional regulatory complexes.

Often, gene network reliability is tested by genetic perturbation experiments followed by dynamic modelling, based on the principle that removing one network node has predictable effects on the functioning of the remaining nodes of the network. Applications of the reverse engineering of gene networks range from understanding mechanisms of plant physiology to the highlighting of new targets for anticancer therapy.

Overlap with patent law

Reverse engineering applies primarily to gaining understanding of a process or artifact in which the manner of its construction, use, or internal processes has not been made clear by its creator.

Patented items do not of themselves have to be reverse-engineered to be studied, for the essence of a patent is that inventors provide a detailed public disclosure themselves, and in return receive legal protection of the invention that is involved. However, an item produced under one or more patents could also include other technology that is not patented and not disclosed. Indeed, one common motivation of reverse engineering is to determine whether a competitor's product contains patent infringement or copyright infringement.

Legality

United States

In the United States, even if an artifact or process is protected by trade secrets, reverse-engineering the artifact or process is often lawful if it has been legitimately obtained.

Reverse engineering of computer software often falls under both contract law as a breach of contract as well as any other relevant laws. That is because most end-user license agreements specifically prohibit it, and US courts have ruled that if such terms are present, they override the copyright law that expressly permits it (see Bowers v. Baystate Technologies). According to Section 103(f) of the Digital Millennium Copyright Act (17 U.S.C. § 1201 (f)), a person in legal possession of a program may reverse-engineer and circumvent its protection if that is necessary to achieve "interoperability", a term that broadly covers other devices and programs that can interact with it, make use of it, and to use and transfer data to and from it in useful ways. A limited exemption exists that allows the knowledge thus gained to be shared and used for interoperability purposes.

European Union

EU Directive 2009/24 on the legal protection of computer programs, which superseded an earlier (1991) directive, governs reverse engineering in the European Union.

Software cracking

From Wikipedia, the free encyclopedia
Software crack illustration

Software cracking (known as "breaking" mostly in the 1980s) is an act of removing copy protection from a software. Copy protection can be removed by applying a specific crack. A crack can mean any tool that enables breaking software protection, a stolen product key, or guessed password. Cracking software generally involves circumventing licensing and usage restrictions on commercial software by illegal methods. These methods can include modifying code directly through disassembling and bit editing, sharing stolen product keys, or developing software to generate activation keys. Examples of cracks are: applying a patch or by creating reverse-engineered serial number generators known as keygens, thus bypassing software registration and payments or converting a trial/demo version of the software into fully-functioning software without paying for it. Software cracking contributes to the rise of online piracy where pirated software is distributed to end-users through filesharing sites like BitTorrent, One click hosting (OCH), or via Usenet downloads, or by downloading bundles of the original software with cracks or keygens.

Some of these tools are called keygen, patch, loader, or no-disc crack. A keygen is a handmade product serial number generator that often offers the ability to generate working serial numbers in your own name. A patch is a small computer program that modifies the machine code of another program. This has the advantage for a cracker to not include a large executable in a release when only a few bytes are changed. A loader modifies the startup flow of a program and does not remove the protection but circumvents it. A well-known example of a loader is a trainer used to cheat in games. Fairlight pointed out in one of their .nfo files that these type of cracks are not allowed for warez scene game releases. A nukewar has shown that the protection may not kick in at any point for it to be a valid crack.

Software cracking is closely related to reverse engineering because the process of attacking a copy protection technology, is similar to the process of reverse engineering. The distribution of cracked copies is illegal in most countries. There have been lawsuits over cracking software. It might be legal to use cracked software in certain circumstances. Educational resources for reverse engineering and software cracking are, however, legal and available in the form of Crackme programs.

History

Software are inherently expensive to produce but cheap to duplicate and distribute. Therefore, software producers generally tried to implement some form of copy protection before releasing it to the market. In 1984, Laind Huntsman, the head of software development for Formaster, a software protection company, commented that "no protection system has remained uncracked by enterprising programmers for more than a few months". In 2001, Dan S. Wallach, a professor from Rice University, argued that "those determined to bypass copy-protection have always found ways to do so – and always will".

Most of the early software crackers were computer hobbyists who often formed groups that competed against each other in the cracking and spreading of software. Breaking a new copy protection scheme as quickly as possible was often regarded as an opportunity to demonstrate one's technical superiority rather than a possibility of money-making. Software crackers usually did not benefit materially from their actions and their motivation was the challenge itself of removing the protection. Some low skilled hobbyists would take already cracked software and edit various unencrypted strings of text in it to change messages a game would tell a game player, often something considered vulgar. Uploading the altered copies on file sharing networks provided a source of laughs for adult users. The cracker groups of the 1980s started to advertise themselves and their skills by attaching animated screens known as crack intros in the software programs they cracked and released. Once the technical competition had expanded from the challenges of cracking to the challenges of creating visually stunning intros, the foundations for a new subculture known as demoscene were established. Demoscene started to separate itself from the illegal "warez scene" during the 1990s and is now regarded as a completely different subculture. Many software crackers have later grown into extremely capable software reverse engineers; the deep knowledge of assembly required in order to crack protections enables them to reverse engineer drivers in order to port them from binary-only drivers for Windows to drivers with source code for Linux and other free operating systems. Also because music and game intro was such an integral part of gaming the music format and graphics became very popular when hardware became affordable for the home user.

With the rise of the Internet, software crackers developed secretive online organizations. In the latter half of the nineties, one of the most respected sources of information about "software protection reversing" was Fravia's website.

In 2017, a group of software crackers started a project to preserve Apple II software by removing the copy protection.

+HCU

The High Cracking University (+HCU) was founded by Old Red Cracker (+ORC), considered a genius of reverse engineering and a legendary figure in Reverse Code Engineering (RCE), to advance research into RCE. He had also taught and authored many papers on the subject, and his texts are considered classics in the field and are mandatory reading for students of RCE.

The addition of the "+" sign in front of the nickname of a reverser signified membership in the +HCU. Amongst the students of +HCU were the top of the elite Windows reversers worldwide. +HCU published a new reverse engineering problem annually and a small number of respondents with the best replies qualified for an undergraduate position at the university.

+Fravia was a professor at +HCU. Fravia's website was known as "+Fravia's Pages of Reverse Engineering" and he used it to challenge programmers as well as the wider society to "reverse engineer" the "brainwashing of a corrupt and rampant materialism". In its heyday, his website received millions of visitors per year and its influence was "widespread". On his site, +Fravia also maintained a database of the tutorials generated by +HCU students for posterity.

Nowadays most of the graduates of +HCU have migrated to Linux and few have remained as Windows reversers. The information at the university has been rediscovered by a new generation of researchers and practitioners of RCE who have started new research projects in the field.

Methods

The most common software crack is the modification of an application's binary to cause or prevent a specific key branch in the program's execution. This is accomplished by reverse engineering the compiled program code using a debugger such as SoftICE, OllyDbg, GDB, or MacsBug until the software cracker reaches the subroutine that contains the primary method of protecting the software (or by disassembling an executable file with a program such as IDA). The binary is then modified using the debugger or a hex editor such as HIEW or monitor in a manner that replaces a prior branching opcode with its complement or a NOP opcode so the key branch will either always execute a specific subroutine or skip over it. Almost all common software cracks are a variation of this type. A region of code that must not be entered is often called a "bad boy" while one that should be followed is a "good boy".

Proprietary software developers are constantly developing techniques such as code obfuscation, encryption, and self-modifying code to make binary modification increasingly difficult. Even with these measures being taken, developers struggle to combat software cracking. This is because it is very common for a professional to publicly release a simple cracked EXE or Retrium Installer for public download, eliminating the need for inexperienced users to crack the software themselves.

A specific example of this technique is a crack that removes the expiration period from a time-limited trial of an application. These cracks are usually programs that alter the program executable and sometimes the .dll or .so linked to the application and the process of altering the original binary files is called patching. Similar cracks are available for software that requires a hardware dongle. A company can also break the copy protection of programs that they have legally purchased but that are licensed to particular hardware, so that there is no risk of downtime due to hardware failure (and, of course, no need to restrict oneself to running the software on bought hardware only).

Another method is the use of special software such as CloneCD to scan for the use of a commercial copy protection application. After discovering the software used to protect the application, another tool may be used to remove the copy protection from the software on the CD or DVD. This may enable another program such as Alcohol 120%, CloneDVD, Game Jackal, or Daemon Tools to copy the protected software to a user's hard disk. Popular commercial copy protection applications which may be scanned for include SafeDisc and StarForce.

In other cases, it might be possible to decompile a program in order to get access to the original source code or code on a level higher than machine code. This is often possible with scripting languages and languages utilizing JIT compilation. An example is cracking (or debugging) on the .NET platform where one might consider manipulating CIL to achieve one's needs. Java's bytecode also works in a similar fashion in which there is an intermediate language before the program is compiled to run on the platform dependent machine code.

Advanced reverse engineering for protections such as SecuROM, SafeDisc, StarForce, or Denuvo requires a cracker, or many crackers to spend much more time studying the protection, eventually finding every flaw within the protection code, and then coding their own tools to "unwrap" the protection automatically from executable (.EXE) and library (.DLL) files.

There are a number of sites on the Internet that let users download cracks produced by warez groups for popular games and applications (although at the danger of acquiring malicious software that is sometimes distributed via such sites). Although these cracks are used by legal buyers of software, they can also be used by people who have downloaded or otherwise obtained unauthorized copies (often through P2P networks).

Software piracy

Software cracking led to the distribution of pirated software around the world (software piracy). It was estimated that the United States lost US$2.3 billion in business application software in 1996. Software piracy rates were especially prevalent in African, Asian, East European, and Latin American countries. In certain countries such as Indonesia, Pakistan, Kuwait, China, and El Salvador, 90% of the software used was pirated.

Disk image

From Wikipedia, the free encyclopedia

A disk image is a snapshot of a storage device's structure and data typically stored in one or more computer files on another storage device.

Traditionally, disk images were bit-by-bit copies of every sector on a hard disk often created for digital forensic purposes, but it is now common to only copy allocated data to reduce storage space. Compression and deduplication are commonly used to reduce the size of the image file set.

Disk imaging is done for a variety of purposes including digital forensics, cloud computing, system administration, as part of a backup strategy, and legacy emulation as part of a digital preservation strategy. Disk images can be made in a variety of formats depending on the purpose. Virtual disk images (such as VHD and VMDK) are intended to be used for cloud computing, ISO images are intended to emulate optical media and raw disk images are used for forensic purposes. Proprietary formats are typically used by disk imaging software.

Despite the benefits of disk imaging the storage costs can be high, management can be difficult and they can be time consuming to create.

Background

Disk images were originally (in the late 1960s) used for backup and disk cloning of mainframe disk media. Early ones were as small as 5 megabytes and as large as 330 megabytes, and the copy medium was magnetic tape, which ran as large as 200 megabytes per reel. Disk images became much more popular when floppy disk media became popular, where replication or storage of an exact structure was necessary and efficient, especially in the case of copy protected floppy disks.

Disk image creation is called disk imaging and is often time consuming, even with a fast computer, because the entire disk must be copied. Typically, disk imaging requires a third party disk imaging program or backup software. The software required varies according to the type of disk image that needs to be created. For example, RawWrite and WinImage create floppy disk image files for MS-DOS and Microsoft Windows. In Unix or similar systems the dd program can be used to create raw disk images. Apple Disk Copy can be used on Classic Mac OS and macOS systems to create and write disk image files.

Authoring software for CDs/DVDs such as Nero Burning ROM can generate and load disk images for optical media. A virtual disk writer or virtual burner is a computer program that emulates an actual disc authoring device such as a CD writer or DVD writer. Instead of writing data to an actual disc, it creates a virtual disk image. A virtual burner, by definition, appears as a disc drive in the system with writing capabilities (as opposed to conventional disc authoring programs that can create virtual disk images), thus allowing software that can burn discs to create virtual discs.

Uses

Digital forensics

Forensic imaging is the process of creating a bit-by-bit copy of the data on the drive, including files, metadata, volume information, filesystems and their structure. Often, these images are also hashed to verify their integrity and that they have not been altered since being created. Unlike disk imaging for other purposes, digital forensic applications take a bit-by-bit copy to ensure forensic soundness. The purposes of imaging the disk is to not only discover evidence preserved in digital information but also to examine the drive to gather clues of how the crime was committed.

Virtualization

Creating a virtual disk image of optical media or a hard disk drive is typically done to make the content available to one or more virtual machines. Virtual machines emulate a CD/DVD drive by reading an ISO image. This can also be faster than reading from the physical optical medium. Further, there are less issues with wear and tear. A hard disk drive or solid-state drive in a virtual machine is implemented as a disk image (i.e. either the VHD format used by Microsoft's Hyper-V, the VDI format used by Oracle Corporation's VirtualBox, the VMDK format used for VMware virtual machines, or the QCOW format used by QEMU). Virtual hard disk images tend to be stored as either a collection of files (where each one is typically 2GB in size), or as a single file. Virtual machines treat the image set as a physical drive.

Rapid deployment of systems

Educational institutions and businesses can often need to buy or replace computer systems in large numbers. Disk imaging is commonly used to rapidly deploy the same configuration across workstations. Disk imaging software is used to create an image of a completely-configured system (such an image is sometimes called a golden image). This image is then written to a computer's hard disk (which is sometimes described as restoring an image).

Network-based image deployment

Image restoration can be done using network-based image deployment. This method uses a PXE server to boot an operating system over a computer network that contains the necessary components to image or restore storage media in a computer. This is usually used in conjunction with a DHCP server to automate the configuration of network parameters including IP addresses. Multicasting, broadcasting or unicasting tend to be used to restore an image to many computers simultaneously. These approaches do not work well if one or more computers experience packet loss. As a result, some imaging solutions use the BitTorrent protocol to overcome this problem.

Network-based image deployment reduces the need to maintain and update individual systems manually. Imaging is also easier than automated setup methods because an administrator does not need to have knowledge of the prior configuration to copy it.

Backup strategy

A disk image contains all files and data (i.e., file attributes and the file fragmentation state). For this reason, it is also used for backing up optical media (CDs and DVDs, etc.), and allows the exact and efficient recovery after experimenting with modifications to a system or virtual machine. Typically, disk imaging can be used to quickly restore an entire system to an operational state after a disaster.

Digital preservation

Libraries and museums are typically required to archive and digitally preserve information without altering it in any manner. Emulators frequently use disk images to emulate floppy disks that have been preserved. This is usually simpler to program than accessing a real floppy drive (particularly if the disks are in a format not supported by the host operating system), and allows a large library of software to be managed. Emulation also allows existing disk images to be put into a usable form even though the data contained in the image is no longer readable without emulation.

Limitations

Disk imaging is time consuming, the space requirements are high and reading from them can be slower than reading from the disk directly because of a performance overhead.

Other limitations can be the lack of access to software required to read the contents of the image. For example, prior to Windows 8, third party software was required to mount disk images. When imaging multiple computers with only minor differences, much data is duplicated unnecessarily, wasting space.

Speed and failure

Disk imaging can be slow, especially for older storage devices. A typical 4.7 GB DVD can take an average of 18 minutes to duplicate. Floppy disks read and write much slower than hard disks. Therefore, despite their small size, it can take several minutes to copy a single disk. In some cases, disk imaging can fail due to bad sectors or physical wear and tear on the source device. Unix utilities (such as dd) are not designed to cope with failures, causing the disk image creation process to fail. When data recovery is the end goal, it is instead recommended to use more specialised tools (such as ddrescue).

Data recovery

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Data_recovery

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way.  The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Logical failures occur when the hard drive devices are functional but the user or automated-OS cannot retrieve or access data stored on them. Logical failures can occur due to corruption of the engineering chip, lost partitions, firmware failure, or failures during formatting/re-installation.

Data recovery can be a very simple or technical challenge. This is why there are specific software companies specialized in this field.

About

The most common data recovery scenarios involve an operating system failure, malfunction of a storage device, logical failure of storage devices, accidental damage or deletion, etc. (typically, on a single-drive, single-partition, single-OS system), in which case the ultimate goal is simply to copy all important files from the damaged media to another new drive. This can be accomplished using a Live CD, or DVD by booting directly from a ROM or a USB drive instead of the corrupted drive in question. Many Live CDs or DVDs provide a means to mount the system drive and backup drives or removable media, and to move the files from the system drive to the backup media with a file manager or optical disc authoring software. Such cases can often be mitigated by disk partitioning and consistently storing valuable data files (or copies of them) on a different partition from the replaceable OS system files.

Another scenario involves a drive-level failure, such as a compromised file system or drive partition, or a hard disk drive failure. In any of these cases, the data is not easily read from the media devices. Depending on the situation, solutions involve repairing the logical file system, partition table, or master boot record, or updating the firmware or drive recovery techniques ranging from software-based recovery of corrupted data, to hardware- and software-based recovery of damaged service areas (also known as the hard disk drive's "firmware"), to hardware replacement on a physically damaged drive which allows for the extraction of data to a new drive. If a drive recovery is necessary, the drive itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read.

In a third scenario, files have been accidentally "deleted" from a storage medium by the users. Typically, the contents of deleted files are not removed immediately from the physical drive; instead, references to them in the directory structure are removed, and thereafter space the deleted data occupy is made available for later data overwriting. In the mind of end users, deleted files cannot be discoverable through a standard file manager, but the deleted data still technically exists on the physical drive. In the meantime, the original file contents remain, often several disconnected fragments, and may be recoverable if not overwritten by other data files.

The term "data recovery" is also used in the context of forensic applications or espionage, where data which have been encrypted, hidden, or deleted, rather than damaged, are recovered. Sometimes data present in the computer gets encrypted or hidden due to reasons like virus attacks which can only be recovered by some computer forensic experts.

Physical damage

A wide variety of failures can cause physical damage to storage media, which may result from human errors and natural disasters. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer from a multitude of mechanical failures, such as head crashes, PCB failure, and failed motors; tapes can simply break.

Physical damage to a hard drive, even in cases where a head crash has occurred, does not necessarily mean there will be a permanent loss of data. The techniques employed by many professional data recovery companies can typically salvage most, if not all, of the data that had been lost when the failure occurred.

Of course, there are exceptions to this, such as cases where severe damage to the hard drive platters may have occurred. However, if the hard drive can be repaired and a full image or clone created, then the logical file structure can be rebuilt in most instances.

Most physical damage cannot be repaired by end users. For example, opening a hard disk drive in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head. During normal operation, read/write heads float 3 to 6 nanometers above the platter surface, and the average dust particles found in a normal environment are typically around 30,000 nanometers in diameter. When these dust particles get caught between the read/write heads and the platter, they can cause new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, data recovery companies are often employed to salvage important data with the more reputable ones using class 100 dust- and static-free cleanrooms.

Recovery techniques

Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analyzed for logical damage and will possibly allow much of the original file system to be reconstructed.

Hardware repair

Media that has suffered a catastrophic electronic failure requires data recovery in order to salvage its contents.

A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives. Electronics boards of modern drives usually contain drive-specific adaptation data (generally a map of bad sectors and tuning parameters) and other information required to properly access data on the drive. Replacement boards often need this information to effectively recover all of the data. The replacement board may need to be reprogrammed. Some manufacturers (Seagate, for example) store this information on a serial EEPROM chip, which can be removed and transferred to the replacement board.

Each hard disk drive has what is called a system area or service area; this portion of the drive, which is not directly accessible to the end user, usually contains drive's firmware and adaptive data that helps the drive operate within normal parameters. One function of the system area is to log defective sectors within the drive; essentially telling the drive where it can and cannot write data.

The sector lists are also stored on various chips attached to the PCB, and they are unique to each hard disk drive. If the data on the PCB do not match what is stored on the platter, then the drive will not calibrate properly. In most cases the drive heads will click because they are unable to find the data matching what is stored on the PCB.

Logical damage

Result of a failed data recovery from a hard disk drive.

The term "logical damage" refers to situations in which the error is not a problem in the hardware and requires software-level solutions.

Corrupt partitions and file systems, media errors

In some cases, data on a hard disk drive can be unreadable due to damage to the partition table or file system, or to (intermittent) media errors. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged partition table or file system using specialized data recovery software such as Testdisk; software like ddrescue can image media despite intermittent errors, and image raw data when there is partition table or file system damage. This type of data recovery can be performed by people without expertise in drive hardware as it requires no special physical equipment or access to platters.

Sometimes data can be recovered using relatively simple methods and tools; more serious cases can require expert intervention, particularly if parts of files are irrecoverable. Data carving is the recovery of parts of damaged files using knowledge of their structure.

Overwritten data

After data has been physically overwritten on a hard disk drive, it is generally assumed that the previous data are no longer possible to recover. In 1996, Peter Gutmann, a computer scientist, presented a paper that suggested overwritten data could be recovered through the use of magnetic force microscopy. In 2001, he presented another paper on a similar topic. To guard against this type of data recovery, Gutmann and Colin Plumb designed a method of irreversibly scrubbing data, known as the Gutmann method and used by several disk-scrubbing software packages.

Substantial criticism has followed, primarily dealing with the lack of any concrete examples of significant amounts of overwritten data being recovered. Gutmann's article contains a number of errors and inaccuracies, particularly regarding information about how data is encoded and processed on hard drives. Although Gutmann's theory may be correct, there is no practical evidence that overwritten data can be recovered, while research has shown to support that overwritten data cannot be recovered.

Solid-state drives (SSD) overwrite data differently from hard disk drives (HDD) which makes at least some of their data easier to recover. Most SSDs use flash memory to store data in pages and blocks, referenced by logical block addresses (LBA) which are managed by the flash translation layer (FTL). When the FTL modifies a sector it writes the new data to another location and updates the map so the new data appear at the target LBA. This leaves the pre-modification data in place, with possibly many generations, and recoverable by data recovery software.

Lost, deleted, and formatted data

Sometimes, data present in the physical drives (Internal/External Hard disk, Pen Drive, etc.) gets lost, deleted and formatted due to circumstances like virus attack, accidental deletion or accidental use of SHIFT+DELETE. In these cases, data recovery software is used to recover/restore the data files.

Logical bad sector

In the list of logical failures of hard disks, a logical bad sector is the most common fault leading data not to be readable. Sometimes it is possible to sidestep error detection even in software, and perhaps with repeated reading and statistical analysis recover at least some of the underlying stored data. Sometimes prior knowledge of the data stored and the error detection and correction codes can be used to recover even erroneous data. However, if the underlying physical drive is degraded badly enough, at least the hardware surrounding the data must be replaced, or it might even be necessary to apply laboratory techniques to the physical recording medium. Each of the approaches is progressively more expensive, and as such progressively more rarely sought.

Eventually, if the final, physical storage medium has indeed been disturbed badly enough, recovery will not be possible using any means; the information has irreversibly been lost.

Remote data recovery

Recovery experts do not always need to have physical access to the damaged hardware. When the lost data can be recovered by software techniques, they can often perform the recovery using remote access software over the Internet, LAN or other connection to the physical location of the damaged media. The process is essentially no different from what the end user could perform by themselves.

Remote recovery requires a stable connection with an adequate bandwidth. However, it is not applicable where access to the hardware is required, as in cases of physical damage.

Four phases of data recovery

Usually, there are four phases when it comes to successful data recovery, though that can vary depending on the type of data corruption and recovery required.

Phase 1
Repair the hard disk drive
The hard drive is repaired in order to get it running in some form, or at least in a state suitable for reading the data from it. For example, if heads are bad they need to be changed; if the PCB is faulty then it needs to be fixed or replaced; if the spindle motor is bad the platters and heads should be moved to a new drive.
Phase 2
Image the drive to a new drive or a disk image file
When a hard disk drive fails, the importance of getting the data off the drive is the top priority. The longer a faulty drive is used, the more likely further data loss is to occur. Creating an image of the drive will ensure that there is a secondary copy of the data on another device, on which it is safe to perform testing and recovery procedures without harming the source.
Phase 3
Logical recovery of files, partition, MBR and filesystem structures
After the drive has been cloned to a new drive, it is suitable to attempt the retrieval of lost data. If the drive has failed logically, there are a number of reasons for that. Using the clone it may be possible to repair the partition table or master boot record (MBR) in order to read the file system's data structure and retrieve stored data.
Phase 4
Repair damaged files that were retrieved
Data damage can be caused when, for example, a file is written to a sector on the drive that has been damaged. This is the most common cause in a failing drive, meaning that data needs to be reconstructed to become readable. Corrupted documents can be recovered by several software methods or by manually reconstructing the document using a hex editor.

Restore disk

The Windows operating system can be reinstalled on a computer that is already licensed for it. The reinstallation can be done by downloading the operating system or by using a "restore disk" provided by the computer manufacturer. Eric Lundgren was fined and sentenced to U.S. federal prison in April 2018 for producing 28,000 restore disks and intending to distribute them for about 25 cents each as a convenience to computer repair shops.

List of data recovery software

Bootable

Data recovery cannot always be done on a running system. As a result, a boot disk, live CD, live USB, or any other type of live distro contains a minimal operating system.

Consistency checkers

File recovery

Forensics

Imaging tools

  • Clonezilla: a free disk cloning, disk imaging, data recovery, and deployment boot disk
  • dd: common byte-to-byte cloning tool found on Unix-like systems
  • ddrescue: an open-source tool similar to dd but with the ability to skip over and subsequently retry bad blocks on failing storage devices
  • Team Win Recovery Project: a free and open-source recovery system for Android devices

Galileo Galilei

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Galileo_Galilei   Galileo Galilei P...