Search This Blog

Wednesday, May 27, 2020

Rich web application

From Wikipedia, the free encyclopedia
 
A rich web application (originally called a rich Internet application RIA or installable Internet application) is a web application that has many of the characteristics of desktop application software. The concept is closely related to a single-page application, and may allow the user interactive features such as drag and drop, background menu, WYSIWYG editing, etc. HTML5 is a current standard for delivering rich web applications, supported by all major browsers.

Older (Before HTML5) Techniques

Adobe Flash

Adobe Flash manipulates vector and raster graphics to provide animation of text, drawings, and still images. It supports bidirectional streaming of audio and video, and it can capture user input via mouse, keyboard, microphone, and camera. Flash contains an object-oriented language called ActionScript and supports automation via the JavaScript Flash language (JSFL). Flash content may be displayed on various computer systems and devices, using Adobe Flash Player, which is available free of charge for common web browsers, some mobile phones and a few other electronic devices (using Flash Lite).




Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform RIAs based on the Adobe Flash platform. Initially developed by Macromedia and then acquired by Adobe Systems, Flex was donated by Adobe to the Apache Software Foundation in 2011.

Java applet

Java applets were used to create interactive visualizations and to present video, three dimensional objects and other media. Java applets were appropriate for complex visualizations that required significant programming effort in a high level language or communications between applet and originating server.

JavaFX

JavaFX is a software platform for creating and delivering RIAs that can run across a wide variety of connected devices. The current release (JavaFX 12, March 11 2019) enables building applications for desktop, browser and mobile phones and comes with 3D support. TV set-top boxes, gaming consoles, Blu-ray players and other platforms are planned. Java FX runs as plug-in Java applet or via Webstart.

Microsoft Silverlight

Silverlight was proposed by Microsoft as another proprietary alternative. The technology has not been widely accepted and, for instance, lacks support on many mobile devices. Some examples of application were video streaming for events including the 2008 Summer Olympics in Beijing, the 2010 Winter Olympics in Vancouver, and the 2008 conventions for both major political parties in the United States. Silverlight was also used by Netflix for its instant video streaming service. Silverlight is no longer under active development and is not supported in Microsoft Edge, Microsoft's most recent browser.

Other techniques

RIAs could use XForms to enhance their functionality. Using XML and XSLT along with some XHTML, CSS and JavaScript can also be used to generate richer client side UI components like data tables that can be resorted locally on the client without going back to the server. Mozilla and Internet Explorer browsers both support this.

History

The terms "rich client" and "rich Internet application" were introduced in a white paper of March 2002 by Macromedia (now Adobe), though the concept had existed for a number of years earlier under names including: "Remote Scripting" by Microsoft in April 1999, "X Internet" by Forrester Research in October 2000, Rich Web application.

Security issues in older standards

RIAs present indexing challenges to Web search engines, but Adobe Flash content is now at least partially indexable.

Security can improve over that of application software (for example through use of sandboxes and automatic updates), but the extensions themselves remain subject to vulnerabilities and access is often much greater than that of native Web applications. For security purposes, most RIAs run their client portions within a special isolated area of the client desktop called a sandbox. The sandbox limits visibility and access to the file-system and to the operating system on the client to the application server on the other side of the connection. This approach allows the client system to handle local activities, reformatting and so forth, thereby lowering the amount and frequency of client-server traffic, especially versus client-server implementations built around so-called thin clients.

HTML5 trend

In November 2011, there were a number of announcements that demonstrated a decline in demand for rich internet application architectures based on plug-ins in order to favor HTML5 alternatives. Adobe announced that Flash would no longer be produced for mobile or TV (refocusing its efforts on Adobe AIR). Pundits questioned its continued relevance even on the desktop and described it as "the beginning of the end". Research In Motion (RIM) announced that it would continue to develop Flash for the PlayBook, a decision questioned by some commentators. Rumors state that Microsoft is to abandon Silverlight after version 5 is released. The combination of these announcements had some proclaiming it "the end of the line for browser plug-ins".

Java virtual machine

From Wikipedia, the free encyclopedia

Java virtual machine
DesignerSun Microsystems
Bits32-bit
Introduced1994
Version187.666.5747
TypeStack and register–register
EndiannessBig
Registers
General purposePer-method operand stack (up to 65535 operands) plus per-method local variables (up to 65535)

Overview of a Java virtual machine (JVM) architecture based on The Java Virtual Machine Specification Java SE 7 Edition

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

The JVM reference implementation is developed by the OpenJDK project as open source code and includes a JIT compiler called HotSpot. The commercially supported Java releases available from Oracle Corporation are based on the OpenJDK runtime. Eclipse OpenJ9 is another open source JVM for OpenJDK.

JVM specification

The Java virtual machine is an abstract (virtual) computer defined by a specification. The garbage-collection algorithm used and any internal optimization of the Java virtual machine instructions (their translation into machine code) are not specified. The main reason for this omission is to not unnecessarily constrain implementers. Any Java application can be run only inside some concrete implementation of the abstract specification of the Java virtual machine.

Starting with Java Platform, Standard Edition (J2SE) 5.0, changes to the JVM specification have been developed under the Java Community Process as JSR 924. As of 2006, changes to specification to support changes proposed to the class file format (JSR 202) are being done as a maintenance release of JSR 924. The specification for the JVM was published as the blue book, The preface states:
We intend that this specification should sufficiently document the Java Virtual Machine to make possible compatible clean-room implementations. Oracle provides tests that verify the proper operation of implementations of the Java Virtual Machine.
One of Oracle's JVMs is named HotSpot, the other, inherited from BEA Systems is JRockit. Clean-room Java implementations include Kaffe, OpenJ9 and Skelmir's CEE-J. Oracle owns the Java trademark and may allow its use to certify implementation suites as fully compatible with Oracle's specification.

Class loader

One of the organizational units of JVM byte code is a class. A class loader implementation must be able to recognize and load anything that conforms to the Java class file format. Any implementation is free to recognize other binary forms besides class files, but it must recognize class files. 

The class loader performs three basic activities in this strict order:
  1. Loading: finds and imports the binary data for a type
  2. Linking: performs verification, preparation, and (optionally) resolution
    • Verification: ensures the correctness of the imported type
    • Preparation: allocates memory for class variables and initializing the memory to default values
    • Resolution: transforms symbolic references from the type into direct references.
  3. Initialization: invokes Java code that initializes class variables to their proper starting values.
In general, there are two types of class loader: bootstrap class loader and user defined class loader.

Every Java virtual machine implementation must have a bootstrap class loader, capable of loading trusted classes. The Java virtual machine specification doesn't specify how a class loader should locate classes.

Virtual machine architecture

The JVM operates on primitive values (integers and floating-point numbers) and references. The JVM is fundamentally a 32-bit machine. long and double types, which are 64-bits, are supported natively, but consume two units of storage in a frame's local variables or operand stack, since each unit is 32 bits. boolean, byte, short, and char types are all sign-extended (except char which is zero-extended) and operated on as 32-bit integers, the same as int types. The smaller types only have a few type-specific instructions for loading, storing, and type conversion. boolean is operated on as 8-bit byte values, with 0 representing false and 1 representing true. (Although boolean has been treated as a type since The Java Virtual Machine Specification, Second Edition clarified this issue, in compiled and executed code there is little difference between a boolean and a byte except for name mangling in method signatures and the type of boolean arrays. booleans in method signatures are mangled as Z while bytes are mangled as B. Boolean arrays carry the type boolean[] but use 8 bits per element, and the JVM has no built-in capability to pack booleans into a bit array, so except for the type they perform and behave the same as byte arrays. In all other uses, the boolean type is effectively unknown to the JVM as all instructions to operate on booleans are also used to operate on bytes.) 

The JVM has a garbage-collected heap for storing objects and arrays. Code, constants, and other class data are stored in the "method area". The method area is logically part of the heap, but implementations may treat the method area separately from the heap, and for example might not garbage collect it. Each JVM thread also has its own call stack (called a "Java Virtual Machine stack" for clarity), which stores frames. A new frame is created each time a method is called, and the frame is destroyed when that method exits.

Each frame provides an "operand stack" and an array of "local variables". The operand stack is used for operands to computations and for receiving the return value of a called method, while local variables serve the same purpose as registers and are also used to pass method arguments. Thus, the JVM is both a stack machine and a register machine.

Bytecode instructions

The JVM has instructions for the following groups of tasks:
The aim is binary compatibility. Each particular host operating system needs its own implementation of the JVM and runtime. These JVMs interpret the bytecode semantically the same way, but the actual implementation may be different. More complex than just emulating bytecode is compatibly and efficiently implementing the Java core API that must be mapped to each host operating system.

These instructions operate on a set of common abstracted data types rather the native data types of any specific instruction set architecture.

JVM languages

A JVM language is any language with functionality that can be expressed in terms of a valid class file which can be hosted by the Java Virtual Machine. A class file contains Java Virtual Machine instructions (Java byte code) and a symbol table, as well as other ancillary information. The class file format is the hardware- and operating system-independent binary format used to represent compiled classes and interfaces.

There are several JVM languages, both old languages ported to JVM and completely new languages. JRuby and Jython are perhaps the most well-known ports of existing languages, i.e. Ruby and Python respectively. Of the new languages that have been created from scratch to compile to Java bytecode, Clojure, Apache Groovy, Scala and Kotlin may be the most popular ones. A notable feature with the JVM languages is that they are compatible with each other, so that, for example, Scala libraries can be used with Java programs and vice versa.

Java 7 JVM implements JSR 292: Supporting Dynamically Typed Languages on the Java Platform, a new feature which supports dynamically typed languages in the JVM. This feature is developed within the Da Vinci Machine project whose mission is to extend the JVM so that it supports languages other than Java.

Bytecode verifier

A basic philosophy of Java is that it is inherently safe from the standpoint that no user program can crash the host machine or otherwise interfere inappropriately with other operations on the host machine, and that it is possible to protect certain methods and data structures belonging to trusted code from access or corruption by untrusted code executing within the same JVM. Furthermore, common programmer errors that often led to data corruption or unpredictable behavior such as accessing off the end of an array or using an uninitialized pointer are not allowed to occur. Several features of Java combine to provide this safety, including the class model, the garbage-collected heap, and the verifier. 

The JVM verifies all bytecode before it is executed. This verification consists primarily of three types of checks:
  • Branches are always to valid locations
  • Data is always initialized and references are always type-safe
  • Access to private or package private data and methods is rigidly controlled
The first two of these checks take place primarily during the verification step that occurs when a class is loaded and made eligible for use. The third is primarily performed dynamically, when data items or methods of a class are first accessed by another class.

The verifier permits only some bytecode sequences in valid programs, e.g. a jump (branch) instruction can only target an instruction within the same method. Furthermore, the verifier ensures that any given instruction operates on a fixed stack location, allowing the JIT compiler to transform stack accesses into fixed register accesses. Because of this, that the JVM is a stack architecture does not imply a speed penalty for emulation on register-based architectures when using a JIT compiler. In the face of the code-verified JVM architecture, it makes no difference to a JIT compiler whether it gets named imaginary registers or imaginary stack positions that must be allocated to the target architecture's registers. In fact, code verification makes the JVM different from a classic stack architecture, of which efficient emulation with a JIT compiler is more complicated and typically carried out by a slower interpreter.

The original specification for the bytecode verifier used natural language that was incomplete or incorrect in some respects. A number of attempts have been made to specify the JVM as a formal system. By doing this, the security of current JVM implementations can more thoroughly be analyzed, and potential security exploits prevented. It will also be possible to optimize the JVM by skipping unnecessary safety checks, if the application being run is proven to be safe.

Secure execution of remote code

A virtual machine architecture allows very fine-grained control over the actions that code within the machine is permitted to take. It assumes the code is "semantically" correct, that is, it successfully passed the (formal) bytecode verifier process, materialized by a tool, possibly off-board the virtual machine. This is designed to allow safe execution of untrusted code from remote sources, a model used by Java applets, and other secure code downloads. Once bytecode-verified, the downloaded code runs in a restricted "sandbox", which is designed to protect the user from misbehaving or malicious code. As an addition to the bytecode verification process, publishers can purchase a certificate with which to digitally sign applets as safe, giving them permission to ask the user to break out of the sandbox and access the local file system, clipboard, execute external pieces of software, or network.

Formal proof of bytecode verifiers have been done by the Javacard industry (Formal Development of an Embedded Verifier for Java Card Byte Code)

Bytecode interpreter and just-in-time compiler

For each hardware architecture a different Java bytecode interpreter is needed. When a computer has a Java bytecode interpreter, it can run any Java bytecode program, and the same program can be run on any computer that has such an interpreter.

When Java bytecode is executed by an interpreter, the execution will always be slower than the execution of the same program compiled into native machine language. This problem is mitigated by just-in-time (JIT) compilers for executing Java bytecode. A JIT compiler may translate Java bytecode into native machine language while executing the program. The translated parts of the program can then be executed much more quickly than they could be interpreted. This technique gets applied to those parts of a program frequently executed. This way a JIT compiler can significantly speed up the overall execution time.

There is no necessary connection between the Java programming language and Java bytecode. A program written in Java can be compiled directly into the machine language of a real computer and programs written in other languages than Java can be compiled into Java bytecode.

Java bytecode is intended to be platform-independent and secure. Some JVM implementations do not include an interpreter, but consist only of a just-in-time compiler.

JVM in the web browser

At the start of the Java platform's lifetime, the JVM was marketed as a web technology for creating Rich Internet Applications. As of 2018, most web browsers and operating systems bundling web browsers do not ship with a Java plug-in, nor do they permit side-loading any non-Flash plug-in. The Java browser plugin was deprecated in JDK 9.

The NPAPI Java browser plug-in was designed to allow the JVM to execute so-called Java applets embedded into HTML pages. For browsers with the plug-in installed, the applet is allowed to draw into a rectangular region on the page assigned to it. Because the plug-in includes a JVM, Java applets are not restricted to the Java programming language; any language targeting the JVM may run in the plug-in. A restricted set of APIs allow applets access to the user's microphone or 3D acceleration, although applets are not able to modify the page outside its rectangular region. Adobe Flash Player, the main competing technology, works in the same way in this respect.

As of June 2015 according to W3Techs, Java applet and Silverlight use had fallen to 0.1% each for all web sites, while Flash had fallen to 10.8%.

JavaScript JVMs and interpreters

As of May 2016, JavaPoly allows users to import unmodified Java libraries, and invoke them directly from JavaScript. JavaPoly allows websites to use unmodified Java libraries, even if the user does not have Java installed on their computer.

Compilation to JavaScript

With the continuing improvements in JavaScript execution speed, combined with the increased use of mobile devices whose web browsers do not implement support for plugins, there are efforts to target those users through compilation to JavaScript. It is possible to either compile the source code or JVM bytecode to JavaScript.

Compiling the JVM bytecode, which is universal across JVM languages, allows building upon the language's existing compiler to bytecode. The main JVM bytecode to JavaScript compilers are TeaVM, the compiler contained in Dragome Web SDK, Bck2Brwsr, and j2js-compiler.

Leading compilers from JVM languages to JavaScript include the Java-to-JavaScript compiler contained in Google Web Toolkit, Clojurescript (Clojure), GrooScript (Apache Groovy), Scala.js (Scala) and others.

Java Runtime Environment

The Java Runtime Environment (JRE) released by Oracle is a freely available software distribution containing a stand-alone JVM (HotSpot), the Java standard library (Java Class Library), a configuration tool, and—until its discontinuation in JDK 9—a browser plug-in. It is the most common Java environment installed on personal computers in the laptop and desktop form factor. Mobile phones including feature phones and early smartphones that ship with a JVM are most likely to include a JVM meant to run applications targeting Micro Edition of the Java platform. Meanwhile, most modern smartphones, tablet computers, and other handheld PCs that run Java apps are most likely to do so through support of the Android operating system, which includes an open source virtual machine incompatible with the JVM specification. (Instead, Google's Android development tools take Java programs as input and output Dalvik bytecode, which is the native input format for the virtual machine on Android devices.)

Performance

The JVM specification gives a lot of leeway to implementors regarding the implementation details. Since Java 1.3, JRE from Oracle contains a JVM called HotSpot. It has been designed to be a high-performance JVM. 

To speed-up code execution, HotSpot relies on just-in-time compilation. To speed-up object allocation and garbage collection, HotSpot uses generational heap.

Generational heap

The Java virtual machine heap is the area of memory used by the JVM for dynamic memory allocation.

In HotSpot the heap is divided into generations:
  • The young generation stores short-lived objects that are created and immediately garbage collected.
  • Objects that persist longer are moved to the old generation (also called the tenured generation). This memory is subdivided into (two) Survivors spaces where the objects that survived the first and next garbage collections are stored.
The permanent generation (or permgen) was used for class definitions and associated metadata prior to Java 8. Permanent generation was not part of the heap. The permanent generation was removed from Java 8.

Originally there was no permanent generation, and objects and classes were stored together in the same area. But as class unloading occurs much more rarely than objects are collected, moving class structures to a specific area allowed significant performance improvements.

Security

Oracle's JRE is installed on a large number of computers. End users with an out-of-date version of JRE therefore are vulnerable to many known attacks. This led to the widely shared belief that Java is inherently insecure. Since Java 1.7, Oracle's JRE for Windows includes automatic update functionality.

Before the discontinuation of the Java browser plug-in, any web page might have potentially run a Java applet, which provided an easily accessible attack surface to malicious web sites. In 2013 Kaspersky Labs reported that the Java plug-in was the method of choice for computer criminals. Java exploits are included in many exploit packs that hackers deploy onto hacked web sites. Java applets were removed in Java 11, released on September 25, 2018.

Just-in-time compilation

From Wikipedia, the free encyclopedia

In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program – at run time – rather than before execution. Most often, this consists of source code or more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

JIT compilation is a combination of the two traditional approaches to translation to machine code – ahead-of-time compilation (AOT), and interpretation – and combines some advantages and drawbacks of both. Roughly, JIT compilation combines the speed of compiled code with the flexibility of interpretation, with the overhead of an interpreter and the additional overhead of compiling (not just interpreting). JIT compilation is a form of dynamic compilation, and allows adaptive optimization such as dynamic recompilation and microarchitecture-specific speedups Interpretation and JIT compilation are particularly suited for dynamic programming languages, as the runtime system can handle late-bound data types and enforce security guarantees.

Applications

JIT compilation can be applied to some programs, or can be used for certain capacities, particularly dynamic capacities such as regular expressions. For example, a text editor may compile a regular expression provided at runtime to machine code to allow faster matching – this cannot be done ahead of time, as the pattern is only provided at runtime. Several modern runtime environments rely on JIT compilation for high-speed code execution, including most implementations of Java, together with Microsoft's .NET Framework. Similarly, many regular-expression libraries feature JIT compilation of regular expressions, either to bytecode or to machine code. JIT compilation is also used in some emulators, in order to translate machine code from one CPU architecture to another.

A common implementation of JIT compilation is to first have AOT compilation to bytecode (virtual machine code), known as bytecode compilation, and then have JIT compilation to machine code (dynamic compilation), rather than interpretation of the bytecode. This improves the runtime performance compared to interpretation, at the cost of lag due to compilation. JIT compilers translate continuously, as with interpreters, but caching of compiled code minimizes lag on future execution of the same code during a given run. Since only part of the program is compiled, there is significantly less lag than if the entire program were compiled prior to execution.

Overview

In a bytecode-compiled system, source code is translated to an intermediate representation known as bytecode. Bytecode is not the machine code for any particular computer, and may be portable among computer architectures. The bytecode may then be interpreted by, or run on a virtual machine. The JIT compiler reads the bytecodes in many sections (or in full, rarely) and compiles them dynamically into machine code so the program can run faster. This can be done per-file, per-function or even on any arbitrary code fragment; the code can be compiled when it is about to be executed (hence the name "just-in-time"), and then cached and reused later without needing to be recompiled.

In contrast, a traditional interpreted virtual machine will simply interpret the bytecode, generally with much lower performance. Some interpreters even interpret source code, without the step of first compiling to bytecode, with even worse performance. Statically-compiled code or native code is compiled prior to deployment. A dynamic compilation environment is one in which the compiler can be used during execution. A common goal of using JIT techniques is to reach or surpass the performance of static compilation, while maintaining the advantages of bytecode interpretation: Much of the "heavy lifting" of parsing the original source code and performing basic optimization is often handled at compile time, prior to deployment: compilation from bytecode to machine code is much faster than compiling from source. The deployed bytecode is portable, unlike native code. Since the runtime has control over the compilation, like interpreted bytecode, it can run in a secure sandbox. Compilers from bytecode to machine code are easier to write, because the portable bytecode compiler has already done much of the work.

JIT code generally offers far better performance than interpreters. In addition, it can in some cases offer better performance than static compilation, as many optimizations are only feasible at run-time:
  1. The compilation can be optimized to the targeted CPU and the operating system model where the application runs. For example, JIT can choose SSE2 vector CPU instructions when it detects that the CPU supports them. To obtain this level of optimization specificity with a static compiler, one must either compile a binary for each intended platform/architecture, or else include multiple versions of portions of the code within a single binary.
  2. The system is able to collect statistics about how the program is actually running in the environment it is in, and it can rearrange and recompile for optimum performance. However, some static compilers can also take profile information as input.
  3. The system can do global code optimizations (e.g. inlining of library functions) without losing the advantages of dynamic linking and without the overheads inherent to static compilers and linkers. Specifically, when doing global inline substitutions, a static compilation process may need run-time checks and ensure that a virtual call would occur if the actual class of the object overrides the inlined method, and boundary condition checks on array accesses may need to be processed within loops. With just-in-time compilation in many cases this processing can be moved out of loops, often giving large increases of speed.
  4. Although this is possible with statically compiled garbage collected languages, a bytecode system can more easily rearrange executed code for better cache utilization.
Because a JIT must render and execute a native binary image at runtime, true machine-code JITs necessitate platforms that allow for data to be executed at runtime, making using such JITs on a Harvard architecture-based machine impossible - the same can be said for certain operating systems and virtual machines as well. However, a special type of "JIT" may potentially not target the physical machine's CPU architecture, but rather an optimized VM bytecode where limitations on raw machine code prevail, especially where that bytecode's VM eventually leverages a JIT to native code.

Startup delay and optimizations

JIT causes a slight to noticeable delay in initial execution of an application, due to the time taken to load and compile the bytecode. Sometimes this delay is called "startup time delay" or "warm-up time". In general, the more optimization JIT performs, the better the code it will generate, but the initial delay will also increase. A JIT compiler therefore has to make a trade-off between the compilation time and the quality of the code it hopes to generate. Startup time can be increased IO-bound operations in addition to JIT compilation: for example, the rt.jar class data file for the Java Virtual Machine (JVM) is 40 MB and the JVM must seek a lot of data in this contextually huge file.

One possible optimization, used by Sun's HotSpot Java Virtual Machine, is to combine interpretation and JIT compilation. The application code is initially interpreted, but the JVM monitors which sequences of bytecode are frequently executed and translates them to machine code for direct execution on the hardware. For bytecode which is executed only a few times, this saves the compilation time and reduces the initial latency; for frequently executed bytecode, JIT compilation is used to run at high speed, after an initial phase of slow interpretation. Additionally, since a program spends most time executing a minority of its code, the reduced compilation time is significant. Finally, during the initial code interpretation, execution statistics can be collected before compilation, which helps to perform better optimization.

The correct tradeoff can vary due to circumstances. For example, Sun's Java Virtual Machine has two major modes—client and server. In client mode, minimal compilation and optimization is performed, to reduce startup time. In server mode, extensive compilation and optimization is performed, to maximize performance once the application is running by sacrificing startup time. Other Java just-in-time compilers have used a runtime measurement of the number of times a method has executed combined with the bytecode size of a method as a heuristic to decide when to compile. Still another uses the number of times executed combined with the detection of loops. In general, it is much harder to accurately predict which methods to optimize in short-running applications than in long-running ones.

Native Image Generator (Ngen) by Microsoft is another approach at reducing the initial delay. Ngen pre-compiles (or "pre-JITs") bytecode in a Common Intermediate Language image into machine native code. As a result, no runtime compilation is needed. .NET framework 2.0 shipped with Visual Studio 2005 runs Ngen on all of the Microsoft library DLLs right after the installation. Pre-jitting provides a way to improve the startup time. However, the quality of code it generates might not be as good as the one that is JITed, for the same reasons why code compiled statically, without profile-guided optimization, cannot be as good as JIT compiled code in the extreme case: the lack of profiling data to drive, for instance, inline caching.

There also exist Java implementations that combine an AOT (ahead-of-time) compiler with either a JIT compiler (Excelsior JET) or interpreter (GNU Compiler for Java).

History

The earliest published JIT compiler is generally attributed to work on LISP by John McCarthy in 1960. In his seminal paper Recursive functions of symbolic expressions and their computation by machine, Part I, he mentions functions that are translated during runtime, thereby sparing the need to save the compiler output to punch cards (although this would be more accurately known as a "Compile and go system"). Another early example was by Ken Thompson, who in 1968 gave one of the first applications of regular expressions, here for pattern matching in the text editor QED. For speed, Thompson implemented regular expression matching by JITing to IBM 7094 code on the Compatible Time-Sharing System. An influential technique for deriving compiled code from interpretation was pioneered by Mitchell in 1970, which he implemented for the experimental language LC².

Smalltalk (c. 1983) pioneered new aspects of JIT compilations. For example, translation to machine code was done on demand, and the result was cached for later use. When memory became scarce, the system would delete some of this code and regenerate it when it was needed again. Sun's Self language improved these techniques extensively and was at one point the fastest Smalltalk system in the world; achieving up to half the speed of optimized C but with a fully object-oriented language.

Self was abandoned by Sun, but the research went into the Java language. The term "Just-in-time compilation" was borrowed from the manufacturing term "Just in time" and popularized by Java, with James Gosling using the term from 1993. Currently JITing is used by most implementations of the Java Virtual Machine, as HotSpot builds on, and extensively uses, this research base. 

The HP project Dynamo was an experimental JIT compiler where the 'bytecode' format and the machine code format were the same; the system turned PA-6000 machine code into PA-8000 machine code. Counterintuitively, this resulted in speed ups, in some cases of 30% since doing this permitted optimizations at the machine code level, for example, inlining code for better cache usage and optimizations of calls to dynamic libraries and many other run-time optimizations which conventional compilers are not able to attempt.

On 30 March 2019 PHP announced that JIT was coming to PHP 8 in 2021.

Security

JIT compilation fundamentally uses executable data, and thus poses security challenges and possible exploits.

Implementation of JIT compilation consists of compiling source code or byte code to machine code and executing it. This is generally done directly in memory – the JIT compiler outputs the machine code directly into memory and immediately executes it, rather than outputting it to disk and then invoking the code as a separate program, as in usual ahead of time compilation. In modern architectures this runs into a problem due to executable space protection – arbitrary memory cannot be executed, as otherwise there is a potential security hole. Thus memory must be marked as executable; for security reasons this should be done after the code has been written to memory, and marked read-only, as writable/executable memory is a security hole. For instance Firefox's JIT compiler for Javascript introduced this protection in a release version with Firefox 46.

JIT spraying is a class of computer security exploits that use JIT compilation for heap spraying – the resulting memory is then executable, which allows an exploit if execution can be moved into the heap.

Runtime system

From Wikipedia, the free encyclopedia

In computer programming, a runtime system, also called runtime environment, primarily implements portions of an execution model. This is not to be confused with the runtime lifecycle phase of a program, during which the runtime system is in operation.

Most programming languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the management of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. The compiler makes assumptions depending on the specific runtime system to generate correct code. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.

Overview

Every programming language specifies an execution model, and many implement at least part of that model in a runtime system. One possible definition of runtime system behavior is, among others, any behavior not directly attributable to the program itself. This definition includes, as part of the runtime system, things such as putting parameters onto the stack before a function call, the behavior of disk I/O, and parallel execution of related behaviors.

By this definition, essentially every language has a runtime system, including compiled languages, interpreted languages, and embedded domain-specific languages. Even API invoked stand alone execution models such as Pthreads (POSIX threads) have a runtime system that is the implementation of execution model's behavior.

Most scholarly papers on runtime systems focus on the implementation details of parallel runtime systems. A notable example of a parallel runtime system is that of Cilk, a popular parallel programming model. In addition, the proto-runtime toolkit was created to simplify the creation of parallel runtime systems.

In addition to the execution model behavior, a runtime system may also perform support services such as type checking, debugging, or code generation and optimization.

The runtime system is also the gateway by which a running program interacts with the runtime environment, which contains not only state values that are accessible during program execution, but also active entities that can be interacted with during program execution like disk drives and people via keyboards. For example, environment variables are features of many operating systems, and are part of the runtime environment; a running program can access them via the runtime system. Likewise, hardware devices such as a DVD drive are active entities that a program can interact with via a runtime system.

A unique application of a runtime environment (RTE) is within an operating system (OS) that only allows that RTE to run, meaning from boot until power-down the entire OS is dedicated to only the application(s) running within that RTE. Any other code that tries to run or any failures in the application(s) break the RTE which breaks the OS which stops all processing and requires a re-boot. If the boot is from read-only memory, an extremely secure, simple, single-mission system is created.
Examples for such kind of directly bundled runtime systems include:
  • Between 1983 and 1984, Digital Research offered several of their business and educations applications for the IBM PC on bootable floppy diskettes bundled with SpeedStart CP/M-86, a reduced version of CP/M-86 as runtime environment.
  • Some stand-alone versions of Ventura Publisher (1986-1993), Artline (1988-1991), Timeworks Publisher (1988-1991) and ViewMAX (1990-1992) contained special runtime versions of Digital Research's GEM as their runtime environment.
  • In the late 1990s JP Software's command line processor 4DOS was optionally available in a special runtime version to be linked with BATCOMP pre-compiled and encrypted batch jobs in order to create unmodifyable executables from batch scripts and run them on systems without 4DOS installed.

Examples

As a simple example of a basic runtime system, the runtime system of the C language is a particular set of instructions inserted into the executable image by the compiler. Among other things, these instructions manage the processor stack, create space for local variables, and copy function-call parameters onto the top of the stack. There are often no clear criteria for deciding which language behavior is considered inside the runtime system versus which behavior is part of the source program. For C, the setup of the stack is part of the runtime system, as opposed to part of the semantics of an individual program, because it maintains a global invariant that holds over all executions. This systematic behavior implements the execution model of the language, as opposed to implementing semantics of the particular program text which is directly translated into code that computes results.

One way to observe this separation between the semantics of a particular program and the runtime environment is to compile a program into an object file containing all the functions versus compiling an entire program to an executable binary. The object file will only contain assembly code relevant to those functions, while the executable binary will contain additional code used to implement the runtime environment. The object file, on one hand, may be missing information from the runtime environment that will be resolved by linking. On the other hand, the code in the object file still depends on assumptions in the runtime system; for example, a function may read parameters from a particular register or stack location, depending on the calling convention used by the runtime environment.

Another example is the case of using an application programming interface (API) to interact with a runtime system. The calls to that API look the same as calls to a regular software library, however at some point during the call the execution model changes. The runtime system implements an execution model different from that of the language the library is written in terms of. A person reading the code of a normal library would be able to understand the library's behavior by just knowing the language the library was written in. However, a person reading the code of the API that invokes a runtime system would not be able to understand the behavior of the API call just by knowing the language the call was written in. At some point, via some mechanism, the execution model stops being that of the language the call is written in and switches over to being the execution model implemented by the runtime system. For example, the trap instruction is one method of switching execution models. This difference is what distinguishes an API-invoked execution model, such as Pthreads, from a usual software library. Both Pthreads calls and software library calls are invoked via an API, but Pthreads behavior cannot be understood in terms of the language of the call. Rather, Pthreads calls bring into play an outside execution model, which is implemented by the Pthreads runtime system (this runtime system is often the OS kernel).

As an extreme example, the physical CPU itself can be viewed as an implementation of the runtime system of a specific assembly language. In this view, the execution model is implemented by the physical CPU and memory systems. As an analogy, runtime systems for higher-level languages are themselves implemented using some other languages. This creates a hierarchy of runtime systems, with the CPU itself—or actually its logic at the microcode layer or below—acting as the lowest-level runtime system.

Advanced features

Some compiled or interpreted languages provide an interface that allows application code to interact directly with the runtime system. An example is the Thread class in the Java language. The class allows code (that is animated by one thread) to do things such as start and stop other threads. Normally, core aspects of a language's behavior such as task scheduling and resource management are not accessible in this fashion. 

Higher-level behaviors implemented by a runtime system may include tasks such as drawing text on the screen or making an Internet connection. It is often the case that operating systems provide these kinds of behaviors as well, and when available, the runtime system is implemented as an abstraction layer that translates the invocation of the runtime system into an invocation of the operating system. This hides the complexity or variations in the services offered by different operating systems. This also implies that the OS kernel can itself be viewed as a runtime system, and that the set of OS calls that invoke OS behaviors may be viewed as interactions with a runtime system. 

In the limit, the runtime system may provide services such as a P-code machine or virtual machine, that hide even the processor's instruction set. This is the approach followed by many interpreted languages such as AWK, and some languages like Java, which are meant to be compiled into some machine-independent intermediate representation code (such as bytecode). This arrangement simplifies the task of language implementation and its adaptation to different machines, and improves efficiency of sophisticated language features such as reflection. It also allows the same program to be executed on any machine without an explicit recompiling step, a feature that has become very important since the proliferation of the World Wide Web. To speed up execution, some runtime systems feature just-in-time compilation to machine code. 

A modern aspect of runtime systems is parallel execution behaviors, such as the behaviors exhibited by mutex constructs in Pthreads and parallel section constructs in OpenMP. A runtime system with such parallel execution behaviors may be modularized according to the proto-runtime approach.

History

Notable early examples of runtime systems are the interpreters for BASIC and Lisp. These environments also included a garbage collector. Forth is an early example of a language designed to be compiled into intermediate representation code; its runtime system was a virtual machine that interpreted that code. Another popular, if theoretical, example is Donald Knuth's MIX computer. 

In C and later languages that supported dynamic memory allocation, the runtime system also included a library that managed the program's memory pool.

In the object-oriented programming languages, the runtime system was often also responsible for dynamic type checking and resolving method references.

Abstract data type

From Wikipedia, the free encyclopedia

In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a user, of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.

Formally, an ADT may be defined as a "class of objects whose logical behavior is defined by a set of values and a set of operations"; this is analogous to an algebraic structure in mathematics. What is meant by "behavior" varies by author, with the two main types of formal specifications for behavior being axiomatic (algebraic) specification and an abstract model; these correspond to axiomatic semantics and operational semantics of an abstract machine, respectively. Some authors also include the computational complexity ("cost"), both in terms of time (for computing operations) and space (for representing values). In practice many common data types are not ADTs, as the abstraction is not perfect, and users must be aware of issues like arithmetic overflow that are due to the representation. For example, integers are often stored as fixed-width values (32-bit or 64-bit binary numbers), and thus experience integer overflow if the maximum value is exceeded.

ADTs are a theoretical concept, in computer science, used in the design and analysis of algorithms, data structures, and software systems, and do not correspond to specific features of computer languages—mainstream computer languages do not directly support formally specified ADTs. However, various language features correspond to certain aspects of ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.

Examples

For example, integers are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc., which behave according to familiar mathematics (with care for integer division), independently of how the integers are represented by the computer. Explicitly, "behavior" includes obeying various axioms (associativity and commutativity of addition, etc.), and preconditions on operations (cannot divide by zero). Typically integers are represented in a data structure as binary numbers, most often as two's complement, but might be binary-coded decimal or in ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.

An ADT consists not only of operations but also of values of the underlying data and of constraints on the operations. An "interface" typically refers only to the operations, and perhaps some of the constraints on the operations, notably pre-conditions and post-conditions, but not other constraints, such as relations between the operations.

For example, an abstract stack, which is a last-in-first-out structure, could be defined by three operations: push, that inserts a data item onto the stack; pop, that removes a data item from it; and peek or top, that accesses a data item on top of the stack without removal. An abstract queue, which is a first-in-first-out structure, would also have three operations: enqueue, that inserts a data item into the queue; dequeue, that removes the first data item from it; and front, that accesses and serves the first data item in the queue. There would be no way of differentiating these two data types, unless a mathematical constraint is introduced that for a stack specifies that each pop always returns the most recently pushed item that has not been popped yet. When analyzing the efficiency of algorithms that use stacks, one may also specify that all operations take the same time no matter how many data items have been pushed into the stack, and that the stack uses a constant amount of storage for each element.

Introduction

Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented by specific data types or data structures, in many ways and in many programming languages; or described in a formal specification language. ADTs are often implemented as modules: the module's interface declares procedures that correspond to the ADT operations, sometimes with comments that describe the constraints. This information hiding strategy allows the implementation of the module to be changed without disturbing the client programs. 

The term abstract data type can also be regarded as a generalized approach of a number of algebraic structures, such as lattices, groups, and rings. The notion of abstract data types is related to the concept of data abstraction, important in object-oriented programming and design by contract methodologies for software development.

Defining an abstract data type

An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. There are no standard conventions for defining them. A broad division may be drawn between "imperative" and "functional" definition styles.

Imperative-style definition

In the philosophy of imperative programming languages, an abstract data structure is conceived as an entity that is mutable—meaning that it may be in different states at different times. Some operations may change the state of the ADT; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times—just like the instructions of a computer, or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperative style is often used when describing abstract algorithms. (See The Art of Computer Programming by Donald Knuth for more details)

Abstract variable

Imperative-style definitions of ADT often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:
  • store(V, x) where x is a value of unspecified nature;
  • fetch(V), that yields a value,
with the constraint that
  • fetch(V) always returns the value x used in the most recent store(V, x) operation on the same variable V.
As in so many programming languages, the operation store(V, x) is often written Vx (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, VV + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).




In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that

  • if U and V are distinct variables, the sequence { store(U, x); store(V, y) } is equivalent to { store(V, y); store(U, x) }.
More generally, ADT definitions often assume that any operation that changes the state of one ADT instance has no effect on the state of any other instance (including other instances of the same ADT) — unless the ADT axioms imply that the two instances are connected (aliased) in that sense. For example, when extending the definition of abstract variable to include abstract records, the operation that selects a field from a record variable R must yield a variable V that is aliased to that part of R.

The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called the range or type of V. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve their readability.

Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V. An algorithm that does so is usually considered invalid, because its effect is not defined. (However, there are some important algorithms whose efficiency strongly depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.)

Instance creation

Some algorithms need to create new instances of some ADT (such as new variables, or new stacks). To describe such algorithms, one usually includes in the ADT definition a create() operation that yields an instance of the ADT, usually with axioms equivalent to
  • the result of create() is distinct from any instance in use by the algorithm.
This axiom may be strengthened to exclude also partial aliasing with other instances. On the other hand, this axiom still allows implementations of create() to yield a previously created instance that has become inaccessible to the program.

Example: abstract stack (imperative)

As another example, an imperative-style definition of an abstract stack could specify that the state of a stack S can be modified only by the operations
  • push(S, x), where x is some value of unspecified nature;
  • pop(S), that yields a value as a result,
with the constraint that
  • For any value x and any abstract variable V, the sequence of operations { push(S, x); Vpop(S) } is equivalent to Vx.
Since the assignment Vx, by definition, cannot change the state of S, this condition implies that Vpop(S) restores S to the state it had before the push(S, x). From this condition and from the properties of abstract variables, it follows, for example, that the sequence
{ push(S, x); push(S, y); Upop(S); push(S, z); Vpop(S); Wpop(S) }
where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to
{ Uy; Vz; Wx }
Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is,
  • For any values x, y, and any distinct stacks S and T, the sequence { push(S, x); push(T, y) } is equivalent to { push(T, y); push(S, x) }.
An abstract stack definition usually includes also a Boolean-valued function empty(S) and a create() operation that returns a stack instance, with axioms equivalent to
  • create() ≠ S for any prior stack S (a newly created stack is distinct from all previous stacks);
  • empty(create()) (a newly created stack is empty);
  • not empty(push(S, x)) (pushing something into a stack makes it non-empty).

Single-instance style

Sometimes an ADT is defined as if only one instance of it existed during the execution of the algorithm, and all operations were applied to that instance, which is not explicitly notated. For example, the abstract stack above could have been defined with operations push(x) and pop(), that operate on the only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like S in the previous example) to every operation that uses or modifies the implicit instance.

On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the case when a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmenting the definition of the abstract stack with an operation compare(S, T) that checks whether the stacks S and T contain the same items in the same order.

Functional-style definition

Another way to define an ADT, closer to the spirit of functional programming, is to consider each state of the structure as a separate entity. In this view, any operation that modifies the ADT is modeled as a mathematical function that takes the old state as an argument, and returns the new state as part of the result. Unlike the imperative operations, these functions have no side effects. Therefore, the order in which they are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states).

In the functional view, in particular, there is no way (or need) to define an "abstract variable" with the semantics of imperative variables (namely, with fetch and store operations). Instead of storing values into variables, one passes them as arguments to functions.

Example: abstract stack (functional)

For example, a complete functional-style definition of an abstract stack could use the three operations:
  • push: takes a stack state and an arbitrary value, returns a stack state;
  • top: takes a stack state, returns a value;
  • pop: takes a stack state, returns a stack state.
In a functional-style definition there is no need for a create operation. Indeed, there is no notion of "stack instance". The stack states can be thought of as being potential states of a single stack structure, and two stack states that contain the same values in the same order are considered to be identical states. This view actually mirrors the behavior of some concrete implementations, such as linked lists with hash cons.

Instead of create(), a functional-style definition of an abstract stack may assume the existence of a special stack state, the empty stack, designated by a special symbol like Λ or "()"; or define a bottom() operation that takes no arguments and returns this special stack state. Note that the axioms imply that
  • push(Λ, x) ≠ Λ.
In a functional-style definition of a stack one does not need an empty predicate: instead, one can test whether a stack is empty by testing whether it is equal to Λ. 

Note that these axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations are undefined (hence invalid) when s = Λ. On the other hand, the axioms (and the lack of side effects) imply that push(s, x) = push(t, y) if and only if x = y and s = t.

As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In the abstract stack example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack (Λ) after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s, x) = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".

Whether to include complexity

Aside from the behavior in terms of axioms, it is also possible to include, in the definition of an ADT operation, their algorithmic complexity. Alexander Stepanov, designer of the C++ Standard Template Library, included complexity guarantees in the STL specification, arguing:
The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.
— Alexander Stepanov

Advantages of abstract data typing

Encapsulation

Abstraction provides a promise that any implementation of the ADT has certain properties and abilities; knowing these is all that is required to make use of an ADT object. The user does not need any technical knowledge of how the implementation works to use the ADT. In this way, the implementation may be complex but will be encapsulated in a simple interface when it is actually used. 

Localization of change

Code that uses an ADT object will not need to be edited if the implementation of the ADT is changed. Since any changes to the implementation must still comply with the interface, and since code using an ADT object may only refer to properties and abilities specified in the interface, changes may be made to the implementation without requiring any changes in code where the ADT is used.

Flexibility

Different implementations of the ADT, having all the same properties and abilities, are equivalent and may be used somewhat interchangeably in code that uses the ADT. This gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency.

Typical operations

Some operations that are often specified for ADTs (possibly under other names) are
  • compare(s, t), that tests whether two instances' states are equivalent in some sense;
  • hash(s), that computes some standard hash function from the instance's state;
  • print(s) or show(s), that produces a human-readable representation of the instance's state.
In imperative-style ADT definitions, one often finds also
  • create(), that yields a new instance of the ADT;
  • initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial state";
  • copy(s, t), that puts instance s in a state equivalent to that of t;
  • clone(t), that performs screate(), copy(s, t), and returns s;
  • free(s) or destroy(s), that reclaims the memory and other resources used by s.
The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by free.

Hate speech

From Wikipedia, the free encyclopedia ...