Search This Blog

Thursday, April 23, 2026

Unified Modeling Language

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Unified_Modeling_Language

The Unified Modeling Language (UML) is a general-purpose, object-oriented, visual modeling language that provides a way to visualize the architecture and design of a system, similar to the function of a blueprint. UML defines notation for many types of diagrams which focus on aspects such as behavior, interaction, and structure.

UML is both a formal metamodel and a collection of graphical templates. The metamodel defines the elements in an object-oriented model such as classes and properties. It is essentially the same thing as the metamodel in object-oriented programming (OOP), however for OOP, the metamodel is primarily used at run time to dynamically inspect and modify an application object model. The UML metamodel provides a mathematical, formal foundation for the graphic views used in the modeling language to describe an emerging system.

UML was created in an attempt to define a standard language for object-oriented programming at the OOPSLA '95 Conference. Originally, Grady Booch and James Rumbaugh merged their models into a unified model. This was followed by Booch's company Rational Software purchasing Ivar Jacobson's Objectory company and merging their model into the UML. At the time Rational and Objectory were two of the dominant players in the small world of independent vendors of object-oriented tools and methods. The Object Management Group (OMG) then took ownership of UML.

The creation of UML was motivated by the desire to standardize the disparate nature of notational systems and approaches to software design at the time. In 1997, UML was adopted as a standard by the Object Management Group (OMG) and has been managed by this organization ever since. In 2005, UML was also published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) as the ISO/IEC 19501 standard. Since then the standard has been periodically revised to cover the latest revision of UML.

Most developers do not use UML per se, but instead produce more informal diagrams, often hand-drawn. These diagrams, however, often include elements from UML.

Use

UML is primarily used for software development (in any industry or domain) but also used outside elsewhere including business processes, system functions, database schemas, workflow in the legal systems, medical electronics, Health care systems, and hardware design.

UML is designed for use with many object-oriented software development methods, both today and for the methods when it was first developed – including OMT, Booch method, Objectory, and especially RUP, which it was originally intended to be used with when work began at Rational Software. Although originally intended for object-oriented design documentation, UML has been used effectively in other contexts such as modeling business process.

As UML is not inherently linked to a particular programming language, it can be used for modeling a system independent of language. Some UML tools generate source code from a UML model.

Elements

Components in a travel reservation system

UML diagrams support visualizing system aspects like:

In addition to syntactical (notational) elements with well-defined semantics, UML diagrams also allow for free-form comments (notes) that explain aspects such as usage, constraints, and intents.

Sharing

UML models can be exchanged among UML tools via the XML Metadata Interchange (XMI) format.

Cardinality notation

As with database Chen, Bachman, and ISO ER diagrams, class models are specified to use "look-across" cardinalities, even though several authors (Merise, Elmasri & Navathe, amongst others) prefer same-side or "look-here" for roles and both minimum and maximum cardinalities. Recent researchers (Feinerer and Dullea et al.) have shown that the "look-across" technique used by UML and ER diagrams is less effective and less coherent when applied to n-ary relationships of order strictly greater than 2.

Feinerer says: "Problems arise if we operate under the look-across semantics as used for UML associations. Hartmann investigates this situation and shows how and why different transformations fail.", and: "As we will see on the next few pages, the look-across interpretation introduces several difficulties which prevent the extension of simple mechanisms from binary to n-ary associations."

Artifacts

Artifact manifesting components

An artifact is the "specification of a physical piece of information that is used or produced by a software development process, or by deployment and operation of a system" including models, source code, scripts, executables, tables in database systems, development deliverables, a design documents, and email messages.

An artifact is the physical entity that is deployed to a node. Other UML elements such as classes and components are first manifest into artifacts and instances of these artifacts are then deployed. Artifacts can be composed of other artifacts.

Metamodeling

Illustration of the Meta-Object Facility

The OMG developed a metamodeling architecture to define UML, called the Meta-Object Facility (MOF). MOF is designed as a four-layered architecture, as shown in the image at right. It provides a meta-meta model at the top, called the M3 layer. This M3-model is the language used by Meta-Object Facility to build metamodels, called M2-models.

The most prominent example of a Layer 2 Meta-Object Facility model is the UML metamodel, which describes UML itself. These M2-models describe elements of the M1-layer, and thus M1-models. These would be, for example, models written in UML. The last layer is the M0-layer or data layer. It is used to describe runtime instances of the system.

The metamodel can be extended using a mechanism called stereotyping. This has been criticized as being insufficient/untenable by Brian Henderson-Sellers and Cesar Gonzalez-Perez in "Uses and Abuses of the Stereotype Mechanism in UML 1.x and 2.0".

Diagrams

UML 2 defines many types of diagrams – shown as a taxonomy in the image.

Hierarchy of UML 2.2 Diagrams, shown as a class diagram

Structure diagrams

Structure diagrams emphasize the structure of the system – using objects, classifiers, relationships, attributes and operations. They are used to document software architecture.

Behavior diagrams

Behavior diagrams emphasize the behavior of a system by showing collaborations among objects and changes to the internal states of objects. They are used to describe the functionality of a system.

Interaction diagrams

Interaction diagrams, a subset of behavior diagrams, emphasize the flow of control and data between components of a system.

Examples

Adoption

In 2013, UML had been marketed by OMG for many contexts, but aimed primarily at software development with limited success.

It has been treated, at times, as a design silver bullet, which leads to problems. UML misuse includes overuse (designing every part of the system with it, which is unnecessary) and assuming that novices can design with it.

It is considered a large language, with many constructs. Some people (including Jacobson) feel that UML's size hinders learning and therefore uptake.

Visual Studio removed support for UML in 2016 due to lack of use.

History

Timeline and relationships of object-oriented methods and notation

UML has evolved since the second half of the 1990s and has its roots in the object-oriented programming methods developed in the late 1980s and early 1990s. The image shows a timeline of the history of UML and other object-oriented modeling methods and notation.

Origin

Rational Software hired James Rumbaugh from General Electric in 1994 and after that, the company became the source for two of the most popular object-oriented modeling approaches of the day: Rumbaugh's object-modeling technique (OMT) and Grady Booch's method. They were soon assisted in their efforts by Ivar Jacobson, the creator of the object-oriented software engineering (OOSE) method, who joined them at Rational in 1995.

UML 1.x

UML is originally based on the notations of the Booch method, the object-modeling technique (OMT), and object-oriented software engineering (OOSE), which were integrated into a single language. UML was developed at Rational Software in 1994–1995, with further development led by them through 1996.

Under the technical leadership of Rumbaugh, Jacobson, and Booch, a consortium called the UML Partners was organized in 1996 to complete the Unified Modeling Language (UML) specification and propose it to the Object Management Group (OMG) for standardization. The partnership also contained additional interested parties (for example HP, DEC, IBM, and Microsoft). The UML Partners' UML 1.0 draft was proposed to the OMG in January 1997 by the consortium. During the same month, the UML Partners formed a group, designed to define the exact meaning of language constructs, chaired by Cris Kobryn and administered by Ed Eykholt, to finalize the specification and integrate it with other standardization efforts. The result of this work, UML 1.1, was submitted to the OMG in August 1997 and adopted by the OMG in November 1997.

After the first release, a task force was formed to improve the language, which released several minor revisions, 1.3, 1.4, and 1.5.

The standards it produced (as well as the original standard) have been noted as being ambiguous and inconsistent.[34]

UML 2

UML 2.0 major revision replaced version 1.5 in 2005, which was developed with an enlarged consortium to improve the language further to reflect new experiences on the usage of its features.

Although UML 2.1 was never released as a formal specification, versions 2.1.1 and 2.1.2 appeared in 2007, followed by UML 2.2 in February 2009. UML 2.3 was formally released in May 2010. UML 2.4.1 was formally released in August 2011. UML 2.5 was released in October 2012 as an "In progress" version and was officially released in June 2015. The formal version 2.5.1 was adopted in December 2017.

There are four parts to the UML 2.x specification:

  • The Superstructure that defines the notation and semantics for diagrams and their model elements
  • The Infrastructure that defines the core metamodel on which the Superstructure is based
  • The Object Constraint Language (OCL) for defining rules for model elements
  • The UML Diagram Interchange that defines how UML 2 diagram layouts are exchanged

Until UML 2.4.1, the latest versions of these standards were:

  • UML Superstructure version 2.4.1
  • UML Infrastructure version 2.4.1
  • OCL version 2.3.1
  • UML Diagram Interchange version 1.0.

Since version 2.5, the UML Specification has been simplified (without Superstructure and Infrastructure), and the latest versions of these standards are now:

  • UML Specification 2.5.1
  • OCL version 2.4

It continues to be updated and improved by the revision task force, who resolve any issues with the language.

Object-oriented programming

From Wikipedia, the free encyclopedia
UML notation for a class. This Button class has variables for data, and functions. Through inheritance, a subclass can be created as a subset of the Button class. Objects are instances of a class.

Object-oriented programming (OOP) is a programming paradigm based on objects – software entities that encapsulate data and function(s). An OOP computer program consists of objects that interact with one another. An OOP language is one that provides object-oriented programming features, but as the set of features that contribute to OOP is contested, classifying a language as OOP – and the degree to which it supports OOP – is debatable. As paradigms are not mutually exclusive, a language can be multi-paradigm (i.e. categorized as more than only OOP).

Notable languages with OOP support include Ada, ActionScript, C++, Common Lisp, C#, Dart, Eiffel, Fortran 2003, Haxe, JavaJavaScript, Kotlin, Logo, MATLAB, Objective-C, Object Pascal, Perl, PHP, Python, R, Raku, Ruby, Scala, SIMSCRIPT, Simula, Smalltalk, Swift, Vala and Visual Basic (.NET).

History

The idea of "objects" in programming began with the artificial intelligence group at Massachusetts Institute of Technology (MIT) in the late 1950s and early 1960s. Here, "object" referred to LISP atoms with identified properties (attributes). Another early example was Sketchpad created by Ivan Sutherland at MIT in 1960–1961. In the glossary of his technical report, Sutherland defined terms like "object" and "instance" (with the class concept covered by "master" or "definition"), albeit specialized to graphical interaction. Later, in 1968, AED-0, MIT's version of the ALGOL programming language, connected data structures ("plexes") and procedures, prefiguring what were later termed "messages", "methods", and "member functions". Topics such as data abstraction and modular programming were common points of discussion at this time.

Meanwhile, in Norway, Simula was developed during the years 1961–1967. Simula introduced essential object-oriented ideas, such as classes, inheritance, and dynamic binding. Simula was used mainly by researchers involved with physical modelling, like the movement of ships and their content through cargo ports. Simula is generally accepted as being the first language with the primary features and framework of an object-oriented language.

I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning – it took a while to see how to do messaging in a programming language efficiently enough to be useful).

— Alan Kay,

Influenced by both MIT and Simula, Alan Kay began developing his own ideas in November 1966. He would go on to create Smalltalk, an influential OOP language. By 1967, Kay was already using the term "object-oriented programming" in conversation. Although sometimes called the "father" of OOP, Kay has said his ideas differ from how OOP is commonly understood, and has implied that the computer science establishment did not adopt his notion. A 1976 MIT memo co-authored by Barbara Liskov lists Simula 67, CLU, and Alphard as object-oriented languages, but does not mention Smalltalk.

In the 1970s, the first version of the Smalltalk programming language was developed at Xerox PARC by Alan Kay, Dan Ingalls and Adele Goldberg. Smalltalk-72 was notable for use of objects at the language level and its graphical development environment. Smalltalk was a fully dynamic system, allowing users to create and modify classes as they worked. Much of the theory of OOP was developed in the context of Smalltalk, for example multiple inheritance.

In the late 1970s and 1980s, OOP rose to prominence. The Flavors object-oriented Lisp was developed starting 1979, introducing multiple inheritance and mixins. In August 1981, Byte Magazine highlighted Smalltalk and OOP, introducing these ideas to a wide audience. LOOPS, the object system for Interlisp-D, was influenced by Smalltalk and Flavors, and a paper about it was published in 1982. In 1986, the first Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) was attended by 1,000 people. This conference marked the start of efforts to consolidate Lisp object systems, eventually resulting in the Common Lisp Object System. In the 1980s, there were a few attempts to design processor architectures that included hardware support for objects in memory, but these were not successful. Examples include the Intel iAPX 432 and the Linn Smart Rekursiv.

In the mid-1980s, new object-oriented languages like Objective-C, C++, and the Eiffel language emerged. Objective-C was developed by Brad Cox, who had used Smalltalk at ITT Inc. Bjarne Stroustrup created C++ based on his experience using Simula for his PhD thesis. Bertrand Meyer produced the first design of the Eiffel language in 1985, which focused on software quality using a design by contract approach.

In the 1990s, OOP became the main way of programming, especially as more languages supported it. These included Visual FoxPro 3.0, C++, and Delphi. OOP became even more popular with the rise of graphical user interfaces, which used objects for buttons, menus and other elements. One well-known example is Apple's Cocoa framework, used on macOS and written in Objective-C. OOP toolkits also enhanced the popularity of event-driven programming.

At ETH Zürich, Niklaus Wirth and his colleagues created new approaches to OOP. Modula-2 (1978) and Oberon (1987), included a distinctive approach to object orientation, classes, and type checking across module boundaries. Inheritance is not obvious in Wirth's design since his nomenclature looks in the opposite direction: It is called type extension and the viewpoint is from the parent down to the inheritor.

Many programming languages that were initially developed before OOP was popular have been augmented with object-oriented features, including Ada, BASIC, Fortran, Pascal, and COBOL.

Features

The OOP features provided by languages varies. Below are some common features of OOP languages. Comparing OOP with other styles, like relational programming, is difficult because there isn't a clear, agreed-upon definition of OOP.

Encapsulation and information hiding

Information hiding and encapsulation can refer to several related concepts:

  • Cohesion, keeping related fields and methods together. A field (a.k.a. attribute or property) contains information (a.k.a. state) as a variable. A method (a.k.a. function or action) defines behavior via logic code.
  • Decoupling, organizing code so that only certain parts of the data are used by related functions. Decoupling makes it easier to change how an object works on the inside without affecting other parts of the codebase, such as in code refactoring. Objects act as a boundary between their internal workings and external, consuming code.
  • Data hiding, keeping the internal details of an object hidden from outside code. Consuming code can only interact with an object via its public members, due to the language providing access modifiers that control visibility.

Some programming languages, like Java, provide information hiding via visibility key words (private and public). Some languages like Python don't provide a visibility feature, but developers might follow a convention such as starting a private member name with an underscore. Intermediate levels of access also exist, such as Java's protected keyword, (which allows access from the same class and its subclasses, but not objects of a different class), and the internal keyword in C#, Swift, and Kotlin, which restricts access to files within the same module.

Supporters of information hiding and data abstraction say it makes code easier to reuse and intuitively represents real-world situations. However, others argue that OOP does not enhance readability or modularity. Eric S. Raymond has written that OOP languages tend to encourage thickly layered programs that destroy transparency. Raymond compares this unfavourably to the approach taken with Unix and the C language.

SOLID includes the open/closed principle, which says that classes and functions should be "open for extension, but closed for modification". Luca Cardelli has stated that OOP languages have "extremely poor modularity properties with respect to class extension and modification", and tend to be extremely complex. The latter point is reiterated by Joe Armstrong, the principal inventor of Erlang, who is quoted as saying:

The problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

Leo Brodie says that information hiding can lead to duplicate code, which goes against the don't repeat yourself rule of software development.

Inheritance

Inheritance can be supported via the class or the prototype, which have differences but use similar terms like object and instance.

Class-based

In class-based programming, the most common type of OOP, an object is an instance of a class. The class defines the data (variables) and methods (logic). An object is created via the constructor. Every instance of the class has the same set of variables and methods. Elements may include:

  • Class variable – belongs to the class itself; all objects of the class share one copy
  • Instance variable – belongs to an object; every object has its own version of these variables
  • Member variable – refers to both the class and instance variables of a class
  • Class method – can only use class variables
  • Instance method – belongs to an object; can use both instance and class variables

Classes may inherit from other classes, creating a hierarchy of classes: a case of a subclass inheriting from a super-class. For example, an Employee class might inherit from a Person class which endows the Employee object with the variables from Person. The subclass may add variables and methods that do not affect the super-class. Most languages also allow the subclass to override super-class methods. Some languages support multiple inheritance, where a class can inherit from more than one class, and other languages similarly support mixins or traits. For example, a mixin called UnicodeConversionMixin might add a method unicode_to_ascii() to both a FileReader and a WebPageScraper class.

An abstract class cannot be directly instantiated as an object. It is only used as a super-class.

Other classes are utility classes which contain only class variables and methods and are not meant to be instantiated or subclassed.

Prototype-based

Instead of providing a class concept, in prototype-based programming, an object is linked to another object, called its prototype or parent. In Self, an object may have multiple or no parents, but in the most popular prototype-based language, JavaScript, an object has exactly one prototype link, up to the base object whose prototype is null.

A prototype acts as a model for new objects. For example, if you have an object fruit, you can make two objects apple and orange that share traits of the fruit prototype. Prototype-based languages also allow objects to have their own unique properties, so the apple object might have an attribute sugar_content, while the orange or fruit objects do not.

No inheritance

In all OOP languages, via object composition, an object can contain other objects. For example, an Employee object might contain an Address object, along with other information like name and position. Composition is a "has-a" relationships, like "an employee has an address". Some languages, like Go, don't support inheritance. Instead, they encourage "composition over inheritance", where objects are built using smaller parts instead of parent-child relationships. For example, instead of inheriting from class Person, the Employee class could simply contain a Person object. This lets the Employee class control how much of Person it exposes to other parts of the program. Delegation is another language feature that can be used as an alternative to inheritance.

Programmers have different opinions on inheritance. Bjarne Stroustrup, author of C++, has stated that it is possible to do OOP without inheritance. Rob Pike has criticized inheritance for creating complex hierarchies instead of simpler solutions.

Inheritance and behavioral subtyping

People often think that if one class inherits from another, it means the subclass "is a" more specific version of the original class. This presumes the program semantics are that objects from the subclass can always replace objects from the original class without problems. This concept is known as behavioral subtyping, more specifically the Liskov substitution principle.

However, this is often not true, especially in programming languages that allow mutable objects, objects that change after they are created. In fact, subtype polymorphism as enforced by the type checker in OOP languages cannot guarantee behavioral subtyping in most if not all contexts. For example, the circle-ellipse problem is notoriously difficult to handle using OOP's concept of inheritance. Behavioral subtyping is undecidable in general, so it cannot be easily implemented by a compiler. Because of this, programmers must carefully design class hierarchies to avoid mistakes that the programming language itself cannot catch.

Dynamic dispatch

A method may be invoked via dynamic dispatch such that the method is selected at runtime instead of compile time. If the method choice depends on more than one type of object (such as other objects passed as parameters), it's called multiple dispatch. In this context, a method call is also known as message passing, meaning the method name and its inputs are like a message sent to the object for it to act on.

Dynamic dispatch works together with inheritance: if an object doesn't have the requested method, it looks up to its parent class (delegation), and continues up the chain to find a matching method.

Polymorphism

Polymorphism in OOP refers to subtyping or subtype polymorphism, where a function can work with a specific interface and thus manipulate entities of different classes in a uniform manner.

For example, imagine a program has two shapes: a circle and a square. Both come from a common class called "Shape." Each shape has its own way of drawing itself. With subtype polymorphism, the program doesn't need to know the type of each shape, and can simply call the "Draw" method for each shape. The programming language runtime will ensure the correct version of the "Draw" method runs for each shape. Because the details of each shape are handled inside their own classes, this makes the code simpler and more organized, enabling strong separation of concerns.

Open recursion

An object's methods can access the object's data. Many programming languages use a special word, like this or self, to refer to the current object. In languages that support open recursion, a method in an object can call other methods in the same object, including itself, using this special word. This allows a method in one class to call another method defined later in a subclass, a feature known as late binding.

Design patterns

Design patterns are common solutions to problems in software design. Some design patterns are especially useful for OOP, and design patterns are typically introduced in an OOP context.

Real-world modeling and relationships

Sometimes, objects represent real-world things and processes in digital form. For example, a graphics program may have objects such as circle, square, and menu. An online shopping system might have objects such as shopping cart, customer, and product. Niklaus Wirth said, "This paradigm [OOP] closely reflects the structure of systems in the real world and is therefore well suited to model complex systems with complex behavior".

However, more often, objects represent abstract entities, like an open file or a unit converter. Not everyone agrees that OOP makes it easy to copy the real world exactly or that doing so is even necessary. Bob Martin suggests that because classes are software, their relationships don't match the real-world relationships they represent. Bertrand Meyer argues that a program is not a model of the world but a model of some part of the world; "Reality is a cousin twice removed". Steve Yegge noted that natural languages lack the OOP approach of naming a thing (object) before an action (method), as opposed to functional programming which does the reverse. This can make an OOP solution more complex than one written via procedural programming.

Object patterns

The following are notable software design patterns for OOP objects.

A common anti-pattern is the God object, an object that knows or does too much.

Gang of Four design patterns

Design Patterns: Elements of Reusable Object-Oriented Software is a famous book published in 1994 by four authors: Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. People often call them the "Gang of Four". The book talks about the strengths and weaknesses of OOP and explains 23 common ways to solve programming problems.

These solutions, called "design patterns," are grouped into three types:

Object-orientation and databases

Both OOP and relational database management systems (RDBMSs) are widely used in software today. However, relational databases don't store objects directly, which creates a challenge when using them together. This issue is called object-relational impedance mismatch.

To solve this problem, developers use different methods, but none of them are perfect. One of the most common solutions is object-relational mapping (ORM), which helps connect object-oriented programs to relational databases. Examples of ORM tools include Visual FoxPro, Java Data Objects, and Ruby on Rails ActiveRecord.

Some databases, called object databases, are designed to work with OOP. However, they have not been as popular or successful as relational databases.

Date and Darwen have proposed a theoretical foundation that uses OOP as a kind of customizable type system to support RDBMSs, but it forbids objects containing pointers to other objects.

Responsibility- vs. data-driven design

In responsibility-driven design, classes are built around what they need to do and the information they share, in the form of a contract. This is different from data-driven design, where classes are built based on the data they need to store. According to Wirfs-Brock and Wilkerson, the originators of responsibility-driven design, responsibility-driven design is the better approach.

SOLID and GRASP guidelines

SOLID is a set of five rules for designing good software, created by Michael Feathers:

GRASP (General Responsibility Assignment Software Patterns) is another set of software design rules, created by Craig Larman, that helps developers assign responsibilities to different parts of a program:

  • Creator Principle: allows classes create objects they closely use.
  • Information Expert Principle: assigns tasks to classes with the needed information.
  • Low Coupling Principle: reduces class dependencies to improve flexibility and maintainability.
  • High Cohesion Principle: designing classes with a single, focused responsibility.
  • Controller Principle: assigns system operations to separate classes that manage flow and interactions.
  • Polymorphism: allows different classes to be used through a common interface, promoting flexibility and reuse.
  • Pure Fabrication Principle: create helper classes to improve design, boost cohesion, and reduce coupling.

Formal semantics

Researchers have tried to formally define the semantics of OOP. Inheritance presents difficulties, particularly with the interactions between open recursion and encapsulated state. Researchers have used recursive types and co-algebraic data types to incorporate essential features of OOP. Abadi and Cardelli defined several extensions of System F<: that deal with mutable objects, allowing both subtype polymorphism and parametric polymorphism (generics), and were able to formally model many OOP concepts and constructs. Although far from trivial, static analysis of object-oriented programming languages such as Java is a mature field, with several commercial tools.

Popularity and reception

The TIOBE programming language popularity index graph from 2002 to 2023. In the 2000s the object-oriented Java (orange) and the procedural C (dark blue) competed for the top position.

Many popular programming languages, like C++, Java, and Python, use OOP. In the past, OOP was widely accepted, but recently, some programmers have criticized it and prefer functional programming instead. A study by Potok et al. found no major difference in productivity between OOP and procedural programming.

Some believe that OOP places too much focus on using objects rather than on algorithms and data structures. For example, programmer Rob Pike pointed out that OOP can make programmers think more about type hierarchy than composition. He has called OOP "the Roman numerals of computing". Rich Hickey, creator of Clojure, described OOP as overly simplistic, especially when it comes to representing real-world things that change over time. Alexander Stepanov said that OOP tries to fit everything into a single type, which can be limiting. He argued that sometimes we need multisorted algebras: families of interfaces that span multiple types, such as in generic programming. Stepanov also said that calling everything an "object" doesn't add much understanding.

OOP was created to make code easier to reuse and maintain. However, it was not designed to clearly show the flow of a program's instructions. That was left to the compiler. As computers began using more parallel processing and multiple threads, it became more important to understand and control how instructions flow. This is difficult to do with OOP.

Paul Graham believes big companies like OOP because it helps manage large teams of average programmers. He argues that OOP adds structure, making it harder for one person to make serious mistakes, but at the same time restrains smart programmers. Eric S. Raymond, a Unix programmer and open-source software advocate, argues that OOP is not the best way to write programs.

Richard Feldman says that, while OOP features helped some languages stay organized, their popularity comes from other reasons. Lawrence Krubner argues that OOP doesn't offer special advantages compared to other styles, like functional programming, and can complicate coding. Luca Cardelli says that OOP is slower and takes longer to compile than procedural programming.

Wednesday, April 22, 2026

Context switch

From Wikipedia, the free encyclopedia

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU) and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process – a program in execution – uses the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and loads the following running process is called a context switch.

The precise meaning of the phrase "context switch" varies. In a multitasking context, it refers to the process of storing the system state for one task, so that task can be paused and another task resumed. A context switch can also occur as the result of an interrupt, such as when a task needs to access disk storage, freeing up CPU time for other tasks. Some operating systems also require a context switch to move between user mode and kernel mode tasks. The process of context switching can have a negative impact on system performance.

Cost

Context switches are usually computationally intensive, and much of the design of operating systems is to optimize the use of context switches. Switching from one process to another requires a certain amount of time for doing the administration – saving and loading registers and memory maps, updating various tables and lists, etc. What is actually involved in a context switch depends on the architectures, operating systems, and the number of resources shared (threads that belong to the same process share many resources compared to unrelated non-cooperating processes).

For example, in the Linux kernel, context switching involves loading the corresponding process control block (PCB) stored in the PCB table in the kernel stack to retrieve information about the state of the new process. CPU state information including the registers, stack pointer, and program counter as well as memory management information like segmentation tables and page tables (unless the old process shares the memory with the new) are loaded from the PCB for the new process. To avoid incorrect address translation in the case of the previous and current processes using different memory, the translation lookaside buffer (TLB) must be flushed. This negatively affects performance because every memory reference to the TLB will be a miss because it is empty after most context switches.

Furthermore, analogous context switching happens between user threads, notably green threads, and is often very lightweight, saving and restoring minimal context. In extreme cases, such as switching between goroutines in Go, a context switch is equivalent to a coroutine yield, which is only marginally more expensive than a subroutine call.

Switching cases

There are three potential triggers for a context switch:

Multitasking

Most commonly, within some scheduling scheme, one process must be switched out of the CPU so another process can run. This context switch can be triggered by the process making itself unrunnable, such as by waiting for an I/O or synchronization operation to complete. On a pre-emptive multitasking system, the scheduler may also switch out processes that are still runnable. To prevent other processes from being starved of CPU time, pre-emptive schedulers often configure a timer interrupt to fire when a process exceeds its time slice. This interrupt ensures that the scheduler will gain control to perform a context switch.

Interrupt handling

Modern architectures are interrupt driven. This means that if the CPU requests data from a disk, for example, it does not need to busy-wait until the read is over; it can issue the request (to the I/O device) and continue with some other task. When the read is over, the CPU can be interrupted (by a hardware in this case, which sends interrupt request to PIC) and presented with the read. For interrupts, a program called an interrupt handler is installed, and it is the interrupt handler that handles the interrupt from the disk.

When an interrupt occurs, the hardware automatically switches a part of the context (at least enough to allow the handler to return to the interrupted code). The handler may save additional context, depending on details of the particular hardware and software designs. Often only a minimal part of the context is changed in order to minimize the amount of time spent handling the interrupt. The kernel does not spawn or schedule a special process to handle interrupts, but instead the handler executes in the (often partial) context established at the beginning of interrupt handling. Once interrupt servicing is complete, the context in effect before the interrupt occurred is restored so that the interrupted process can resume execution in its proper state.

User and kernel mode switching

When the system transitions between user mode and kernel mode, a context switch is not necessary; a mode transition is not by itself a context switch. However, depending on the operating system, a context switch may also take place at this time.

Steps

The state of the currently executing process must be saved so it can be restored when rescheduled for execution.

The process state includes all the registers that the process may be using, especially the program counter, plus any other operating system specific data that may be necessary. This is usually stored in a data structure called a process control block (PCB) or switchframe.

The PCB might be stored on a per-process stack in kernel memory (as opposed to the user-mode call stack), or there may be some specific operating system-defined data structure for this information. A handle to the PCB is added to a queue of processes that are ready to run, often called the ready queue.

Since the operating system has effectively suspended the execution of one process, it can then switch context by choosing a process from the ready queue and restoring its PCB. In doing so, the program counter from the PCB is loaded, and thus execution can continue in the chosen process. Process and thread priority can influence which process is chosen from the ready queue (i.e., it may be a priority queue).

Examples

The details vary depending on the architecture and operating system, but these are common scenarios.

No context switch needed

Considering a general arithmetic addition operation A = B + 1. The instruction is stored in the instruction register, and the program counter is incremented. A and B are read from memory and are stored in registers R1, R2 respectively. In this case, B + 1 is calculated and written in R1 as the final answer. This operation only requires sequential reads and writes, and there's no waits for function calls used, hence no context switch/wait takes place in this case.

Context switch caused by interrupt

Suppose a process A is running and a timer interrupt occurs. The user registers — program counter, stack pointer, and status register — of process A are then implicitly saved by the CPU onto the kernel stack of A. Then, the hardware switches to kernel mode and jumps into interrupt handler for the operating system to take over. Then the operating system calls the switch() routine to first save the general-purpose user registers of A onto A's kernel stack, then it saves A's current kernel register values into the PCB of A, restores kernel registers from the PCB of process B, and switches context, that is, changes kernel stack pointer to point to the kernel stack of process B. The operating system then returns from interrupt. The hardware then loads user registers from B's kernel stack, switches to user mode, and starts running process B from B's program counter.

Performance

Context switching itself has a cost in performance, due to running the task scheduler, TLB flushes, and indirectly due to sharing the CPU cache between multiple tasks. Switching between threads of a single process can be faster than between two separate processes because threads share the same virtual memory maps, so a TLB flush is not necessary.

The time to switch between two separate processes is called the process switching latency. The time to switch between two threads of the same process is called the thread switching latency. The time from when a hardware interrupt is generated to when the interrupt is serviced is called the interrupt latency.

Switching between two processes in a single address space operating system can be faster than switching between two processes in an operating system with private per-process address spaces.

Hardware vs. software

Context switching can be performed primarily by software or hardware. Some processors, like the Intel 80386 and its successors, have hardware support for context switches, by making use of a special data segment designated the task state segment (TSS). A task switch can be explicitly triggered with a CALL or JMP instruction targeted at a TSS descriptor in the global descriptor table. It can occur implicitly when an interrupt or exception is triggered if there is a task gate in the interrupt descriptor table (IDT). When a task switch occurs, the CPU can automatically load the new state from the TSS.

As with other tasks performed in hardware, one would expect this to be rather fast; however, mainstream operating systems, including Windows and Linux, do not use this feature. This is mainly due to two reasons:

  • Hardware context switching does not save all the registers (only general-purpose registers, not floating-point registers — although the TS bit is automatically turned on in the CR0 control register, resulting in a fault when executing floating-point instructions and giving the OS the opportunity to save and restore the floating-point state as needed).
  • Associated performance issues, e.g., software context switching can be selective and store only those registers that need storing, whereas hardware context switching stores nearly all registers whether they are required or not.

Unified Modeling Language

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Unified_Modeling_Language The Unified Mode...