Search This Blog

Saturday, November 24, 2018

MathML

From Wikipedia, the free encyclopedia

MathML
Developed byWorld Wide Web Consortium
Type of formatMarkup language
Extended fromXML
StandardW3C MathML

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and an ISO standard ISO/IEC DIS 40314 since 2015.

History

MathML 1 was released as a W3C recommendation in April 1998 as the first XML language to be recommended by the W3C. Version 1.01 of the format was released in July 1999 and version 2.0 appeared in February 2001.

In October 2003, the second edition of MathML Version 2.0 was published as the final release by the W3C math working group.

MathML was originally designed before the finalization of XML namespaces. However it was assigned a namespace immediately after the Namespace Recommendation was completed, and for XML use, the elements should be in the namespace with namespace URI http://www.w3.org/1998/Math/MathML. When MathML is used in HTML (as opposed to XML) this namespace is automatically inferred by the HTML parser and need not be specified in the document.

MathML version 3

Version 3 of the MathML specification was released as a W3C Recommendation on 20 October 2010. A recommendation of A MathML for CSS Profile was later released on 7 June 2011; this is a subset of MathML suitable for CSS formatting. Another subset, Strict Content MathML, provides a subset of content MathML with a uniform structure and is designed to be compatible with OpenMath. Other content elements are defined in terms of a transformation to the strict subset. New content elements include which associates bound variables () to expressions, for example a summation index. The new element allows structure sharing.

The development of MathML 3.0 went through a number of stages. In June 2006 the W3C rechartered the MathML Working Group to produce a MathML 3 Recommendation until February 2008 and in November 2008 extended the charter to April 2010. A sixth Working Draft of the MathML 3 revision was published in June 2009. On 10 August 2010 version 3 graduated to become a "Proposed Recommendation" rather than a draft.

The Second Edition of MathML 3.0 was published as a W3C Recommendation on April 10, 2014.[3] The specification was approved as an ISO/IEC international standard 40314:2015 on June 23, 2015.

Presentation and semantics

MathML deals not only with the presentation but also the meaning of formula components (the latter part of MathML is known as “Content MathML”). Because the meaning of the equation is preserved separate from the presentation, how the content is communicated can be left up to the user. For example, web pages with MathML embedded in them can be viewed as normal web pages with many browsers, but visually impaired users can also have the same MathML read to them through the use of screen readers (e.g. using the MathPlayer plugin for Internet Explorer, Opera 9.50 build 9656+ or the Fire Vox extension for Firefox).

Presentation MathML

Presentation MathML focuses on the display of an equation, and has about 30 elements. The elements' names all begin with m. A Presentation MathML expression is built up out of tokens that are combined using higher-level elements, which control their layout (there are also about 50 attributes, which mainly control fine details).

Token elements generally only contain characters (not other elements). They include:
  • x – identifiers;
  • + – operators;
  • 2 – numbers.
  • non zero – text.
Note however that these token elements may be used as extension points, allowing markup in host languages. MathML in HTML5 allows most inline HTML markup in mtext, and
  • non zero
is conforming, with the HTML markup being used within the MathML to mark up the embedded text (making the first word bold in this example).

These are combined using layout elements, that generally contain only elements. They include:
  • – a horizontal row of items;
  • , , and others – superscripts, limits over and under operators like sums, etc.;
  • – fractions;
  • and – roots;
  • - surrounding content with fences, such as parentheses.
As usual in HTML and XML, many entities are available for specifying special symbols by name, such as π and . An interesting feature of MathML is that entities also exist to express normally-invisible operators, such as for implicit multiplication. They are: U+2061 FUNCTION APPLICATION; U+2062 INVISIBLE TIMES; U+2063 INVISIBLE SEPARATOR; and U+2064 INVISIBLE PLUS. The full specification of MathML entities  is closely coordinated with the corresponding specifications for use with HTML and XML in general.
Thus, the expression requires two layout elements: one to create the overall horizontal row and one for the superscripted exponent. Including only the layout elements and the (not yet marked up) bare tokens, the structure looks like this:

    
      a  x 2
+ b x + c
However, the individual tokens also have to be identified as identifiers (mi), operators (mo), or numbers (mn). Adding the token markup, the full form ends up as:

    
      a
x2 +bx +c
A valid MathML document typically consists of the XML declaration, DOCTYPE declaration, and document element. The document body then contains MathML expressions which appear in elements as needed in the document. Often, MathML will be embedded in more general documents, such as HTML, DocBook, or other XML schemas. A complete document that consists of just the MathML example above, is shown here: 

 
  
           "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
   xmlns="http://www.w3.org/1998/Math/MathML">
    
      a
x 2 + b x + c

Content MathML

Content MathML focuses on the semantics, or meaning, of the expression rather than its layout. Central to Content MathML is the element that represents function application. The function being applied is the first child element under , and its operands or parameters are the remaining child elements. Content MathML uses only a few attributes.

Tokens such as identifiers and numbers are individually marked up, much as for Presentation MathML, but with elements such as ci and cn. Rather than being merely another type of token, operators are represented by specific elements, whose mathematical semantics are known to MathML: times, power, etc. There are over a hundred different elements for different functions and operators.

For example, x
represents and x5 represents . The elements representing operators and functions are empty elements, because their operands are the other elements under the containing .
The expression could be represented as
 

    
        
        
            
            a
x 2 b x c
Content MathML is nearly isomorphic to expressions in a functional language such as Scheme. ... amounts to Scheme's (...), and the many operator and function elements amount to Scheme functions. With this trivial literal transformation, plus un-tagging the individual tokens, the example above becomes:

  (plus
    (times a (power x 2))
    (times b x)
    c)

This reflects the long-known close relationship between XML element structures, and LISP or Scheme S-expressions.

Example and comparison to other formats

The well-known quadratic formula:
would be marked up using LaTeX syntax like this:

x=\frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

in troff/eqn like this:
 
x={-b +- sqrt{b sup 2 – 4ac}} over 2a

in Apache OpenOffice Math and LibreOffice Math like this (all three are valid):
 
x={-b plusminus sqrt {b^2 – 4 ac}} over {2 a}
x={-b +- sqrt {b^2 – 4ac}} over 2a
x={-b ± sqrt {b^2 – 4ac}} over 2a

in AsciiMath like this:
 
x=(-b +- sqrt(b^2 – 4ac))/(2a)

The above equation could be represented in Presentation MathML as an expression tree made up from layout elements like mfrac or msqrt elements:
 
 mode="display" xmlns="http://www.w3.org/1998/Math/MathML">
 
  
    x
= form="prefix"> b ± b 2 4 a c 2 a encoding="TeX"> x=\frac{-b\pm\sqrt{b^2-4ac}}{2a} encoding="StarMath 5.0"> x={-b plusminus sqrt {b^2 - 4 ac}} over {2 a}
This example uses the element, which can be used to embed a semantic annotation in non-XML format, for example to store the formula in the format used by an equation editor such as StarMath or the markup using LaTeX syntax.

Although less compact than TeX, the XML structuring promises to make it widely usable and allows for instant display in applications such as Web browsers and facilitates an interpretation of its meaning in mathematical software products. MathML is not intended to be written or edited directly by humans.

Embedding MathML in HTML/XHTML files

MathML, being XML, can be embedded inside other XML files such as XHTML files using XML namespaces. Recent browsers such as Firefox 3+ and Opera 9.6+ (support incomplete) can display Presentation MathML embedded in XHTML.



  "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  
    </span>Example of MathML embedded in an XHTML file<span class="nt">
     name="description" content="Example of MathML embedded in an XHTML file"/>
  

Example of MathML embedded in an XHTML file The area of a circle is xmlns="http://www.w3.org/1998/Math/MathML"> π r 2 . nofloat
A rendering of the formula for a circle in MathML+XHTML using Firefox 22 on Mac OS X
Inline MathML is also supported in HTML5 files in the current versions of WebKit (Safari and JavaFX/WebView ), Gecko (Firefox). There is no need to specify namespaces like in the XHTML.


<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Example of MathML embedded in an HTML5 file</title>
  </head>
  <body>
    <h1>Example of MathML embedded in an HTML5 file</h1>
    <p>
      The area of a circle is 
      <math>
        <mi>π</mi>
        <mo></mo>
        <msup>
          <mi>r</mi>
          <mn>2</mn>
        </msup>
      </math>.
    </p>
  </body>
</html>

Software support

Web browsers

Of the major web browsers, Gecko-based browsers (e.g., Firefox and Camino) have the most complete native support for MathML.

While the WebKit layout engine has a development version of MathML,[10] this feature is only available in version 5.1 and higher of Safari, Chrome 24 but not in later versions of Chrome. Google removed support of MathML claiming architectural security issues and low usage do not justify their engineering time. As of October 2013, the WebKit/Safari implementation has numerous bugs.

JavaFX/WebView. Also based on WebKit, the JavaFX embedded web browser supports MathML starting with JavaFX 8 Update 192 and JavaFX 11 versions. Support is broken, in JavaFX 8 previous versions, JavaFX 9 and JavaFX 10.

Opera, between version 9.5 and 12, supports MathML for CSS profile,[17][18] but is unable to position diacritical marks properly. Prior to version 9.5 it required User JavaScript or custom stylesheets to emulate MathML support. Starting with Opera 14, Opera drops support for MathML by switching to the Chromium 25 engine.

Internet Explorer does not support MathML natively. Support for IE6 through IE9 can be added by installing the MathPlayer plugin. IE10 has some crashing bugs with MathPlayer and Microsoft decided to completely disable in IE11 the binary plug-in interface that MathPlayer needs. MathPlayer has a license that may limit its use or distribution in commercial webpages and software. Using or distributing the MathPlayer plugin to display HTML content via the WebBrowser control in commercial software may also be forbidden by this license.

The KHTML-based Konqueror currently does not provide support for MathML.

The quality of rendering of MathML in a browser depends on the installed fonts. The STIX Fonts project have released a comprehensive set of mathematical fonts under an open license. The Cambria Math font supplied with Microsoft Windows had a slightly more limited support.

According to a member of the MathJax team, none of the major browser makers paid any of their developers for any MathML-rendering work; whatever support exists is overwhelmingly the result of unpaid volunteer time/work.

In 2015 the MathML Association was founded to support the adoption of the MathML standard.

Editors

Some editors with native MathML support (including copy and paste of MathML) are MathFlow and MathType from Design Science, MathML Kit for Adobe Creative Suite from SCAND Ltd., MathMagic, Publicon from Wolfram Research, and WIRIS. Full MathML editor list at W3C.

MathML is also supported by major office products such as Apache OpenOffice (via OpenOffice Math), LibreOffice (via LibreOffice Math), Calligra Suite (former KOffice), Apple's Pages[30] and MS Office 2007, as well as mathematical software products such as Mathematica, Maple and the Windows version of the Casio ClassPad 300. The W3C Browser/Editor Amaya can also be mentioned as a WYSIWYG MathML-as-is editor.

Firemath, an addon for Firefox, provides a WYSIWYG MathML editor.

Most editors will only produce presentation MathML. The MathDox formula editor is an OpenMath editor also providing presentation and content MathML. Formulator MathML Weaver uses WYSIWYG style to edit Presentation, Content and mixed markups of MathML.

Handwriting recognition

Web Equation can convert handwriting to MathML. Windows 7 has a built-in tool called Math Input Panel. It converts handwriting to MathML. (Unlike the Microsoft Office suite, the Math Input Panel does not use the OMML format, but Office applications can convert/paste from MathML into their preferred internal format.) The underlying technology is also exposed for use in other applications as an ActiveX control called Math Input Control but ActiveX is now deprecated and will not necessarily have support in future Microsoft software such as their browser Microsoft Edge.

Conversion

Several utilities for converting to and from MathML are available. W3.org maintains a list of MathML related software for download.

Web conversion

ASCIIMathML provides a JavaScript library to rewrite a convenient Wiki-like text syntax used inline in web pages into MathML on the fly; it works in Gecko-based browsers, and Internet Explorer with MathPlayer. LaTeXMathML does the same for (a subset of) the standard LaTeX mathematical syntax. ASCIIMathML syntax would also be quite familiar to anyone used to electronic scientific calculators.

MathJax, a JavaScript library for inline rendering of mathematical formulae expressed in LaTeX/AsciiMath/MathML, can also be used to translate LaTeX or AsciiMath into MathML for direct interpretation by the browser.

Equation Server for .NET from soft4science can be used on the server side (ASP.NET) for TeX-Math (Subset of LaTeX math syntax) to MathML conversion. It can also create bitmap images (Png, Jpg, Gif, etc.) from TeX-Math or MathML input.

jqMath is a JavaScript module that dynamically converts a simple TeX-like syntax to MathML if the browser supports it, else simple HTML and CSS.

LaTeXML is a full reimplementation of the TeX typesetting system, capable of converting LaTeX documents to HTML and ePub, optionally either using MathML or converting mathematical expressions to PNG or SVG images. It is one of few tools that also provide optional Content MathML output for the converted equations, as well as fine-grained cross-references between the presentation and content representations. LaTeXML also offers optional MathML annotations with the original LaTeX source.

Support of software developers

Support of MathML format accelerates software application development in such various topics as computer-aided education (distance learning, electronic textbooks and other classroom materials); automated creation of attractive reports; computer algebra systems; authoring, training, publishing tools (both for web and desktop-oriented), and many other applications for mathematics, science, business, economics, etc. Several software vendors propose a component edition of their MathML editors, thus providing the easy way for software developers to insert mathematics rendering/editing/processing functionality in their applications. For example, Formulator ActiveX Control from Hermitech Laboratory can be incorporated into an application as a MathML-as-is editor, Design Science offer a toolkit for building web pages that include interactive math (MathFlow Developers Suite).

Other standards

Another standard called OpenMath that has been designed (largely by the same people who devised Content MathML) more specifically for storing formulae semantically can also be used to complement MathML. OpenMath data can be embedded in MathML using the element. OpenMath content dictionaries can be used to define the meaning of elements. The following would define P1(x) to be the first Legendre polynomial
 

   encoding="OpenMath" definitionURL="http://www.openmath.org/cd/contrib/cd/orthpoly1.xhtml#legendreP">
    P
1 x
The OMDoc format has been created for markup of larger mathematical structures than formulae, from statements like definitions, theorems, proofs, or example, to theories and text books. Formulae in OMDoc documents can either be written in Content MathML or in OpenMath; for presentation, they are converted to Presentation MathML.

The ISO/IEC standard Office Open XML (OOXML) defines a different XML math syntax, derived from Microsoft Office products. However, it is partially compatible through relatively simple XSL Transformations.

XHTML

From Wikipedia, the free encyclopedia

XHTML
Filename extension.xhtml, .xht,
.xml, .html, .htm
Internet media typeapplication/xhtml+xml
Developed byWorld Wide Web Consortium
Initial release26 January 2000
Latest release
5.0
(28 October 2014; 4 years ago)
Type of formatMarkup language
Extended fromXML, HTML
StandardW3C HTML5 (Recommendation)
Open format?Yes

Extensible Hypertext Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used Hypertext Markup Language (HTML), the language in which Web pages are formulated.

While HTML, prior to HTML5, was defined as an application of Standard Generalized Markup Language (SGML), a flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. XHTML documents are well-formed and may therefore be parsed using standard XML parsers, unlike HTML, which requires a lenient HTML-specific parser.

XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. The standard known as XHTML5 is being developed as an XML adaptation of the HTML5 specification.

Overview

XHTML 1.0 is "a reformulation of the three HTML 4 document types as applications of XML 1.0". The World Wide Web Consortium (W3C) also continues to maintain the HTML 4.01 Recommendation, and the specifications for HTML5 and XHTML5 are being actively developed. In the current XHTML 1.0 Recommendation document, as published and revised to August 2002, the W3C commented that, "The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility."

However, in 2005, the Web Hypertext Application Technology Working Group (WHATWG) formed, independently of the W3C, to work on advancing ordinary HTML not based on XHTML. The WHATWG eventually began working on a standard that supported both XML and non-XML serializations, HTML5, in parallel to W3C standards such as XHTML 2. In 2007, the W3C's HTML working group voted to officially recognize HTML5 and work on it as the next-generation HTML standard. In 2009, the W3C allowed the XHTML 2 Working Group's charter to expire, acknowledging that HTML5 would be the sole next-generation HTML standard, including both XML and non-XML serializations. Of the two serializations, the W3C suggests that most authors use the HTML syntax, rather than the XHTML syntax.

Motivation

XHTML was developed to make HTML more extensible and increase interoperability with other data formats. In addition, browsers were forgiving of errors in HTML, and most websites were displayed despite technical errors in the markup; XHTML introduced stricter error handling. HTML 4 was ostensibly an application of Standard Generalized Markup Language (SGML); however the specification for SGML was complex, and neither web browsers nor the HTML 4 Recommendation were fully conformant to it. The XML standard, approved in 1998, provided a simpler data format closer in simplicity to HTML 4. By shifting to an XML format, it was hoped HTML would become compatible with common XML tools; servers and proxies would be able to transform content, as necessary, for constrained devices such as mobile phones. By using namespaces, XHTML documents could provide extensibility by including fragments from other XML-based languages such as Scalable Vector Graphics and MathML. Finally, the renewed work would provide an opportunity to divide HTML into reusable components (XHTML Modularization) and clean up untidy parts of the language.

Relationship to HTML

There are various differences between XHTML and HTML. The Document Object Model (DOM) is a tree structure that represents the page internally in applications, and XHTML and HTML are two different ways of representing that in markup (serializations). Both are less expressive than the DOM (for example, "--" may be placed in comments in the DOM, but cannot be represented in a comment in either XHTML or HTML), and generally XHTML's XML syntax is a little more expressive than HTML (for example, arbitrary namespaces are not allowed in HTML). First off, one source of differences is immediate: XHTML uses an XML syntax, while HTML uses a pseudo-SGML syntax (officially SGML for HTML 4 and under, but never in practice, and standardised away from SGML in HTML5). Secondly however, because the expressible contents of the DOM in syntax are slightly different, there are some changes in actual behavior between the two models.

First, there are some differences in syntax:
  • Broadly, the XML rules require that all elements be closed, either by a separate closing tag or using self-closing syntax (e.g.
    ), while HTML syntax permits some elements to be unclosed because either they are always empty or their end can be determined implicitly ("omissibility", e.g. );
  • XML is case-sensitive for element and attribute names, while HTML is not;
  • Some shorthand features in HTML are omitted in XML, such as (1) attribute minimization, where attribute values or their quotes may be omitted (e.g. or , while in XML this must be expressed as selected="selected"); (2) element minimization may be used to remove elements entirely (such as inferred in a table if not given); and (3) the rarely used SGML syntax for element minimization ("shorttag"), which most browsers do not implement;
  • There are numerous other technical requirements surrounding namespaces and precise parsing of whitespace and certain characters and elements. The exact parsing of HTML in practice has been undefined until recently; see the HTML5 specification ([HTML5]) for full details, or the working summary (HTML vs. XHTML).
Second, in contrast to these minor syntactical differences, there are some behavioral differences, mostly arising from the underlying differences in serialization. For example:
  • Most prominently, behavior on parse errors differ. A fatal parse error in XML (such as an incorrect tag structure) causes document processing to be aborted;
  • Most content requiring namespaces will not work in HTML, except the built-in support for SVG and MathML in the HTML5 parser along with certain magic prefixes such as xlink;
  • JavaScript processing is a little different in XHTML, with minor changes in case sensitivity to some functions, and further precautions to restrict processing to well-formed content. Scripts must not use the document.write() method; it is not available for XHTML. The innerHTML property is available, but will not insert non-well-formed content. On the other hand, it can be used to insert well-formed namespaced content into XHTML;
  • Cascading Style Sheets (CSS) is also applied slightly differently. Due to XHTML's case-sensitivity, all CSS selectors become case-sensitive for XHTML documents. Some CSS properties, such as backgrounds, set on the element in HTML are 'inherited upwards' into the element; this appears not to be the case for XHTML.

Adoption

The similarities between HTML 4.01 and XHTML 1.0 led many web sites and content management systems to adopt the initial W3C XHTML 1.0 Recommendation. To aid authors in the transition, the W3C provided guidance on how to publish XHTML 1.0 documents in an HTML-compatible manner, and serve them to browsers that were not designed for XHTML.

Such "HTML-compatible" content is sent using the HTML media type (text/html) rather than the official Internet media type for XHTML (application/xhtml+xml). When measuring the adoption of XHTML to that of regular HTML, therefore, it is important to distinguish whether it is media type usage or actual document contents that is being compared.

Most web browsers have mature support for all of the possible XHTML media types. The notable exception is Internet Explorer versions 8 and earlier by Microsoft; rather than rendering application/xhtml+xml content, a dialog box invites the user to save the content to disk instead. Both Internet Explorer 7 (released in 2006) and Internet Explorer 8 (released in March 2009) exhibit this behavior. Microsoft developer Chris Wilson explained in 2005 that IE7’s priorities were improved browser security and CSS support, and that proper XHTML support would be difficult to graft onto IE’s compatibility-oriented HTML parser; however, Microsoft added support for true XHTML in IE9.

As long as support is not widespread, most web developers avoid using XHTML that is not HTML-compatible, so advantages of XML such as namespaces, faster parsing and smaller-footprint browsers do not benefit the user.

Criticism

In the early 2000s, some Web developers began to question why Web authors ever made the leap into authoring in XHTML. Others countered that the problems ascribed to the use of XHTML could mostly be attributed to two main sources: the production of invalid XHTML documents by some Web authors and the lack of support for XHTML built into Internet Explorer 6. They went on to describe the benefits of XML-based Web documents (i.e. XHTML) regarding searching, indexing and parsing as well as future-proofing the Web itself.

In October 2006, HTML inventor and W3C chair Tim Berners-Lee, introducing a major W3C effort to develop a new HTML specification, posted in his blog that, "The attempt to get the world to switch to XML … all at once didn't work. The large HTML-generating public did not move … Some large communities did shift and are enjoying the fruits of well-formed systems … The plan is to charter a completely new HTML group." The current HTML5 working draft says "special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability … while at the same time updating the HTML specifications to address issues raised in the past few years." Ian Hickson, editor of the HTML5 specification criticising the improper use of XHTML in 2002, is a member of the group developing this specification and is listed as one of the co-editors of the current working draft.

Simon Pieters researched the XML-compliance of mobile browsers and concluded “the claim that XHTML would be needed for mobile devices is simply a myth”.

Versions of XHTML

XHTML 1.0

In earlier times, Wikipedia used the XHTML 1.0 Transitional doctype and syntax, though it was not served as XHTML

December 1998 saw the publication of a W3C Working Draft entitled Reformulating HTML in XML. This introduced Voyager, the codename for a new markup language based on HTML 4, but adhering to the stricter syntax rules of XML. By February 1999 the name of the specification had changed to XHTML 1.0: The Extensible HyperText Markup Language, and in January 2000 it was officially adopted as a W3C Recommendation. There are three formal DTDs for XHTML 1.0, corresponding to the three different versions of HTML 4.01:
  1. XHTML 1.0 Strict is the XML equivalent to strict HTML 4.01, and includes elements and attributes that have not been marked deprecated in the HTML 4.01 specification. As of November 2015, XHTML 1.0 Strict is the document type used for the homepage of the website of the World Wide Web Consortium;
  2. XHTML 1.0 Transitional is the XML equivalent of HTML 4.01 Transitional, and includes the presentational elements (such as center, font and strike) excluded from the strict version;
  3. XHTML 1.0 Frameset is the XML equivalent of HTML 4.01 Frameset, and allows for the definition of frameset documents—a common Web feature in the late 1990s.
The second edition of XHTML 1.0 became a W3C Recommendation in August 2002.

Modularization of XHTML

Modularization provides an abstract collection of components through which XHTML can be subsetted and extended. The feature is intended to help XHTML extend its reach onto emerging platforms, such as mobile devices and Web-enabled televisions. The initial draft of Modularization of XHTML became available in April 1999, and reached Recommendation status in April 2001.

The first modular XHTML variants were XHTML 1.1 and XHTML Basic 1.0.

In October 2008 Modularization of XHTML was superseded by XHTML Modularization 1.1, which adds an XML Schema implementation. It was itself superseded by a second edition in July 2010.

XHTML 1.1: Module-based XHTML

XHTML 1.1 evolved out of the work surrounding the initial Modularization of XHTML specification. The W3C released a first draft in September 1999; Recommendation status was reached in May 2001. The modules combined within XHTML 1.1 effectively recreate XHTML 1.0 Strict, with the addition of ruby annotation elements (ruby, rbc, rtc, rb, rt and rp) to better support East-Asian languages. Other changes include removal of the name attribute from the a and map elements, and (in the first edition of the language) removal of the lang attribute in favour of xml:lang.

Although XHTML 1.1 is largely compatible with XHTML 1.0 and HTML 4, in August 2002 the Working Group issued a formal Note advising that it should not be transmitted with the HTML media type. With limited browser support for the alternate application/xhtml+xml media type, XHTML 1.1 proved unable to gain widespread use. In January 2009 a second edition of the document (XHTML Media Types – Second Edition) was issued, relaxing this restriction and allowing XHTML 1.1 to be served as text/html.

A second edition of XHTML 1.1 was issued on 23 November 2010, which addresses various errata and adds an XML Schema implementation not included in the original specification. (It was first released briefly on 7 May 2009 as a "Proposed Edited Recommendation" before being rescinded on 19 May due to unresolved issues.)

XHTML Basic

Since information appliances may lack the system resources to implement all XHTML abstract modules, the W3C defined a feature-limited XHTML specification called XHTML Basic. It provides a minimal feature subset sufficient for the most common content-authoring. The specification became a W3C recommendation on December 2000.

Of all the versions of XHTML, XHTML Basic 1.0 provides the fewest features. With XHTML 1.1, it is one of the two first implementations of modular XHTML. In addition to the Core Modules (Structure, Text, Hypertext, and List), it implements the following abstract modules: Base, Basic Forms, Basic Tables, Image, Link, Metainformation, Object, Style Sheet, and Target.

XHTML Basic 1.1 replaces the Basic Forms Module with the Forms Module, and adds the Intrinsic Events, Presentation, and Scripting modules. It also supports additional tags and attributes from other modules. This version became a W3C recommendation on 29 July 2008.

The current version of XHTML Basic is 1.1 Second Edition (23 November 2010), in which the language is re-implemented in the W3C's XML Schema language. This version also supports the lang attribute.

XHTML-Print

XHTML-Print, which became a W3C Recommendation in September 2006, is a specialized version of XHTML Basic designed for documents printed from information appliances to low-end printers.

XHTML Mobile Profile

XHTML Mobile Profile (abbreviated XHTML MP or XHTML-MP) is a third-party variant of the W3C's XHTML Basic specification. Like XHTML Basic, XHTML was developed for information appliances with limited system resources.
In October 2001, a limited company called the Wireless Application Protocol Forum began adapting XHTML Basic for WAP 2.0, the second major version of the Wireless Application Protocol. WAP Forum based their DTD on the W3C's Modularization of XHTML, incorporating the same modules the W3C used in XHTML Basic 1.0—except for the Target Module. Starting with this foundation, the WAP Forum replaced the Basic Forms Module with a partial implementation of the Forms Module, added partial support for the Legacy and Presentation modules, and added full support for the Style Attribute Module.

In 2002, the WAP Forum was subsumed into the Open Mobile Alliance (OMA), which continued to develop XHTML Mobile Profile as a component of their OMA Browsing Specification.

XHTML Mobile Profile 1.1

To this version, finalized in 2004, the OMA added partial support for the Scripting Module, and partial support for Intrinsic Events. XHTML MP 1.1 is part of v2.1 of the OMA Browsing Specification (1 November 2002).

XHTML Mobile Profile 1.2

This version, finalized 27 February 2007, expands the capabilities of XHTML MP 1.1 with full support for the Forms Module and OMA Text Input Modes. XHTML MP 1.2 is part of v2.3 of the OMA Browsing Specification (13 March 2007).

XHTML Mobile Profile 1.3

XHTML MP 1.3 (finalized on 23 September 2008) uses the XHTML Basic 1.1 document type definition, which includes the Target Module. Events in this version of the specification are updated to DOM Level 3 specifications (i.e., they are platform- and language-neutral).

XHTML 1.2

The XHTML 2 Working Group considered the creation of a new language based on XHTML 1.1. If XHTML 1.2 was created, it would include WAI-ARIA and role attributes to better support accessible web applications, and improved Semantic Web support through RDFa. The inputmode attribute from XHTML Basic 1.1, along with the target attribute (for specifying frame targets) might also be present. The XHTML2 WG had not been chartered to carry out the development of XHTML1.2. Since the W3C announced that it does not intend to recharter the XHTML2 WG, and closed the WG in December 2010, this means that XHTML 1.2 proposal would not eventuate.

XHTML 2.0

Between August 2002 and July 2006, the W3C released eight Working Drafts of XHTML 2.0, a new version of XHTML able to make a clean break from the past by discarding the requirement of backward compatibility. This lack of compatibility with XHTML 1.x and HTML 4 caused some early controversy in the web developer community. Some parts of the language (such as the role and RDFa attributes) were subsequently split out of the specification and worked on as separate modules, partially to help make the transition from XHTML 1.x to XHTML 2.0 smoother. A ninth draft of XHTML 2.0 was expected to appear in 2009, but on July 2, 2009, the W3C decided to let the XHTML2 Working Group charter expire by that year's end, effectively halting any further development of the draft into a standard. Instead, XHTML 2.0 and its related documents were released as W3C Notes.

New features to have been introduced by XHTML 2.0 included:
  • HTML forms were to be replaced by XForms, an XML-based user input specification allowing forms to be displayed appropriately for different rendering devices;
  • HTML frames were to be replaced by XFrames;
  • The DOM Events were to be replaced by XML Events, which uses the XML Document Object Model;
  • A new list element type, the nl element type, were to be included to specifically designate a list as a navigation list. This would have been useful in creating nested menus, which are currently created by a wide variety of means like nested unordered lists or nested definition lists.
  • Any element was to be able to act as a hyperlink, e. g.,
  • Articles
  • , similar to XLink. However, XLink itself is not compatible with XHTML due to design differences;
  • Any element was to be able to reference alternative media with the src attribute, e. g.,
    London Bridge
    is the same as

    London Bridge

    ;
  • The alt attribute of the img element was removed: alternative text was to be given in the content of the img element, much like the object element, e. g., HMS Audacious;
  • A single heading element (h) was added. The level of these headings was determined by the depth of the nesting. This would have allowed the use of headings to be infinite, rather than limiting use to six levels deep;
  • The remaining presentational elements i, b and tt, still allowed in XHTML 1.x (even Strict), were to be absent from XHTML 2.0. The only somewhat presentational elements remaining were to be sup and sub for superscript and subscript respectively, because they have significant non-presentational uses and are required by certain languages. All other tags were meant to be semantic instead (e. g. strong for strong emphasis) while allowing the user agent to control the presentation of elements via CSS (e.g. rendered as boldface text in most visual browsers, but possibly rendered with changes of tone in a text-to-speech reader, larger + italic font per rules in a user-end stylesheet, etc.);
  • The addition of RDF triple with the property and about attributes to facilitate the conversion from XHTML to RDF/XML.

XHTML5

HTML5 initially grew independently of the W3C, through a loose group of browser manufacturers and other interested parties calling themselves the WHATWG, or Web Hypertext Application Technology Working Group. The key motive of the group was to create a platform for dynamic web applications; they considered XHTML 2.0 to be too document-centric, and not suitable for the creation of internet forum sites or online shops.

HTML5 has both a regular text/html serialization and an XML serialization, which is also known as XHTML5. The language is more compatible with HTML 4 and XHTML 1.x than XHTML 2.0, due to the decision to keep the existing HTML form elements and events model. It adds many new elements not found in XHTML 1.x, however, such as section and aside tags.

Semantic content in XHTML

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. This host language is one of the techniques used to develop Semantic Web content by embedding rich semantic markup.

Valid XHTML documents

An XHTML document that conforms to an XHTML specification is said to be valid. Validity assures consistency in document code, which in turn eases processing, but does not necessarily ensure consistent rendering by browsers. A document can be checked for validity with the W3C Markup Validation Service. In practice, many web development programs provide code validation based on the W3C standards.

Root element

The root element of an XHTML document must be html, and must contain an xmlns attribute to associate it with the XHTML namespace. The namespace URI for XHTML is http://www.w3.org/1999/xhtml. The example tag below additionally features an xml:lang attribute to identify the document with a natural language:

 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

DOCTYPEs

In order to validate an XHTML document, a Document Type Declaration, or DOCTYPE, may be used. A DOCTYPE declares to the browser the Document Type Definition (DTD) to which the document conforms. A Document Type Declaration should be placed before the root element. The system identifier part of the DOCTYPE, which in these examples is the URL that begins with http://, need only point to a copy of the DTD to use, if the validator cannot locate one based on the public identifier (the other quoted string). It does not need to be the specific URL that is in these examples; in fact, authors are encouraged to use local copies of the DTD files when possible. The public identifier, however, must be character-for-character the same as in the examples.

XML declaration

A character encoding may be specified at the beginning of an XHTML document in the XML declaration when the document is served using the application/xhtml+xml MIME type. (If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.)
For example:
The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode, if it encounters an XML declaration in a document served as text/html.

Backward compatibility

XHTML 1.x documents are mostly backward compatible with HTML 4 user agents when the appropriate guidelines are followed. XHTML 1.1 is essentially compatible, although the elements for ruby annotation are not part of the HTML 4 specification and thus generally ignored by HTML 4 browsers. Later XHTML 1.x modules such as those for the role attribute, RDFa and WAI-ARIA degrade gracefully in a similar manner.

XHTML 2.0 is significantly less compatible, although this can be mitigated to some degree through the use of scripting. (This can be simple one-liners, such as the use of “document.createElement()” to register a new HTML element within Internet Explorer, or complete JavaScript frameworks, such as the FormFaces implementation of XForms.

Examples

The following are examples of XHTML 1.0 Strict, with both having the same visual output. The former one follows the HTML Compatibility Guidelines of the XHTML Media Types Note while the latter one breaks backward compatibility, but provides cleaner markup.

Media type recommendation (in RFC 2119 terms) for the example:
Media type Example 1 Example 2
application/xhtml+xml SHOULD SHOULD
application/xml MAY MAY
text/xml MAY MAY
text/html MAY SHOULD NOT

Example 1.


  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 
  http-equiv="Content-Type" content="text/html; charset=utf-8"/>
 </span>XHTML 1.0 Strict Example<span class="nt">
 
 
onload="loadpdf()"> This is an example of an title="Extensible HyperText Markup Language">XHTML 1.0 Strict document.
id="validation-icon" src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Strict"/> id="pdf-object" name="pdf-object" type="application/pdf" data="http://www.w3.org/TR/xhtml1/xhtml1.pdf" width="100%" height="500">

Example 2.
 


  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
 
 </span>XHTML 1.0 Strict Example<span class="nt">
 
 
onload="loadpdf()"> This is an example of an title="Extensible HyperText Markup Language">XHTML 1.0 Strict document.
id="validation-icon" src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Strict"/> id="pdf-object" type="application/pdf" data="http://www.w3.org/TR/xhtml1/xhtml1.pdf" width="100%" height="500">

Notes:
  1. The "loadpdf" function is actually a workaround for Internet Explorer. It can be replaced by adding within ;
  2. The img element does not get a name attribute in the XHTML 1.0 Strict DTD. Use id instead.

Cross-compatibility of XHTML and HTML

HTML5 and XHTML5 serializations are largely inter-compatible if adhering to the stricter XHTML5 syntax, but there are some cases in which XHTML will not work as valid HTML5 (e.g., processing instructions are deprecated in HTML, are treated as comments, and close on the first ">", whereas they are fully allowed in XML, are treated as their own type, and close on "?>").

Reproductive rights

From Wikipedia, the free encyclo...