A Medley of Potpourri

Wednesday, October 9, 2019

Data type

From Wikipedia, the free encyclopedia

Python 3: the standard type hierarchy

In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support common data types of real, integer and boolean. A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored. A type of value from which an expression may take its value.

Concept

Data types are used within type systems, which offer various ways of defining, implementing and using them. Different type systems ensure varying degrees of type safety.

Almost all programming languages explicitly include the notion of data type, though different languages may use different terminology. Common data types include:

For example, in the Java programming language, the type int represents the set of 32-bit integers ranging in value from −2,147,483,648 to 2,147,483,647, as well as the operations that can be performed on integers, such as addition, subtraction, and multiplication. Colors, on the other hand, are represented by three bytes denoting the amounts each of red, green, and blue, and one string representing that color's name; allowable operations include addition and subtraction, but not multiplication.

Most programming languages also allow the programmer to define additional data types, usually by combining multiple elements of other types and defining the valid operations of the new data type. For example, a programmer might create a new data type named "complex number" that would include real and imaginary parts. A data type also represents a constraint placed upon the interpretation of data in a type system, describing representation, interpretation and structure of values or objects stored in computer memory. The type system uses data type information to check correctness of computer programs that access or manipulate the data.

Most data types in statistics have comparable types in computer programming, and vice versa, as shown in the following table:

Statistics	Programming
real-valued (interval scale)	floating-point
real-valued (ratio scale)	floating-point
count data (usually non-negative)	integer
binary data	Boolean
categorical data	enumerated type
random vector	list or array
random matrix	two-dimensional array
random tree	tree

Definition

(Parnas, Shore & Weiss 1976) identified five definitions of a "type" that were used—sometimes implicitly—in the literature. Types including behavior align more closely with object-oriented models, whereas a structured programming model would tend to not include code, and are called plain old data structures.

The five types are:

Syntactic: A type is a purely syntactic label associated with a variable when it is declared. Such definitions of "type" do not give any semantic meaning to types.^{[clarification needed]}
Representation: A type is defined in terms of its composition of more primitive types—often machine types.
Representation and behaviour: A type is defined as its representation and a set of operators manipulating these representations.
Value space: A type is a set of possible values which a variable can possess. Such definitions make it possible to speak about (disjoint) unions or Cartesian products of types.
Value space and behaviour: A type is a set of values which a variable can possess and a set of functions that one can apply to these values.

The definition in terms of a representation was often done in imperative languages such as ALGOL and Pascal, while the definition in terms of a value space and behaviour was used in higher-level languages such as Simula and CLU.

Classes of data types

Primitive data types

Primitive data types are typically types that are built-in or basic to a language implementation.

Machine data types

All data in computers based on digital electronics is represented as bits (alternatives 0 and 1) on the lowest level. The smallest addressable unit of data is usually a group of bits called a byte (usually an octet, which is 8 bits). The unit processed by machine code instructions is called a word (as of 2011, typically 32 or 64 bits). Most instructions interpret the word as a binary number, such that a 32-bit word can represent unsigned integer values from 0 to

2^{32}-1

or signed integer values from

-2^{31}

2^{31}-1

. Because of two's complement, the machine language and machine doesn't need to distinguish between these unsigned and signed data types for the most part.

Floating point numbers used for floating point arithmetic use a different interpretation of the bits in a word. See: https://en.wikipedia.org/wiki/Floating-point_arithmetic for details.

Machine data types need to be exposed or made available in systems or low-level programming languages, allowing fine-grained control over hardware. The C programming language, for instance, supplies integer types of various widths, such as short and long. If a corresponding native type does not exist on the target platform, the compiler will break them down into code using types that do exist. For instance, if a 32-bit integer is requested on a 16 bit platform, the compiler will tacitly treat it as an array of two 16 bit integers.

In higher level programming, machine data types are often hidden or abstracted as an implementation detail that would render code less portable if exposed. For instance, a generic numeric type might be supplied instead of integers of some specific bit-width.

Boolean type

The Boolean type represents the values true and false. Although only two values are possible, they are rarely implemented as a single binary digit for efficiency reasons. Many programming languages do not have an explicit Boolean type, instead interpreting (for instance) 0 as false and other values as true. Boolean data refers to the logical structure of how the language is interpreted to the machine language. In this case a Boolean 0 refers to the logic False. True is always a non zero, especially a one which is known as Boolean 1.

Numeric types

Such as:

The integer data types, or "non-fractional numbers". May be sub-typed according to their ability to contain negative values (e.g. unsigned in C and C++). May also have a small number of predefined subtypes (such as short and long in C/C++); or allow users to freely define subranges such as 1..12 (e.g. Pascal/Ada).
Floating point data types, usually represent values as high-precision fractional values (rational numbers, mathematically), but are sometimes misleadingly called reals (evocative of mathematical real numbers). They usually have predefined limits on both their maximum values and their precision. Typically stored internally in the form $a \times 2 b$ (where $a$ and $b$ are integers), but displayed in familiar decimal form.
Fixed point data types are convenient for representing monetary values. They are often implemented internally as integers, leading to predefined limits.
Bignum or arbitrary precision numeric types lack predefined limits. They are not primitive types, and are used sparingly for efficiency reasons.

Composite types

Composite types are derived from more than one primitive type. This can be done in a number of ways. The ways they are combined are called data structures. Composing a primitive type into a compound type generally results in a new type, e.g. array-of-integer is a different type to integer.

An array (also called vector, list, or sequence) stores a number of elements and provide random access to individual elements. The elements of an array are typically (but not in all contexts) required to be of the same type. Arrays may be fixed-length or expandable. Indices into an array are typically required to be integers (if not, one may stress this relaxation by speaking about an associative array) from a specific range (if not all indices in that range correspond to elements, it may be a sparse array).
Record (also called tuple or struct) Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and an integer; whereas, in a union, there is only one type allowed at a time.
- A tagged union (also called a variant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type for enhanced type safety.
A set is an abstract data structure that can store certain values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in".
An object contains a number of data fields, like a record, and also a number of subroutines for accessing or modifying them, called methods.

Many others are possible, but they tend to be further variations and compounds of the above. For example a linked list can store the same data as an array, but provides sequential access rather than random and is built up of records in dynamic memory; though arguably a data structure rather than a type per se, it is also common and distinct enough that including it in a discussion of composite types can be justified.

Enumerations

The enumerated type has distinct values, which can be compared and assigned, but which do not necessarily have any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily. For example, the four suits in a deck of playing cards may be four enumerators named CLUB, DIAMOND, HEART, SPADE, belonging to an enumerated type named suit. If a variable V is declared having suit as its data type, one can assign any of those four values to it. Some implementations allow programmers to assign integer values to the enumeration values, or even treat them as type-equivalent to integers.

String and text types

Such as:

Alphanumeric character. A letter of the alphabet, digit, blank space, punctuation mark, etc.
Alphanumeric strings, a sequence of characters. They are typically used to represent words and text.

Character and string types can store sequences of characters from a character set such as ASCII. Since most character sets include the digits, it is possible to have a numeric string, such as "1234". However, many languages treat these as belonging to a different type to the numeric value 1234.

Character and string types can have different subtypes according to the required character "width". The original 7-bit wide ASCII was found to be limited, and superseded by 8 and 16-bit sets, which can encode a wide variety of non-Latin alphabets (Hebrew, Chinese) and other symbols. Strings may be either stretch-to-fit or of fixed size, even in the same programming language. They may also be subtyped by their maximum size.

Note: strings are not a primitive in all languages, for instance in C, they are composed from an array of characters.

Other types

Types can be based on, or derived from, the basic types explained above. In some languages, such as C, functions have a type derived from the type of their return value.

Pointers and references

The main non-composite, derived type is the pointer, a data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. It is a primitive kind of reference. (In everyday terms, a page number in a book could be considered a piece of data that refers to another one). Pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same.

This section needs expansion. You can help by adding to it. (October 2012)

Abstract data types

Any type that does not specify an implementation is an abstract data type. For instance, a stack (which is an abstract type) can be implemented as an array (a contiguous block of memory containing multiple values), or as a linked list (a set of non-contiguous memory blocks linked by pointers).

Abstract types can be handled by code that does not know or "care" what underlying types are contained in them. Programming that is agnostic about concrete data types is called generic programming. Arrays and records can also contain underlying types, but are considered concrete because they specify how their contents or elements are laid out in memory.

Examples include:

A queue is a first-in first-out list. Variations are Deque and Priority queue.
A set can store certain values, without any particular order, and with no repeated values.
A stack is a last-in, first out data structure.
A tree is a hierarchical structure.
A graph.
A list.
A hash, dictionary, map or associative array is a more flexible variation on a record, in which name-value pairs can be added and deleted freely.
A smart pointer is the abstract counterpart to a pointer. Both are kinds of references.

Utility types

For convenience, high-level languages may supply ready-made "real world" data types, for instance times, dates and monetary values and memory, even where the language allows them to be built from primitive types.

Type systems

A type system associates types with computed values. By examining the flow of these values, a type system attempts to prove that no type errors can occur. The type system in question determines what constitutes a type error, but a type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation does not make sense.

A compiler may use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the float data type, for example, is represented in 32 bits, in accord with the IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific microprocessor operations on those values (floating-point addition, multiplication, etc.).

The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design.

Type systems may be variously static or dynamic, strong or weak typing, and so forth.

Byte

From Wikipedia, the free encyclopedia

byte
Unit system	units derived from bit
Unit of	digital information, data size
Symbol	B or (when referring to exactly 8 bits) o

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures.

The size of the byte has historically been hardware dependent and no definitive standards existed that mandated the size. Sizes from 1 to 48 bits have been used. The six-bit character code was an often used implementation in early encoding systems and computers using six-bit and nine-bit bytes were common in the 1960s. These systems often had memory words of 12, 24, 36, 48, or 60 bits, corresponding to 2, 4, 6, 8, or 10 six-bit bytes. In this era, bit groupings in the instruction stream were often referred to as syllables, before the term byte became common.

The modern de facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the binary-encoded values 0 through 255 for one byte—2 to the power 8 is 256. The international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits and processor designers optimize for this common usage. The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the eight-bit size. Modern architectures typically use 32- or 64-bit words, built of four or eight bytes.

The unit symbol for the byte was designated as the upper-case letter B by the International Electrotechnical Commission (IEC) and Institute of Electrical and Electronics Engineers (IEEE) in contrast to the bit, whose IEEE symbol is a lower-case b. Internationally, the unit octet, symbol o, explicitly defines a sequence of eight bits, eliminating the ambiguity of the byte.

History

The term byte was coined by Werner Buchholz in June 1956, during the early design phase for the IBM Stretch computer, which had addressing to the bit and variable field length (VFL) instructions with a byte size encoded in the instruction. It is a deliberate respelling of bite to avoid accidental mutation to bit.

Another origin of byte for bit groups smaller than a computers's word size, and in particular groups of four bits, is on record by Louis G. Dooley, who claimed he coined the term while working with Jules Schwartz and Dick Beeler on an air defense system called SAGE at MIT Lincoln Laboratory in 1956 or 1957, which was jointly developed by Rand, MIT, and IBM. Later on, Schwartz's language JOVIAL actually used the term, but the author recalled vaguely that it was derived from AN/FSQ-31.

Early computers used a variety of four-bit binary-coded decimal (BCD) representations and the six-bit codes for printable graphic patterns common in the U.S. Army (FIELDATA) and Navy. These representations included alphanumeric characters and special graphical symbols. These sets were expanded in 1963 to seven bits of coding, called the American Standard Code for Information Interchange (ASCII) as the Federal Information Processing Standard, which replaced the incompatible teleprinter codes in use by different branches of the U.S. government and universities during the 1960s. ASCII included the distinction of upper- and lowercase alphabets and a set of control characters to facilitate the transmission of written language as well as printing device functions, such as page advance and line feed, and the physical or logical control of data flow over the transmission media. During the early 1960s, while also active in ASCII standardization, IBM simultaneously introduced in its product line of System/360 the eight-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), an expansion of their six-bit binary-coded decimal (BCDIC) representations used in earlier card punches. The prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size, while in detail the EBCDIC and ASCII encoding schemes are different.

In the early 1960s, AT&T introduced digital telephony on long-distance trunk lines. These used the eight-bit µ-law encoding. This large investment promised to reduce transmission costs for eight-bit data.

The development of eight-bit microprocessors in the 1970s popularized this storage size. Microprocessors such as the Intel 8008, the direct predecessor of the 8080 and the 8086, used in early personal computers, could also perform a small number of operations on the four-bit pairs in a byte, such as the decimal-add-adjust (DAA) instruction. A four-bit quantity is often called a nibble, also nybble, which is conveniently represented by a single hexadecimal digit.

The term octet is used to unambiguously specify a size of eight bits. It is used extensively in protocol definitions.

Historically, the term octad or octade was used to denote eight bits as well at least in Western Europe; however, this usage is no longer common. The exact origin of the term is unclear, but it can be found in British, Dutch, and German sources of the 1960s and 1970s, and throughout the documentation of Philips mainframe computers.

Unit symbol

Prefixes for multiples of
bits (bit) or bytes (B)

Decimal
Value		SI
1000	10³	k	kilo
1000²	10⁶	M	mega
1000³	10⁹	G	giga
1000⁴	10¹²	T	tera
1000⁵	10¹⁵	P	peta
1000⁶	10¹⁸	E	exa
1000⁷	10²¹	Z	zetta
1000⁸	10²⁴	Y	yotta

Binary
Value		IEC		JEDEC
1024	2¹⁰	Ki	kibi	K	kilo
1024²	2²⁰	Mi	mebi	M	mega
1024³	2³⁰	Gi	gibi	G	giga
1024⁴	2⁴⁰	Ti	tebi	–
1024⁵	2⁵⁰	Pi	pebi	–
1024⁶	2⁶⁰	Ei	exbi	–
1024⁷	2⁷⁰	Zi	zebi	–
1024⁸	2⁸⁰	Yi	yobi	–

The unit symbol for the byte is specified in IEC 80000-13, IEEE 1541 and the Metric Interchange Format as the upper-case character B. In contrast, IEEE 1541 specifies the lower case character b as the symbol for the bit, but IEC 80000-13 and Metric-Interchange-Format specify the symbol as bit, providing disambiguation from B for byte.

In the International System of Quantities (ISQ), B is the symbol of the bel, a unit of logarithmic power ratios named after Alexander Graham Bell, creating a conflict with the IEC specification. However, little danger of confusion exists, because the bel is a rarely used unit. It is used primarily in its decadic fraction, the decibel (dB), for signal strength and sound pressure level measurements, while a unit for one tenth of a byte, the decibyte, and other fractions, are only used in derived units, such as transmission rates.

The lowercase letter o for octet is defined as the symbol for octet in IEC 80000-13 and is commonly used in languages such as French and Romanian, and is also combined with metric prefixes for multiples, for example ko and Mo.

The usage of the term octad(e) for eight bits is no longer common.

Unit multiples

Percentage difference between decimal and binary interpretations of the unit prefixes grows with increasing storage size

Despite standardization efforts, ambiguity still exists in the meanings of the SI (or metric) prefixes used with the unit byte, especially concerning the prefixes kilo (k or K), mega (M), and giga (G). Computer memory has a binary architecture in which multiples are expressed in powers of 2. In some fields of the software and computer hardware industries a binary prefix is used for bytes and bits, while producers of computer storage devices practice adherence to decimal SI multiples. For example, a computer disk drive capacity of 100 gigabytes is specified when the disk contains 100 billion bytes (93 gibibytes) of storage space.

While the numerical difference between the decimal and binary interpretations is relatively small for the prefixes kilo and mega, it grows to over 20% for prefix yotta. The linear–log graph illustrates the difference versus storage size up to an exabyte.

Common uses

Many programming languages defined the data type byte.

The C and C++ programming languages define byte as an "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). The C standard requires that the integral data type unsigned char must hold at least 256 different values, and is represented by at least eight bits (clause 5.2.4.2.1). Various implementations of C and C++ reserve 8, 9, 16, 32, or 36 bits for the storage of a byte. In addition, the C and C++ standards require that there are no "gaps" between two bytes. This means every bit in memory is part of a byte.

Java's primitive byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from −128 to 127.

.NET programming languages, such as C#, define both an unsigned byte and a signed sbyte, holding values from 0 to 255, and −128 to 127, respectively.

In data transmission systems, the byte is defined as a contiguous sequence of bits in a serial data stream representing the smallest distinguished unit of data. A transmission unit might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.

Binary prefix

From Wikipedia, the free encyclopedia

Prefixes for multiples of
bits (bit) or bytes (B)

Decimal
Value		SI
1000	10³	k	kilo
1000²	10⁶	M	mega
1000³	10⁹	G	giga
1000⁴	10¹²	T	tera
1000⁵	10¹⁵	P	peta
1000⁶	10¹⁸	E	exa
1000⁷	10²¹	Z	zetta
1000⁸	10²⁴	Y	yotta

Binary
Value		IEC		JEDEC
1024	2¹⁰	Ki	kibi	K	kilo
1024²	2²⁰	Mi	mebi	M	mega
1024³	2³⁰	Gi	gibi	G	giga
1024⁴	2⁴⁰	Ti	tebi	–
1024⁵	2⁵⁰	Pi	pebi	–
1024⁶	2⁶⁰	Ei	exbi	–
1024⁷	2⁷⁰	Zi	zebi	–
1024⁸	2⁸⁰	Yi	yobi	–

A binary prefix is a unit prefix for multiples of units in data processing, data transmission, and digital information, notably the bit and the byte, to indicate multiplication by a power of 2.

The computer industry has historically used the units kilobyte, megabyte, and gigabyte, and the corresponding symbols KB, MB, and GB, in at least two slightly different measurement systems. In citations of main memory (RAM) capacity, gigabyte customarily means 1073741824 bytes. As this is a power of 1024, and 1024 is a power of two (2¹⁰), this usage is referred to as a binary measurement.

In most other contexts, the industry uses the multipliers kilo, mega, giga, etc., in a manner consistent with their meaning in the International System of Units (SI), namely as powers of 1000. For example, a 500 gigabyte hard disk holds 500000000000 bytes, and a 1 Gbit/s (gigabit per second) Ethernet connection transfers data at 1000000000 bit/s. In contrast with the binary prefix usage, this use is described as a decimal prefix, as 1000 is a power of 10 (10³).

The use of the same unit prefixes with two different meanings has caused confusion. Starting around 1998, the International Electrotechnical Commission (IEC) and several other standards and trade organizations addressed the ambiguity by publishing standards and recommendations for a set of binary prefixes that refer exclusively to powers of 1024. Accordingly, the US National Institute of Standards and Technology (NIST) requires that SI prefixes only be used in the decimal sense: kilobyte and megabyte denote one thousand bytes and one million bytes respectively (consistent with SI), while new terms such as kibibyte, mebibyte and gibibyte, having the symbols KiB, MiB, and GiB, denote 1024 bytes, 1048576 bytes, and 1073741824 bytes, respectively. In 2008, the IEC prefixes were incorporated into the international standard system of units used alongside the International System of Quantities (see ISO/IEC 80000).

History

Main memory

Early computers used one of two addressing methods to access the system memory; binary (base 2) or decimal (base 10). For example, the IBM 701 (1952) used binary and could address 2048 words of 36 bits each, while the IBM 702 (1953) used decimal and could address ten thousand 7-bit words.

By the mid-1960s, binary addressing had become the standard architecture in most computer designs, and main memory sizes were most commonly powers of two. This is the most natural configuration for memory, as all combinations of their address lines map to a valid address, allowing easy aggregation into a larger block of memory with contiguous addresses.

Early computer system documentation would specify the memory size with an exact number such as 4096, 8192, or 16384 words of storage. These are all powers of two, and furthermore are small multiples of 2¹⁰, or 1024. As storage capacities increased, several different methods were developed to abbreviate these quantities.

The method most commonly used today uses prefixes such as kilo, mega, giga, and corresponding symbols K, M, and G, which the computer industry originally adopted from the metric system. The prefixes kilo- and mega-, meaning 1000 and 1000000 respectively, were commonly used in the electronics industry before World War II. Along with giga- or G-, meaning 1000000000, they are now known as SI prefixes after the International System of Units (SI), introduced in 1960 to formalize aspects of the metric system.

The International System of Units does not define units for digital information but notes that the SI prefixes may be applied outside the contexts where base units or derived units would be used. But as computer main memory in a binary-addressed system is manufactured in sizes that were easily expressed as multiples of 1024, kilobyte, when applied to computer memory, came to be used to mean 1024 bytes instead of 1000. This usage is not consistent with the SI. Compliance with the SI requires that the prefixes take their 1000-based meaning, and that they are not to be used as placeholders for other numbers, like 1024.

The use of K in the binary sense as in a "32K core" meaning 32 × 1024 words, i.e., 32768 words, can be found as early as 1959. Gene Amdahl's seminal 1964 article on IBM System/360 used "1K" to mean 1024. This style was used by other computer vendors, the CDC 7600 System Description (1968) made extensive use of K as 1024. Thus the first binary prefix was born.

Another style was to truncate the last three digits and append K, essentially using K as a decimal prefix^{[defn. 3]} similar to SI, but always truncating to the next lower whole number instead of rounding to the nearest. The exact values 32768 words, 65536 words and 131072 words would then be described as "32K", "65K" and "131K". (If these values had been rounded to nearest they would have become 33K, 66K, and 131K, respectively.) This style was used from about 1965 to 1975.

These two styles (K = 1024 and truncation) were used loosely around the same time, sometimes by the same company. In discussions of binary-addressed memories, the exact size was evident from context. (For memory sizes of "41K" and below, there is no difference between the two styles.) The HP 21MX real-time computer (1974) denoted 196608 (which is 192×1024) as "196K" and 1048576 as "1M",^[11] while the HP 3000 business computer (1973) could have "64K", "96K", or "128K" bytes of memory.

The "truncation" method gradually waned. Capitalization of the letter K became the de facto standard for binary notation, although this could not be extended to higher powers, and use of the lowercase k did persist. Nevertheless, the practice of using the SI-inspired "kilo" to indicate 1024 was later extended to "megabyte" meaning 1024² (1048576) bytes, and later "gigabyte" for 1024³ (1073741824) bytes. For example, a "512 megabyte" RAM module is 512×1024² bytes (512 × 1048576, or 536870912), rather than 512000000.

The symbols Kbit, Kbyte, Mbit and Mbyte started to be used as "binary units"—"bit" or "byte" with a multiplier that is a power of 1024—in the early 1970s. For a time, memory capacities were often expressed in K, even when M could have been used: The IBM System/370 Model 158 brochure (1972) had the following: "Real storage capacity is available in 512K increments ranging from 512K to 2,048K bytes."

Megabyte was used to describe the 22-bit addressing of DEC PDP-11/70 (1975) and gigabyte the 30-bit addressing DEC VAX-11/780 (1977).

In 1998, the International Electrotechnical Commission IEC introduced the binary prefixes kibi, mebi, gibi ... to mean 1024, 1024², 1024³ etc., so that 1048576 bytes could be referred to unambiguously as 1 mebibyte. The IEC prefixes were defined for use alongside the International System of Quantities (ISQ) in 2009.

Disk drives

The disk drive industry has followed a different pattern. Disk drive capacity is generally specified with unit prefixes with decimal meaning, in accordance to SI practices. Unlike computer main memory, disk architecture or construction does not mandate or make it convenient to use binary multiples. Drives can have any practical number of platters or surfaces, and the count of tracks, as well as the count of sectors per track may vary greatly between designs.

The first commercially sold disk drive, the IBM 350, had fifty physical disk platters containing a total of 50,000 sectors of 100 characters each, for a total quoted capacity of 5 million characters. It was introduced in September 1956.

In the 1960s most disk drives used IBM's variable block length format, called Count Key Data (CKD). Any block size could be specified up to the maximum track length. Since the block headers occupied space, the usable capacity of the drive was dependent on the block size. Blocks ("records" in IBM's terminology) of 88, 96, 880 and 960 were often used because they related to the fixed block size of 80- and 96-character punch cards. The drive capacity was usually stated under conditions of full track record blocking. For example, the 100-megabyte 3336 disk pack only achieved that capacity with a full track block size of 13,030 bytes.

Floppy disks for the IBM PC and compatibles quickly standardized on 512-byte sectors, so two sectors were easily referred to as "1K". The 3.5-inch "360 KB" and "720 KB" had 720 (single-sided) and 1440 sectors (double-sided) respectively. When the High Density "1.44 MB" floppies came along, with 2880 of these 512-byte sectors, that terminology represented a hybrid binary-decimal definition of "1 MB" = 2¹⁰ × 10³ = 1 024 000 bytes.

In contrast, hard disk drive manufacturers used megabytes or MB, meaning 10⁶ bytes, to characterize their products as early as 1974. By 1977, in its first edition, Disk/Trend, a leading hard disk drive industry marketing consultancy segmented the industry according to MBs (decimal sense) of capacity.

One of the earliest hard disk drives in personal computing history, the Seagate ST-412, was specified as Formatted: 10.0 Megabytes. The drive contains four heads and active surfaces (tracks per cylinder), 306 cylinders. When formatted with a sector size of 256 bytes and 32 sectors/track it has a capacity of 10027008 bytes. This drive was one of several types installed in the IBM PC/XT and extensively advertised and reported as a "10 MB" (formatted) hard disk drive. The cylinder count of 306 is not conveniently close to any power of 1024; operating systems and programs using the customary binary prefixes show this as 9.5625 MB. Many later drives in the personal computer market used 17 sectors per track; still later, zone bit recording was introduced, causing the number of sectors per track to vary from the outer track to the inner.

The hard drive industry continues to use decimal prefixes for drive capacity, as well as for transfer rate. For example, a "300 GB" hard drive offers slightly more than 300×10⁹, or 300000000000, bytes, not 300 × 2³⁰ (which would be about 322×10⁹). Operating systems such as Microsoft Windows that display hard drive sizes using the customary binary prefix "GB" (as it is used for RAM) would display this as "279.4 GB" (meaning 279.4 × 1024³ bytes, or 279.4 × 1073741824 B). On the other hand, macOS has since version 10.6 shown hard drive size using decimal prefixes (thus matching the drive makers' packaging). (Previous versions of Mac OS X used binary prefixes.)

However, other usages still occur. Seagate has specified data transfer rates in select manuals of some hard drives in both IEC and decimal units. "Advanced Format" drives using 4096-byte sectors are described as having "4K sectors."

Information transfer and clock rates

Computer clock frequencies are always quoted using SI prefixes in their decimal sense. For example, the internal clock frequency of the original IBM PC was 4.77 MHz, that is 4770000 Hz. Similarly, digital information transfer rates are quoted using decimal prefixes:

The ATA-100 disk interface refers to 100000000 bytes per second
A "56K" modem refers to 56000 bits per second
SATA-2 has a raw bit rate of 3 Gbit/s = 3000000000 bits per second
PC2-6400 RAM transfers 6400000000 bytes per second
Firewire 800 has a raw rate of 800000000 bits per second
In 2011, Seagate specified the sustained transfer rate of some hard disk drive models with both decimal and IEC binary prefixes.

Standardization of dual definitions

By the mid-1970s it was common to see K meaning 1024 and the occasional M meaning 1048576 for words or bytes of main memory (RAM) while K and M were commonly used with their decimal meaning for disk storage. In the 1980s, as capacities of both types of devices increased, the SI prefix G, with SI meaning, was commonly applied to disk storage, while M in its binary meaning, became common for computer memory. In the 1990s, the prefix G, in its binary meaning, became commonly used for computer memory capacity. The first terabyte (SI prefix, 1000000000000 bytes) hard disk drive was introduced in 2007.

The dual usage of the kilo (K), mega (M), and giga (G) prefixes as both powers of 1000 and powers of 1024 has been recorded in standards and dictionaries. For example, the 1986 ANSI/IEEE Std 1084-1986 defined dual uses for kilo and mega.

kilo (K). (1) A prefix indicating 1000. (2) In statements involving size of computer storage, a prefix indicating 2¹⁰, or 1024.
mega (M). (1) A prefix indicating one million. (2) In statements involving size of computer storage, a prefix indicating 2²⁰, or 1048576.

The binary units Kbyte and Mbyte were formally defined in ANSI/IEEE Std 1212-1991.

Many dictionaries have noted the practice of using traditional prefixes to indicate binary multiples. Oxford online dictionary defines, for example, megabyte as: "Computing: a unit of information equal to one million or (strictly) 1048576 bytes."

The units Kbyte, Mbyte, and Gbyte are found in the trade press and in IEEE journals. Gigabyte was formally defined in IEEE Std 610.10-1994 as either 1000000000 or 2³⁰ bytes. Kilobyte, Kbyte, and KB are equivalent units and all are defined in the obsolete standard, IEEE 100-2000.

The hardware industry measures system memory (RAM) using the binary meaning while magnetic disk storage uses the SI definition. However, many exceptions exist. Labeling of diskettes uses the megabyte to denote 1024×1000 bytes. In the optical disks market, compact discs use MB to mean 1024² bytes while DVDs use GB to mean 1000³ bytes.

Inconsistent use of units

Deviation between powers of 1024 and powers of 1000

Computer storage has become cheaper per unit and thereby larger, by many orders of magnitude since "K" was first used to mean 1024. Because both the SI and "binary" meanings of kilo, mega, etc., are based on powers of 1000 or 1024 rather than simple multiples, the difference between 1M "binary" and 1M "decimal" is proportionally larger than that between 1K "binary" and 1k "decimal," and so on up the scale. The relative difference between the values in the binary and decimal interpretations increases, when using the SI prefixes as the base, from 2.4% for kilo to nearly 21% for the yotta prefix.

Linear-log graph of percentage of the difference between decimal and binary interpretations of the unit prefixes versus the storage size.

Prefix	Binary ÷ Decimal	Decimal ÷ Binary
kilo	1.024 (+2.4%)	0.9766 (−2.3%)
mega	1.049 (+4.9%)	0.9537 (−4.6%)
giga	1.074 (+7.4%)	0.9313 (−6.9%)
tera	1.100 (+10.0%)	0.9095 (−9.1%)
peta	1.126 (+12.6%)	0.8882 (−11.2%)
exa	1.153 (+15.3%)	0.8674 (−13.3%)
zetta	1.181 (+18.1%)	0.8470 (−15.3%)
yotta	1.209 (+20.9%)	0.8272 (−17.3%)

Consumer confusion

In the early days of computers (roughly, prior to the advent of personal computers) there was little or no consumer confusion because of the technical sophistication of the buyers and their familiarity with the products. In addition, it was common for computer manufacturers to specify their products with capacities in full precision.

In the personal computing era, one source of consumer confusion is the difference in the way many operating systems display hard drive sizes, compared to the way hard drive manufacturers describe them. Hard drives are specified and sold using "GB" and "TB" in their decimal meaning: one billion and one trillion bytes. Many operating systems and other software, however, display hard drive and file sizes using "MB", "GB" or other SI-looking prefixes in their binary sense, just as they do for displays of RAM capacity. For example, many such systems display a hard drive marketed as "160 GB" as "149.05 GB". The earliest known presentation of hard disk drive capacity by an operating system using "KB" or "MB" in a binary sense is 1984; earlier operating systems generally presented the hard disk drive capacity as an exact number of bytes, with no prefix of any sort, for example, in the output of the MS-DOS or PC DOS CHKDSK command.

Legal disputes

The different interpretations of disk size prefixes has led to three significant class action lawsuits against digital storage manufacturers. One case involved flash memory and the other two involved hard disk drives. Two of these were settled with the manufacturers admitting no wrongdoing but agreeing to clarify the storage capacity of their products on the consumer packaging. Flash memory and hard disk manufacturers now have disclaimers on their packaging and web sites clarifying the formatted capacity of the devices or defining MB as 1 million bytes and 1 GB as 1 billion bytes.

Willem Vroegh v. Eastman Kodak Company

On 20 February 2004, Willem Vroegh filed a lawsuit against Lexar Media, Dane–Elec Memory, Fuji Photo Film USA, Eastman Kodak Company, Kingston Technology Company, Inc., Memorex Products, Inc.; PNY Technologies Inc., SanDisk Corporation, Verbatim Corporation, and Viking Interworks alleging that their descriptions of the capacity of their flash memory cards were false and misleading.

Vroegh claimed that a 256 MB Flash Memory Device had only 244 MB of accessible memory. "Plaintiffs allege that Defendants marketed the memory capacity of their products by assuming that one megabyte equals one million bytes and one gigabyte equals one billion bytes." The plaintiffs wanted the defendants to use the traditional values of 1024² for megabyte and 1024³ for gigabyte. The plaintiffs acknowledged that the IEC and IEEE standards define a MB as one million bytes but stated that the industry has largely ignored the IEC standards.

The manufacturers agreed to clarify the flash memory card capacity on the packaging and web sites. The consumers could apply for "a discount of ten percent off a future online purchase from Defendants' Online Stores Flash Memory Device".

Orin Safier v. Western Digital Corporation

On 7 July 2005, an action entitled Orin Safier v. Western Digital Corporation, et al. was filed in the Superior Court for the City and County of San Francisco, Case No. CGC-05-442812. The case was subsequently moved to the Northern District of California, Case No. 05-03353 BZ.

Although Western Digital maintained that their usage of units is consistent with "the indisputably correct industry standard for measuring and describing storage capacity", and that they "cannot be expected to reform the software industry", they agreed to settle in March 2006 with 14 June 2006 as the Final Approval hearing date.

Western Digital offered to compensate customers with a free download of backup and recovery software valued at US$30. They also paid $500,000 in fees and expenses to San Francisco lawyers Adam Gutride and Seth Safier, who filed the suit. The settlement called for Western Digital to add a disclaimer to their later packaging and advertising.

Cho v. Seagate Technology (US) Holdings, Inc.

A lawsuit (Cho v. Seagate Technology (US) Holdings, Inc., San Francisco Superior Court, Case No. CGC-06-453195) was filed against Seagate Technology, alleging that Seagate overrepresented the amount of usable storage by 7% on hard drives sold between March 22, 2001 and September 26, 2007. The case was settled without Seagate admitting wrongdoing, but agreeing to supply those purchasers with free backup software or a 5% refund on the cost of the drives.

Unique binary prefixes

Early suggestions

While early computer scientists typically used k to mean 1000, some recognized the convenience that would result from working with multiples of 1024 and the confusion that resulted from using the same prefixes for two different meanings.

Several proposals for unique binary prefixes were made in 1968. Donald Morrison proposed to use the Greek letter kappa (κ) to denote 1024, κ² to denote 1024², and so on. (At the time, memory size was small, and only K was in widespread use.) Wallace Givens responded with a proposal to use bK as an abbreviation for 1024 and bK2 or bK² for 1024², though he noted that neither the Greek letter nor lowercase letter b would be easy to reproduce on computer printers of the day. Bruce Alan Martin of Brookhaven National Laboratory further proposed that the prefixes be abandoned altogether, and the letter B be used for base-2 exponents, similar to E in decimal scientific notation, to create shorthands like 3B20 for 3×2²⁰, a convention still used on some calculators to present binary floating point-numbers today.

None of these gained much acceptance, and capitalization of the letter K became the de facto standard for indicating a factor of 1024 instead of 1000, although this could not be extended to higher powers.

As the discrepancy between the two systems increased in the higher-order powers, more proposals for unique prefixes were made. In 1996, Markus Kuhn proposed a system with di prefixes, like the "dikilobyte" (K₂B or K2B). Donald Knuth, who uses decimal notation like 1 MB = 1000 kB, expressed "astonishment" that the IEC proposal was adopted, calling them "funny-sounding" and opining that proponents were assuming "that standards are automatically adopted just because they are there." Knuth proposed that the powers of 1024 be designated as "large kilobytes" and "large megabytes" (abbreviated KKB and MMB, as "doubling the letter connotes both binary-ness and large-ness"). Double prefixes were already abolished from SI, however, having a multiplicative meaning ("MMB" would be equivalent to "TB"), and this proposed usage never gained any traction.

IEC prefixes

The set of binary prefixes that were eventually adopted, now referred to as the "IEC prefixes", were first proposed by the International Union of Pure and Applied Chemistry's (IUPAC) Interdivisional Committee on Nomenclature and Symbols (IDCNS) in 1995. At that time, it was proposed that the terms kilobyte and megabyte be used only for 10³ bytes and 10⁶ bytes, respectively. The new prefixes kibi (kilobinary), mebi (megabinary), gibi (gigabinary) and tebi (terabinary) were also proposed at the time, and the proposed symbols for the prefixes were kb, Mb, Gb and Tb respectively, rather than Ki, Mi, Gi and Ti. The proposal was not accepted at the time.

The Institute of Electrical and Electronics Engineers (IEEE) began to collaborate with the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) to find acceptable names for binary prefixes. IEC proposed kibi, mebi, gibi and tebi, with the symbols Ki, Mi, Gi and Ti respectively, in 1996.

The names for the new prefixes are derived from the original SI prefixes combined with the term binary, but contracted, by taking the first two letters of the SI prefix and "bi" from binary. The first letter of each such prefix is therefore identical to the corresponding SI prefixes, except for "K", which is used interchangeably with "k", whereas in SI, only the lower-case k represents 1000.

The IEEE decided that their standards would use the prefixes kilo, etc. with their metric definitions, but allowed the binary definitions to be used in an interim period as long as such usage was explicitly pointed out on a case-by-case basis.

Adoption by IEC, NIST and ISO

In January 1999, the IEC published the first international standard (IEC 60027-2 Amendment 2) with the new prefixes, extended up to pebi (Pi) and exbi (Ei).

The IEC 60027-2 Amendment 2 also states that the IEC position is the same as that of BIPM (the body that regulates the SI system); the SI prefixes retain their definitions in powers of 1000 and are never used to mean a power of 1024.

In usage, products and concepts typically described using powers of 1024 would continue to be, but with the new IEC prefixes. For example, a memory module of 536870912 bytes (512 × 1048576) would be referred to as 512 MiB or 512 mebibytes instead of 512 MB or 512 megabytes. Conversely, since hard drives have historically been marketed using the SI convention that "giga" means 1000000000, a "500 GB" hard drive would still be labeled as such. According to these recommendations, operating systems and other software would also use binary and SI prefixes in the same way, so the purchaser of a "500 GB" hard drive would find the operating system reporting either "500 GB" or "466 GiB", while 536870912 bytes of RAM would be displayed as "512 MiB".

The second edition of the standard, published in 2000, defined them only up to exbi, but in 2005, the third edition added prefixes zebi and yobi, thus matching all SI prefixes with binary counterparts.

The harmonized ISO/IEC IEC 80000-13:2008 standard cancels and replaces subclauses 3.8 and 3.9 of IEC 60027-2:2005 (those defining prefixes for binary multiples). The only significant change is the addition of explicit definitions for some quantities. In 2009, the prefixes kibi-, mebi-, etc. were defined by ISO 80000-1 in their own right, independently of the kibibyte, mebibyte, and so on.

The BIPM standard JCGM 200:2012 "International vocabulary of metrology – Basic and general concepts and associated terms (VIM), 3rd edition" lists the IEC binary prefixes and states "SI prefixes refer strictly to powers of 10, and should not be used for powers of 2. For example, 1 kilobit should not be used to represent 1024 bits (2¹⁰ bits), which is 1 kibibit."

Other standards bodies and organizations

The IEC standard binary prefixes are now supported by other standardization bodies and technical organizations.

The United States National Institute of Standards and Technology (NIST) supports the ISO/IEC standards for "Prefixes for binary multiples" and has a web site documenting them, describing and justifying their use. NIST suggests that in English, the first syllable of the name of the binary-multiple prefix should be pronounced in the same way as the first syllable of the name of the corresponding SI prefix, and that the second syllable should be pronounced as bee.^[2] NIST has stated the SI prefixes "refer strictly to powers of 10" and that the binary definitions "should not be used" for them.

The microelectronics industry standards body JEDEC describes the IEC prefixes in its online dictionary. The JEDEC standards for semiconductor memory use the customary prefix symbols K, M, G and T in the binary sense.

On 19 March 2005, the IEEE standard IEEE 1541-2002 ("Prefixes for Binary Multiples") was elevated to a full-use standard by the IEEE Standards Association after a two-year trial period. However, as of April 2008, the IEEE Publications division does not require the use of IEC prefixes in its major magazines such as Spectrum or Computer.

The International Bureau of Weights and Measures (BIPM), which maintains the International System of Units (SI), expressly prohibits the use of SI prefixes to denote binary multiples, and recommends the use of the IEC prefixes as an alternative since units of information are not included in SI.

The Society of Automotive Engineers (SAE) prohibits the use of SI prefixes with anything but a power-of-1000 meaning, but does not recommend or otherwise cite the IEC binary prefixes.

The European Committee for Electrotechnical Standardization (CENELEC) adopted the IEC-recommended binary prefixes via the harmonization document HD 60027-2:2003-03. The European Union (EU) has required the use of the IEC binary prefixes since 2007.

Current practice

Most computer hardware uses SI prefixes to state capacity and define other performance parameters such as data rate. Main and cache memories are notable exceptions.

Capacities of main memory and cache memory are usually expressed with customary binary prefixes On the other hand, flash memory, like that found in solid state drives, mostly uses SI prefixes to state capacity.

Some operating systems and other software continue to use the customary binary prefixes in displays of memory, disk storage capacity, and file size, but SI prefixes in other areas such as network communication speeds and processor speeds.

In the following subsections, unless otherwise noted, examples are first given using the common prefixes used in each case, and then followed by interpretation using other notation where appropriate.

Operating systems

Prior to the release of Macintosh System Software (1984), file sizes were typically reported by the operating system without any prefixes. Today, most operating systems report file sizes with prefixes.

The Linux kernel uses binary prefixes when booting up. However, many Unix-like system utilities, such as the ls command, use powers of 1024 indicated as K/M (customary binary prefixes) if called with the ‘‘-h’’ option or give the exact value in bytes otherwise. The GNU versions will also use powers of 10 indicated with k/M if called with the ‘‘--si’’ option.
- The Ubuntu Linux distribution uses the IEC prefixes for base-2 numbers as of the 10.10 release.
Microsoft Windows reports file sizes and disk device capacities using the customary binary prefixes or, in a "Properties" dialog, using the exact value in bytes.
Since Mac OS X Snow Leopard, (version 10.6), Apple's Mac OS X reports sizes using SI decimal prefixes (1 MB = 1000000 bytes).

Software

As of February 2010, most software does not distinguish symbols for binary and decimal prefixes.^{[defn. 3]} The IEC binary naming convention has been adopted by a few, but this is not used universally.

One of the stated goals of the introduction of the IEC prefixes was "to preserve the SI prefixes as unambiguous decimal multipliers." Programs such as fdisk/cfdisk, parted, and apt-get use SI prefixes with their decimal meaning.

GNOME's partition editor uses IEC prefixes to display partition sizes. The total capacity of the 120×10⁹-byte disk is displayed as "111.79 GiB"
GNOME's system monitor uses IEC prefixes to show memory size and networking data rate.
BitTornado uses standard SI prefixes for data rates and IEC prefixes for file sizes
Deluge (BitTorrent client) uses IEC prefixes for data rates as well as file sizes

Example of the use of IEC binary prefixes in the Linux operating system displaying traffic volume on a network interface in kibibytes (KiB) and mebibytes (MiB), as obtained with the ifconfig utility:

eth0      Link encap:Ethernet  HWaddr 00:14:A0:B0:7A:42
          inet6 addr: 2001:491:890a:1:214:a5ff:febe:7a42/64 Scope:Global
          inet6 addr: fe80::214:a5ff:febe:7a42/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:254804 errors:0 dropped:0 overruns:0 frame:0
          TX packets:756 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:18613795 (17.7 MiB)  TX bytes:45708 (44.6 KiB)

Software that uses standard SI prefixes for powers of 1000, but not IEC binary prefixes for powers of 1024, includes:

Mac OS X v10.6 and later for hard drive and file sizes

Software that supports decimal prefixes for powers of 1000 and binary prefixes for powers of 1024 (but does not follow SI or IEC nomenclature for this) includes:

4DOS (uses lowercase letters as decimal and uppercase letters as binary prefixes)

Software that uses IEC binary prefixes for powers of 1024 and uses standard SI prefixes for powers of 1000 includes:

Computer hardware

Hardware types that use powers-of-1024 multipliers, such as memory, continue to be marketed with customary binary prefixes.

Computer memory

The 536870912 byte (512×2²⁰) capacity of these RAM modules is stated as "512 MB" on the label.

Measurements of most types of electronic memory such as RAM and ROM are given using customary binary prefixes (kilo, mega, and giga). This includes some flash memory, like EEPROMs. For example, a "512-megabyte" memory module is 512×2²⁰ bytes (512 × 1048576, or 536870912).

JEDEC Solid State Technology Association, the semiconductor engineering standardization body of the Electronic Industries Alliance (EIA), continues to include the customary binary definitions of kilo, mega and giga in their Terms, Definitions, and Letter Symbols document, and uses those definitions in later memory standards.

Many computer programming tasks reference memory in terms of powers of two because of the inherent binary design of current hardware addressing systems. For example, a 16-bit processor register can reference at most 65,536 items (bytes, words, or other objects); this is conveniently expressed as "64K" items. An operating system might map memory as 4096-byte pages, in which case exactly 8192 pages could be allocated within 33554432 bytes of memory: "8K" (8192) pages of "4 kilobytes" (4096 bytes) each within "32 megabytes" (32 MiB) of memory.

Hard disk drives

All hard disk drive manufacturers state capacity using SI prefixes.

Flash drives

USB flash drives, flash-based memory cards like CompactFlash or Secure Digital, and flash-based SSDs use SI prefixes; for example, a "256 MB" flash card provides at least 256 million bytes (256000000), not 256×1024×1024 (268435456).^[44] The flash memory chips inside these devices contain considerably more than the quoted capacities, but much like a traditional hard drive, some space is reserved for internal functions of the flash drive. These include wear leveling, error correction, sparing, and metadata needed by the device's internal firmware.

Floppy drives

Floppy disks have existed in numerous physical and logical formats, and have been sized inconsistently. In part, this is because the end user capacity of a particular disk is a function of the controller hardware, so that the same disk could be formatted to a variety of capacities. In many cases, the media are marketed without any indication of the end user capacity, as for example, DSDD, meaning double-sided double-density.

The last widely adopted diskette was the 3½-inch high density. This has a formatted capacity of 1474560 bytes or 1440 KB (1440 × 1024, using "KB" in the customary binary sense). These are marketed as "HD", or "1.44 MB" or both. This usage creates a third definition of "megabyte" as 1000×1024 bytes.

Most operating systems display the capacity using "MB" in the customary binary sense, resulting in a display of "1.4 MB" (1.40625 MiB). Some users have noticed the missing 0.04 MB and both Apple and Microsoft have support bulletins referring to them as 1.4 MB.

The earlier "1200 KB" (1200×1024 bytes) 5¼-inch diskette sold with the IBM PC AT was marketed as "1.2 MB" (1.171875 MiB). The largest 8-inch diskette formats could contain more than a megabyte, and the capacities of those devices were often irregularly specified in megabytes, also without controversy.

Older and smaller diskette formats were usually identified as an accurate number of (binary) KB, for example the Apple Disk II described as "140KB" had a 140×1024-byte capacity, and the original "360KB" double sided, double density disk drive used on the IBM PC had a 360×1024-byte capacity.

In many cases diskette hardware was marketed based on unformatted capacity, and the overhead required to format sectors on the media would reduce the nominal capacity as well (and this overhead typically varied based on the size of the formatted sectors), leading to more irregularities.

Optical discs

The capacities of most optical disc storage media like DVD, Blu-ray Disc, HD DVD and magneto-optical (MO) are given using SI decimal prefixes. A "4.7 GB" DVD has a nominal capacity of about 4.38 GiB. However, CD capacities are always given using customary binary prefixes. Thus a "700-MB" (or "80-minute") CD has a nominal capacity of about 700 MiB (approx 730 MB).

Tape drives and media

Tape drive and media manufacturers use SI decimal prefixes to identify capacity.

Data transmission and clock rates

Certain units are always used with SI decimal prefixes even in computing contexts. Two examples are hertz (Hz), which is used to measure the clock rates of electronic components, and bit/s, used to measure data transmission speed.

A 1-GHz processor receives 1000000000 clock ticks per second.
A sound file sampled at 44.1 kHz has 44100 samples per second.
A 128 kbit/s MP3 stream consumes 128000 bits (16 kilobytes, 15.6 KiB) per second.
A 1 Mbit/s Internet connection can transfer 1000000 bits per second (125000 bytes per second ≈ 122 KiB/s, assuming an 8-bit byte and no overhead)
A 1 Gbit/s Ethernet connection can transfer 1000000000 bits per second (125000000 bytes per second ≈ 119 MiB/s, assuming an 8-bit byte and no overhead)
A 56k modem transfers 56000 bits per second ≈ 6.8 KiB/s.

Bus clock speeds and therefore bandwidths are both quoted using SI decimal prefixes.

PC3200 memory on a double data rate bus, transferring 8 bytes per cycle with a clock speed of 200 MHz (200000000 cycles per second) has a bandwidth of 200000000 × 8 × 2 = 3200000000 B/s = 3.2 GB/s (about 3.0 GiB/s).
A PCI-X bus at 66 MHz (66000000 cycles per second), 64 bits per transfer, has a bandwidth of 66000000 transfers per second × 64 bits per transfer = 4224000000 bit/s, or 528000000 B/s, usually quoted as 528 MB/s (about 503 MiB/s).

Use by industry

IEC prefixes are used by Toshiba, IBM, HP to advertise or describe some of their products. According to one HP brochure, "[t]o reduce confusion, vendors are pursuing one of two remedies: they are changing SI prefixes to the new binary prefixes, or they are recalculating the numbers as powers of ten." The IBM Data Center also uses IEC prefixes to reduce confusion. The IBM Style Guide reads

To help avoid inaccuracy (especially with the larger prefixes) and potential ambiguity, the International Electrotechnical Commission (IEC) in 2000 adopted a set of prefixes specifically for binary multipliers (See IEC 60027-2). Their use is now supported by the United States National Institute of Standards and Technology (NIST) and incorporated into ISO 80000. They are also required by EU law and in certain contexts in the US. However, most documentation and products in the industry continue to use SI prefixes when referring to binary multipliers. In product documentation, follow the same standard that is used in the product itself (for example, in the interface or firmware). Whether you choose to use IEC prefixes for powers of 2 and SI prefixes for powers of 10, or use SI prefixes for a dual purpose ... be consistent in your usage and explain to the user your adopted system.

Search This Blog

Wednesday, October 9, 2019

Data type

Concept

Definition

Classes of data types

Primitive data types

Machine data types

Boolean type

Numeric types

Composite types

Enumerations

String and text types

Other types

Pointers and references

Abstract data types

Utility types

Type systems

Byte

History

Unit symbol

Unit multiples

Common uses

Binary prefix

History

Main memory

Disk drives

Information transfer and clock rates

Standardization of dual definitions

Inconsistent use of units

Deviation between powers of 1024 and powers of 1000

Consumer confusion

Legal disputes

Willem Vroegh v. Eastman Kodak Company

Orin Safier v. Western Digital Corporation

Cho v. Seagate Technology (US) Holdings, Inc.

Unique binary prefixes

Early suggestions

IEC prefixes

Adoption by IEC, NIST and ISO

Other standards bodies and organizations

Current practice

Operating systems

Software

Computer hardware

Computer memory

Hard disk drives

Flash drives

Floppy drives

Optical discs

Tape drives and media

Data transmission and clock rates

Use by industry

Effects of climate change on the water cycle