A Medley of Potpourri

Monday, December 16, 2024

Quicksort

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Quicksort


Animated visualization of the quicksort algorithm. The horizontal lines are pivot values.

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. For this reason, it is sometimes called partition-exchange sort. The sub-arrays are then sorted recursively. This can be done in-place, requiring small additional amounts of memory to perform the sorting.

Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than" relation (formally, a total order) is defined. It is a comparison-based sort since elements a and b are only swapped in case their relative order has been obtained in the transitive closure of prior comparison-outcomes. Most implementations of quicksort are not stable, meaning that the relative order of equal sort items is not preserved.

Mathematical analysis of quicksort shows that, on average, the algorithm takes $O (n \log n)$ comparisons to sort n items. In the worst case, it makes $O (n^{2})$ comparisons.

History

The quicksort algorithm was developed in 1959 by Tony Hoare while he was a visiting student at Moscow State University. At that time, Hoare was working on a machine translation project for the National Physical Laboratory. As a part of the translation process, he needed to sort the words in Russian sentences before looking them up in a Russian-English dictionary, which was in alphabetical order on magnetic tape. After recognizing that his first idea, insertion sort, would be slow, he came up with a new idea. He wrote the partition part in Mercury Autocode but had trouble dealing with the list of unsorted segments. On return to England, he was asked to write code for Shellsort. Hoare mentioned to his boss that he knew of a faster algorithm and his boss bet a sixpence that he did not. His boss ultimately accepted that he had lost the bet. Hoare published a paper about his algorithm in The Computer Journal Volume 5, Issue 1, 1962, Pages 10–16. Later, Hoare learned about ALGOL and its ability to do recursion that enabled him to publish an improved version of the algorithm in ALGOL in Communications of the Association for Computing Machinery, the premier computer science journal of the time. The ALGOL code is published in Communications of the ACM (CACM), Volume 4, Issue 7 July 1961, pp 321 Algorithm 63: partition and Algorithm 64: Quicksort.

Quicksort gained widespread adoption, appearing, for example, in Unix as the default library sort subroutine. Hence, it lent its name to the C standard library subroutine qsort and in the reference implementation of Java.

Robert Sedgewick's PhD thesis in 1975 is considered a milestone in the study of Quicksort where he resolved many open problems related to the analysis of various pivot selection schemes including Samplesort, adaptive partitioning by Van Emden as well as derivation of expected number of comparisons and swaps. Jon Bentley and Doug McIlroy in 1993 incorporated various improvements for use in programming libraries, including a technique to deal with equal elements and a pivot scheme known as pseudomedian of nine, where a sample of nine elements is divided into groups of three and then the median of the three medians from three groups is chosen. Bentley described another simpler and compact partitioning scheme in his book Programming Pearls that he attributed to Nico Lomuto. Later Bentley wrote that he used Hoare's version for years but never really understood it but Lomuto's version was simple enough to prove correct. Bentley described Quicksort as the "most beautiful code I had ever written" in the same essay. Lomuto's partition scheme was also popularized by the textbook Introduction to Algorithms although it is inferior to Hoare's scheme because it does three times more swaps on average and degrades to $O (n 2)$ runtime when all elements are equal.McIlroy would further produce an AntiQuicksort (aqsort) function in 1998, which consistently drives even his 1993 variant of Quicksort into quadratic behavior by producing adversarial data on-the-fly.

Algorithm

Full example of quicksort on a random set of numbers. The shaded element is the pivot. It is always chosen as the last element of the partition. However, always choosing the last element in the partition as the pivot in this way results in poor performance ( $O (n 2)$ ) on *already sorted* arrays, or arrays of identical elements. Since sub-arrays of sorted / identical elements crop up a lot towards the end of a sorting procedure on a large set, versions of the quicksort algorithm that choose the pivot as the middle element run much more quickly than the algorithm described in this diagram on large sets of numbers.

Quicksort is a type of divide-and-conquer algorithm for sorting an array, based on a partitioning routine; the details of this partitioning can vary somewhat, so that quicksort is really a family of closely related algorithms. Applied to a range of at least two elements, partitioning produces a division into two consecutive non empty sub-ranges, in such a way that no element of the first sub-range is greater than any element of the second sub-range. After applying this partition, quicksort then recursively sorts the sub-ranges, possibly after excluding from them an element at the point of division that is at this point known to be already in its final location. Due to its recursive nature, quicksort (like the partition routine) has to be formulated so as to be callable for a range within a larger array, even if the ultimate goal is to sort a complete array. The steps for in-place quicksort are:

If the range has fewer than two elements, return immediately as there is nothing to do. Possibly for other very short lengths a special-purpose sorting method is applied and the remainder of these steps skipped.
Otherwise pick a value, called a pivot, that occurs in the range (the precise manner of choosing depends on the partition routine, and can involve randomness).
Partition the range: reorder its elements, while determining a point of division, so that all elements with values less than the pivot come before the division, while all elements with values greater than the pivot come after it; elements that are equal to the pivot can go either way. Since at least one instance of the pivot is present, most partition routines ensure that the value that ends up at the point of division is equal to the pivot, and is now in its final position (but termination of quicksort does not depend on this, as long as sub-ranges strictly smaller than the original are produced).
Recursively apply quicksort to the sub-range up to the point of division and to the sub-range after it, possibly excluding from both ranges the element equal to the pivot at the point of division. (If the partition produces a possibly larger sub-range near the boundary where all elements are known to be equal to the pivot, these can be excluded as well.)

The choice of partition routine (including the pivot selection) and other details not entirely specified above can affect the algorithm's performance, possibly to a great extent for specific input arrays. In discussing the efficiency of quicksort, it is therefore necessary to specify these choices first. Here we mention two specific partition methods.

Lomuto partition scheme

This scheme is attributed to Nico Lomuto and popularized by Bentley in his book Programming Pearls and Cormen et al. in their book Introduction to Algorithms. In most formulations this scheme chooses as the pivot the last element in the array. The algorithm maintains index i as it scans the array using another index j such that the elements at lo through i-1 (inclusive) are less than the pivot, and the elements at i through j (inclusive) are equal to or greater than the pivot. As this scheme is more compact and easy to understand, it is frequently used in introductory material, although it is less efficient than Hoare's original scheme e.g., when all elements are equal. The complexity of Quicksort with this scheme degrades to $O (n 2)$ when the array is already in order, due to the partition being the worst possible one. There have been various variants proposed to boost performance including various ways to select the pivot, deal with equal elements, use other sorting algorithms such as insertion sort for small arrays, and so on. In pseudocode, a quicksort that sorts elements at lo through hi (inclusive) of an array $A$ can be expressed as:

// Sorts (a portion of) an array, divides it into partitions, then sorts those
algorithm quicksort(A, lo, hi) is 
  // Ensure indices are in correct order
  if lo >= hi || lo < 0 then 
    return
    
  // Partition array and get the pivot index
  p := partition(A, lo, hi) 
      
  // Sort the two partitions
  quicksort(A, lo, p - 1) // Left side of pivot
  quicksort(A, p + 1, hi) // Right side of pivot

// Divides array into two partitions
algorithm partition(A, lo, hi) is 
  pivot := A[hi] // Choose the last element as the pivot

  // Temporary pivot index
  i := lo

  for j := lo to hi - 1 do 
    // If the current element is less than or equal to the pivot
    if A[j] <= pivot then 
      // Swap the current element with the element at the temporary pivot index
      swap A[i] with A[j]
      // Move the temporary pivot index forward
      i := i + 1

  // Swap the pivot with the last element
  swap A[i] with A[hi]
  return i // the pivot index

Sorting the entire array is accomplished by quicksort(A, 0, length(A) - 1).

Hoare partition scheme

An animated demonstration of Quicksort using Hoare's partition scheme. The red outlines show the positions of the left and right pointers (`i` and `j` respectively), the black outlines show the positions of the sorted elements, and the filled black square shows the value that is being compared to (`pivot`).

The original partition scheme described by Tony Hoare uses two pointers (indices into the range) that start at both ends of the array being partitioned, then move toward each other, until they detect an inversion: a pair of elements, one greater than the pivot at the first pointer, and one less than the pivot at the second pointer; if at this point the first pointer is still before the second, these elements are in the wrong order relative to each other, and they are then exchanged. After this the pointers are moved inwards, and the search for an inversion is repeated; when eventually the pointers cross (the first points after the second), no exchange is performed; a valid partition is found, with the point of division between the crossed pointers (any entries that might be strictly between the crossed pointers are equal to the pivot and can be excluded from both sub-ranges formed). With this formulation it is possible that one sub-range turns out to be the whole original range, which would prevent the algorithm from advancing. Hoare therefore stipulates that at the end, the sub-range containing the pivot element (which still is at its original position) can be decreased in size by excluding that pivot, after (if necessary) exchanging it with the sub-range element closest to the separation; thus, termination of quicksort is ensured.

With respect to this original description, implementations often make minor but important variations. Notably, the scheme as presented below includes elements equal to the pivot among the candidates for an inversion (so "greater than or equal" and "less than or equal" tests are used instead of "greater than" and "less than" respectively; since the formulation uses do...while rather than repeat...until which is actually reflected by the use of strict comparison operators). While there is no reason to exchange elements equal to the pivot, this change allows tests on the pointers themselves to be omitted, which are otherwise needed to ensure they do not run out of range. Indeed, since at least one instance of the pivot value is present in the range, the first advancement of either pointer cannot pass across this instance if an inclusive test is used; once an exchange is performed, these exchanged elements are now both strictly ahead of the pointer that found them, preventing that pointer from running off. (The latter is true independently of the test used, so it would be possible to use the inclusive test only when looking for the first inversion. However, using an inclusive test throughout also ensures that a division near the middle is found when all elements in the range are equal, which gives an important efficiency gain for sorting arrays with many equal elements.) The risk of producing a non-advancing separation is avoided in a different manner than described by Hoare. Such a separation can only result when no inversions are found, with both pointers advancing to the pivot element at the first iteration (they are then considered to have crossed, and no exchange takes place).

In pseudocode,

// Sorts (a portion of) an array, divides it into partitions, then sorts those
algorithm quicksort(A, lo, hi) is 
  if lo >= 0 && hi >= 0 && lo < hi then
    p := partition(A, lo, hi) 
    quicksort(A, lo, p) // Note: the pivot is now included
    quicksort(A, p + 1, hi) 

// Divides array into two partitions
algorithm partition(A, lo, hi) is 
  // Pivot value
  pivot := A[lo] // Choose the first element as the pivot

  // Left index
  i := lo - 1 

  // Right index
  j := hi + 1

  loop forever 
    // Move the left index to the right at least once and while the element at
    // the left index is less than the pivot
    do i := i + 1 while A[i] < pivot
    
    // Move the right index to the left at least once and while the element at
    // the right index is greater than the pivot
    do j := j - 1 while A[j] > pivot

    // If the indices crossed, return
    if i >= j then return j
    
    // Swap the elements at the left and right indices
    swap A[i] with A[j]

The entire array is sorted by quicksort(A, 0, length(A) - 1).

Hoare's scheme is more efficient than Lomuto's partition scheme because it does three times fewer swaps on average. Also, as mentioned, the implementation given creates a balanced partition even when all values are equal, which Lomuto's scheme does not. Like Lomuto's partition scheme, Hoare's partitioning also would cause Quicksort to degrade to $O (n 2)$ for already sorted input, if the pivot was chosen as the first or the last element. With the middle element as the pivot, however, sorted data results with (almost) no swaps in equally sized partitions leading to best case behavior of Quicksort, i.e. $O (n log(n))$ . Like others, Hoare's partitioning doesn't produce a stable sort. In this scheme, the pivot's final location is not necessarily at the index that is returned, as the pivot and elements equal to the pivot can end up anywhere within the partition after a partition step, and may not be sorted until the base case of a partition with a single element is reached via recursion. Therefore, the next two segments that the main algorithm recurs on are (lo..p) (elements ≤ pivot) and (p+1..hi) (elements ≥ pivot) as opposed to (lo..p-1) and (p+1..hi) as in Lomuto's scheme.

Subsequent recursions (expansion on previous paragraph)

Let's expand a little bit on the next two segments that the main algorithm recurs on. Because we are using strict comparators (>, <) in the "do...while" loops to prevent ourselves from running out of range, there's a chance that the pivot itself gets swapped with other elements in the partition function. Therefore, the index returned in the partition function isn't necessarily where the actual pivot is. Consider the example of [5, 2, 3, 1, 0], following the scheme, after the first partition the array becomes [0, 2, 1, 3, 5], the "index" returned is 2, which is the number 1, when the real pivot, the one we chose to start the partition with was the number 3. With this example, we see how it is necessary to include the returned index of the partition function in our subsequent recursions. As a result, we are presented with the choices of either recursing on (lo..p) and (p+1..hi), or (lo..p-1) and (p..hi). Which of the two options we choose depends on which index (i or j) we return in the partition function when the indices cross, and how we choose our pivot in the partition function (floor v.s. ceiling).

Let's first examine the choice of recursing on (lo..p) and (p+1..hi), with the example of sorting an array where multiple identical elements exist [0, 0]. If index i (the "latter" index) is returned after indices cross in the partition function, the index 1 would be returned after the first partition. The subsequent recursion on (lo..p)would be on (0, 1), which corresponds to the exact same array [0, 0]. A non-advancing separation that causes infinite recursion is produced. It is therefore obvious that when recursing on (lo..p) and (p+1..hi), because the left half of the recursion includes the returned index, it is the partition function's job to exclude the "tail" in non-advancing scenarios. Which is to say, index j (the "former" index when indices cross) should be returned instead of i. Going with a similar logic, when considering the example of an already sorted array [0, 1], the choice of pivot needs to be "floor" to ensure that the pointers stop on the "former" instead of the "latter" (with "ceiling" as the pivot, the index 1 would be returned and included in (lo..p) causing infinite recursion). It is for the exact same reason why choice of the last element as pivot must be avoided.

The choice of recursing on (lo..p-1) and (p..hi) follows the exact same logic as above. Because the right half of the recursion includes the returned index, it is the partition function's job to exclude the "head" in non-advancing scenarios. The index i (the "latter" index after the indices cross) in the partition function needs to be returned, and "ceiling" needs to be chosen as the pivot. The two nuances are clear, again, when considering the examples of sorting an array where multiple identical elements exist ([0, 0]), and an already sorted array [0, 1] respectively. It is noteworthy that with version of recursion, for the same reason, choice of the first element as pivot must be avoided.

Implementation issues

Choice of pivot

In the very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot (as recommended by Sedgewick). This "median-of-three" rule counters the case of sorted (or reverse-sorted) input, and gives a better estimate of the optimal pivot (the true median) than selecting any single element, when no information about the ordering of the input is known.

Median-of-three code snippet for Lomuto partition:

mid := ⌊(lo + hi) / 2⌋
if A[mid] < A[lo]
    swap A[lo] with A[mid]
if A[hi] < A[lo]
    swap A[lo] with A[hi]
if A[mid] < A[hi]
    swap A[mid] with A[hi]
pivot := A[hi]

It puts a median into A[hi] first, then that new value of A[hi] is used for a pivot, as in a basic algorithm presented above.

Specifically, the expected number of comparisons needed to sort $n$ elements (see § Analysis of randomized quicksort) with random pivot selection is $1.386 n log n$ . Median-of-three pivoting brings this down to $C n, 2 \approx 1.188 n log n$ , at the expense of a three-percent increase in the expected number of swaps. An even stronger pivoting rule, for larger arrays, is to pick the ninther, a recursive median-of-three (Mo3), defined as

ninther(a) = median(Mo3(first ⁠ 1 / 3 ⁠ of a), Mo3(middle ⁠ 1 / 3 ⁠ of a), Mo3(final ⁠ 1 / 3 ⁠ of a))

Selecting a pivot element is also complicated by the existence of integer overflow. If the boundary indices of the subarray being sorted are sufficiently large, the naïve expression for the middle index, $(lo + hi)/2$ , will cause overflow and provide an invalid pivot index. This can be overcome by using, for example, $lo + (hi - lo)/2$ to index the middle element, at the cost of more complex arithmetic. Similar issues arise in some other methods of selecting the pivot element.

Repeated elements

With a partitioning algorithm such as the Lomuto partition scheme described above (even one that chooses good pivot values), quicksort exhibits poor performance for inputs that contain many repeated elements. The problem is clearly apparent when all the input elements are equal: at each recursion, the left partition is empty (no input values are less than the pivot), and the right partition has only decreased by one element (the pivot is removed). Consequently, the Lomuto partition scheme takes quadratic time to sort an array of equal values. However, with a partitioning algorithm such as the Hoare partition scheme, repeated elements generally results in better partitioning, and although needless swaps of elements equal to the pivot may occur, the running time generally decreases as the number of repeated elements increases (with memory cache reducing the swap overhead). In the case where all elements are equal, Hoare partition scheme needlessly swaps elements, but the partitioning itself is best case, as noted in the Hoare partition section above.

To solve the Lomuto partition scheme problem (sometimes called the Dutch national flag problem), an alternative linear-time partition routine can be used that separates the values into three groups: values less than the pivot, values equal to the pivot, and values greater than the pivot. (Bentley and McIlroy call this a "fat partition" and it was already implemented in the qsort of Version 7 Unix.) The values equal to the pivot are already sorted, so only the less-than and greater-than partitions need to be recursively sorted. In pseudocode, the quicksort algorithm becomes:

// Sorts (a portion of) an array, divides it into partitions, then sorts those
algorithm quicksort(A, lo, hi) is
  if lo >= 0 && lo < hi then
    lt, gt := partition(A, lo, hi) // Multiple return values
    quicksort(A, lo, lt - 1)
    quicksort(A, gt + 1, hi)

// Divides array into three partitions
algorithm partition(A, lo, hi) is
  // Pivot value
  pivot := A[(lo + hi) / 2] // Choose the middle element as the pivot (integer division)
  
  // Lesser, equal and greater index
  lt := lo
  eq := lo
  gt := hi
  
  // Iterate and compare all elements with the pivot
  while eq <= gt do
    if A[eq] < pivot then
      // Swap the elements at the equal and lesser indices
      swap A[eq] with A[lt]
      // Increase lesser index
      lt := lt + 1
      // Increase equal index
      eq := eq + 1
    else if A[eq] > pivot then
      // Swap the elements at the equal and greater indices
      swap A[eq] with A[gt]
      // Decrease greater index
      gt := gt - 1
    else // if A[eq] = pivot then
      // Increase equal index
      eq := eq + 1
  
  // Return lesser and greater indices
  return lt, gt

The partition algorithm returns indices to the first ('leftmost') and to the last ('rightmost') item of the middle partition. Every other item of the partition is equal to the pivot and is therefore sorted. Consequently, the items of the partition need not be included in the recursive calls to quicksort.

The best case for the algorithm now occurs when all elements are equal (or are chosen from a small set of $k ≪ n$ elements). In the case of all equal elements, the modified quicksort will perform only two recursive calls on empty subarrays and thus finish in linear time (assuming the partition subroutine takes no longer than linear time).

Optimizations

Other important optimizations, also suggested by Sedgewick and widely used in practice, are:

To make sure at most $O (log n)$ space is used, recur first into the smaller side of the partition, then use a tail call to recur into the other, or update the parameters to no longer include the now sorted smaller side, and iterate to sort the larger side.
When the number of elements is below some threshold (perhaps ten elements), switch to a non-recursive sorting algorithm such as insertion sort that performs fewer swaps, comparisons or other operations on such small arrays. The ideal 'threshold' will vary based on the details of the specific implementation.
An older variant of the previous optimization: when the number of elements is less than the threshold $k$ , simply stop; then after the whole array has been processed, perform insertion sort on it. Stopping the recursion early leaves the array $k$ -sorted, meaning that each element is at most $k$ positions away from its final sorted position. In this case, insertion sort takes $O (kn)$ time to finish the sort, which is linear if $k$ is a constant. Compared to the "many small sorts" optimization, this version may execute fewer instructions, but it makes suboptimal use of the cache memories in modern computers.

Parallelization

Quicksort's divide-and-conquer formulation makes it amenable to parallelization using task parallelism. The partitioning step is accomplished through the use of a parallel prefix sum algorithm to compute an index for each array element in its section of the partitioned array. Given an array of size $n$ , the partitioning step performs $O(n)$ work in $O (log n)$ time and requires $O(n)$ additional scratch space. After the array has been partitioned, the two partitions can be sorted recursively in parallel. Assuming an ideal choice of pivots, parallel quicksort sorts an array of size $n$ in $O(n log n)$ work in $O(log 2 n)$ time using $O(n)$ additional space.

Quicksort has some disadvantages when compared to alternative sorting algorithms, like merge sort, which complicate its efficient parallelization. The depth of quicksort's divide-and-conquer tree directly impacts the algorithm's scalability, and this depth is highly dependent on the algorithm's choice of pivot. Additionally, it is difficult to parallelize the partitioning step efficiently in-place. The use of scratch space simplifies the partitioning step, but increases the algorithm's memory footprint and constant overheads.

Other more sophisticated parallel sorting algorithms can achieve even better time bounds. For example, in 1991 David M W Powers described a parallelized quicksort (and a related radix sort) that can operate in $O (log n)$ time on a CRCW (concurrent read and concurrent write) PRAM (parallel random-access machine) with $n$ processors by performing partitioning implicitly.

Formal analysis

Worst-case analysis

The most unbalanced partition occurs when one of the sublists returned by the partitioning routine is of size $n - 1$ . This may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.

If this happens repeatedly in every partition, then each recursive call processes a list of size one less than the previous list. Consequently, we can make $n - 1$ nested calls before we reach a list of size 1. This means that the call tree is a linear chain of $n - 1$ nested calls. The $i$ th call does $O (n - i)$ work to do the partition, and $\sum_{i = 0}^{n} (n - i) = O (n^{2})$ , so in that case quicksort takes $O (n 2)$ time.

Best-case analysis

In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only $log 2 n$ nested calls before we reach a list of size 1. This means that the depth of the call tree is $log 2 n$ . But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only $O (n)$ time all together (each call has some constant overhead, but since there are only $O (n)$ calls at each level, this is subsumed in the $O (n)$ factor). The result is that the algorithm uses only $O (n log n)$ time.

Average-case analysis

To sort an array of $n$ distinct elements, quicksort takes $O (n log n)$ time in expectation, averaged over all $n!$ permutations of $n$ elements with equal probability. Alternatively, if the algorithm selects the pivot uniformly at random from the input array, the same analysis can be used to bound the expected running time for any input sequence; the expectation is then taken over the random choices made by the algorithm (Cormen et al., Introduction to Algorithms, Section 7.3).

We list here three common proofs to this claim providing different insights into quicksort's workings.

Using percentiles

If each pivot has rank somewhere in the middle 50 percent, that is, between the 25th percentile and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. If we could consistently choose such pivots, we would only have to split the list at most $\log_{4 / 3} n$ times before reaching lists of size 1, yielding an $O (n log n)$ algorithm.

When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that a coin is flipped: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Now imagine that the coin is flipped over and over until it gets $k$ heads. Although this could take a long time, on average only $2 k$ flips are required, and the chance that the coin won't get $k$ heads after $100 k$ flips is highly improbable (this can be made rigorous using Chernoff bounds). By the same argument, Quicksort's recursion will terminate on average at a call depth of only $2 \log_{4 / 3} n$ . But if its average call depth is $O (log n)$ , and each level of the call tree processes at most $n$ elements, the total amount of work done on average is the product, $O (n log n)$ . The algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.

Using recurrences

An alternative approach is to set up a recurrence relation for the $T (n)$ factor, the time needed to sort a list of size $n$ . In the most unbalanced case, a single quicksort call involves $O (n)$ work plus two recursive calls on lists of size $0$ and $n -1$ , so the recurrence relation is

T (n) = O (n) + T (0) + T (n - 1) = O (n) + T (n - 1) .

This is the same relation as for insertion sort and selection sort, and it solves to worst case $T (n) = O (n 2)$ .

In the most balanced case, a single quicksort call involves $O (n)$ work plus two recursive calls on lists of size $n /2$ , so the recurrence relation is

T (n) = O (n) + 2 T (\frac{n}{2}) .

The master theorem for divide-and-conquer recurrences tells us that $T (n) = O (n log n)$ .

The outline of a formal proof of the $O (n log n)$ expected time complexity follows. Assume that there are no duplicates as duplicates could be handled with linear time pre- and post-processing, or considered cases easier than the analyzed. When the input is a random permutation, the rank of the pivot is uniform random from 0 to $n - 1$ . Then the resulting parts of the partition have sizes $i$ and $n - i - 1$ , and i is uniform random from 0 to $n - 1$ . So, averaging over all possible splits and noting that the number of comparisons for the partition is $n - 1$ , the average number of comparisons over all permutations of the input sequence can be estimated accurately by solving the recurrence relation:

C (n) = n - 1 + \frac{1}{n} \sum_{i = 0}^{n - 1} (C (i) + C (n - i - 1)) = n - 1 + \frac{2}{n} \sum_{i = 0}^{n - 1} C (i)

n C (n) = n (n - 1) + 2 \sum_{i = 0}^{n - 1} C (i)

n C (n) - (n - 1) C (n - 1) = n (n - 1) - (n - 1) (n - 2) + 2 C (n - 1)

n C (n) = (n + 1) C (n - 1) + 2 n - 2

\begin{aligned} \frac{C (n)}{n + 1} & = \frac{C (n - 1)}{n} + \frac{2}{n + 1} - \frac{2}{n (n + 1)} \leq \frac{C (n - 1)}{n} + \frac{2}{n + 1} \\ = \frac{C (n - 2)}{n - 1} + \frac{2}{n} - \frac{2}{(n - 1) n} + \frac{2}{n + 1} \leq \frac{C (n - 2)}{n - 1} + \frac{2}{n} + \frac{2}{n + 1} \\ ⋮ \\ = \frac{C (1)}{2} + \sum_{i = 2}^{n} \frac{2}{i + 1} \leq 2 \sum_{i = 1}^{n - 1} \frac{1}{i} \approx 2 \int_{1}^{n} \frac{1}{x} d x = 2 \ln n \end{aligned}

Solving the recurrence gives $C (n) = 2 n ln n \approx 1.39 n log 2 n$ .

This means that, on average, quicksort performs only about 39% worse than in its best case. In this sense, it is closer to the best case than the worst case. A comparison sort cannot use less than $log 2 (n!)$ comparisons on average to sort $n$ items (as explained in the article Comparison sort) and in case of large $n$ , Stirling's approximation yields $log 2 (n!) \approx n (log 2 n - log 2 e)$ , so quicksort is not much worse than an ideal comparison sort. This fast average runtime is another reason for quicksort's practical dominance over other sorting algorithms.

Using a binary search tree

The following binary search tree (BST) corresponds to each execution of quicksort: the initial pivot is the root node; the pivot of the left half is the root of the left subtree, the pivot of the right half is the root of the right subtree, and so on. The number of comparisons of the execution of quicksort equals the number of comparisons during the construction of the BST by a sequence of insertions. So, the average number of comparisons for randomized quicksort equals the average cost of constructing a BST when the values inserted $(x_{1}, x_{2}, \dots, x_{n})$ form a random permutation.

Consider a BST created by insertion of a sequence $(x_{1}, x_{2}, \dots, x_{n})$ of values forming a random permutation. Let $C$ denote the cost of creation of the BST. We have $C = \sum_{i} \sum_{j < i} c_{i, j}$ , where $c_{i, j}$ is a binary random variable expressing whether during the insertion of $x_{i}$ there was a comparison to $x_{j}$ .

By linearity of expectation, the expected value $E [C]$ of $C$ is $E [C] = \sum_{i} \sum_{j < i} Pr (c_{i, j})$ .

Fix $i$ and $j < i$ . The values $x_{1}, x_{2}, \dots, x_{j}$ , once sorted, define $j +1$ intervals. The core structural observation is that $x_{i}$ is compared to $x_{j}$ in the algorithm if and only if $x_{i}$ falls inside one of the two intervals adjacent to $x_{j}$ .

Observe that since $(x_{1}, x_{2}, \dots, x_{n})$ is a random permutation, $(x_{1}, x_{2}, \dots, x_{j}, x_{i})$ is also a random permutation, so the probability that $x_{i}$ is adjacent to $x_{j}$ is exactly $\frac{2}{j + 1}$ .

We end with a short calculation:

E [C] = \sum_{i} \sum_{j < i} \frac{2}{j + 1} = O (\sum_{i} \log i) = O (n \log n) .

Space complexity

The space used by quicksort depends on the version used.

The in-place version of quicksort has a space complexity of $O (log n)$ , even in the worst case, when it is carefully implemented using the following strategies.

In-place partitioning is used. This unstable partition requires $O (1)$ space.
After partitioning, the partition with the fewest elements is (recursively) sorted first, requiring at most $O (log n)$ space. Then the other partition is sorted using tail recursion or iteration, which doesn't add to the call stack. This idea, as discussed above, was described by R. Sedgewick, and keeps the stack depth bounded by $O (log n)$ .

Quicksort with in-place and unstable partitioning uses only constant additional space before making any recursive call. Quicksort must store a constant amount of information for each nested recursive call. Since the best case makes at most $O (log n)$ nested recursive calls, it uses $O (log n)$ space. However, without Sedgewick's trick to limit the recursive calls, in the worst case quicksort could make $O (n)$ nested recursive calls and need $O (n)$ auxiliary space.

From a bit complexity viewpoint, variables such as lo and hi do not use constant space; it takes $O (log n)$ bits to index into a list of $n$ items. Because there are such variables in every stack frame, quicksort using Sedgewick's trick requires $O ((log n) 2)$ bits of space. This space requirement isn't too terrible, though, since if the list contained distinct elements, it would need at least $O (n log n)$ bits of space.

Another, less common, not-in-place, version of quicksort uses $O (n)$ space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.

Relation to other algorithms

Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive calls. The algorithms make exactly the same comparisons, but in a different order. An often desirable property of a sorting algorithm is stability – that is the order of elements that compare equal is not changed, allowing controlling order of multikey tables (e.g. directory or folder listings) in a natural way. This property is hard to maintain for in-place quicksort (that uses only constant additional space for pointers and buffers, and $O (log n)$ additional space for the management of explicit or implicit recursion). For variant quicksorts involving extra memory due to representations using pointers (e.g. lists or trees) or files (effectively lists), it is trivial to maintain stability. The more complex, or disk-bound, data structures tend to increase time cost, in general making increasing use of virtual memory or disk.

The most direct competitor of quicksort is heapsort. Heapsort has the advantages of simplicity, and a worst case run time of $O (n log n)$ , but heapsort's average running time is usually considered slower than in-place quicksort, primarily due to its worse locality of reference.^[27] This result is debatable; some publications indicate the opposite. The main disadvantage of quicksort is the implementation complexity required to avoid bad pivot choices and the resultant $O (n 2)$ performance. Introsort is a variant of quicksort which solves this problem by switching to heapsort when a bad case is detected. Major programming languages, such as C++ (in the GNU and LLVM implementations), use introsort.

Quicksort also competes with merge sort, another $O (n log n)$ sorting algorithm. Merge sort's main advantages are that it is a stable sort and has excellent worst-case performance. The main disadvantage of merge sort is that it is an out-of-place algorithm, so when operating on arrays, efficient implementations require $O (n)$ auxiliary space (vs. $O (log n)$ for quicksort with in-place partitioning and tail recursion, or $O (1)$ for heapsort).

Merge sort works very well on linked lists, requiring only a small, constant amount of auxiliary storage. Although quicksort can be implemented as a stable sort using linked lists, there is no reason to; it will often suffer from poor pivot choices without random access, and is essentially always inferior to merge sort. Merge sort is also the algorithm of choice for external sorting of very large data sets stored on slow-to-access media such as disk storage or network-attached storage.

Bucket sort with two buckets is very similar to quicksort; the pivot in this case is effectively the value in the middle of the value range, which does well on average for uniformly distributed inputs.

Selection-based pivoting

A selection algorithm chooses the $k$ th smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, and is accordingly known as quickselect. The difference is that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist that contains the desired element. This change lowers the average complexity to linear or $O (n)$ time, which is optimal for selection, but the selection algorithm is still $O (n 2)$ in the worst case.

A variant of quickselect, the median of medians algorithm, chooses pivots more carefully, ensuring that the pivots are near the middle of the data (between the 30th and 70th percentiles), and thus has guaranteed linear time – $O (n)$ . This same pivot strategy can be used to construct a variant of quicksort (median of medians quicksort) with $O (n log n)$ time. However, the overhead of choosing the pivot is significant, so this is generally not used in practice.

More abstractly, given an $O (n)$ selection algorithm, one can use it to find the ideal pivot (the median) at every step of quicksort and thus produce a sorting algorithm with $O (n log n)$ running time. Practical implementations of this variant are considerably slower on average, but they are of theoretical interest because they show an optimal selection algorithm can yield an optimal sorting algorithm.

Variants

Multi-pivot quicksort

Instead of partitioning into two subarrays using a single pivot, multi-pivot quicksort (also multiquicksort) partitions its input into some $s$ number of subarrays using $s - 1$ pivots. While the dual-pivot case ( $s = 3$ ) was considered by Sedgewick and others already in the mid-1970s, the resulting algorithms were not faster in practice than the "classical" quicksort. A 1999 assessment of a multiquicksort with a variable number of pivots, tuned to make efficient use of processor caches, found it to increase the instruction count by some 20%, but simulation results suggested that it would be more efficient on very large inputs. A version of dual-pivot quicksort developed by Yaroslavskiy in 2009 turned out to be fast enough to warrant implementation in Java 7, as the standard algorithm to sort arrays of primitives (sorting arrays of objects is done using Timsort). The performance benefit of this algorithm was subsequently found to be mostly related to cache performance, and experimental results indicate that the three-pivot variant may perform even better on modern machines.

External quicksort

For disk files, an external sort based on partitioning similar to quicksort is possible. It is slower than external merge sort, but doesn't require extra disk space. 4 buffers are used, 2 for input, 2 for output. Let $N =$ number of records in the file, $B =$ the number of records per buffer, and $M = N / B =$ the number of buffer segments in the file. Data is read (and written) from both ends of the file inwards. Let $X$ represent the segments that start at the beginning of the file and $Y$ represent segments that start at the end of the file. Data is read into the $X$ and $Y$ read buffers. A pivot record is chosen and the records in the $X$ and $Y$ buffers other than the pivot record are copied to the $X$ write buffer in ascending order and $Y$ write buffer in descending order based comparison with the pivot record. Once either $X$ or $Y$ buffer is filled, it is written to the file and the next $X$ or $Y$ buffer is read from the file. The process continues until all segments are read and one write buffer remains. If that buffer is an $X$ write buffer, the pivot record is appended to it and the $X$ buffer written. If that buffer is a $Y$ write buffer, the pivot record is prepended to the $Y$ buffer and the $Y$ buffer written. This constitutes one partition step of the file, and the file is now composed of two subfiles. The start and end positions of each subfile are pushed/popped to a stand-alone stack or the main stack via recursion. To limit stack space to $O (\log_{2} (n))$ , the smaller subfile is processed first. For a stand-alone stack, push the larger subfile parameters onto the stack, iterate on the smaller subfile. For recursion, recurse on the smaller subfile first, then iterate to handle the larger subfile. Once a sub-file is less than or equal to 4 B records, the subfile is sorted in-place via quicksort and written. That subfile is now sorted and in place in the file. The process is continued until all sub-files are sorted and in place. The average number of passes on the file is approximately $\frac{1 + \ln (N + 1)}{(4 B)}$ , but worst case pattern is $N$ passes (equivalent to $O (n^{2})$ for worst case internal sort).

Three-way radix quicksort

This algorithm is a combination of radix sort and quicksort. Pick an element from the array (the pivot) and consider the first character (key) of the string (multikey). Partition the remaining elements into three sets: those whose corresponding character is less than, equal to, and greater than the pivot's character. Recursively sort the "less than" and "greater than" partitions on the same character. Recursively sort the "equal to" partition by the next character (key). Given we sort using bytes or words of length $W$ bits, the best case is $O (KN)$ and the worst case $O (2 K N)$ or at least $O (N 2)$ as for standard quicksort, given for unique keys $N <2 K$ , and $K$ is a hidden constant in all standard comparison sort algorithms including quicksort. This is a kind of three-way quicksort in which the middle partition represents a (trivially) sorted subarray of elements that are exactly equal to the pivot.

Quick radix sort

Also developed by Powers as an $O (K)$ parallel PRAM algorithm. This is again a combination of radix sort and quicksort but the quicksort left/right partition decision is made on successive bits of the key, and is thus $O (KN)$ for $N$ $K$ -bit keys. All comparison sort algorithms implicitly assume the transdichotomous model with $K$ in $Θ (log N)$ , as if $K$ is smaller we can sort in $O (N)$ time using a hash table or integer sorting. If $K ≫ log N$ but elements are unique within $O (log N)$ bits, the remaining bits will not be looked at by either quicksort or quick radix sort. Failing that, all comparison sorting algorithms will also have the same overhead of looking through $O (K)$ relatively useless bits but quick radix sort will avoid the worst case $O (N 2)$ behaviours of standard quicksort and radix quicksort, and will be faster even in the best case of those comparison algorithms under these conditions of $uniqueprefix(K) ≫ log N$ . See Powers for further discussion of the hidden overheads in comparison, radix and parallel sorting.

BlockQuicksort

In any comparison-based sorting algorithm, minimizing the number of comparisons requires maximizing the amount of information gained from each comparison, meaning that the comparison results are unpredictable. This causes frequent branch mispredictions, limiting performance. BlockQuicksort rearranges the computations of quicksort to convert unpredictable branches to data dependencies. When partitioning, the input is divided into moderate-sized blocks (which fit easily into the data cache), and two arrays are filled with the positions of elements to swap. (To avoid conditional branches, the position is unconditionally stored at the end of the array, and the index of the end is incremented if a swap is needed.) A second pass exchanges the elements at the positions indicated in the arrays. Both loops have only one conditional branch, a test for termination, which is usually taken.

The BlockQuicksort technique is incorporated into LLVM's C++ STL implementation, libcxx, providing a 50% improvement on random integer sequences. Pattern-defeating quicksort (pdqsort), a version of introsort, also incorporates this technique.

Partial and incremental quicksort

Several variants of quicksort exist that separate the $k$ smallest or largest elements from the rest of the input.

Generalization

Richard Cole and David C. Kandathil, in 2004, discovered a one-parameter family of sorting algorithms, called partition sorts, which on average (with all input orderings equally likely) perform at most

n \log n + O (n)

comparisons (close to the information theoretic lower bound) and

Θ (n \log n)

operations; at worst they perform

Θ (n \log^{2} n)

comparisons (and also operations); these are in-place, requiring only additional

O (\log n)

space. Practical efficiency and smaller variance in performance were demonstrated against optimised quicksorts (of Sedgewick and Bentley-McIlroy).

Sunday, December 15, 2024

Galactic year

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Galactic_year

Approximate orbit of the Sun (yellow circle) around the Galactic Center

The galactic year, also known as a cosmic year, is the duration of time required for the Sun to orbit once around the center of the Milky Way Galaxy. One galactic year is approximately 225 million Earth years. The Solar System is traveling at an average speed of 230 km/s (828,000 km/h) or 143 mi/s (514,000 mph) within its trajectory around the Galactic Center, a speed at which an object could circumnavigate the Earth's equator in 2 minutes and 54 seconds; that speed corresponds to approximately 1/1300 of the speed of light.

The galactic year provides a conveniently usable unit for depicting cosmic and geological time periods together. By contrast, a "billion-year" scale does not allow for useful discrimination between geologic events, and a "million-year" scale requires some rather large numbers.

Timeline of the universe and Earth's history in galactic years

The following list assumes that 1 galactic year is 225 million years.

Time		Event
Galactic years (gal)	Millions of years (Ma)	Event
Past (years ago)
About 61.32 gal		Big Bang
About 54 gal		Birth of the Milky Way
20.44 gal		Birth of the Sun
17–18 gal	3937 Ma	Oceans appear on Earth
16.889 gal	3800 Ma	Life begins on Earth
15.555 gal	3500 Ma	Prokaryotes appear
12 gal	2700 Ma	Bacteria appear
10 gal	2250 Ma	Eukaryian period first appearance of eukaryotes Stable continents appear
6.8 gal	1530 Ma	Multicellular organisms appear
2.4 gal	540 Ma	Cambrian explosion occurs
2 gal	500 Ma	The first brain structure appears in worms
1.11 gal	250 Ma	Permian–Triassic extinction event
0.2933 gal		Cretaceous–Paleogene extinction event
0.0013 gal		Emergence of anatomically modern humans
Future (years from now)
0.15 gal		Mean time between impacts of asteroidal bodies in the order of magnitude of the K/Pg impactor has elapsed.
1 gal		All the continents on Earth may fuse into a supercontinent. Three potential arrangements of this configuration have been dubbed Amasia, Novopangaea, and Pangaea Proxima.
2–3 gal		Tidal acceleration moves the Moon far enough from Earth that total solar eclipses are no longer possible
4 gal		Carbon dioxide levels fall to the point at which C4 photosynthesis is no longer possible. Multicellular life dies out
15 gal		Surface conditions on Earth are comparable to those on Venus today
22 gal		The Milky Way and Andromeda Galaxy begin to collide
25 gal		Sun ejects a planetary nebula, leaving behind a white dwarf
30 gal		The Milky Way and Andromeda complete their merger into a giant elliptical galaxy called Milkomeda or Milkdromeda
500 gal		The Universe's expansion causes all galaxies beyond the Milky Way's Local Group to disappear beyond the cosmic light horizon, removing them from the observable universe
2000 gal		Local Group of 47 galaxies coalesces into a single large galaxy

Visualization of the orbit of the Sun (yellow dot and white curve) around the Galactic Center (GC) in the last galactic year. The red dots correspond to the positions of the stars studied by the European Southern Observatory in a monitoring program.

Luminiferous aether

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Luminiferous_aether

Luminiferous aether or ether (luminiferous meaning 'light-bearing') was the postulated medium for the propagation of light. It was invoked to explain the ability of the apparently wave-based light to propagate through empty space (a vacuum), something that waves should not be able to do. The assumption of a spatial plenum (space completely filled with matter) of luminiferous aether, rather than a spatial vacuum, provided the theoretical medium that was required by wave theories of light.

The aether hypothesis was the topic of considerable debate throughout its history, as it required the existence of an invisible and infinite material with no interaction with physical objects. As the nature of light was explored, especially in the 19th century, the physical qualities required of an aether became increasingly contradictory. By the late 19th century, the existence of the aether was being questioned, although there was no physical theory to replace it.

The negative outcome of the Michelson–Morley experiment (1887) suggested that the aether did not exist, a finding that was confirmed in subsequent experiments through the 1920s. This led to considerable theoretical work to explain the propagation of light without an aether. A major breakthrough was the special theory of relativity, which could explain why the experiment failed to see aether, but was more broadly interpreted to suggest that it was not needed. The Michelson–Morley experiment, along with the blackbody radiator and photoelectric effect, was a key experiment in the development of modern physics, which includes both relativity and quantum theory, the latter of which explains the particle-like nature of light.

The history of light and aether

Particles vs. waves

In the 17th century, Robert Boyle was a proponent of an aether hypothesis. According to Boyle, the aether consists of subtle particles, one sort of which explains the absence of vacuum and the mechanical interactions between bodies, and the other sort of which explains phenomena such as magnetism (and possibly gravity) that are, otherwise, inexplicable on the basis of purely mechanical interactions of macroscopic bodies, "though in the ether of the ancients there was nothing taken notice of but a diffused and very subtle substance; yet we are at present content to allow that there is always in the air a swarm of streams moving in a determinate course between the north pole and the south".

Christiaan Huygens's Treatise on Light (1690) hypothesized that light is a wave propagating through an aether. He and Isaac Newton could only envision light waves as being longitudinal, propagating like sound and other mechanical waves in fluids. However, longitudinal waves necessarily have only one form for a given propagation direction, rather than two polarizations like a transverse wave. Thus, longitudinal waves can not explain birefringence, in which two polarizations of light are refracted differently by a crystal. In addition, Newton rejected light as waves in a medium because such a medium would have to extend everywhere in space, and would thereby "disturb and retard the Motions of those great Bodies" (the planets and comets) and thus "as it [light's medium] is of no use, and hinders the Operation of Nature, and makes her languish, so there is no evidence for its Existence, and therefore it ought to be rejected".

Isaac Newton contended that light is made up of numerous small particles. This can explain such features as light's ability to travel in straight lines and reflect off surfaces. Newton imagined light particles as non-spherical "corpuscles", with different "sides" that give rise to birefringence. But the particle theory of light can not satisfactorily explain refraction and diffraction. To explain refraction, Newton's Third Book of Opticks (1st ed. 1704, 4th ed. 1730) postulated an "aethereal medium" transmitting vibrations faster than light, by which light, when overtaken, is put into "Fits of easy Reflexion and easy Transmission", which caused refraction and diffraction. Newton believed that these vibrations were related to heat radiation:

Is not the Heat of the warm Room convey'd through the vacuum by the Vibrations of a much subtiler Medium than Air, which after the Air was drawn out remained in the Vacuum? And is not this Medium the same with that Medium by which Light is refracted and reflected, and by whose Vibrations Light communicates Heat to Bodies, and is put into Fits of easy Reflexion and easy Transmission?

In contrast to the modern understanding that heat radiation and light are both electromagnetic radiation, Newton viewed heat and light as two different phenomena. He believed heat vibrations to be excited "when a Ray of Light falls upon the Surface of any pellucid Body". He wrote, "I do not know what this Aether is", but that if it consists of particles then they must be

exceedingly smaller than those of Air, or even than those of Light: The exceeding smallness of its Particles may contribute to the greatness of the force by which those Particles may recede from one another, and thereby make that Medium exceedingly more rare and elastic than Air, and by consequence exceedingly less able to resist the motions of Projectiles, and exceedingly more able to press upon gross Bodies, by endeavoring to expand itself.

Bradley suggests particles

In 1720, James Bradley carried out a series of experiments attempting to measure stellar parallax by taking measurements of stars at different times of the year. As the Earth moves around the Sun, the apparent angle to a given distant spot changes. By measuring those angles the distance to the star can be calculated based on the known orbital circumference of the Earth around the Sun. He failed to detect any parallax, thereby placing a lower limit on the distance to stars.

During these experiments, Bradley also discovered a related effect; the apparent positions of the stars did change over the year, but not as expected. Instead of the apparent angle being maximized when the Earth was at either end of its orbit with respect to the star, the angle was maximized when the Earth was at its fastest sideways velocity with respect to the star. This effect is now known as stellar aberration.

Bradley explained this effect in the context of Newton's corpuscular theory of light, by showing that the aberration angle was given by simple vector addition of the Earth's orbital velocity and the velocity of the corpuscles of light, just as vertically falling raindrops strike a moving object at an angle. Knowing the Earth's velocity and the aberration angle enabled him to estimate the speed of light.

Explaining stellar aberration in the context of an aether-based theory of light was regarded as more problematic. As the aberration relied on relative velocities, and the measured velocity was dependent on the motion of the Earth, the aether had to be remaining stationary with respect to the star as the Earth moved through it. This meant that the Earth could travel through the aether, a physical medium, with no apparent effect – precisely the problem that led Newton to reject a wave model in the first place.

Wave-theory triumphs

A century later, Thomas Young and Augustin-Jean Fresnel revived the wave theory of light when they pointed out that light could be a transverse wave rather than a longitudinal wave; the polarization of a transverse wave (like Newton's "sides" of light) could explain birefringence, and in the wake of a series of experiments on diffraction the particle model of Newton was finally abandoned. Physicists assumed, moreover, that, like mechanical waves, light waves required a medium for propagation, and thus required Huygens's idea of an aether "gas" permeating all space.

However, a transverse wave apparently required the propagating medium to behave as a solid, as opposed to a fluid. The idea of a solid that did not interact with other matter seemed a bit odd, and Augustin-Louis Cauchy suggested that perhaps there was some sort of "dragging", or "entrainment", but this made the aberration measurements difficult to understand. He also suggested that the absence of longitudinal waves suggested that the aether had negative compressibility. George Green pointed out that such a fluid would be unstable. George Gabriel Stokes became a champion of the entrainment interpretation, developing a model in which the aether might, like pine pitch, be dilatant (fluid at slow speeds and rigid at fast speeds). Thus the Earth could move through it fairly freely, but it would be rigid enough to support light.

Electromagnetism

In 1856, Wilhelm Eduard Weber and Rudolf Kohlrausch measured the numerical value of the ratio of the electrostatic unit of charge to the electromagnetic unit of charge. They found that the ratio between the electrostatic unit of charge and the electromagnetic unit of charge is the speed of light c. The following year, Gustav Kirchhoff wrote a paper in which he showed that the speed of a signal along an electric wire was equal to the speed of light. These are the first recorded historical links between the speed of light and electromagnetic phenomena.

James Clerk Maxwell began working on Michael Faraday's lines of force. In his 1861 paper On Physical Lines of Force he modelled these magnetic lines of force using a sea of molecular vortices that he considered to be partly made of aether and partly made of ordinary matter. He derived expressions for the dielectric constant and the magnetic permeability in terms of the transverse elasticity and the density of this elastic medium. He then equated the ratio of the dielectric constant to the magnetic permeability with a suitably adapted version of Weber and Kohlrausch's result of 1856, and he substituted this result into Newton's equation for the speed of sound. On obtaining a value that was close to the speed of light as measured by Hippolyte Fizeau, Maxwell concluded that light consists in undulations of the same medium that is the cause of electric and magnetic phenomena.

Maxwell had, however, expressed some uncertainties surrounding the precise nature of his molecular vortices and so he began to embark on a purely dynamical approach to the problem. He wrote another paper in 1864, entitled "A Dynamical Theory of the Electromagnetic Field", in which the details of the luminiferous medium were less explicit. Although Maxwell did not explicitly mention the sea of molecular vortices, his derivation of Ampère's circuital law was carried over from the 1861 paper and he used a dynamical approach involving rotational motion within the electromagnetic field which he likened to the action of flywheels. Using this approach to justify the electromotive force equation (the precursor of the Lorentz force equation), he derived a wave equation from a set of eight equations which appeared in the paper and which included the electromotive force equation and Ampère's circuital law. Maxwell once again used the experimental results of Weber and Kohlrausch to show that this wave equation represented an electromagnetic wave that propagates at the speed of light, hence supporting the view that light is a form of electromagnetic radiation.

In 1887–1889, Heinrich Hertz experimentally demonstrated the electric magnetic waves are identical to light waves. This unification of electromagnetic wave and optics indicated that there was a single luminiferous aether instead of many different kinds of aether media.

The apparent need for a propagation medium for such Hertzian waves (later called radio waves) can be seen by the fact that they consist of orthogonal electric (E) and magnetic (B or H) waves. The E waves consist of undulating dipolar electric fields, and all such dipoles appeared to require separated and opposite electric charges. Electric charge is an inextricable property of matter, so it appeared that some form of matter was required to provide the alternating current that would seem to have to exist at any point along the propagation path of the wave. Propagation of waves in a true vacuum would imply the existence of electric fields without associated electric charge, or of electric charge without associated matter. Albeit compatible with Maxwell's equations, electromagnetic induction of electric fields could not be demonstrated in vacuum, because all methods of detecting electric fields required electrically charged matter.

In addition, Maxwell's equations required that all electromagnetic waves in vacuum propagate at a fixed speed, c. As this can only occur in one reference frame in Newtonian physics (see Galilean relativity), the aether was hypothesized as the absolute and unique frame of reference in which Maxwell's equations hold. That is, the aether must be "still" universally, otherwise c would vary along with any variations that might occur in its supportive medium. Maxwell himself proposed several mechanical models of aether based on wheels and gears, and George Francis FitzGerald even constructed a working model of one of them. These models had to agree with the fact that the electromagnetic waves are transverse but never longitudinal.

Problems

By this point the mechanical qualities of the aether had become more and more magical: it had to be a fluid in order to fill space, but one that was millions of times more rigid than steel in order to support the high frequencies of light waves. It also had to be massless and without viscosity, otherwise it would visibly affect the orbits of planets. Additionally it appeared it had to be completely transparent, non-dispersive, incompressible, and continuous at a very small scale. Maxwell wrote in Encyclopædia Britannica:

Aethers were invented for the planets to swim in, to constitute electric atmospheres and magnetic effluvia, to convey sensations from one part of our bodies to another, and so on, until all space had been filled three or four times over with aethers. ... The only aether which has survived is that which was invented by Huygens to explain the propagation of light.

By the early 20th century, aether theory was in trouble. A series of increasingly complex experiments had been carried out in the late 19th century to try to detect the motion of the Earth through the aether, and had failed to do so. A range of proposed aether-dragging theories could explain the null result but these were more complex, and tended to use arbitrary-looking coefficients and physical assumptions. Lorentz and FitzGerald offered within the framework of Lorentz ether theory a more elegant solution to how the motion of an absolute aether could be undetectable (length contraction), but if their equations were correct, the new special theory of relativity (1905) could generate the same mathematics without referring to an aether at all. Aether fell to Occam's Razor.

Relative motion between the Earth and aether

Aether drag

The two most important models, which were aimed to describe the relative motion of the Earth and aether, were Augustin-Jean Fresnel's (1818) model of the (nearly) stationary aether including a partial aether drag determined by Fresnel's dragging coefficient, and George Gabriel Stokes' (1844) model of complete aether drag. The latter theory was not considered as correct, since it was not compatible with the aberration of light, and the auxiliary hypotheses developed to explain this problem were not convincing. Also, subsequent experiments as the Sagnac effect (1913) also showed that this model is untenable. However, the most important experiment supporting Fresnel's theory was Fizeau's 1851 experimental confirmation of Fresnel's 1818 prediction that a medium with refractive index n moving with a velocity v would increase the speed of light travelling through the medium in the same direction as v from c/n to:

\frac{c}{n} + (1 - \frac{1}{n^{2}}) v .

That is, movement adds only a fraction of the medium's velocity to the light (predicted by Fresnel in order to make Snell's law work in all frames of reference, consistent with stellar aberration). This was initially interpreted to mean that the medium drags the aether along, with a portion of the medium's velocity, but that understanding became very problematic after Wilhelm Veltmann demonstrated that the index n in Fresnel's formula depended upon the wavelength of light, so that the aether could not be moving at a wavelength-independent speed. This implied that there must be a separate aether for each of the infinitely many frequencies.

Negative aether-drift experiments

The key difficulty with Fresnel's aether hypothesis arose from the juxtaposition of the two well-established theories of Newtonian dynamics and Maxwell's electromagnetism. Under a Galilean transformation the equations of Newtonian dynamics are invariant, whereas those of electromagnetism are not. Basically this means that while physics should remain the same in non-accelerated experiments, light would not follow the same rules because it is travelling in the universal "aether frame". Some effect caused by this difference should be detectable.

A simple example concerns the model on which aether was originally built: sound. The speed of propagation for mechanical waves, the speed of sound, is defined by the mechanical properties of the medium. Sound travels 4.3 times faster in water than in air. This explains why a person hearing an explosion underwater and quickly surfacing can hear it again as the slower travelling sound arrives through the air. Similarly, a traveller on an airliner can still carry on a conversation with another traveller because the sound of words is travelling along with the air inside the aircraft. This effect is basic to all Newtonian dynamics, which says that everything from sound to the trajectory of a thrown baseball should all remain the same in the aircraft flying (at least at a constant speed) as if still sitting on the ground. This is the basis of the Galilean transformation, and the concept of frame of reference.

But the same was not supposed to be true for light, since Maxwell's mathematics demanded a single universal speed for the propagation of light, based, not on local conditions, but on two measured properties, the permittivity and permeability of free space, that were assumed to be the same throughout the universe. If these numbers did change, there should be noticeable effects in the sky; stars in different directions would have different colours, for instance.

Thus at any point there should be one special coordinate system, "at rest relative to the aether". Maxwell noted in the late 1870s that detecting motion relative to this aether should be easy enough—light travelling along with the motion of the Earth would have a different speed than light travelling backward, as they would both be moving against the unmoving aether. Even if the aether had an overall universal flow, changes in position during the day/night cycle, or over the span of seasons, should allow the drift to be detected.

First-order experiments

Although the aether is almost stationary according to Fresnel, his theory predicts a positive outcome of aether drift experiments only to second order in $v / c$ because Fresnel's dragging coefficient would cause a negative outcome of all optical experiments capable of measuring effects to first order in $v / c$ . This was confirmed by the following first-order experiments, all of which gave negative results. The following list is based on the description of Wilhelm Wien (1898), with changes and additional experiments according to the descriptions of Edmund Taylor Whittaker (1910) and Jakob Laub (1910):

The experiment of François Arago (1810), to confirm whether refraction, and thus the aberration of light, is influenced by Earth's motion. Similar experiments were conducted by George Biddell Airy (1871) by means of a telescope filled with water, and Éleuthère Mascart (1872).
The experiment of Fizeau (1860), to find whether the rotation of the polarization plane through glass columns is changed by Earth's motion. He obtained a positive result, but Lorentz could show that the results have been contradictory. DeWitt Bristol Brace (1905) and Strasser (1907) repeated the experiment with improved accuracy, and obtained negative results.
The experiment of Martin Hoek (1868). This experiment is a more precise variation of the Fizeau experiment (1851). Two light rays were sent in opposite directions – one of them traverses a path filled with resting water, the other one follows a path through air. In agreement with Fresnel's dragging coefficient, he obtained a negative result.
The experiment of Wilhelm Klinkerfues (1870) investigated whether an influence of Earth's motion on the absorption line of sodium exists. He obtained a positive result, but this was shown to be an experimental error, because a repetition of the experiment by Haga (1901) gave a negative result.
The experiment of Ketteler (1872), in which two rays of an interferometer were sent in opposite directions through two mutually inclined tubes filled with water. No change of the interference fringes occurred. Later, Mascart (1872) showed that the interference fringes of polarized light in calcite remained uninfluenced as well.
The experiment of Éleuthère Mascart (1872) to find a change of rotation of the polarization plane in quartz. No change of rotation was found when the light rays had the direction of Earth's motion and then the opposite direction. Lord Rayleigh conducted similar experiments with improved accuracy, and obtained a negative result as well.

Besides those optical experiments, also electrodynamic first-order experiments were conducted, which should have led to positive results according to Fresnel. However, Hendrik Antoon Lorentz (1895) modified Fresnel's theory and showed that those experiments can be explained by a stationary aether as well:

The experiment of Wilhelm Röntgen (1888), to find whether a charged capacitor produces magnetic forces due to Earth's motion.
The experiment of Theodor des Coudres (1889), to find whether the inductive effect of two wire rolls upon a third one is influenced by the direction of Earth's motion. Lorentz showed that this effect is cancelled to first order by the electrostatic charge (produced by Earth's motion) upon the conductors.
The experiment of Königsberger (1905). The plates of a capacitor are located in the field of a strong electromagnet. Due to Earth's motion, the plates should have become charged. No such effect was observed.
The experiment of Frederick Thomas Trouton (1902). A capacitor was brought parallel to Earth's motion, and it was assumed that momentum is produced when the capacitor is charged. The negative result can be explained by Lorentz's theory, according to which the electromagnetic momentum compensates the momentum due to Earth's motion. Lorentz could also show, that the sensitivity of the apparatus was much too low to observe such an effect.

Second-order experiments

While the first-order experiments could be explained by a modified stationary aether, more precise second-order experiments were expected to give positive results. However, no such results could be found.

The famous Michelson–Morley experiment compared the source light with itself after being sent in different directions and looked for changes in phase in a manner that could be measured with extremely high accuracy. In this experiment, their goal was to determine the velocity of the Earth through the aether. The publication of their result in 1887, the null result, was the first clear demonstration that something was seriously wrong with the aether hypothesis (Michelson's first experiment in 1881 was not entirely conclusive). In this case the MM experiment yielded a shift of the fringing pattern of about 0.01 of a fringe, corresponding to a small velocity. However, it was incompatible with the expected aether wind effect due to the Earth's (seasonally varying) velocity which would have required a shift of 0.4 of a fringe, and the error was small enough that the value may have indeed been zero. Therefore, the null hypothesis, the hypothesis that there was no aether wind, could not be rejected. More modern experiments have since reduced the possible value to a number very close to zero, about 10⁻¹⁷.

It is obvious from what has gone before that it would be hopeless to attempt to solve the question of the motion of the solar system by observations of optical phenomena at the surface of the earth.
— A. Michelson and E. Morley. "On the Relative Motion of the Earth and the Luminiferous Æther". Philosophical Magazine S. 5. Vol. 24. No. 151. December 1887.

A series of experiments using similar but increasingly sophisticated apparatuses all returned the null result as well. Conceptually different experiments that also attempted to detect the motion of the aether were the Trouton–Noble experiment (1903), whose objective was to detect torsion effects caused by electrostatic fields, and the experiments of Rayleigh and Brace (1902, 1904), to detect double refraction in various media. However, all of them obtained a null result, like Michelson–Morley (MM) previously did.

These "aether-wind" experiments led to a flurry of efforts to "save" aether by assigning to it ever more complex properties, and only a few scientists, like Emil Cohn or Alfred Bucherer, considered the possibility of the abandonment of the aether hypothesis. Of particular interest was the possibility of "aether entrainment" or "aether drag", which would lower the magnitude of the measurement, perhaps enough to explain the results of the Michelson–Morley experiment. However, as noted earlier, aether dragging already had problems of its own, notably aberration. In addition, the interference experiments of Lodge (1893, 1897) and Ludwig Zehnder (1895), aimed to show whether the aether is dragged by various, rotating masses, showed no aether drag. A more precise measurement was made in the Hammar experiment (1935), which ran a complete MM experiment with one of the "legs" placed between two massive lead blocks. If the aether was dragged by mass then this experiment would have been able to detect the drag caused by the lead, but again the null result was achieved. The theory was again modified, this time to suggest that the entrainment only worked for very large masses or those masses with large magnetic fields. This too was shown to be incorrect by the Michelson–Gale–Pearson experiment, which detected the Sagnac effect due to Earth's rotation (see Aether drag hypothesis).

Another completely different attempt to save "absolute" aether was made in the Lorentz–FitzGerald contraction hypothesis, which posited that everything was affected by travel through the aether. In this theory, the reason that the Michelson–Morley experiment "failed" was that the apparatus contracted in length in the direction of travel. That is, the light was being affected in the "natural" manner by its travel through the aether as predicted, but so was the apparatus itself, cancelling out any difference when measured. FitzGerald had inferred this hypothesis from a paper by Oliver Heaviside. Without referral to an aether, this physical interpretation of relativistic effects was shared by Kennedy and Thorndike in 1932 as they concluded that the interferometer's arm contracts and also the frequency of its light source "very nearly" varies in the way required by relativity.

Similarly, the Sagnac effect, observed by G. Sagnac in 1913, was immediately seen to be fully consistent with special relativity. In fact, the Michelson–Gale–Pearson experiment in 1925 was proposed specifically as a test to confirm the relativity theory, although it was also recognized that such tests, which merely measure absolute rotation, are also consistent with non-relativistic theories.

During the 1920s, the experiments pioneered by Michelson were repeated by Dayton Miller, who publicly proclaimed positive results on several occasions, although they were not large enough to be consistent with any known aether theory. However, other researchers were unable to duplicate Miller's claimed results. Over the years the experimental accuracy of such measurements has been raised by many orders of magnitude, and no trace of any violations of Lorentz invariance has been seen. (A later re-analysis of Miller's results concluded that he had underestimated the variations due to temperature.)

Since the Miller experiment and its unclear results there have been many more experimental attempts to detect the aether. Many experimenters have claimed positive results. These results have not gained much attention from mainstream science, since they contradict a large quantity of high-precision measurements, all the results of which were consistent with special relativity.

Lorentz aether theory

Between 1892 and 1904, Hendrik Lorentz developed an electron–aether theory, in which he avoided making assumptions about the aether. In his model the aether is completely motionless, and by that he meant that it could not be set in motion in the neighborhood of ponderable matter. Contrary to earlier electron models, the electromagnetic field of the aether appears as a mediator between the electrons, and changes in this field cannot propagate faster than the speed of light. A fundamental concept of Lorentz's theory in 1895 was the "theorem of corresponding states" for terms of order v/c. This theorem states that an observer moving relative to the aether makes the same observations as a resting observer, after a suitable change of variables. Lorentz noticed that it was necessary to change the space-time variables when changing frames and introduced concepts like physical length contraction (1892) to explain the Michelson–Morley experiment, and the mathematical concept of local time (1895) to explain the aberration of light and the Fizeau experiment. This resulted in the formulation of the so-called Lorentz transformation by Joseph Larmor (1897, 1900) and Lorentz (1899, 1904), whereby (it was noted by Larmor) the complete formulation of local time is accompanied by some sort of time dilation of electrons moving in the aether. As Lorentz later noted (1921, 1928), he considered the time indicated by clocks resting in the aether as "true" time, while local time was seen by him as a heuristic working hypothesis and a mathematical artifice. Therefore, Lorentz's theorem is seen by modern authors as being a mathematical transformation from a "real" system resting in the aether into a "fictitious" system in motion.

The work of Lorentz was mathematically perfected by Henri Poincaré, who formulated on many occasions the Principle of Relativity and tried to harmonize it with electrodynamics. He declared simultaneity only a convenient convention which depends on the speed of light, whereby the constancy of the speed of light would be a useful postulate for making the laws of nature as simple as possible. In 1900 and 1904 he physically interpreted Lorentz's local time as the result of clock synchronization by light signals. In June and July 1905 he declared the relativity principle a general law of nature, including gravitation. He corrected some mistakes of Lorentz and proved the Lorentz covariance of the electromagnetic equations. However, he used the notion of an aether as a perfectly undetectable medium and distinguished between apparent and real time, so most historians of science argue that he failed to invent special relativity.

End of aether

Special relativity

Aether theory was dealt another blow when the Galilean transformation and Newtonian dynamics were both modified by Albert Einstein's special theory of relativity, giving the mathematics of Lorentzian electrodynamics a new, "non-aether" context. Unlike most major shifts in scientific thought, special relativity was adopted by the scientific community remarkably quickly, consistent with Einstein's later comment that the laws of physics described by the Special Theory were "ripe for discovery" in 1905. Max Planck's early advocacy of the special theory, along with the elegant formulation given to it by Hermann Minkowski, contributed much to the rapid acceptance of special relativity among working scientists.

Einstein based his theory on Lorentz's earlier work. Instead of suggesting that the mechanical properties of objects changed with their constant-velocity motion through an undetectable aether, Einstein proposed to deduce the characteristics that any successful theory must possess in order to be consistent with the most basic and firmly established principles, independent of the existence of a hypothetical aether. He found that the Lorentz transformation must transcend its connection with Maxwell's equations, and must represent the fundamental relations between the space and time coordinates of inertial frames of reference. In this way he demonstrated that the laws of physics remained invariant as they had with the Galilean transformation, but that light was now invariant as well.

With the development of the special theory of relativity, the need to account for a single universal frame of reference had disappeared – and acceptance of the 19th-century theory of a luminiferous aether disappeared with it. For Einstein, the Lorentz transformation implied a conceptual change: that the concept of position in space or time was not absolute, but could differ depending on the observer's location and velocity.

Moreover, in another paper published the same month in 1905, Einstein made several observations on a then-thorny problem, the photoelectric effect. In this work he demonstrated that light can be considered as particles that have a "wave-like nature". Particles obviously do not need a medium to travel, and thus, neither did light. This was the first step that would lead to the full development of quantum mechanics, in which the wave-like nature and the particle-like nature of light are both considered as valid descriptions of light. A summary of Einstein's thinking about the aether hypothesis, relativity and light quanta may be found in his 1909 (originally German) lecture "The Development of Our Views on the Composition and Essence of Radiation".

Lorentz on his side continued to use the aether hypothesis. In his lectures of around 1911, he pointed out that what "the theory of relativity has to say ... can be carried out independently of what one thinks of the aether and the time". He commented that "whether there is an aether or not, electromagnetic fields certainly exist, and so also does the energy of the electrical oscillations" so that, "if we do not like the name of 'aether', we must use another word as a peg to hang all these things upon". He concluded that "one cannot deny the bearer of these concepts a certain substantiality".

Nevertheless, in 1920, Einstein gave an address at Leiden University in which he commented "More careful reflection teaches us however, that the special theory of relativity does not compel us to deny ether. We may assume the existence of an ether; only we must give up ascribing a definite state of motion to it, i.e. we must by abstraction take from it the last mechanical characteristic which Lorentz had still left it. We shall see later that this point of view, the conceivability of which I shall at once endeavour to make more intelligible by a somewhat halting comparison, is justified by the results of the general theory of relativity". He concluded his address by saying that "according to the general theory of relativity space is endowed with physical qualities; in this sense, therefore, there exists an ether. According to the general theory of relativity space without ether is unthinkable."

Other models

In later years there have been a few individuals who advocated a neo-Lorentzian approach to physics, which is Lorentzian in the sense of positing an absolute true state of rest that is undetectable and which plays no role in the predictions of the theory. (No violations of Lorentz covariance have ever been detected, despite strenuous efforts.) Hence these theories resemble the 19th century aether theories in name only. For example, the founder of quantum field theory, Paul Dirac, stated in 1951 in an article in Nature, titled "Is there an Aether?" that "we are rather forced to have an aether".However, Dirac never formulated a complete theory, and so his speculations found no acceptance by the scientific community.

Einstein's views on the aether

When Einstein was still a student in the Zurich Polytechnic in 1900, he was very interested in the idea of aether. His initial proposal of research thesis was to do an experiment to measure how fast the Earth was moving through the aether. "The velocity of a wave is proportional to the square root of the elastic forces which cause [its] propagation, and inversely proportional to the mass of the aether moved by these forces."

In 1916, after Einstein completed his foundational work on general relativity, Lorentz wrote a letter to him in which he speculated that within general relativity the aether was re-introduced. In his response Einstein wrote that one can actually speak about a "new aether", but one may not speak of motion in relation to that aether. This was further elaborated by Einstein in some semi-popular articles (1918, 1920, 1924, 1930).

In 1918, Einstein publicly alluded to that new definition for the first time. Then, in the early 1920s, in a lecture which he was invited to give at Lorentz's university in Leiden, Einstein sought to reconcile the theory of relativity with Lorentzian aether. In this lecture Einstein stressed that special relativity took away the last mechanical property of the aether: immobility. However, he continued that special relativity does not necessarily rule out the aether, because the latter can be used to give physical reality to acceleration and rotation. This concept was fully elaborated within general relativity, in which physical properties (which are partially determined by matter) are attributed to space, but no substance or state of motion can be attributed to that "aether" (by which he meant curved space-time).

In another paper of 1924, named "Concerning the Aether", Einstein argued that Newton's absolute space, in which acceleration is absolute, is the "Aether of Mechanics". And within the electromagnetic theory of Maxwell and Lorentz one can speak of the "Aether of Electrodynamics", in which the aether possesses an absolute state of motion. As regards special relativity, also in this theory acceleration is absolute as in Newton's mechanics. However, the difference from the electromagnetic aether of Maxwell and Lorentz lies in the fact that "because it was no longer possible to speak, in any absolute sense, of simultaneous states at different locations in the aether, the aether became, as it were, four-dimensional since there was no objective way of ordering its states by time alone". Now the "aether of special relativity" is still "absolute", because matter is affected by the properties of the aether, but the aether is not affected by the presence of matter. This asymmetry was solved within general relativity. Einstein explained that the "aether of general relativity" is not absolute, because matter is influenced by the aether, just as matter influences the structure of the aether.

The only similarity of this relativistic aether concept with the classical aether models lies in the presence of physical properties in space, which can be identified through geodesics. As historians such as John Stachel argue, Einstein's views on the "new aether" are not in conflict with his abandonment of the aether in 1905. As Einstein himself pointed out, no "substance" and no state of motion can be attributed to that new aether. Einstein's use of the word "aether" found little support in the scientific community, and played no role in the continuing development of modern physics.

USB dead drop

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/USB_dead_drop

A USB dead drop is a USB mass storage device installed in a public space. For example, a USB flash drive might be mounted in an outdoor brick wall and fixed in place with fast concrete. Members of the public are implicitly invited to find files, or leave files, on a dead drop by directly plugging their laptop into the wall-mounted USB stick in order to transfer data. (It is also possible to use smartphones and tablets for this purpose, by utilizing a USB on-the-go cable.) The dead drops can therefore be regarded as an anonymous, offline, peer-to-peer file sharing network. In practice, USB dead drops are more often used for social or artistic reasons, rather than practical ones.

Background and history

The Dead Drops project was conceived by Berlin-based conceptual artist Aram Bartholl, a member of New York's F.A.T. Lab art and technology collective. The first USB dead drop network of five devices was installed by Bartholl in October 2010 in Brooklyn, New York City. The name comes from the dead drop method of communication used in espionage. An unrelated system called "deadSwap", in which participants use an SMS gateway to coordinate passing USB memory sticks on to one another, was begun in Germany in 2009.

Each dead drop is typically installed without any data except two files: deaddrops-manifesto.txt, and a readme.txt file explaining the project. Although typically found in urban areas embedded in concrete or brick, installation of USB dead drops in trees and other organic structures in natural settings have also been observed. Wireless dead drops such as the 2011 PirateBox, where the user connects to a Wi-Fi hotspot with network attached storage rather than physically connecting to a USB device, have also been created.

Comparison to other types of data transfer

Some reasons to use USB dead drops are practical. They permit P2P file sharing without needing any internet or cellular connection, sharing files with another person secretly/anonymously, and they do not track any IP address or similar personally identifying information. Other benefits are more social or artistic in nature: USB dead drops are an opportunity to practice what Telecomix describes as datalove and can be seen as a way to promote off-grid data networks. Motivation for using USB dead drops has been likened to what drives people involved in geocaching, which has existed for longer and is somewhat similar in that often a set of GPS coordinates is used to locate a particular USB dead drop. Specifically, USB dead drops give the user "the thrill of discovery" in seeking out the location of the dead drop and when examining the data it contains. A QR-Code dead drop including the data in the QR code image or pointing to a decentralized storage repository would be an alternative and less risky option compared to a physical USB dead drop as long as users avoid IP address disclosure.

Potential drawbacks

Dead drops are USB-based devices, which must be connected to an upstream computer system, e.g. laptop or smartphone or similar. The act of making such a connection, to a device which is not necessarily trusted, inherently poses certain threats:

Malware: anyone can intentionally or unintentionally infect an attached computer with malware such as a trojan horse, keylogger, or unwanted firmware. This risk can be mitigated by using antivirus software, or by using a throwaway device for the act of data transfer.
Booby trap: a fake dead drop or USB Killer might be rigged to electrically damage any equipment connected to it, and/or constitute a health and safety hazard for users. This risk can be mitigated by using a USB galvanic isolation adapter, which allows data exchange while physically decoupling the two circuits. Wifi-based dead drops are not vulnerable to this threat.
Mugging: because a USB dead drop is normally in a public or quasi-public location, users may be physically attacked when they attempt to use the system, for a variety of reasons including theft of the user's devices.

Drawbacks to system infrastructure

Publicly and privately available USB dead drops give anyone (with physical access) the ability to save and transfer data anonymously and free of charge. These features are an advantages over the internet and the cellular network, which are at best quasi-anonymous and low-cost (there is always some fee associated although in certain scenarios such as government-subsidized or employer-subsidized or public-library-subsidized network access the end user may experience no direct costs). However, offline networks are vulnerable to various types of threats and disadvantages, relative to online ones:

One device at a time: Users cannot plug in to a USB dead drop if someone else is already plugged in
Removal of stored data: anyone with physical access can erase all of the data held within the USB dead drop (via file deletion or disk formatting), or make it unusable by encrypting the data of the whole drive and hiding the key (see also the related topic of ransomware).
Removal of the entire device: thieves can steal the USB drive itself.
Disclosure: anyone can disclose the location of a (formerly) private dead drop, by shadowing people that use it, and publishing coordinates in a public fashion. This impacts the anonymous nature of USB dead drops, since known drops can be filmed or otherwise observed.
Vandalism of the dead drop by physical destruction: anyone with physical access can destroy the dead drop, e.g. with pliers, a hammer, high voltage from a static field, high temperature from a blowtorch, or other physical force. Likelihood of vandalism or extraction is reduced by sealing the USB dead drop in a hole deeper than its length but this requires legitimate users to connect with a USB extender cable. Sometimes the installation of the dead-drop can itself be vandalism of the building; i.e. when a building owner destroys a dead drop placed without permission.
Exposure to the elements: dead drops tend to be exposed to rain and snow, which will presumably reduce their service lifetime.
Demolition or damage during maintenance: certain dead drop locations are limited to the lifespan of public structures. When a dead drop is embedded in a brick wall, the drop can be destroyed when the wall is destroyed. When a drop is embedded in a concrete sidewalk, the drop can be destroyed by sidewalk-related construction and maintenance. Sometimes dead drops are damaged when walls are repainted.

Prevalence

As of 2013, there were approximately 1000 USB dead drops (plus six known wifi-based dead drops). Most known USB dead drops are in the United States and Europe. As of 2016, overall dead drop infrastructure was estimated as being more than 10 terabytes of storage capacity, with the majority still located in the United States and Europe, but with growing numbers installed in the Asia-Pacific region, South America, and Africa.

Search This Blog