A Medley of Potpourri

Tuesday, December 17, 2024

Generative adversarial network

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Generative_adversarial_network

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning.

The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.

GANs are similar to mimicry in evolutionary biology, with an evolutionary arms race between both networks.

Definition

Mathematical

The original GAN is defined as the following game:

Each probability space $(Ω, μ_{ref})$ defines a GAN game.
There are 2 players: generator and discriminator.
The generator's strategy set is $P (Ω)$ , the set of all probability measures $μ_{G}$ on $Ω$ .
The discriminator's strategy set is the set of Markov kernels $μ_{D} : Ω \to P [0, 1]$ , where $P [0, 1]$ is the set of probability measures on $[0, 1]$ .
The GAN game is a zero-sum game, with objective function $L (μ_{G}, μ_{D}) := E_{x \sim μ_{ref}, y \sim μ_{D} (x)} [\ln y] + E_{x \sim μ_{G}, y \sim μ_{D} (x)} [\ln (1 - y)] .$ The generator aims to minimize the objective, and the discriminator aims to maximize the objective.

The generator's task is to approach $μ_{G} \approx μ_{ref}$ , that is, to match its own output distribution as closely as possible to the reference distribution. The discriminator's task is to output a value close to 1 when the input appears to be from the reference distribution, and to output a value close to 0 when the input looks like it came from the generator distribution.

In practice

The generative network generates candidates while the discriminative network evaluates them. The contest operates in terms of data distributions. Typically, the generative network learns to map from a latent space to a data distribution of interest, while the discriminative network distinguishes candidates produced by the generator from the true data distribution. The generative network's training objective is to increase the error rate of the discriminative network (i.e., "fool" the discriminator network by producing novel candidates that the discriminator thinks are not synthesized (are part of the true data distribution)).

A known dataset serves as the initial training data for the discriminator. Training involves presenting it with samples from the training dataset until it achieves acceptable accuracy. The generator is trained based on whether it succeeds in fooling the discriminator. Typically, the generator is seeded with randomized input that is sampled from a predefined latent space (e.g. a multivariate normal distribution). Thereafter, candidates synthesized by the generator are evaluated by the discriminator. Independent backpropagation procedures are applied to both networks so that the generator produces better samples, while the discriminator becomes more skilled at flagging synthetic samples.^[9] When used for image generation, the generator is typically a deconvolutional neural network, and the discriminator is a convolutional neural network.

Relation to other statistical machine learning methods

GANs are implicit generative models, which means that they do not explicitly model the likelihood function nor provide a means for finding the latent variable corresponding to a given sample, unlike alternatives such as flow-based generative model.

Compared to fully visible belief networks such as WaveNet and PixelRNN and autoregressive models in general, GANs can generate one complete sample in one pass, rather than multiple passes through the network.

Compared to Boltzmann machines and linear ICA, there is no restriction on the type of function used by the network.

Since neural networks are universal approximators, GANs are asymptotically consistent. Variational autoencoders might be universal approximators, but it is not proven as of 2017.

Mathematical properties

Measure-theoretic considerations

This section provides some of the mathematical theory behind these methods.

In modern probability theory based on measure theory, a probability space also needs to be equipped with a σ-algebra. As a result, a more rigorous definition of the GAN game would make the following changes:

Each probability space $(Ω, B, μ_{ref})$ defines a GAN game.
The generator's strategy set is $P (Ω, B)$ , the set of all probability measures $μ_{G}$ on the measure-space $(Ω, B)$ .
The discriminator's strategy set is the set of Markov kernels $μ_{D} : (Ω, B) \to P ([0, 1], B ([0, 1]))$ , where $B ([0, 1])$ is the Borel σ-algebra on $[0, 1]$ .

Since issues of measurability never arise in practice, these will not concern us further.

Choice of the strategy set

In the most generic version of the GAN game described above, the strategy set for the discriminator contains all Markov kernels $μ_{D} : Ω \to P [0, 1]$ , and the strategy set for the generator contains arbitrary probability distributions $μ_{G}$ on $Ω$ .

However, as shown below, the optimal discriminator strategy against any $μ_{G}$ is deterministic, so there is no loss of generality in restricting the discriminator's strategies to deterministic functions $D : Ω \to [0, 1]$ . In most applications, $D$ is a deep neural network function.

As for the generator, while $μ_{G}$ could theoretically be any computable probability distribution, in practice, it is usually implemented as a pushforward: $μ_{G} = μ_{Z} \circ G^{- 1}$ . That is, start with a random variable $z \sim μ_{Z}$ , where $μ_{Z}$ is a probability distribution that is easy to compute (such as the uniform distribution, or the Gaussian distribution), then define a function $G : Ω_{Z} \to Ω$ . Then the distribution $μ_{G}$ is the distribution of $G (z)$ .

Consequently, the generator's strategy is usually defined as just $G$ , leaving $z \sim μ_{Z}$ implicit. In this formalism, the GAN game objective is $L (G, D) := E_{x \sim μ_{ref}} [\ln D (x)] + E_{z \sim μ_{Z}} [\ln (1 - D (G (z)))] .$

Generative reparametrization

The GAN architecture has two main components. One is casting optimization into a game, of form $min_{G} max_{D} L (G, D)$ , which is different from the usual kind of optimization, of form $min_{θ} L (θ)$ . The other is the decomposition of $μ_{G}$ into $μ_{Z} \circ G^{- 1}$ , which can be understood as a reparametrization trick.

To see its significance, one must compare GAN with previous methods for learning generative models, which were plagued with "intractable probabilistic computations that arise in maximum likelihood estimation and related strategies".

At the same time, Kingma and Welling and Rezende et al. developed the same idea of reparametrization into a general stochastic backpropagation method. Among its first applications was the variational autoencoder.

Move order and strategic equilibria

In the original paper, as well as most subsequent papers, it is usually assumed that the generator moves first, and the discriminator moves second, thus giving the following minimax game: $min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) := E_{x \sim μ_{ref}, y \sim μ_{D} (x)} [\ln y] + E_{x \sim μ_{G}, y \sim μ_{D} (x)} [\ln (1 - y)] .$

If both the generator's and the discriminator's strategy sets are spanned by a finite number of strategies, then by the minimax theorem, $min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) = max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D})$ that is, the move order does not matter.

However, since the strategy sets are both not finitely spanned, the minimax theorem does not apply, and the idea of an "equilibrium" becomes delicate. To wit, there are the following different concepts of equilibrium:

Equilibrium when generator moves first, and discriminator moves second: ${\hat{μ}}_{G} \in \arg min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}), {\hat{μ}}_{D} \in \arg max_{μ_{D}} L ({\hat{μ}}_{G}, μ_{D}),$
Equilibrium when discriminator moves first, and generator moves second: ${\hat{μ}}_{D} \in \arg max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}), {\hat{μ}}_{G} \in \arg min_{μ_{G}} L (μ_{G}, {\hat{μ}}_{D}),$
Nash equilibrium $({\hat{μ}}_{D}, {\hat{μ}}_{G})$ , which is stable under simultaneous move order: ${\hat{μ}}_{D} \in \arg max_{μ_{D}} L ({\hat{μ}}_{G}, μ_{D}), {\hat{μ}}_{G} \in \arg min_{μ_{G}} L (μ_{G}, {\hat{μ}}_{D})$

For general games, these equilibria do not have to agree, or even to exist. For the original GAN game, these equilibria all exist, and are all equal. However, for more general GAN games, these do not necessarily exist, or agree.

Main theorems for GAN game

The original GAN paper proved the following two theorems:

Theorem (the optimal discriminator computes the Jensen–Shannon divergence) — For any fixed generator strategy $μ_{G}$ , let the optimal reply be $D^{*} = \arg max_{D} L (μ_{G}, D)$ , then

$\begin{aligned} D^{*} (x) & = \frac{d μ_{ref}}{d (μ_{ref} + μ_{G})} \\ L (μ_{G}, D^{*}) & = 2 D_{J S} (μ_{ref}; μ_{G}) - 2 \ln 2 \end{aligned}$

where the derivative is the Radon–Nikodym derivative, and $D_{J S}$ is the Jensen–Shannon divergence.

Proof

By Jensen's inequality,

$E_{x \sim μ_{ref}, y \sim μ_{D} (x)} [\ln y] \leq E_{x \sim μ_{ref}} [\ln E_{y \sim μ_{D} (x)} [y]]$ and similarly for the other term. Therefore, the optimal reply can be deterministic, i.e. $μ_{D} (x) = δ_{D (x)}$ for some function $D : Ω \to [0, 1]$ , in which case

$L (μ_{G}, μ_{D}) := E_{x \sim μ_{ref}} [\ln D (x)] + E_{x \sim μ_{G}} [\ln (1 - D (x))] .$

To define suitable density functions, we define a base measure $μ := μ_{ref} + μ_{G}$ , which allows us to take the Radon–Nikodym derivatives

$ρ_{ref} = \frac{d μ_{ref}}{d μ} ρ_{G} = \frac{d μ_{G}}{d μ}$ with $ρ_{ref} + ρ_{G} = 1$ .

We then have

$L (μ_{G}, μ_{D}) := \int μ (d x) [ρ_{ref} (x) \ln (D (x)) + ρ_{G} (x) \ln (1 - D (x))] .$

The integrand is just the negative cross-entropy between two Bernoulli random variables with parameters $ρ_{ref} (x)$ and $D (x)$ . We can write this as $- H (ρ_{ref} (x)) - D_{K L} (ρ_{ref} (x) ∥ D (x))$ , where $H$ is the binary entropy function, so

$L (μ_{G}, μ_{D}) = - \int μ (d x) (H (ρ_{ref} (x)) + D_{K L} (ρ_{ref} (x) ∥ D (x))) .$

This means that the optimal strategy for the discriminator is $D (x) = ρ_{ref} (x)$ , with $L (μ_{G}, μ_{D}^{*}) = - \int μ (d x) H (ρ_{ref} (x)) = D_{J S} (μ_{ref} ∥ μ_{G}) - 2 \ln 2$

after routine calculation.

Interpretation: For any fixed generator strategy $μ_{G}$ , the optimal discriminator keeps track of the likelihood ratio between the reference distribution and the generator distribution: $\frac{D (x)}{1 - D (x)} = \frac{d μ_{ref}}{d μ_{G}} (x) = \frac{μ_{ref} (d x)}{μ_{G} (d x)}; D (x) = σ (\ln μ_{ref} (d x) - \ln μ_{G} (d x))$ where $σ$ is the logistic function. In particular, if the prior probability for an image $x$ to come from the reference distribution is equal to $\frac{1}{2}$ , then $D (x)$ is just the posterior probability that $x$ came from the reference distribution: $D (x) = Pr (x came from reference distribution ∣ x) .$

Theorem (the unique equilibrium point) — For any GAN game, there exists a pair $({\hat{μ}}_{D}, {\hat{μ}}_{G})$ that is both a sequential equilibrium and a Nash equilibrium:

$\begin{aligned} L ({\hat{μ}}_{G}, {\hat{μ}}_{D}) = min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) = & max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}) = - 2 \ln 2 \\ {\hat{μ}}_{D} \in \arg max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}), & {\hat{μ}}_{G} \in \arg min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) \\ {\hat{μ}}_{D} \in \arg max_{μ_{D}} L ({\hat{μ}}_{G}, μ_{D}), & {\hat{μ}}_{G} \in \arg min_{μ_{G}} L (μ_{G}, {\hat{μ}}_{D}) \\ \forall x \in Ω, {\hat{μ}}_{D} (x) = δ_{\frac{1}{2}}, & {\hat{μ}}_{G} = μ_{ref} \end{aligned}$

That is, the generator perfectly mimics the reference, and the discriminator outputs $\frac{1}{2}$ deterministically on all inputs.

Proof

From the previous proposition,

$\arg min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) = μ_{ref}; min_{μ_{G}} max_{μ_{D}} L (μ_{G}, μ_{D}) = - 2 \ln 2.$

For any fixed discriminator strategy $μ_{D}$ , any $μ_{G}$ concentrated on the set

${x ∣ E_{y \sim μ_{D} (x)} [\ln (1 - y)] = inf_{x} E_{y \sim μ_{D} (x)} [\ln (1 - y)]}$ is an optimal strategy for the generator. Thus,

$\arg max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}) = \arg max_{μ_{D}} E_{x \sim μ_{ref}, y \sim μ_{D} (x)} [\ln y] + inf_{x} E_{y \sim μ_{D} (x)} [\ln (1 - y)] .$

By Jensen's inequality, the discriminator can only improve by adopting the deterministic strategy of always playing $D (x) = E_{y \sim μ_{D} (x)} [y]$ . Therefore,

$\arg max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}) = \arg max_{D} E_{x \sim μ_{ref}} [\ln D (x)] + inf_{x} \ln (1 - D (x))$

By Jensen's inequality,

$\begin{aligned} \ln E_{x \sim μ_{ref}} [D (x)] + inf_{x} \ln (1 - D (x)) \\ = & \ln E_{x \sim μ_{ref}} [D (x)] + \ln (1 - sup_{x} D (x)) \\ = & \ln [E_{x \sim μ_{ref}} [D (x)] (1 - sup_{x} D (x))] \leq \ln [sup_{x} D (x)) (1 - sup_{x} D (x))] \leq \ln \frac{1}{4}, \end{aligned}$

with equality if $D (x) = \frac{1}{2}$ , so

$\forall x \in Ω, {\hat{μ}}_{D} (x) = δ_{\frac{1}{2}}; max_{μ_{D}} min_{μ_{G}} L (μ_{G}, μ_{D}) = - 2 \ln 2.$

Finally, to check that this is a Nash equilibrium, note that when $μ_{G} = μ_{ref}$ , we have

$L (μ_{G}, μ_{D}) := E_{x \sim μ_{ref}, y \sim μ_{D} (x)} [\ln (y (1 - y))]$ which is always maximized by $y = \frac{1}{2}$ .

When $\forall x \in Ω, μ_{D} (x) = δ_{\frac{1}{2}}$ , any strategy is optimal for the generator.

Training and evaluating GAN

Training

Unstable convergence

While the GAN game has a unique global equilibrium point when both the generator and discriminator have access to their entire strategy sets, the equilibrium is no longer guaranteed when they have a restricted strategy set.

In practice, the generator has access only to measures of form $μ_{Z} \circ G_{θ}^{- 1}$ , where $G_{θ}$ is a function computed by a neural network with parameters $θ$ , and $μ_{Z}$ is an easily sampled distribution, such as the uniform or normal distribution. Similarly, the discriminator has access only to functions of form $D_{ζ}$ , a function computed by a neural network with parameters $ζ$ . These restricted strategy sets take up a vanishingly small proportion of their entire strategy sets.

Further, even if an equilibrium still exists, it can only be found by searching in the high-dimensional space of all possible neural network functions. The standard strategy of using gradient descent to find the equilibrium often does not work for GAN, and often the game "collapses" into one of several failure modes. To improve the convergence stability, some training strategies start with an easier task, such as generating low-resolution images or simple images (one object with uniform background), and gradually increase the difficulty of the task during training. This essentially translates to applying a curriculum learning scheme.

Mode collapse

GANs often suffer from mode collapse where they fail to generalize properly, missing entire modes from the input data. For example, a GAN trained on the MNIST dataset containing many samples of each digit might only generate pictures of digit 0. This was termed "the Helvetica scenario".

One way this can happen is if the generator learns too fast compared to the discriminator. If the discriminator $D$ is held constant, then the optimal generator would only output elements of $\arg max_{x} D (x)$ . So for example, if during GAN training for generating MNIST dataset, for a few epochs, the discriminator somehow prefers the digit 0 slightly more than other digits, the generator may seize the opportunity to generate only digit 0, then be unable to escape the local minimum after the discriminator improves.

Some researchers perceive the root problem to be a weak discriminative network that fails to notice the pattern of omission, while others assign blame to a bad choice of objective function. Many solutions have been proposed, but it is still an open problem.

Even the state-of-the-art architecture, BigGAN (2019), could not avoid mode collapse. The authors resorted to "allowing collapse to occur at the later stages of training, by which time a model is sufficiently trained to achieve good results".

Two time-scale update rule

The two time-scale update rule (TTUR) is proposed to make GAN convergence more stable by making the learning rate of the generator lower than that of the discriminator. The authors argued that the generator should move slower than the discriminator, so that it does not "drive the discriminator steadily into new regions without capturing its gathered information".

They proved that a general class of games that included the GAN game, when trained under TTUR, "converges under mild assumptions to a stationary local Nash equilibrium".

They also proposed using the Adam stochastic optimization to avoid mode collapse, as well as the Fréchet inception distance for evaluating GAN performances.

Vanishing gradient

Conversely, if the discriminator learns too fast compared to the generator, then the discriminator could almost perfectly distinguish $μ_{G_{θ}}, μ_{ref}$ . In such case, the generator $G_{θ}$ could be stuck with a very high loss no matter which direction it changes its $θ$ , meaning that the gradient $\nabla_{θ} L (G_{θ}, D_{ζ})$ would be close to zero. In such case, the generator cannot learn, a case of the vanishing gradient problem.

Intuitively speaking, the discriminator is too good, and since the generator cannot take any small step (only small steps are considered in gradient descent) to improve its payoff, it does not even try.

One important method for solving this problem is the Wasserstein GAN.

Evaluation

GANs are usually evaluated by Inception score (IS), which measures how varied the generator's outputs are (as classified by an image classifier, usually Inception-v3), or Fréchet inception distance (FID), which measures how similar the generator's outputs are to a reference set (as classified by a learned image featurizer, such as Inception-v3 without its final layer). Many papers that propose new GAN architectures for image generation report how their architectures break the state of the art on FID or IS.

Another evaluation method is the Learned Perceptual Image Patch Similarity (LPIPS), which starts with a learned image featurizer $f_{θ} : Image \to R^{n}$ , and finetunes it by supervised learning on a set of $(x, x^{'}, p e r c e p t u a l d i f f e r e n c e (x, x^{'}))$ , where $x$ is an image, $x^{'}$ is a perturbed version of it, and $p e r c e p t u a l d i f f e r e n c e (x, x^{'})$ is how much they differ, as reported by human subjects. The model is finetuned so that it can approximate $‖ f_{θ} (x) - f_{θ} (x^{'}) ‖ \approx p e r c e p t u a l d i f f e r e n c e (x, x^{'})$ . This finetuned model is then used to define $LPIPS (x, x^{'}) := ‖ f_{θ} (x) - f_{θ} (x^{'}) ‖$ .

Other evaluation methods are reviewed in.

Variants

There is a veritable zoo of GAN variants. Some of the most prominent are as follows:

Conditional GAN

Conditional GANs are similar to standard GANs except they allow the model to conditionally generate samples based on additional information. For example, if we want to generate a cat face given a dog picture, we could use a conditional GAN.

The generator in a GAN game generates $μ_{G}$ , a probability distribution on the probability space $Ω$ . This leads to the idea of a conditional GAN, where instead of generating one probability distribution on $Ω$ , the generator generates a different probability distribution $μ_{G} (c)$ on $Ω$ , for each given class label $c$ .

For example, for generating images that look like ImageNet, the generator should be able to generate a picture of cat when given the class label "cat".

In the original paper, the authors noted that GAN can be trivially extended to conditional GAN by providing the labels to both the generator and the discriminator.

Concretely, the conditional GAN game is just the GAN game with class labels provided: $L (μ_{G}, D) := E_{c \sim μ_{C}, x \sim μ_{ref} (c)} [\ln D (x, c)] + E_{c \sim μ_{C}, x \sim μ_{G} (c)} [\ln (1 - D (x, c))]$ where $μ_{C}$ is a probability distribution over classes, $μ_{ref} (c)$ is the probability distribution of real images of class $c$ , and $μ_{G} (c)$ the probability distribution of images generated by the generator when given class label $c$ .

In 2017, a conditional GAN learned to generate 1000 image classes of ImageNet.

GANs with alternative architectures

The GAN game is a general framework and can be run with any reasonable parametrization of the generator $G$ and discriminator $D$ . In the original paper, the authors demonstrated it using multilayer perceptron networks and convolutional neural networks. Many alternative architectures have been tried.

Deep convolutional GAN (DCGAN): For both generator and discriminator, uses only deep networks consisting entirely of convolution-deconvolution layers, that is, fully convolutional networks.

Self-attention GAN (SAGAN): Starts with the DCGAN, then adds residually-connected standard self-attention modules to the generator and discriminator.

Variational autoencoder GAN (VAEGAN): Uses a variational autoencoder (VAE) for the generator.

Transformer GAN (TransGAN): Uses the pure transformer architecture for both the generator and discriminator, entirely devoid of convolution-deconvolution layers.

Flow-GAN: Uses flow-based generative model for the generator, allowing efficient computation of the likelihood function.

GANs with alternative objectives

Many GAN variants are merely obtained by changing the loss functions for the generator and discriminator.

Original GAN:

We recast the original GAN objective into a form more convenient for comparison: ${\begin{cases} min_{D} L_{D} (D, μ_{G}) = - E_{x \sim μ_{G}} [\ln D (x)] - E_{x \sim μ_{ref}} [\ln (1 - D (x))] \\ min_{G} L_{G} (D, μ_{G}) = - E_{x \sim μ_{G}} [\ln (1 - D (x))] \end{cases}$

Original GAN, non-saturating loss:

This objective for generator was recommended in the original paper for faster convergence. $L_{G} = E_{x \sim μ_{G}} [\ln D (x)]$ The effect of using this objective is analyzed in Section 2.2.2 of Arjovsky et al.

Original GAN, maximum likelihood:

$L_{G} = E_{x \sim μ_{G}} [(\exp \circ σ^{- 1} \circ D) (x)]$ where $σ$ is the logistic function. When the discriminator is optimal, the generator gradient is the same as in maximum likelihood estimation, even though GAN cannot perform maximum likelihood estimation itself.

Hinge loss GAN: $L_{D} = - E_{x \sim p_{ref}} [min (0, - 1 + D (x))] - E_{x \sim μ_{G}} [min (0, - 1 - D (x))]$ $L_{G} = - E_{x \sim μ_{G}} [D (x)]$ Least squares GAN: $L_{D} = E_{x \sim μ_{ref}} [(D (x) - b)^{2}] + E_{x \sim μ_{G}} [(D (x) - a)^{2}]$ $L_{G} = E_{x \sim μ_{G}} [(D (x) - c)^{2}]$ where $a, b, c$ are parameters to be chosen. The authors recommended $a = - 1, b = 1, c = 0$ .

Wasserstein GAN (WGAN)

The Wasserstein GAN modifies the GAN game at two points:

The discriminator's strategy set is the set of measurable functions of type $D : Ω \to R$ with bounded Lipschitz norm: $‖ D ‖_{L} \leq K$ , where $K$ is a fixed positive constant.
The objective is $L_{W G A N} (μ_{G}, D) := E_{x \sim μ_{G}} [D (x)] - E_{x \sim μ_{ref}} [D (x)]$

One of its purposes is to solve the problem of mode collapse (see above). The authors claim "In no experiment did we see evidence of mode collapse for the WGAN algorithm".

GANs with more than two players

Adversarial autoencoder

An adversarial autoencoder (AAE) is more autoencoder than GAN. The idea is to start with a plain autoencoder, but train a discriminator to discriminate the latent vectors from a reference distribution (often the normal distribution).

InfoGAN

In conditional GAN, the generator receives both a noise vector $z$ and a label $c$ , and produces an image $G (z, c)$ . The discriminator receives image-label pairs $(x, c)$ , and computes $D (x, c)$ .

When the training dataset is unlabeled, conditional GAN does not work directly.

The idea of InfoGAN is to decree that every latent vector in the latent space can be decomposed as $(z, c)$ : an incompressible noise part $z$ , and an informative label part $c$ , and encourage the generator to comply with the decree, by encouraging it to maximize $I (c, G (z, c))$ , the mutual information between $c$ and $G (z, c)$ , while making no demands on the mutual information $z$ between $G (z, c)$ .

Unfortunately, $I (c, G (z, c))$ is intractable in general, The key idea of InfoGAN is Variational Mutual Information Maximization: indirectly maximize it by maximizing a lower bound $\hat{I} (G, Q) = E_{z \sim μ_{Z}, c \sim μ_{C}} [\ln Q (c ∣ G (z, c))]; I (c, G (z, c)) \geq sup_{Q} \hat{I} (G, Q)$ where $Q$ ranges over all Markov kernels of type $Q : Ω_{Y} \to P (Ω_{C})$ .

The InfoGAN game is defined as follows:

Three probability spaces define an InfoGAN game:
$(Ω_{X}, μ_{ref})$ , the space of reference images.
$(Ω_{Z}, μ_{Z})$ , the fixed random noise generator.
$(Ω_{C}, μ_{C})$ , the fixed random information generator.

There are 3 players in 2 teams: generator, Q, and discriminator. The generator and Q are on one team, and the discriminator on the other team.
The objective function is $L (G, Q, D) = L_{G A N} (G, D) - λ \hat{I} (G, Q)$ where $L_{G A N} (G, D) = E_{x \sim μ_{ref},} [\ln D (x)] + E_{z \sim μ_{Z}} [\ln (1 - D (G (z, c)))]$ is the original GAN game objective, and $\hat{I} (G, Q) = E_{z \sim μ_{Z}, c \sim μ_{C}} [\ln Q (c ∣ G (z, c))]$
Generator-Q team aims to minimize the objective, and discriminator aims to maximize it: $min_{G, Q} max_{D} L (G, Q, D)$

Bidirectional GAN (BiGAN)

The standard GAN generator is a function of type $G : Ω_{Z} \to Ω_{X}$ , that is, it is a mapping from a latent space $Ω_{Z}$ to the image space $Ω_{X}$ . This can be understood as a "decoding" process, whereby every latent vector $z \in Ω_{Z}$ is a code for an image $x \in Ω_{X}$ , and the generator performs the decoding. This naturally leads to the idea of training another network that performs "encoding", creating an autoencoder out of the encoder-generator pair.

Already in the original paper, the authors noted that "Learned approximate inference can be performed by training an auxiliary network to predict $z$ given $x$ ". The bidirectional GAN architecture performs exactly this.

The BiGAN is defined as follows:

Two probability spaces define a BiGAN game:
$(Ω_{X}, μ_{X})$ , the space of reference images.
$(Ω_{Z}, μ_{Z})$ , the latent space.

There are 3 players in 2 teams: generator, encoder, and discriminator. The generator and encoder are on one team, and the discriminator on the other team.
The generator's strategies are functions $G : Ω_{Z} \to Ω_{X}$ , and the encoder's strategies are functions $E : Ω_{X} \to Ω_{Z}$ . The discriminator's strategies are functions $D : Ω_{X} \to [0, 1]$ .
The objective function is $L (G, E, D) = E_{x \sim μ_{X}} [\ln D (x, E (x))] + E_{z \sim μ_{Z}} [\ln (1 - D (G (z), z))]$
Generator-encoder team aims to minimize the objective, and discriminator aims to maximize it: $min_{G, E} max_{D} L (G, E, D)$

In the paper, they gave a more abstract definition of the objective as: $L (G, E, D) = E_{(x, z) \sim μ_{E, X}} [\ln D (x, z)] + E_{(x, z) \sim μ_{G, Z}} [\ln (1 - D (x, z))]$ where $μ_{E, X} (d x, d z) = μ_{X} (d x) \cdot δ_{E (x)} (d z)$ is the probability distribution on $Ω_{X} \times Ω_{Z}$ obtained by pushing $μ_{X}$ forward via $x \mapsto (x, E (x))$ , and $μ_{G, Z} (d x, d z) = δ_{G (z)} (d x) \cdot μ_{Z} (d z)$ is the probability distribution on $Ω_{X} \times Ω_{Z}$ obtained by pushing $μ_{Z}$ forward via $z \mapsto (G (x), z)$ .

Applications of bidirectional models include semi-supervised learning, interpretable machine learning, and neural machine translation.

CycleGAN

CycleGAN is an architecture for performing translations between two domains, such as between photos of horses and photos of zebras, or photos of night cities and photos of day cities.

The CycleGAN game is defined as follows:

There are two probability spaces $(Ω_{X}, μ_{X}), (Ω_{Y}, μ_{Y})$ , corresponding to the two domains needed for translations fore-and-back.
There are 4 players in 2 teams: generators $G_{X} : Ω_{X} \to Ω_{Y}, G_{Y} : Ω_{Y} \to Ω_{X}$ , and discriminators $D_{X} : Ω_{X} \to [0, 1], D_{Y} : Ω_{Y} \to [0, 1]$ .
The objective function is $L (G_{X}, G_{Y}, D_{X}, D_{Y}) = L_{G A N} (G_{X}, D_{X}) + L_{G A N} (G_{Y}, D_{Y}) + λ L_{c y c l e} (G_{X}, G_{Y})$
where $λ$ is a positive adjustable parameter, $L_{G A N}$ is the GAN game objective, and $L_{c y c l e}$ is the cycle consistency loss: $L_{c y c l e} (G_{X}, G_{Y}) = E_{x \sim μ_{X}} ‖ G_{X} (G_{Y} (x)) - x ‖ + E_{y \sim μ_{Y}} ‖ G_{Y} (G_{X} (y)) - y ‖$ The generators aim to minimize the objective, and the discriminators aim to maximize it: $min_{G_{X}, G_{Y}} max_{D_{X}, D_{Y}} L (G_{X}, G_{Y}, D_{X}, D_{Y})$

Unlike previous work like pix2pix, which requires paired training data, cycleGAN requires no paired data. For example, to train a pix2pix model to turn a summer scenery photo to winter scenery photo and back, the dataset must contain pairs of the same place in summer and winter, shot at the same angle; cycleGAN would only need a set of summer scenery photos, and an unrelated set of winter scenery photos.

GANs with particularly large or small scales

BigGAN

The BigGAN is essentially a self-attention GAN trained on a large scale (up to 80 million parameters) to generate large images of ImageNet (up to 512 x 512 resolution), with numerous engineering tricks to make it converge.

Invertible data augmentation

When there is insufficient training data, the reference distribution $μ_{ref}$ cannot be well-approximated by the empirical distribution given by the training dataset. In such cases, data augmentation can be applied, to allow training GAN on smaller datasets. Naïve data augmentation, however, brings its problems.

Consider the original GAN game, slightly reformulated as follows: ${\begin{cases} min_{D} L_{D} (D, μ_{G}) = - E_{x \sim μ_{ref}} [\ln D (x)] - E_{x \sim μ_{G}} [\ln (1 - D (x))] \\ min_{G} L_{G} (D, μ_{G}) = - E_{x \sim μ_{G}} [\ln (1 - D (x))] \end{cases}$ Now we use data augmentation by randomly sampling semantic-preserving transforms $T : Ω \to Ω$ and applying them to the dataset, to obtain the reformulated GAN game: ${\begin{cases} min_{D} L_{D} (D, μ_{G}) = - E_{x \sim μ_{ref}, T \sim μ_{trans}} [\ln D (T (x))] - E_{x \sim μ_{G}} [\ln (1 - D (x))] \\ min_{G} L_{G} (D, μ_{G}) = - E_{x \sim μ_{G}} [\ln (1 - D (x))] \end{cases}$ This is equivalent to a GAN game with a different distribution $μ_{ref}^{'}$ , sampled by $T (x)$ , with $x \sim μ_{ref}, T \sim μ_{trans}$ . For example, if $μ_{ref}$ is the distribution of images in ImageNet, and $μ_{trans}$ samples identity-transform with probability 0.5, and horizontal-reflection with probability 0.5, then $μ_{ref}^{'}$ is the distribution of images in ImageNet and horizontally-reflected ImageNet, combined.

The result of such training would be a generator that mimics $μ_{ref}^{'}$ . For example, it would generate images that look like they are randomly cropped, if the data augmentation uses random cropping.

The solution is to apply data augmentation to both generated and real images: ${\begin{cases} min_{D} L_{D} (D, μ_{G}) = - E_{x \sim μ_{ref}, T \sim μ_{trans}} [\ln D (T (x))] - E_{x \sim μ_{G}, T \sim μ_{trans}} [\ln (1 - D (T (x)))] \\ min_{G} L_{G} (D, μ_{G}) = - E_{x \sim μ_{G}, T \sim μ_{trans}} [\ln (1 - D (T (x)))] \end{cases}$ The authors demonstrated high-quality generation using just 100-picture-large datasets.

The StyleGAN-2-ADA paper points out a further point on data augmentation: it must be invertible. Continue with the example of generating ImageNet pictures. If the data augmentation is "randomly rotate the picture by 0, 90, 180, 270 degrees with equal probability", then there is no way for the generator to know which is the true orientation: Consider two generators $G, G^{'}$ , such that for any latent $z$ , the generated image $G (z)$ is a 90-degree rotation of $G^{'} (z)$ . They would have exactly the same expected loss, and so neither is preferred over the other.

The solution is to only use invertible data augmentation: instead of "randomly rotate the picture by 0, 90, 180, 270 degrees with equal probability", use "randomly rotate the picture by 90, 180, 270 degrees with 0.1 probability, and keep the picture as it is with 0.7 probability". This way, the generator is still rewarded to keep images oriented the same way as un-augmented ImageNet pictures.

Abstractly, the effect of randomly sampling transformations $T : Ω \to Ω$ from the distribution $μ_{trans}$ is to define a Markov kernel $K_{trans} : Ω \to P (Ω)$ . Then, the data-augmented GAN game pushes the generator to find some ${\hat{μ}}_{G} \in P (Ω)$ , such that $K_{trans} * μ_{ref} = K_{trans} * {\hat{μ}}_{G}$ where $*$ is the Markov kernel convolution. A data-augmentation method is defined to be invertible if its Markov kernel $K_{trans}$ satisfies $K_{trans} * μ = K_{trans} * μ^{'} ⟹ μ = μ^{'} \forall μ, μ^{'} \in P (Ω)$ Immediately by definition, we see that composing multiple invertible data-augmentation methods results in yet another invertible method. Also by definition, if the data-augmentation method is invertible, then using it in a GAN game does not change the optimal strategy ${\hat{μ}}_{G}$ for the generator, which is still $μ_{ref}$ .

There are two prototypical examples of invertible Markov kernels:

Discrete case: Invertible stochastic matrices, when $Ω$ is finite.

For example, if $Ω = {↑, ↓, \leftarrow, \to}$ is the set of four images of an arrow, pointing in 4 directions, and the data augmentation is "randomly rotate the picture by 90, 180, 270 degrees with probability $p$ , and keep the picture as it is with probability $(1 - 3 p)$ ", then the Markov kernel $K_{trans}$ can be represented as a stochastic matrix: $[K_{trans}] = [\begin{matrix} (1 - 3 p) & p & p & p \\ p & (1 - 3 p) & p & p \\ p & p & (1 - 3 p) & p \\ p & p & p & (1 - 3 p) \end{matrix}]$ and $K_{trans}$ is an invertible kernel iff $[K_{trans}]$ is an invertible matrix, that is, $p \neq 1 / 4$ .

Continuous case: The gaussian kernel, when $Ω = R^{n}$ for some $n \geq 1$ .

For example, if $Ω = R^{256^{2}}$ is the space of 256x256 images, and the data-augmentation method is "generate a gaussian noise $z \sim N (0, I_{256^{2}})$ , then add $ϵ z$ to the image", then $K_{trans}$ is just convolution by the density function of $N (0, ϵ^{2} I_{256^{2}})$ . This is invertible, because convolution by a gaussian is just convolution by the heat kernel, so given any $μ \in P (R^{n})$ , the convolved distribution $K_{trans} * μ$ can be obtained by heating up $R^{n}$ precisely according to $μ$ , then wait for time $ϵ^{2} / 4$ . With that, we can recover $μ$ by running the heat equation backwards in time for $ϵ^{2} / 4$ .

More examples of invertible data augmentations are found in the paper.

SinGAN

SinGAN pushes data augmentation to the limit, by using only a single image as training data and performing data augmentation on it. The GAN architecture is adapted to this training method by using a multi-scale pipeline.

The generator $G$ is decomposed into a pyramid of generators $G = G_{1} \circ G_{2} \circ \dots \circ G_{N}$ , with the lowest one generating the image $G_{N} (z_{N})$ at the lowest resolution, then the generated image is scaled up to $r (G_{N} (z_{N}))$ , and fed to the next level to generate an image $G_{N - 1} (z_{N - 1} + r (G_{N} (z_{N})))$ at a higher resolution, and so on. The discriminator is decomposed into a pyramid as well.

StyleGAN series

The StyleGAN family is a series of architectures published by Nvidia's research division.

Progressive GAN

Progressive GAN is a method for training GAN for large-scale image generation stably, by growing a GAN generator from small to large scale in a pyramidal fashion. Like SinGAN, it decomposes the generator as $G = G_{1} \circ G_{2} \circ \dots \circ G_{N}$ , and the discriminator as $D = D_{1} \circ D_{2} \circ \dots \circ D_{N}$ .

During training, at first only $G_{N}, D_{N}$ are used in a GAN game to generate 4x4 images. Then $G_{N - 1}, D_{N - 1}$ are added to reach the second stage of GAN game, to generate 8x8 images, and so on, until we reach a GAN game to generate 1024x1024 images.

To avoid shock between stages of the GAN game, each new layer is "blended in" (Figure 2 of the paper). For example, this is how the second stage GAN game starts:

Just before, the GAN game consists of the pair $G_{N}, D_{N}$ generating and discriminating 4x4 images.
Just after, the GAN game consists of the pair $((1 - α) + α \cdot G_{N - 1}) \circ u \circ G_{N}, D_{N} \circ d \circ ((1 - α) + α \cdot D_{N - 1})$ generating and discriminating 8x8 images. Here, the functions $u, d$ are image up- and down-sampling functions, and $α$ is a blend-in factor (much like an alpha in image composing) that smoothly glides from 0 to 1.

StyleGAN-1

StyleGAN-1 is designed as a combination of Progressive GAN with neural style transfer.

The key architectural choice of StyleGAN-1 is a progressive growth mechanism, similar to Progressive GAN. Each generated image starts as a constant $4 \times 4 \times 512$ array, and repeatedly passed through style blocks. Each style block applies a "style latent vector" via affine transform ("adaptive instance normalization"), similar to how neural style transfer uses Gramian matrix. It then adds noise, and normalize (subtract the mean, then divide by the variance).

At training time, usually only one style latent vector is used per image generated, but sometimes two ("mixing regularization") in order to encourage each style block to independently perform its stylization without expecting help from other style blocks (since they might receive an entirely different style latent vector).

After training, multiple style latent vectors can be fed into each style block. Those fed to the lower layers control the large-scale styles, and those fed to the higher layers control the fine-detail styles.

Style-mixing between two images $x, x^{'}$ can be performed as well. First, run a gradient descent to find $z, z^{'}$ such that $G (z) \approx x, G (z^{'}) \approx x^{'}$ . This is called "projecting an image back to style latent space". Then, $z$ can be fed to the lower style blocks, and $z^{'}$ to the higher style blocks, to generate a composite image that has the large-scale style of $x$ , and the fine-detail style of $x^{'}$ . Multiple images can also be composed this way.

StyleGAN-2

StyleGAN-2 improves upon StyleGAN-1, by using the style latent vector to transform the convolution layer's weights instead, thus solving the "blob" problem.

This was updated by the StyleGAN-2-ADA ("ADA" stands for "adaptive"), which uses invertible data augmentation as described above. It also tunes the amount of data augmentation applied by starting at zero, and gradually increasing it until an "overfitting heuristic" reaches a target level, thus the name "adaptive".

StyleGAN-3

StyleGAN-3 improves upon StyleGAN-2 by solving the "texture sticking" problem, which can be seen in the official videos. They analyzed the problem by the Nyquist–Shannon sampling theorem, and argued that the layers in the generator learned to exploit the high-frequency signal in the pixels they operate upon.

To solve this, they proposed imposing strict lowpass filters between each generator's layers, so that the generator is forced to operate on the pixels in a way faithful to the continuous signals they represent, rather than operate on them as merely discrete signals. They further imposed rotational and translational invariance by using more signal filters. The resulting StyleGAN-3 is able to solve the texture sticking problem, as well as generating images that rotate and translate smoothly.

Other uses

Other than for generative and discriminative modelling of data, GANs have been used for other things.

GANs have been used for transfer learning to enforce the alignment of the latent feature space, such as in deep reinforcement learning. This works by feeding the embeddings of the source and target task to the discriminator which tries to guess the context. The resulting loss is then (inversely) backpropagated through the encoder.

Applications

Science

Iteratively reconstruct astronomical images
Simulate gravitational lensing for dark matter research.
Model the distribution of dark matter in a particular direction in space and to predict the gravitational lensing that will occur.
Model high energy jet formation and showers through calorimeters of high-energy physics experiments.
Approximate bottlenecks in computationally expensive simulations of particle physics experiments. Applications in the context of present and proposed CERN experiments have demonstrated the potential of these methods for accelerating simulation and/or improving simulation fidelity.
Reconstruct velocity and scalar fields in turbulent flows.

GAN-generated molecules were validated experimentally in mice.

Medical

One of the major concerns in medical imaging is preserving patient privacy. Due to these reasons, researchers often face difficulties in obtaining medical images for their research purposes. GAN has been used for generating synthetic medical images, such as MRI and PET images to address this challenge.

GAN can be used to detect glaucomatous images helping the early diagnosis which is essential to avoid partial or total loss of vision.

GANs have been used to create forensic facial reconstructions of deceased historical figures.

Malicious

Concerns have been raised about the potential use of GAN-based human image synthesis for sinister purposes, e.g., to produce fake, possibly incriminating, photographs and videos. GANs can be used to generate unique, realistic profile photos of people who do not exist, in order to automate creation of fake social media profiles.

In 2019 the state of California considered and passed on October 3, 2019, the bill AB-602, which bans the use of human image synthesis technologies to make fake pornography without the consent of the people depicted, and bill AB-730, which prohibits distribution of manipulated videos of a political candidate within 60 days of an election. Both bills were authored by Assembly member Marc Berman and signed by Governor Gavin Newsom. The laws went into effect in 2020.

DARPA's Media Forensics program studies ways to counteract fake media, including fake media produced using GANs.

Fashion, art and advertising

GANs can be used to generate art; The Verge wrote in March 2019 that "The images created by GANs have become the defining look of contemporary AI art." GANs can also be used to

inpaint photographs
generate fashion models, shadows, photorealistic renders of interior design, industrial design, shoes, etc. Such networks were reported to be used by Facebook.

Some have worked with using GAN for artistic creativity, as "creative adversarial network". A GAN, trained on a set of 15,000 portraits from WikiArt from the 14th to the 19th century, created the 2018 painting Edmond de Belamy, which sold for US$432,500.

GANs were used by the video game modding community to up-scale low-resolution 2D textures in old video games by recreating them in 4k or higher resolutions via image training, and then down-sampling them to fit the game's native resolution (resembling supersampling anti-aliasing).

In 2020, Artbreeder was used to create the main antagonist in the sequel to the psychological web horror series Ben Drowned. The author would later go on to praise GAN applications for their ability to help generate assets for independent artists who are short on budget and manpower.

In May 2020, Nvidia researchers taught an AI system (termed "GameGAN") to recreate the game of Pac-Man simply by watching it being played.

In August 2019, a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment was created for neural melody generation from lyrics using conditional GAN-LSTM (refer to sources at GitHub AI Melody Generation from Lyrics).

Miscellaneous

GANs have been used to

show how an individual's appearance might change with age.
reconstruct 3D models of objects from images,
generate novel objects as 3D point clouds,
model patterns of motion in video.
inpaint missing features in maps, transfer map styles in cartography or augment street view imagery.
use feedback to generate images and replace image search systems.
visualize the effect that climate change will have on specific houses.
reconstruct an image of a person's face after listening to their voice.
produces videos of a person speaking, given only a single photo of that person.
recurrent sequence generation.

History

In 1991, Juergen Schmidhuber published "artificial curiosity", neural networks in a zero-sum game. The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. GANs can be regarded as a case where the environmental reaction is 1 or 0 depending on whether the first network's output is in a given set.

Other people had similar ideas but did not develop them similarly. An idea involving adversarial networks was published in a 2010 blog post by Olli Niemitalo. This idea was never implemented and did not involve stochasticity in the generator and thus was not a generative model. It is now known as a conditional GAN or cGAN. An idea similar to GANs was used to model animal behavior by Li, Gauci and Gross in 2013.

Another inspiration for GANs was noise-contrastive estimation, which uses the same loss function as GANs and which Goodfellow studied during his PhD in 2010–2014.

Adversarial machine learning has other uses besides generative modeling and can be applied to models other than neural networks. In control theory, adversarial learning based on neural networks was used in 2006 to train robust controllers in a game theoretic sense, by alternating the iterations between a minimizer policy, the controller, and a maximizer policy, the disturbance.

In 2017, a GAN was used for image enhancement focusing on realistic textures rather than pixel-accuracy, producing a higher image quality at high magnification. In 2017, the first faces were generated. These were exhibited in February 2018 at the Grand Palais. Faces generated by StyleGAN in 2019 drew comparisons with Deepfakes.

Synthetic media

From Wikipedia, the free encyclopedia

Synthetic media (also known as AI-generated media, media produced by generative AI, personalized media, personalized content, and colloquially as deepfakes) is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology (and often use "deepfakes" as a euphemism, e.g. "deepfakes for text" for natural-language generation; "deepfakes for voices" for neural voice cloning, etc.) Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

History

Pre-1950s

Synthetic media as a process of automated art dates back to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria designed machines capable of writing text, generating sounds, and playing music. The tradition of automaton-based entertainment flourished throughout history, with mechanical beings' seemingly magical ability to mimic human creativity often drawing crowds throughout Europe, China, India, so on. Other automated novelties such as Johann Philipp Kirnberger's "Musikalisches Würfelspiel" (Musical Dice Game) 1757 also amused audiences.

Despite the technical capabilities of these machines, however, none were capable of generating original content and were entirely dependent upon their mechanical designs.

Rise of artificial intelligence

The field of AI research was born at a workshop at Dartmouth College in 1956, begetting the rise of digital computing used as a medium of art as well as the rise of generative art. Initial experiments in AI-generated art included the Illiac Suite, a 1957 composition for string quartet which is generally agreed to be the first score composed by an electronic computer. Lejaren Hiller, in collaboration with Leonard Issacson, programmed the ILLIAC I computer at the University of Illinois at Urbana–Champaign (where both composers were professors) to generate compositional material for his String Quartet No. 4.

In 1960, Russian researcher R.Kh.Zaripov published worldwide first paper on algorithmic music composing using the "Ural-1" computer.

In 1965, inventor Ray Kurzweil premiered a piano piece created by a computer that was capable of pattern recognition in various compositions. The computer was then able to analyze and use these patterns to create novel melodies. The computer was debuted on Steve Allen's I've Got a Secret program, and stumped the hosts until film star Harry Morgan guessed Ray's secret.

Before 1989, artificial neural networks have been used to model certain aspects of creativity. Peter Todd (1989) first trained a neural network to reproduce musical melodies from a training set of musical pieces. Then he used a change algorithm to modify the network's input parameters. The network was able to randomly generate new music in a highly uncontrolled manner.

In 2014, Ian Goodfellow and his colleagues developed a new class of machine learning systems: generative adversarial networks (GAN). Two neural networks contest with each other in a game (in the sense of game theory, often but not always in the form of a zero-sum game). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proven useful for semi-supervised learning, fully supervised learning, and reinforcement learning. In a 2016 seminar, Yann LeCun described GANs as "the coolest idea in machine learning in the last twenty years".

In 2017, Google unveiled transformers, a new type of neural network architecture specialized for language modeling that enabled for rapid advancements in natural language processing. Transformers proved capable of high levels of generalization, allowing networks such as GPT-3 and Jukebox from OpenAI to synthesize text and music respectively at a level approaching humanlike ability. There have been some attempts to use GPT-3 and GPT-2 for screenplay writing, resulting in both dramatic (the Italian short film Frammenti di Anime Meccaniche, written by GPT-2) and comedic narratives (the short film Solicitors by YouTube Creator Calamity AI written by GPT-3).

Branches of synthetic media

Deepfakes

Deepfakes (a portmanteau of "deep learning" and "fake") are the most prominent form of synthetic media. Deepfakes are media productions that uses a an existing image or video and replaces the subject with someone else's likeness using artificial neural networks. They often combine and superimpose existing media onto source media using machine learning techniques known as autoencoders and generative adversarial networks (GANs). Deepfakes have garnered widespread attention for their uses in celebrity pornographic videos, revenge porn, fake news, hoaxes, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.

The term deepfakes originated around the end of 2017 from a Reddit user named "deepfakes". He, as well as others in the Reddit community r/deepfakes, shared deepfakes they created; many videos involved celebrities' faces swapped onto the bodies of actresses in pornographic videos, while non-pornographic content included many videos with actor Nicolas Cage's face swapped into various movies. In December 2017, Samantha Cole published an article about r/deepfakes in Vice that drew the first mainstream attention to deepfakes being shared in online communities. Six weeks later, Cole wrote in a follow-up article about the large increase in AI-assisted fake pornography. In February 2018, r/deepfakes was banned by Reddit for sharing involuntary pornography. Other websites have also banned the use of deepfakes for involuntary pornography, including the social media platform Twitter and the pornography site Pornhub. However, some websites have not yet banned Deepfake content, including 4chan and 8chan.

Non-pornographic deepfake content continues to grow in popularity with videos from YouTube creators such as Ctrl Shift Face and Shamook. A mobile application, Impressions, was launched for iOS in March 2020. The app provides a platform for users to deepfake celebrity faces into videos in a matter of minutes.

Image synthesis

Image synthesis is the artificial production of visual media, especially through algorithmic means. In the emerging world of synthetic media, the work of digital-image creation—once the domain of highly skilled programmers and Hollywood special-effects artists—could be automated by expert systems capable of producing realism on a vast scale. One subfield of this includes human image synthesis, which is the use of neural networks to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. The website This Person Does Not Exist showcases fully automated human image synthesis by endlessly generating images that look like facial portraits of human faces.

Audio synthesis

Beyond deepfakes and image synthesis, audio is another area where AI is used to create synthetic media. Synthesized audio will be capable of generating any conceivable sound that can be achieved through audio waveform manipulation, which might conceivably be used to generate stock audio of sound effects or simulate audio of currently imaginary things.

AI art

Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.

Artists began to create artificial intelligence art in the mid to late 20th century when the discipline was founded. Throughout its history, artificial intelligence art has raised many philosophical concerns related to the human mind, artificial beings, and what can be considered art in a human–AI collaboration. Since the 20th century, artists have used AI to create art, some of which has been exhibited in museums and won awards.

During the AI boom of the early 2020s, text-to-image models such as Midjourney, DALL-E, Stable Diffusion, and FLUX.1 became widely available to the public, allowing non-artists to quickly generate imagery with little effort. Commentary about AI art in the 2020s has often focused on issues related to copyright, deception, defamation, and its impact on more traditional artists, including technological unemployment.

There are many tools available to the artist when working with diffusion models. They can define both positive and negative prompts, but they are also afforded a choice in using (or omitting the use of) VAEs, LorAs, hypernetworks, ipadapter, and embeddings/textual inversions. Variables, including CFG, seed, steps, sampler, scheduler, denoise, upscaler, and encoder, are sometimes available for adjustment. Additional influence can be exerted during pre-inference by means of noise manipulation, while traditional post-processing techniques are frequently used post-inference. Artists can also train their own models.

In addition, procedural "rule-based" generation of images using mathematical patterns, algorithms that simulate brush strokes and other painted effects, and deep learning algorithms such as generative adversarial networks (GANs) and transformers have been developed. Several companies have released apps and websites that allow one to forego all the options mentioned entirely while solely focusing on the positive prompt. There also exist programs which transform photos into art-like images in the style of well-known sets of paintings.

There are many options, ranging from simple consumer-facing mobile apps to Jupyter notebooks and webUIs that require powerful GPUs to run effectively. Additional functionalities include "textual inversion," which refers to enabling the use of user-provided concepts (like an object or a style) learned from a few images. Novel art can then be generated from the associated word(s) (the text that has been assigned to the learned, often abstract, concept) and model extensions or fine-tuning (such as DreamBooth).

Music generation

The capacity to generate music through autonomous, non-programmable means has long been sought after since the days of Antiquity, and with developments in artificial intelligence, two particular domains have arisen:

The robotic creation of music, whether through machines playing instruments or sorting of virtual instrument notes (such as through MIDI files)
Directly generating waveforms that perfectly recreate instrumentation and human voice without the need for instruments, MIDI, or organizing premade notes.

Speech synthesis

Speech synthesis has been identified as a popular branch of synthetic media and is defined as the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.

Virtual assistants such as Siri and Alexa have the ability to turn text into audio and synthesize speech.

In 2016, Google DeepMind unveiled WaveNet, a deep generative model of raw audio waveforms that could learn to understand which waveforms best resembled human speech as well as musical instrumentation. Some projects offer real-time generations of synthetic speech using deep learning, such as 15.ai, a web application text-to-speech tool developed by an MIT research scientist.

Natural-language generation

Natural-language generation (NLG, sometimes synonymous with text synthesis) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out by a text-to-speech system. Interest in natural-language generation increased in 2019 after OpenAI unveiled GPT2, an AI system that generates text matching its input in subject and tone. GPT2 is a transformer, a deep machine learning model introduced in 2017 used primarily in the field of natural language processing (NLP).

Interactive media synthesis

AI-generated media can be used to develop a hybrid graphics system that could be used in video games, movies, and virtual reality, as well as text-based games such as AI Dungeon 2, which uses either GPT-2 or GPT-3 to allow for near-infinite possibilities that are otherwise impossible to create through traditional game development methods. Computer hardware company Nvidia has also worked on developed AI-generated video game demos, such as a model that can generate an interactive game based on non-interactive videos.

Concerns and controversies

Apart from organizational attack, political organizations and leaders are more suffered from such deep fake videos. In 2022, a deep fake was released where Ukraine president was calling for a surrender the fight against Russia. The video shows Ukrainian president telling his soldiers to lay down their arms and surrender.

Deepfakes have been used to misrepresent well-known politicians in videos. In separate videos, the face of the Argentine President Mauricio Macri has been replaced by the face of Adolf Hitler, and Angela Merkel's face has been replaced with Donald Trump's.

In June 2019, a downloadable Windows and Linux application called DeepNude was released which used neural networks, specifically generative adversarial networks, to remove clothing from images of women. The app had both a paid and unpaid version, the paid version costing $50. On June 27 the creators removed the application and refunded consumers.

The US Congress held a senate meeting discussing the widespread impacts of synthetic media, including deepfakes, describing it as having the "potential to be used to undermine national security, erode public trust in our democracy and other nefarious reasons."

In 2019, voice cloning technology was used to successfully impersonate a chief executive's voice and demand a fraudulent transfer of €220,000. The case raised concerns about the lack of encryption methods over telephones as well as the unconditional trust often given to voice and to media in general.

Starting in November 2019, multiple social media networks began banning synthetic media used for purposes of manipulation in the lead-up to the 2020 United States presidential election.

In 2024, Elon Musk shared a parody without clarifying that it’s a satire but raised his voice against AI in politics. The shared video where Kamala Harris appeared and said things she never said in real life. A few lines from the video transcription, “I, Kamala Harris, am your Democrat candidate for president because Joe Biden finally exposed his senility at the debate,” The voice says then, Kamla is a “Diversity hire” and it says she has now idea of what “the first thing about running the country”.

These are some examples among thousands of these deep fakes used by targeting celebrities, political party or organizations, business or MNCs. The potential to harm the image of such is irresistible. It will erode trust in public and private institutions, and it will be harder to maintain the trusts. Citron (2019) lists the public officials who will be most affected are, “elected officials, appointed officials, judges, juries, legislators, staffers, and agencies.” Even the private institutions are also at the verge of facing this crisis, if they have an impact on society on a grand scale. Citron (2019) further states, “religious institutions are an obvious target, as are politically engaged entities ranging from Planned Parenthood to the NRA. ” The author anticipate that deep fakes will deepen and extend the social hierarchy or class differences which gave rise to them in the first place. The major concern revolves around deep fakes is that isn’t only a matter of proving something that is wrong, it’s also a concern of proving something that is original. A recent study shows that two out three cyber security professionals noticed that deepfakes used as part of disinformation against business in 2022 which is apparently 13% increase in number from the previous year.

Potential uses and impacts

Synthetic media techniques involve generating, manipulating, and altering data to emulate creative processes on a much faster and more accurate scale. As a result, the potential uses are as wide as human creativity itself, ranging from revolutionizing the entertainment industry to accelerating the research and production of academia. The initial application has been to synchronize lip-movements to increase the engagement of normal dubbing that is growing fast with the rise of OTTs. News organizations have explored ways to use video synthesis and other synthetic media technologies to become more efficient and engaging. Potential future hazards include the use of a combination of different subfields to generate fake news, natural-language bot swarms generating trends and memes, false evidence being generated, and potentially addiction to personalized content and a retreat into AI-generated fantasy worlds within virtual reality.

Advanced text-generating bots could potentially be used to manipulate social media platforms through tactics such as astroturfing.

Deep reinforcement learning-based natural-language generators could potentially be used to create advanced chatbots that could imitate natural human speech.

One use case for natural-language generation is to generate or assist with writing novels and short stories, while other potential developments are that of stylistic editors to emulate professional writers.

Image synthesis tools may be able to streamline or even completely automate the creation of certain aspects of visual illustrations, such as animated cartoons, comic books, and political cartoons. Because the automation process takes away the need for teams of designers, artists, and others involved in the making of entertainment, costs could plunge to virtually nothing and allow for the creation of "bedroom multimedia franchises" where singular people can generate results indistinguishable from the highest budget productions for little more than the cost of running their computer. Character and scene creation tools will no longer be based on premade assets, thematic limitations, or personal skill but instead based on tweaking certain parameters and giving enough input.

A combination of speech synthesis and deepfakes has been used to automatically redub an actor's speech into multiple languages without the need for reshoots or language classes. It can also be used by companies for employee onboarding, eLearning, explainer and how-to videos.

An increase in cyberattacks has also been feared due to methods of phishing, catfishing, and social hacking being more easily automated by new technological methods.

Natural-language generation bots mixed with image synthesis networks may theoretically be used to clog search results, filling search engines with trillions of otherwise useless but legitimate-seeming blogs, websites, and marketing spam.

There has been speculation about deepfakes being used for creating digital actors for future films. Digitally constructed/altered humans have already been used in films before, and deepfakes could contribute new developments in the near future. Amateur deepfake technology has already been used to insert faces into existing films, such as the insertion of Harrison Ford's young face onto Han Solo's face in Solo: A Star Wars Story, and techniques similar to those used by deepfakes were used for the acting of Princess Leia in Rogue One.

GANs can be used to create photos of imaginary fashion models, with no need to hire a model, photographer, makeup artist, or pay for a studio and transportation. GANs can be used to create fashion advertising campaigns including more diverse groups of models, which may increase intent to buy among people resembling the models or family members. GANs can also be used to create portraits, landscapes and album covers. The ability for GANs to generate photorealistic human bodies presents a challenge to industries such as fashion modeling, which may be at heightened risk of being automated.

In 2019, Dadabots unveiled an AI-generated stream of death metal which remains ongoing with no pauses.

Musical artists and their respective brands may also conceivably be generated from scratch, including AI-generated music, videos, interviews, and promotional material. Conversely, existing music can be completely altered at will, such as changing lyrics, singers, instrumentation, and composition. In 2018, using a process by WaveNet for timbre musical transfer, researchers were able to shift entire genres from one to another. Through the use of artificial intelligence, old bands and artists may be "revived" to release new material without pause, which may even include "live" concerts and promotional images.

Neural network-powered photo manipulation has the potential to abet the behaviors of totalitarian and absolutist regimes. A sufficiently paranoid totalitarian government or community may engage in a total wipe-out of history using all manner of synthetic technologies, fabricating history and personalities as well as any evidence of their existence at all times. Even in otherwise rational and democratic societies, certain social and political groups may use synthetic to craft cultural, political, and scientific cocoons that greatly reduce or even altogether destroy the ability of the public to agree on basic objective facts. Conversely, the existence of synthetic media will be used to discredit factual news sources and scientific facts as "potentially fabricated."

Computing

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Computing

Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both hardware and software. Computing has scientific, engineering, mathematical, technological, and social aspects. Major computing disciplines include computer engineering, computer science, cybersecurity, data science, information systems, information technology, and software engineering.

The term computing is also synonymous with counting and calculating. In earlier times, it was used in reference to the action performed by mechanical computing machines, and before that, to human computers.

History

The history of computing is longer than the history of computing hardware and includes the history of methods intended for pen and paper (or for chalk and slate) with or without the aid of tables. Computing is intimately tied to the representation of numbers, though mathematical concepts necessary for computing existed before numeral systems. The earliest known tool for use in computation is the abacus, and it is thought to have been invented in Babylon circa between 2700 and 2300 BC. Abaci, of a more modern design, are still used as calculation tools today.

The first recorded proposal for using digital electronics in computing was the 1931 paper "The Use of Thyratrons for High Speed Automatic Counting of Physical Phenomena" by C. E. Wynn-Williams. Claude Shannon's 1938 paper "A Symbolic Analysis of Relay and Switching Circuits" then introduced the idea of using electronics for Boolean algebraic operations.

The concept of a field-effect transistor was proposed by Julius Edgar Lilienfeld in 1925. John Bardeen and Walter Brattain, while working under William Shockley at Bell Labs, built the first working transistor, the point-contact transistor, in 1947. In 1953, the University of Manchester built the first transistorized computer, the Manchester Baby. However, early junction transistors were relatively bulky devices that were difficult to mass-produce, which limited them to a number of specialised applications.

In 1957, Frosch and Derick were able to manufacture the first silicon dioxide field effect transistors at Bell Labs, the first transistors in which drain and source were adjacent at the surface. Subsequently, a team demonstrated a working MOSFET at Bell Labs 1960. The MOSFET made it possible to build high-density integrated circuits, leading to what is known as the computer revolution or microcomputer revolution.

Computer

A computer is a machine that manipulates data according to a set of instructions called a computer program. The program has an executable form that the computer can use directly to execute the instructions. The same program in its human-readable source code form, enables a programmer to study and develop a sequence of steps known as an algorithm. Because the instructions can be carried out in different types of computers, a single set of source instructions converts to machine instructions according to the CPU type.

The execution process carries out the instructions in a computer program. Instructions express the computations performed by the computer. They trigger sequences of simple actions on the executing machine. Those actions produce effects according to the semantics of the instructions.

Computer hardware

Computer hardware includes the physical parts of a computer, including the central processing unit, memory, and input/output. Computational logic and computer architecture are key topics in the field of computer hardware.

Computer software

Computer software, or just software, is a collection of computer programs and related data, which provides instructions to a computer. Software refers to one or more computer programs and data held in the storage of the computer. It is a set of programs, procedures, algorithms, as well as its documentation concerned with the operation of a data processing system. Program software performs the function of the program it implements, either by directly providing instructions to the computer hardware or by serving as input to another piece of software. The term was coined to contrast with the old term hardware (meaning physical devices). In contrast to hardware, software is intangible.

Software is also sometimes used in a more narrow sense, meaning application software only.

System software

System software, or systems software, is computer software designed to operate and control computer hardware, and to provide a platform for running application software. System software includes operating systems, utility software, device drivers, window systems, and firmware. Frequently used development tools such as compilers, linkers, and debuggers are classified as system software. System software and middleware manage and integrate a computer's capabilities, but typically do not directly apply them in the performance of tasks that benefit the user, unlike application software.

Application software

Application software, also known as an application or an app, is computer software designed to help the user perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software, and media players. Many application programs deal principally with documents. Apps may be bundled with the computer and its system software, or may be published separately. Some users are satisfied with the bundled apps and need never install additional applications. The system software manages the hardware and serves the application, which in turn serves the user.

Application software applies the power of a particular computing platform or system software to a particular purpose. Some apps, such as Microsoft Office, are developed in multiple versions for several different platforms; others have narrower requirements and are generally referred to by the platform they run on. For example, a geography application for Windows or an Android application for education or Linux gaming. Applications that run only on one platform and increase the desirability of that platform due to the popularity of the application, known as killer applications.

Computer network

A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow the sharing of resources and information. When at least one process in one device is able to send or receive data to or from at least one process residing in a remote device, the two devices are said to be in a network. Networks may be classified according to a wide variety of characteristics such as the medium used to transport the data, communications protocol used, scale, topology, and organizational scope.

Communications protocols define the rules and data formats for exchanging information in a computer network, and provide the basis for network programming. One well-known communications protocol is Ethernet, a hardware and link layer standard that is ubiquitous in local area networks. Another common protocol is the Internet Protocol Suite, which defines a set of protocols for internetworking, i.e. for data communication between multiple networks, host-to-host data transfer, and application-specific data transmission formats.

Computer networking is sometimes considered a sub-discipline of electrical engineering, telecommunications, computer science, information technology, or computer engineering, since it relies upon the theoretical and practical application of these disciplines.

Internet

The Internet is a global system of interconnected computer networks that use the standard Internet Protocol Suite (TCP/IP) to serve billions of users. This includes millions of private, public, academic, business, and government networks, ranging in scope from local to global. These networks are linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries an extensive range of information resources and services, such as the inter-linked hypertext documents of the World Wide Web and the infrastructure to support email.

Computer programming

Computer programming is the process of writing, testing, debugging, and maintaining the source code and documentation of computer programs. This source code is written in a programming language, which is an artificial language that is often more restrictive than natural languages, but easily translated by the computer. Programming is used to invoke some desired behavior (customization) from the machine.

Writing high-quality source code requires knowledge of both the computer science domain and the domain in which the application will be used. The highest-quality software is thus often developed by a team of domain experts, each a specialist in some area of development. However, the term programmer may apply to a range of program quality, from hacker to open source contributor to professional. It is also possible for a single programmer to do most or all of the computer programming needed to generate the proof of concept to launch a new killer application.

Computer programmer

A programmer, computer programmer, or coder is a person who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to programming may also be known as a programmer analyst. A programmer's primary computer language (C, C++, Java, Lisp, Python, etc.) is often prefixed to the above titles, and those who work in a web environment often prefix their titles with Web. The term programmer can be used to refer to a software developer, software engineer, computer scientist, or software analyst. However, members of these professions typically possess other software engineering skills, beyond programming.

Computer industry

The computer industry is made up of businesses involved in developing computer software, designing computer hardware and computer networking infrastructures, manufacturing computer components, and providing information technology services, including system administration and maintenance.

The software industry includes businesses engaged in development, maintenance, and publication of software. The industry also includes software services, such as training, documentation, and consulting.

Sub-disciplines of computing

Computer engineering

Computer engineering is a discipline that integrates several fields of electrical engineering and computer science required to develop computer hardware and software. Computer engineers usually have training in electronic engineering (or electrical engineering), software design, and hardware-software integration, rather than just software engineering or electronic engineering. Computer engineers are involved in many hardware and software aspects of computing, from the design of individual microprocessors, personal computers, and supercomputers, to circuit design. This field of engineering includes not only the design of hardware within its own domain, but also the interactions between hardware and the context in which it operates.

Software engineering

Software engineering is the application of a systematic, disciplined, and quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches. That is, the application of engineering to software. It is the act of using insights to conceive, model and scale a solution to a problem. The first reference to the term is the 1968 NATO Software Engineering Conference, and was intended to provoke thought regarding the perceived software crisis at the time. Software development, a widely used and more generic term, does not necessarily subsume the engineering paradigm. The generally accepted concepts of Software Engineering as an engineering discipline have been specified in the Guide to the Software Engineering Body of Knowledge (SWEBOK). The SWEBOK has become an internationally accepted standard in ISO/IEC TR 19759:2015.

Computer science

Computer science or computing science (abbreviated CS or Comp Sci) is the scientific and practical approach to computation and its applications. A computer scientist specializes in the theory of computation and the design of computational systems.

Its subfields can be divided into practical techniques for its implementation and application in computer systems, and purely theoretical areas. Some, such as computational complexity theory, which studies fundamental properties of computational problems, are highly abstract, while others, such as computer graphics, emphasize real-world applications. Others focus on the challenges in implementing computations. For example, programming language theory studies approaches to the description of computations, while the study of computer programming investigates the use of programming languages and complex systems. The field of human–computer interaction focuses on the challenges in making computers and computations useful, usable, and universally accessible to humans.

Cybersecurity

The field of cybersecurity pertains to the protection of computer systems and networks. This includes information and data privacy, preventing disruption of IT services and prevention of theft of and damage to hardware, software, and data.

Data science

Data science is a field that uses scientific and computing tools to extract information and insights from data, driven by the increasing volume and availability of data. Data mining, big data, statistics, machine learning and deep learning are all interwoven with data science.

Information systems

Information systems (IS) is the study of complementary networks of hardware and software (see information technology) that people and organizations use to collect, filter, process, create, and distribute data. The ACM's Computing Careers describes IS as:

"A majority of IS [degree] programs are located in business schools; however, they may have different names such as management information systems, computer information systems, or business information systems. All IS degrees combine business and computing topics, but the emphasis between technical and organizational issues varies among programs. For example, programs differ substantially in the amount of programming required."

The study of IS bridges business and computer science, using the theoretical foundations of information and computation to study various business models and related algorithmic processes within a computer science discipline. The field of Computer Information Systems (CIS) studies computers and algorithmic processes, including their principles, their software and hardware designs, their applications, and their impact on society while IS emphasizes functionality over design.

Information technology

Information technology (IT) is the application of computers and telecommunications equipment to store, retrieve, transmit, and manipulate data, often in the context of a business or other enterprise. The term is commonly used as a synonym for computers and computer networks, but also encompasses other information distribution technologies such as television and telephones. Several industries are associated with information technology, including computer hardware, software, electronics, semiconductors, internet, telecom equipment, e-commerce, and computer services.

Research and emerging technologies

DNA-based computing and quantum computing are areas of active research for both computing hardware and software, such as the development of quantum algorithms. Potential infrastructure for future technologies includes DNA origami on photolithography and quantum antennae for transferring information between ion traps. By 2011, researchers had entangled 14 qubits. Fast digital circuits, including those based on Josephson junctions and rapid single flux quantum technology, are becoming more nearly realizable with the discovery of nanoscale superconductors.

Fiber-optic and photonic (optical) devices, which already have been used to transport data over long distances, are starting to be used by data centers, along with CPU and semiconductor memory components. This allows the separation of RAM from CPU by optical interconnects. IBM has created an integrated circuit with both electronic and optical information processing in one chip. This is denoted CMOS-integrated nanophotonics (CINP). One benefit of optical interconnects is that motherboards, which formerly required a certain kind of system on a chip (SoC), can now move formerly dedicated memory and network controllers off the motherboards, spreading the controllers out onto the rack. This allows standardization of backplane interconnects and motherboards for multiple types of SoCs, which allows more timely upgrades of CPUs.

Another field of research is spintronics. Spintronics can provide computing power and storage, without heat buildup. Some research is being done on hybrid chips, which combine photonics and spintronics. There is also research ongoing on combining plasmonics, photonics, and electronics.

Cloud computing

Cloud computing is a model that allows for the use of computing resources, such as servers or applications, without the need for interaction between the owner of these resources and the end user. It is typically offered as a service, making it an example of Software as a Service, Platforms as a Service, and Infrastructure as a Service, depending on the functionality offered. Key characteristics include on-demand access, broad network access, and the capability of rapid scaling. It allows individual users or small business to benefit from economies of scale.

One area of interest in this field is its potential to support energy efficiency. Allowing thousands of instances of computation to occur on one single machine instead of thousands of individual machines could help save energy. It could also ease the transition to renewable energy source, since it would suffice to power one server farm with renewable energy, rather than millions of homes and offices.^[75]

However, this centralized computing model poses several challenges, especially in security and privacy. Current legislation does not sufficiently protect users from companies mishandling their data on company servers. This suggests potential for further legislative regulations on cloud computing and tech companies.

Quantum computing

Quantum computing is an area of research that brings together the disciplines of computer science, information theory, and quantum physics. While the idea of information as part of physics is relatively new, there appears to be a strong tie between information theory and quantum mechanics. Whereas traditional computing operates on a binary system of ones and zeros, quantum computing uses qubits. Qubits are capable of being in a superposition, i.e. in both states of one and zero, simultaneously. Thus, the value of the qubit is not between 1 and 0, but changes depending on when it is measured. This trait of qubits is known as quantum entanglement, and is the core idea of quantum computing that allows quantum computers to do large scale computations. Quantum computing is often used for scientific research in cases where traditional computers do not have the computing power to do the necessary calculations, such in molecular modeling. Large molecules and their reactions are far too complex for traditional computers to calculate, but the computational power of quantum computers could provide a tool to perform such calculations.

Search This Blog

Tuesday, December 17, 2024

Generative adversarial network

Definition

Mathematical

In practice

Relation to other statistical machine learning methods

Mathematical properties

Measure-theoretic considerations

Choice of the strategy set

Generative reparametrization

Move order and strategic equilibria

Main theorems for GAN game

Training and evaluating GAN

Training

Unstable convergence

Mode collapse

Two time-scale update rule

Vanishing gradient

Evaluation

Variants

Conditional GAN

GANs with alternative architectures

GANs with alternative objectives

Wasserstein GAN (WGAN)

GANs with more than two players

Adversarial autoencoder

InfoGAN

Bidirectional GAN (BiGAN)

CycleGAN

GANs with particularly large or small scales

BigGAN

Invertible data augmentation

SinGAN

StyleGAN series

Progressive GAN

StyleGAN-1

StyleGAN-2

StyleGAN-3

Other uses

Applications

Science

Medical

Malicious

Fashion, art and advertising

Miscellaneous

History

Synthetic media

History

Pre-1950s

Rise of artificial intelligence

Branches of synthetic media

Deepfakes

Image synthesis

Audio synthesis

AI art

Music generation

Speech synthesis

Natural-language generation

Interactive media synthesis

Concerns and controversies

Potential uses and impacts

Computing

History

Computer

Computer hardware

Computer software

System software

Application software

Computer network

Internet

Computer programming

Computer programmer

Computer industry

Sub-disciplines of computing

Computer engineering

Software engineering

Computer science

Cybersecurity

Data science