A Medley of Potpourri: Generative artificial intelligence

Monday, October 16, 2023

Generative artificial intelligence

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Generative_artificial_intelligence

Generative artificial intelligence (also generative AI or GenAI) is artificial intelligence capable of generating text, images, or other media, using generative models. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

In the early 2020s, advances in transformer-based deep neural networks enabled a number of generative AI systems notable for accepting natural language prompts as input. These include large language model chatbots such as ChatGPT, Bing Chat, Bard, and LLaMA, and text-to-image artificial intelligence art systems such as Stable Diffusion, Midjourney, and DALL-E.

Generative AI has uses across a wide range of industries, including art, writing, script writing, software development, product design, healthcare, finance, gaming, marketing, and fashion.Investment in generative AI surged during the early 2020s, with large companies such as Microsoft, Google, and Baidu as well as numerous smaller firms developing generative AI models.However, there are also concerns about the potential misuse of generative AI, including cybercrime or creating fake news or deepfakes which can be used to deceive or manipulate people.

History

The academic discipline of artificial intelligence was founded at a research workshop at Dartmouth College in 1956, and has experienced several waves of advancement and optimism in the decades since. Since its founding, researchers in the field have raised philosophical and ethical arguments about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction and philosophy since antiquity. These concepts of automated art date back at least to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria were described as having designed machines capable of writing text, generating sounds, and playing music. The tradition of creative automatons has flourished throughout history, such as Maillardet's automaton, created in the early 1800s.

Since the founding of AI in the 1950s, artists and researchers have used artificial intelligence to create artistic works. By the early 1970s, Harold Cohen was creating and exhibiting generative AI works created by AARON, the computer program Cohen created to generate paintings.

Markov chains have long been used to model natural languages since their development by Russian mathematician Andrey Markov in the early 20th century. Markov published his first paper on the topic in 1906, and analyzed the pattern of vowels and consonants in the novel Eugeny Onegin using Markov chains. Once a Markov chain is learned on a text corpus, it can then be used as a probabilistic text generator.

The field of machine learning often uses statistical models, including generative models, to model and predict data. Beginning in the late 2000s, the emergence of deep learning drove progress and research in image classification, speech recognition, natural language processing and other tasks. Neural networks in this era were typically trained as discriminative models, due to the difficulty of generative modeling.

In 2014, advancements such as the variational autoencoder and generative adversarial network produced the first practical deep neural networks capable of learning generative, rather than discriminative, models of complex data such as images. These deep generative models were the first able to output not only class labels for images, but to output entire images.

In 2017, the Transformer network enabled advancements in generative models, leading to the first generative pre-trained transformer (GPT) in 2018. This was followed in 2019 by GPT-2 which demonstrated the ability to generalize unsupervised to many different tasks as a Foundation model.

In 2021, the release of DALL-E, a transformer-based pixel generative model, followed by Midjourney and Stable Diffusion marked the emergence of practical high-quality artificial intelligence art from natural language prompts.

In March 2023, GPT-4 was released. A team from Microsoft Research argued that "it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system".

Modalities

A generative AI system is constructed by applying unsupervised or self-supervised machine learning to a data set. The capabilities of a generative AI system depend on the modality or type of the data set used.

Generative AI can be either unimodal or multimodal; unimodal systems take only one type of input, whereas multimodal systems can take more than one type of input. For example, one version of OpenAI's GPT-4 accepts both text and image inputs.

Text

Generative AI systems trained on words or word tokens include GPT-3, LaMDA, LLaMA, BLOOM, GPT-4, and others (see List of large language models). They are capable of natural language processing, machine translation, and natural language generation and can be used as foundation models for other tasks. Data sets include BookCorpus, Wikipedia, and others (see List of text corpora).

Code

In addition to natural language text, large language models can be trained on programming language text, allowing them to generate source code for new computer programs. Examples include OpenAI Codex.

Images

Producing high-quality visual art is a prominent application of generative AI. Many such artistic works have received public awards and recognition.

Generative AI systems trained on sets of images with text captions include Imagen, DALL-E, Midjourney, Adobe Firefly, Stable Diffusion and others (see Artificial intelligence art, Generative art, and Synthetic media). They are commonly used for text-to-image generation and neural style transfer. Datasets include LAION-5B and others (See Datasets in computer vision).

Music

Generative AI systems such as MusicLM and MusicGen can be trained on the audio waveforms of recorded music along with text annotations, in order to generate new musical samples based on text descriptions such as a calming violin melody backed by a distorted guitar riff.

Video

Generative AI trained on annotated video can generate temporally-coherent video clips. Examples include Gen1 and Gen2 by RunwayML and Make-A-Video by Meta Platforms.

Molecules

Generative AI systems can be trained on sequences of amino acids or molecular representations such as SMILES representing DNA or proteins. These systems, such as AlphaFold, are used for protein structure prediction and drug discovery. Datasets include various biological datasets.

Robotics

Generative AI can also be trained on the motions of a robotic system to generate new trajectories for motion planning or navigation. For example, UniPi from Google Research uses prompts like "pick up blue bowl" or "wipe plate with yellow sponge" to control movements of a robot arm. Multimodal "vision-language-action" models such as Google's RT-2 can perform rudimentary reasoning in response to user prompts and visual input, such as picking up a toy dinosaur when given the prompt pick up the extinct animal at a table filled with toy animals and other objects.

Planning

The terms generative AI planning or generative planning were used in the 1980s and 1990s to refer to AI planning systems, especially computer-aided process planning, used to generate sequences of actions to reach a specified goal.

Generative AI planning systems used symbolic AI methods such as state space search and constraint satisfaction and were a "relatively mature" technology by the early 1990s. They were used to generate crisis action plans for military use, process plans for manufacturing and decision plans such as in prototype autonomous spacecraft.

Software and hardware

Generative AI models are used to power chatbot products such as ChatGPT, programming tools such as GitHub Copilot, text-to-image products such as Midjourney, and text-to-video products such as Runway Gen-2. Generative AI features have been integrated into a variety of existing commercially-available products such as Microsoft Office, Google Photos, and Adobe Photoshop. Many generative AI models are also available as open-source software, including Stable Diffusion and the LLaMA language model.

Smaller generative AI models with up to a few billion parameters can run on smartphones, embedded devices, and personal computers. For example, LLaMA-7B (a version with 7 billion parameters) can run on a Raspberry Pi 4 and one version of Stable Diffusion can run on an iPhone 11.

Larger models with tens of billions of parameters can run on laptop or desktop computers. To achieve an acceptable speed, models of this size may require accelerators such as the GPU chips produced by Nvidia and AMD or the Neural Engine included in Apple silicon products. For example, the 65 billion parameter version of LLaMA can be configured to run on a desktop PC.

Language models with hundreds of billions of parameters, such as GPT-4 or PaLM, typically run on datacenter computers equipped with arrays of GPUs (such as Nvidia's H100) or AI accelerator chips (such as Google's TPU). These very large models are typically accessed as cloud services over the Internet.

In 2022, the United States New Export Controls on Advanced Computing and Semiconductors to China imposed restrictions on exports to China of GPU and AI accelerator chips used for generative AI. Chips such as the Nvidia A800 and the Biren Technology BR104 were developed to meet the requirements of the sanctions.

Concerns

The development of generative AI has raised concerns from governments, businesses, and individuals, resulting in protests, legal actions, calls to pause AI experiments, and actions by multiple governments. In a July 2023 briefing of the United Nations Security Council, Secretary-General António Guterres stated "Generative AI has enormous potential for good and evil at scale", that AI may "turbocharge global development" and contribute between $10 and $15 trillion to the global economy by 2030, but that its malicious use "could cause horrific levels of death and destruction, widespread trauma, and deep psychological damage on an unimaginable scale".

Job losses

From the early days of the development of AI, there have been arguments put forward by ELIZA creator Joseph Weizenbaum and others about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculations and qualitative, value-based judgements. In April 2023, it was reported that image generation AI has resulted in 70% of the jobs for video game illustrators in China being lost. In July 2023, developments in generative AI contributed to the 2023 Hollywood labor disputes. Fran Drescher, president of the Screen Actors Guild, declared that "artificial intelligence poses an existential threat to creative professions" during the 2023 SAG-AFTRA strike.

Deepfakes

Deepfakes (a portmanteau of "deep learning" and "fake") are AI-generated media that take a person in an existing image or video and replace them with someone else's likeness using artificial neural networks. Deepfakes have garnered widespread attention and concerns for their uses in deepfake celebrity pornographic videos, revenge porn, fake news, hoaxes, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.

Cybercrime

Generative AI's ability to create realistic fake content has been exploited in numerous types of cybercrime, including phishing scams. Deepfake video and audio have been used to create disinformation and fraud. Former Google fraud czar Shuman Ghosemajumder has predicted that while deepfake videos initially created a stir in the media, they would soon become commonplace, and as a result, more dangerous. Cybercriminals have created large language models focused on fraud, including WormGPT and FraudGPT.

Misuse in journalism

In January 2023, Futurism.com broke the story that CNET had been using an undisclosed internal AI tool to write at least 77 of its stories; after the news broke, CNET posted corrections to 41 of the stories.

In April 2023, German tabloid Die Aktuelle published a fake AI-generated interview with former racing driver Michael Schumacher, who had not made any public appearances since 2013 after sustaining a brain injury in a skiing accident. The story included two possible disclosures: the cover included the line "deceptively real", and the interview included an acknowledgement at the end that it was AI-generated. The editor-in-chief was fired shortly thereafter amid the controversy.

Regulation

In the European Union, the proposed Artificial Intelligence Act includes requirements to disclose copyrighted material used to train generative AI systems, and to label any AI-generated output as such.

In the United States, a group of companies including OpenAI, Alphabet, and Meta signed a voluntary agreement with the White House in July 2023 to watermark AI-generated content.

In China, the Interim Measures for the Management of Generative AI Services introduced by the Cyberspace Administration of China regulates any public-facing generative AI. It includes requirements to watermark generated images or videos, regulations on training data and label quality, restrictions on personal data collection, and a guideline that generative AI must "adhere to socialist core values".

A Medley of Potpourri

Search This Blog