
A gentle whisper for the model, a booming wake-up call for humanity: the very first response from the freshly published ChatGPT on November 30, 2022 made it clear to everyone: Generative AI is here! And it will change everything.
Let us dive into the wild world of genAI. Each section of this story comprises a discussion of the topic plus a curated list of resources, sometimes containing sites with more lists of resources:
20+: What is Generative AI?
95x: Generative AI history
600+: Key Technological Concepts
2,350+: Models & Mediums — Text, Image, Video, Sound, Code, etc.
350x: Application Areas, Companies, Startups
3,000+: Prompts, Prompt Engineering, & Prompt Lists
250+: Hardware, Frameworks, Approaches, Tools, & Data
300+: Achievements, Impacts on Society, AI Regulation, & Outlook
20+: What is Generative AI?
Let’s play the comparison game. If classic AI is the wise owl, generative AI is the wiser owl with a paintbrush and a knack for writing. Traditional AI can recognize, classify, and cluster, but not generate the data it is trained on. Plus, classic AI models are usually focused on a single task. Their generative sisters, on the other hand, are pre-trained on giant amounts of data from a wide range of domains. They can build up general knowledge and use it to generate almost any output in their specific medium (text, image, sound, or other).

1x: Introduction to generative AI by Monit Sharma
20+: Introduction and learning journey with resources
95x Generative AI history
Generative AI has a pretty history, with early theories emerging from Leibniz, Pascal, Babbage, Lovelace. This was even preceded by the development of so-called automatons (robots and calculating machines) of all sorts (Yan Shi, Ctesibius, Heron of Alexandria, the Banū Mūsā brothers, Ismail_al-Jazari).
The mathematical groundwork was laid in the 1940s and 1950s (Shannon, Turing, von Neumann, Wiener). The foundations for today’s generative language models were elaborated in the 1990s (LSTM, Hochreiter, Schmidhuber), and the whole field took off around 2018 (Radford, Devlin, et al.). Major milestones in the last few years comprised BERT (Google, 2018), GPT-3 (OpenAI, 2020), Dall-E (OpenAI, 2021), Stable Diffusion (Stability AI, LMU Munich, 2022), ChatGPT (OpenAI, 2022), Mixtral (Mistral, 2023), a Mixture of Experts LLM.
1x: Evolution of Generative AI
1x: Timeline of Generative AI
1x: Exciting, amazing, and sometimes a little bit spooky: Early precursors of LLMs
2x: AI timeline, and some striking data visualizations
90+: Current and past notable AI projects

600+: Key technology concepts of generative AI
300+: Deep Learning — the core of any generative AI model:
Deep learning is a central concept of traditional AI that has been adopted and further developed in generative AI. Complex ML problems can only be solved in neural networks with many layers. Incidentally, this also applies to cognitive processes and the brains of mammals (yes, that means us).

In an artificial neural network, a node represents a neuron, and a connection between nodes is a synapse, which unidirectionally transports information. Generative AI models usually have millions of neurons and billions of synapses (aka „parameters“). Current models do not use neurons built from silicon, but work with traditional computing algorithms with more or less traditional hardware (sometimes CPUs, usually GPUs/TPUs). In the code, the complete deep learning network is represented as a matrix of weights. And — yes I’m trying to finally demystify generative AI — both learning and answer generation in all the magic models like ChatGPT can ultimately be broken down to matrix multiplication — good old high school algebra. Just much, much more of it, executed at lightning speed.

More on deep learning and neural network technology:
7x: Best machine learning courses
300+: Deep learning resources: Books, courses, lectures, papers, datasets, conferences, frameworks, tools by Christos Christofidis
200+: Foundation models, pre-training, fine-tuning & prompting
Generative AI is based on a foundation models. Foundation models are huge models (billions of parameters), pre-trained on giant datasets (GB or TB of data), and capable of performing an infinite number of tasks in their domain (text or image generation). Datasets for pre-training usually comprise all genres of data in the domain. For text: scientific papers, haikus, spreadsheets, encyclopedic contents, dialogs, laws, manuals, invoices, screenplays, textbooks, or novels. The pre-trained model is comparable to a super smart and knowledgeable high school graduate with a lot of basic knowledge and the ability to understand many languages but with no specific qualification for a job. To prepare a model for a specific job, like answering questions in a support hotline for a specific product, you may use fine-tuning: an additional training with a small dataset of contents for the specific task. More often than not, you just use the prompt to specify the job, provide data for the job, and format the response.

1x: Empowering Language Models: Pre-training, Fine-Tuning, and In-Context Learning
1x: Great video intro to GPT (General Pre-Trained-Transformer) technology by Grant Sanderson of 3Blue1Brown1x: Deep dive into pre-training LLMs by Yash Bhaskar
1x: Tutorial: Fine-tune a large language model with code examples
200+: A curated list on fine-tuning resources
(see resources on prompting later in the story)
120+: Tokens, embeddings & vectors

Oh, that is not correct; it contains 89 characters including spaces and punctuation — not 83. Why does the smartest bot on earth fail on this simple counting task? A seven-year-old could do better!
ChatGPT, like any other language model, does not understand language, text, or characters. The ChatGPT model does not even get to see my prompt:
The prompt is first being split up („tokenized“) into these 19 tokens:

Common English words are not split; they are single tokens. Less common words („ChatGPT“ was not common in the training material before the release of ChatGPT) and misspelled words („inlcuding“) are comprised of two or more tokens.
Each model uses a constant vocabulary of tokens. Each token is then transformed into an embedding, a high-dimensional vector (often more than 1,000 dimensions), before the models get to see it. Embeddings represent the semantic value of a token. For semantically similar tokens like king, queen, and prince, the vectors should be close together. Similarly spelled tokens like prince, price, and prance are not close if they have no semantic similarity. The embeddings are machine-generated based on the word or token neighbours in texts, not by a human meta-explanation of what a word means. So „king“ could be both close to „throne“ and „checkmate“ based on these two contexts in English texts. After the prompt is transformed into a sequence of embeddings — high dimensional vectors representing tokens — these embeddings are fed to the language model and can then be processed.

The model doesn’t generate a full answer to these embeddings. No, no, no! It just generates (in ML lingo „predicts“) the next token. After that, it takes the embeddings of the prompt and the first predicted token and predicts the second token of its answer … and so on.

In the process of the generation of token after token, the models usually don’t know (and don’t need to know) where their own contribution to the ongoing flow of text really started. In my view, this is one of the weirdest features of the LLM technology.
More on tokens, shmokens and embeddings:
1x: Word embedding tutorial
1x: Tokenization and token production explained by Bea Stollnitz
1x: Deep dive into the LLM architecture with a particular focus on tokens and embeddings by Vijayarajan Alagumalai
1x: Dense, sparse vectors & embeddings with code examples explained by James Briggs
120+: Embeddings in comparison, MTEB
10+: The transformer architecture
Almost all relevant language models are based on a technology called the transformer architecture. It would have been a great honour and even greater pleasure for me to discuss it here. Unfortunately, any attempt to describe it has exceeded the scope of this introduction to gen AI.
1x: The key differentiator in transformer models: The attention mechanismum. Super tricky, super well explained by Grant Sanderson of 3Blue1Brown
10x: I recommend the beautifully illustrated introduction to language generation concepts (from RNN to LSTM to all the concepts in the transformer architecture) by Giuliano Giacaglia to anyone who is not afraid of a well-dosed sip of complexity.
1x: Here’s the original paper from the Google team introducing the transformer concept — read it with awe
50x: Resources to study transformers
10x: Image Generation Technology: Latent Diffusion Models / Stable Diffusion
Latent diffusion models (LDMs) like Stable Diffusion work differently from large language models. It starts with the training: While LLMs are trained on unlabelled data, LDMs are trained on text/image pairs. This allows for text prompting of image generation models.
LDMs don’t process data directly in the vast image space but first compress the images into a much smaller but perceptually equivalent space, making the model faster and more efficient.

The image-generation process is counterintuitive. It is not really drawing a visual, but taking out the noise of a random pixel distribution the model uses as a starting point. The process is like that of a sculptor — removing all the unnecessary marble to get the David statue.

More details:
1x: Latent diffusion models
1x: Rombach et al.: High-Resolution Image Synthesis with Latent Diffusion Models
1x: Ho, et al.: Denoising Diffusion Probabilistic Models
30x: Diffusion models: Image, audio, language, time Series, graphs
2350+: Models — Text, Image, Video, Sound, Code and Much More
1200+: Text — Large Language Models
Without a doubt, language is the most important application area for generative AI. And while it is raining dollars in any domain of AI, here the dollars are bigger. These are the most important LLMs:
4x OpenAI: GPT-4o: OpenAI improved the quality, doubled the speed and cut prices in half, GPT-4-turbo, GPT-3.5-turbo, ChatGPT
1x Mistral: Mixtral 8x7B — A high performing small model with Mixture-of-Experts architecture. From Paris with love.
3x: Anthropic: The Claude 3 model family — one of the best models up to date
2x Meta: Llama 2, Llama — Not very large (as measured in parameters), but high performing and open source.
1x Stanford University: Octopus-V2–2B, a super small (for a single GPU) and fast model, Alpaca — another member of the Camelidae family and based on Llama. Small as well (7B parameters).
2x: Google: Gemini, Palm 2
1x: TII (Abu Dhabi): Falcon 180B
Bloom: Bloomz, Bloom-Lora
1x: Aleph Alpha: Luminous supreme
1x: Baidu: Ernie Bot — China’s answer to ChatGPT with more than 100m registered users.
5x: Amazon: Titan models
More, more and still more LLMs:
100+: A list of major open source LLMs by Hannibal046
100+: Stanford’s HELM model list
1000+: A graphical overview of thousands of current and historic LLMs
120+: Image Generation Models and Tools
Perhaps not the most important domain of generative AI, but certainly the most enchanting.

5x: CompVis / Stability.ai: Stable diffusion 1, Stable Diffusion 2.1 — the top open source model
1x: Midjourney — love it!
1x: OpenAI: DALL-e 3 — too!
14x: Curated list of image creation models tested with the same prompt by Vinnie Wong
100+: List of image creation models and tools
15x: Code Generation Models and Tools
Code Generation tools support developers in writing, debugging, and documenting code and can be integrated into IDEs or other development tools.
1x: GitHub: CoPilot The most widely adopted code generation model
1x: OpenAI: Codex, the model behind CoPilot
1x: Tabnine — open source AI code generation
1x: Salesforce: CodeT5 — open source, and read here how to fine-tune it
1x: Meta: Code Llama based on Llama 2
1x: Google: Codey Generation, completion & code chat
10x: AI code generation models by Tracy Phillips
17+: Speech Recognition (STT / ASR), Speech Generation (TTS) Models
There are now models for both transformation processes: Speech to text and text to speech.
1x: Openai: Whisper — one of the first huge foundation models in ASR
1x: RevAI ASR — the most accurate ASR
1x: Google is in the game now with Chirp ASR
3x: Top open source speech recognition models in comparison
1x: Meta: Voicebox voice generator (open source)
10x: Best AI voice generators
15x: Music Generation Models, Tools

It is real fun to create a song just with a ten word prompt.
1x: Harmonai — Community-driven and OS production tool
1x: Mubert — A royalty-free music ecosystem
1x: MusicLM — A model by Google Research for generating high-fidelity music from text descriptions.
1x: Aiva — Generate songs in 250 styles.
1x: Suno — Took me about 50 seconds to register, write a prompt and create my first shining masterpiece of elevator music
10x: Best AI music generators
18x: Video Generation (Text to Video Models)
Similar to image generation, video generation is often based on diffusion / latent diffusion models:
1x: OpenAI: Sora, many of the first reviewers got a mild form of exophthalmos when experiencing the capabilities of this models
1x: Google: Imagen video generation from text
1x: Synthesia — Generate a video in seconds
1x: DeepBrain AI: Creates video and even the scripts to create the videos
5x: Comparison of video creation AI by Artturi Jalli
10x: And still some more models
7x: Other Generative AI Models
Generative AI can be used in completely different domains as long as there is such a thing as similarly structured content formats (such as images and texts) and a gigantic data base that can be used for pre-training.
1x: Robotics control. Google: RT-2 repository
2x: Molecule fold prediction: AlphaFold. Super interesting, here the foundation model and generative AI approach is used in a completely different domain, which has almost no touchpoints to media contents, like language or image. Startup with an application in drug creation: Absci
1x: Genomics: Building genome-scale language models (GenSLMs) by adapting large language models (LLMs) for genomic data
1x: Llemma — an open language model for mathematics
1x: AstroLLaMA — a foundation model for astronomy
1x: Antibiotics: Generative AI for designing and validating easily synthesizable and structurally novel antibiotics
1000+: GPT Store:
The GPT Store is OpenAI’s equivalent to an app store. It hosts thousands of custom GPTs based on GPT-4 and Dall-E: From personal prompt engineering tools to daily schedule assistance, presentation and logo designs, task management, step-by-step tech troubleshooting, website creation and hosting, AI insight generation, explain board and card games, digital visionary painting, text-based adventure games, etc.
The access to the GPT Store is for ChatGPT Plus users only (around $20 per month).
You can create your own GPT and offer it to other users.

10+: Autonomous Agent AIs
Agent AIs are usually not models of their own but platforms that orchestrate different models (language, image generation, etc.) to perform complex, multimodal tasks. Usually, they employ large language models to plan the task execution and the breakdown in simple steps.
1x: AgentGPT
1x: AutoGPT
10x: Intro to agent AI and overview of agents
350x: Application Areas, Companies, Startups
Generative AI start-ups are mushrooming, and many established companies are building tools and applications in this area. An XXXL-sized thank you to everyone who has made the effort to map this area.
150+: Sequoia’s market map by target group & application area:

8x: Generative AI market maps, landscapes, comparisons & timelines
100x: Top generative AI startup list by YCombinator
100x: Generative AI application areas from audit reporting to writing product descriptions
3000+: Prompts, Prompt Engineering & Prompt Lists
The prompt serves as the tool to control a model’s behaviour. Users can provide a description of the desired output to prompt most models, including those generating images, videos, or music.

Prompts can be so much more than just an instruction or question. They can comprise
- a few shot examples (showing the models how to generate the output),
- data (which the model should use to generate the output)
- a conversation history (for multi-turn conversations)
- an exact definition of an output format
- and much more
Prompt engineering is the art of generating safe, exact, successful, efficient, & robust prompts.

1x: Free prompt engineering course
10x: Overview on the best prompt engineering courses
1x: Prompt engineering cheat sheet.
40+: Prompt engineering guide with lot of single topics
100+: Awesome ChatGPT prompts which can be used for other models as well
3000+: The ChatGPT list of list containing dozens of prompt lists with thousands of prompts
250+: Hardware, Frameworks, Approaches, Tools & Data
Generative AI models are huge (require a lot of memory) and need a lot of processor resources (an incredible amount of flops executed for training and still many for a single inference). So the hardware is game-changing in gen AI:
1x: Hardware: Generative AI hardware intro
15x: Overview on deep learning hardware with links to other resources
100x: Resources on processing units — CPU, GPU, APU, TPU, VPU, FPGA, QPU
1x: For some people language processing units (LPUs) have become the latest craze in AI hardware: 10 times faster than GPU’s / TPU’s in LLM token prediction: Read about Groq’s LPU inference engine and test it.
3x: Generative AI frameworks facilitate the development of applications with language and other models: LangChain, Llamaindex, Comparison of La and Lla

1x: RAG — Retrieval augmented generation is the key approach to let LLMs run with your data: Intro
10+: Vector databases store your data in gen AI applications and make them retrievable: Intro to vector DBs and top 6 DBs, & a few more
5x: Platforms providing models, resources to use and operate them: HuggingFace, Haystack, Azure AI, Google, Amazon Bedrock
150+: More resources on generative AI tools, frameworks and other contents
300+: Generative AI Achievements, Security & Privacy, Impacts on Society, AI Regulation, and an Outlook
40+: Achievements
Generative AI models — and here mostly OpenAIs models — took the bar exam, the medical licensing exam, the verbal intelligence test with an IQ of 147, the SAT college readiness test and many more tests and exams.
30x: List of ChatGPT / GPT-4 achievements
10x: Here are some more tests gen AI passed and as well where it failed
200+: AI security, privacy, AI TRiSM, explainability, hallucination control
AI TRiSM stands for Trust, Risk, and Security Management and comprises these fields:
- hallucination control
- safety & security
- transparency & explainability (XAI)
- accountability
- risk management (AI risk = likelihood x potential effect)
- fairness (and bias)
- alignment
- privacy
More resources:
1x: OWASP AI Security and Privacy Guide
5+: AI security guidelines by Jiadong Chen
200+: More resources on AI security
25+: Impact on Society
Generative AI will have a deep impact on our society on different levels and at different time scales. Usually, we are prone to overestimate the short-term and underestimate the long-term impacts of new technologies.
1x: Macro-, meso- and micro-level impacts of generative AI
1x: Comprehensive paper on fields of impact of generative AI on systems on society
1x: The ILO on how it might affect quality and quantity of jobs
6x: Long-term impacts of AI on humantiy
15x: Catastrophic AI risks
2x: Superintelligence and why we should and how we can make sure, that future AIs are aligned to humanity’s goals

50+: AI Regulation
AI regulation will be necessary if only to define what is permitted in what form in the new fields of application, who is allowed to profit from which intellectual property and how, and who is liable for errors and damage. With its AI Act draft, the EU has started the competition for the toughest AI regulation with a bang. Many insiders hope that other legislations will take a more measured approach and ensure adaptability with current technologies (generative AI). In principle, the EU has issued a regulation that essentially addresses the capabilities of pre-generative models.
20+: The new EU AI Act and more resources
1x: US, EU & UK regulation approaches in comparison
30+: A list of the evolving AI regulation approaches around the world
1x: Outlook & the End:
As almost nobody (maybe not even the guys at OpenAI) had predicted how generative AI would take off in 2023, it is really hard to forecast how it will evolve in 2024 and the upcoming years. ZDNet’s Vala Afshar did a great job here. The best outlook for a journey into the unknown is a compilation of outlooks: An exciting overview of what the leading tech fortune tellers like IDC, Gartner, Forrester & Co. expect: Half-life? A year? A few months? Just weeks until a groundbreaking development sets us on a new trajectory again.
I am delighted to be in this journey with you! I hope that you were able to take something away from my story. I wish you many, many, many more insights and an insane amount of success in AI!
