Text to Image Historical Perspective

A brief historical perspective.

Sep 22, 2023

The timeline of significant developments in text-to-image generation is depicted below^[1] . AlignDRAW was an early attempt at generating images from text but had limitations in realism. This was followed by the Text-conditional GAN, the first end-to-end system from character to pixel. While many GAN-based approaches focused on small datasets, autoregressive methods like OpenAI's DALL-E and Google's Parti tapped into larger datasets.

However, these methods were computationally intensive and had issues with sequential errors. Recently, diffusion models (DM) have risen to prominence in text-to-image synthesis, gaining significant attention both in academia and on social media.

State of Play

There are 4 main players in the Text-To-Image (TTI) space.

Midjourney 46
Stability.AI 47
Dall-E 2 48
Adobe Firefly 49
Midjourney is the leader in consistently producing high quality images but currently lack an API for use in development.

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model↩︎

Patternworks

Discussion about this post