The timeline of significant developments in text-to-image generation is depicted below[1] . AlignDRAW was an early attempt at generating images from text but had limitations in realism. This was followed by the Text-conditional GAN, the first end-to-end system from character to pixel. While many GAN-based approaches focused on small datasets, autoregressive methods like OpenAI's DALL-E and Google's Parti tapped into larger datasets.
However, these methods were computationally intensive and had issues with sequential errors. Recently, diffusion models (DM) have risen to prominence in text-to-image synthesis, gaining significant attention both in academia and on social media.
State of Play
There are 4 main players in the Text-To-Image (TTI) space.
Midjourney 46
Stability.AI 47
Dall-E 2 48
Adobe Firefly 49
Midjourney is the leader in consistently producing high quality images but currently lack an API for use in development.