Stable Diffusion: Latent Generative Modeling for Efficient Visual Synthesis


Andreas Blattmann

Andreas works as a Research Scientist at the London-based startup ‘StabilityAI’ and is a doctoral student at LMU Munich. He co-authored the paper which lead to the the well-known open-source Text-to-Image model ‘Stable Diffusion’ and co-developed the model as well as its successor versions. His research primarily focuses on finding suitable representations to efficiently apply generative AI on high-dimensional visual data such as images and videos. Furthermore, he is interested in combining existing probabilistic approaches and generative network architectures in an optimal way to achieve better results more efficiently. Andreas is a proponent of open source ML models.



The open source release and subordinate success of Stable Diffusion in 2022 has induced strong interests in the beforehand rather unknown approach of ‘latent generative modeling’ for high-dimensional visual data such as images and videos. By highlighting the differences between these visual data types and other domains such as text, the speaker will motivate the benefits of latent generative modeling and deduce the design choices which lead to the development of Stable Diffusion and its successor models. He will also talk about his more recent research on latent video modeling and give an outlook about oncoming projects from the vision research team at StabilityAI.

Youtube Video

Watch in detail video of this event on Youtube