Stable diffusion

#machine-learning/gnerative-model #machine-learning/image-processing/image-synthesis

Stable Diffusion

Note

(2) 【生成式 AI】Stable Diffusion、DALL-E、Imagen 背後共同的套路 - YouTube

3 modular:01:50
1. Text Encoder
2. Generation Model: take noise vector as input.
3. Decoder
DALL-E: component 2 can be repalced with Autoregressive Model to generate low quality/ppi/compressed version.03:11
Text Encoder: GPT, BERT(too old). 文字 Encoder 比 Diffusion Modular 大小对效果影响更大。06:18
FID, 算两组 Gaussians，测距离衡量。进而衡量生成效果。09:09
Contrastive Language-lmage Pre-Training(CLIP): 使相关文字图片距离近，反之距离远。11:01
Decoder 训练的两种情况：超分辨率或 En-Decoder。第二种情况在 latent representation 上加 noise。 13:20
Midjourney 每次把 latent representation decode, 所以可以看出过程。18:48

Connected Pages

Depth

On this page

Pages mentioning this page

Recent Advances in MultiModal Large Language Models