dalle2: hierarchical text-conditional image generation with clip
2022-08-06 07:54:18【Kun Li】
The clip is robust to changes in image distribution and can be zero-shot. The diffusion model can satisfy sample diversity and has good fidelity.dalle2 combines the good features of both models.
The picture above is very good. Based on this picture, first of all, there is a clip above the dotted line. This clip is trained in advance and will not be used again during the training of dalle2.To train clip, it is a weight lock. In the training of dalle2, the input is also a pair of data, a text pair and its corresponding image, first enter a text, and go through the text encoding module of clip (bert, clip uses vit for images)., use bert to encode text, clip is a basic contrastive learning, the encoding of two modalities is very important, after modal encoding, the cosine is directly calculated for similarity).Image vector, this image vector is actually gt.The generated text code is input into the first prior model, which is a diffusion model, and an autoregressive transformer can also be used. This diffusion model outputs a set of image vectors, which are supervised by the image vectors generated by clip.It is actually a supervised model, followed by a decoder module. In the previous dalle, the encoder and the decoder were trained together in dvae, but the deocder here is a single training and a diffusion model. In fact, under the dotted lineThe generative model is to turn a complete generation step into a two-stage explicit image generation. The author experimented with this explicit generation.This article calls itself unclip, clip is to convert input text and images into features, and dalle2 is the process of converting text features into image features and then into images. In fact, image features to images are achieved through a diffusion model.In the deocder, both the classifier-free guidence and the clip's guidence are used. This guidence refers to the process of the decoder, the input is a noisy image at time t, and the final output is an image, this noisy image.A feature map obtained by unet each time can be judged by an image classifier. Here, the cross-entropy function is generally used for a two-classification, but the gradient of image classification can be obtained, and this gradient can be used to guide the diffusion to betterdecoder.
- How to limit command length to bounce shell
- Datax3.0+DataX-Web builds distributed visual ETL system
- Original Questions for Level 5 of China Electronics Society Youth Grade Examination
- [Cloud Native--Kubernetes] Configuration Management
- Use Specification and Example to implement dynamic conditional query cases
- The origin of the name, concave language -, and moral
- [面试篇]Mysql 索引 BTree 与 B+Tree 的区别
I set the global mapping table prefix in yml, but the database does not recognize it
Simulate the realization of strcpy function (including multiple optimization ideas)
Script for reverse generation of entity class, query and other interface xml of MySQL database
js simulates the function of dynamically deleting messages
How to improve the quality of articles without being "recommended and affected" by the post assistant
CSDN official plug-in
【leetcode】8. 字符串转换整数 (atoi)
"Digital reconstruction system, CEO is the first step"
- how to jump higher
- No, no, no, it's 2022, you don't know the principle of Jmeter, right?
- 2022-08-05：以下go语言代码输出什么？A：65, string；B：A, string；C：65, int；D：报错。
- Jetpack WorkManager 看这一篇就够了~
- Parameter ‘courseId‘ not found. Available parameters are [arg1, arg0, param1, para
- LeetCode——345. 反转字符串中的元音字母
- LeetCode——1047. 删除字符串中的所有相邻重复项
- 山石发声 | 做好安全运营，没有你想象的那么难
- bpe 中文tokens
- dalle2：hierarchical text-conditional image generation with clip
- QianBase 运维实用命令
- EsgynDB Troubleshooting - 网卡MTU导致跨网段访问数据库失败
- errorCode 1045, state 28000错误详解即解决方法
- UNIX environment advanced programming - the first chapter
- Advanced Programming in UNIX Environment - Chapter 2
- CPU Architecture at a Glance
- QianBase Operation and Maintenance Practical Commands
- About the third parameter of np.zeros(): c represents similar to c language, row priority; F represents column priority record
- Use the aggird component to implement sliding request paging to achieve the effect of infinite scrolling
- Test case design method - detailed explanation of scenario method
- Program development that runs the game is prohibited
- ErrorCode 1045, the state 28000 error, rounding the solution
- Can the code signing certificate solve the software being alerted by antivirus software?
- yum offline installation
- How much is a code signing certificate?
- Jetpack WorkManager is enough to read this article~
- Button can only be clicked once