DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

Diffusion models have recently gained enormous popularity, due to the ability to generate high-quality and controlled images based on textual cues written in natural language. However, generating images with the desired details is challenging, because it requires users to write appropriate cues indicating the exact expected results. Developing such cues requires trial and error, and can often seem random.

The DiffusionDB human-interaction dataset is the first large-scale text-to-image cue database with 2 million real-world cue-image pairs, which opens up a broad research opportunity in understanding the interaction between cues and generative models, detecting deep fakes, and developing human-image interaction tools to help users use these models more easily.

DiffusionDB contains 2 million images generated by Stable Diffusion using cues and hyperparameters set by real users. The main language of the dataset is English, but it also contains languages such as Spanish, Chinese and Russian.

To distribute DiffusionDB, it uses a modular file structure: the 2 million images in DiffusionDB are split into 2,000 folders, where each folder contains 1,000 images and a JSON file that links those 1,000 images with their hints and hyperparameters. The subfolders, in turn, have part-00xxx names, and each image has a unique name generated by UUID version 4. The JSON file in a subfolder has the same name as the subfolder. Each image is a PNG file. The JSON file contains key-value pairs that map image file names to their hints and hyperparameters.

DiffusionDB is quite large - as much as 1.6 TB! However, thanks to the modular file structure, you can easily load the desired number of images and their hints and hyperparameters.

Links to the work and code are left below.

Project - https://poloclub.github.io/diffusiondb/
Paper - https://arxiv.org/abs/2210.14896
Code - https://github.com/poloclub/diffusiondb
Hugging Face - https://huggingface.co/datasets/poloclub/diffusiondb

Recall another new addition to the world of AI, eDiff-I, a new generation of generative AI content creation tool that offers unprecedented text-to-image fusion, instant style transfer, and intuitive word-painting capabilities.

Subscribe

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

Comments

Read Next

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Codex now boasts plugins for white-collar work and other new features for Enterprise users