DeepSeek is undoubtedly making the most out of its time in the spotlight. Earlier this week, the AI research lab released Janus-Pro, a multimodal model family that includes 1.5 B and 7B parameter models for image understanding and generation. According to internal evaluations, Janus-Pro achieves state-of-the-art scores in the DPG-Bench, a collection of dense text prompts designed to test how well visual generation models adhere to prompt specifications. On the DPG-Bench evaluation, Janus-Pro-7B surpasses strong rivals including DALL-E 3 and SD3-Medium. The models were also tested against GenEval and popular visual understanding benchmarks including GQA and MMMU.
Janus-Pro builds on previous research that led to the Janus framework for unified image understanding and generation. Like Janus, Janus-Pro is an autoregressive framework that uses separate encoders for understanding and generation. This approach differentiates Janus and Janus-Pro from other popular approaches to image understanding and generation, which usually leverage a single encoder for both tasks. The research team hypothesized that, given the different requirements of each visual task, using a single image encoder negatively impacts multimodal understanding. Thus, they developed Janus-Pro to show that using separate encoders would help mitigate that deficiency in image understanding performance and deliver efficient performance across understanding and generation tasks regardless of the models' limited parameters.
Janus-Pro-7B and 1B can be downloaded from Hugging Face, where users can also take Janus-Pro-7B for a spin. Janus-Pro has some clear limitations: it can only process visual inputs with a resolution up to 384 x 384 and often produces images that may feature a lower level of fine detail than its rivals' outputs due to a combination of limited resolution and reconstruction losses. However, the models are available under an MIT license, which means they can be used for commercial and non-commercial applications without restriction. As with the recently unveiled R1, their open availability makes the Janus-Pro models an attractive option for users looking for strong performance at a low cost and not requiring vendor lock-in.
Comments