Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

The massive DeepSeek V3 rivals many of the latest openly available models

DeepSeek, a Chinese AI firm backed by quantitative hedge fund High-Flyer, recently released DeepSeek V3, a model available for commercial and non-commercial applications. DeepSeek V3 displays competitive performance that approaches recent models and is particularly proficient in coding.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
The massive DeepSeek V3 rivals many of the latest openly available models

The recently released DeepSeek V3, one of the largest openly available models yet, is a 671 billion parameter mixture of experts (MoE) text-only model trained on 14.8 trillion tokens. DeepSeek released the DeepSeek V3 last week under a permissive license that covers downloading and modification for both commercial and non-commercial purposes.

Internal testing shows that DeepSeek V3 outperforms competing large and openly available models, including Qwen 2.5 70B and Llama 3.1 405B. Based on benchmarks alone, DeepSeek V3 demonstrates a competitive performance compared to Claude Sonnet 3.5 (1022) and GPT-4o (0513) and notable proficiency in coding—DeepSeek V3 outperforms GPT-4o in every coding benchmark tested, and only falls behind Sonnet in the SWE Verified and Aider-Edit benchmarks.

But perhaps the most remarkable fact about DeepSeek V3 is not about its benchmark performance, but the hardware used to develop the model. DeepSeek V3 was trained using the firm's 2048 H800 Nvidia GPU cluster for a total of 2.788M GPU hours. To put this into perspective, Llama 3 405B, a model not nearly as large as DeepSeek V3, used 30.8M GPU-hours. If, as in DeepSeek V3's technical report, one assumes a $2 per GPU hour cost, DeepSeek V3 cost less than $6 million to train, while Llama 3 405B costs would be just over $60 million, or more than ten times as much as DeepSeek V3's training costs.

Despite the US Department of Commerce ban on AI chip exports to China, High-Flyer, the Chinese quantitative hedge fund that backs the DeepSeek research lab, builds its own clusters for model training. Using this infrastructure, DeepSeek has released a series of strong-performing openly available models. One of DeepSeek's latest releases is R1-Lite-Preview, a model which, like OpenAI's o series, leverages the new test-time or inference compute technique to solve more challenging reasoning problems.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More