News

Meta released its models trained on multi-token prediction for research purposes

Meta has released pre-trained code completion models licensed for research use. The models were trained using a novel multi-token prediction approach, which improves performance and inference speed without additional training time or memory overhead.

by Ellie Ramirez-Camara

Updated July 07, 2024

Meta recently announced the release of its models pre-trained for code completion using its multi-token prediction approach. The models, which fall under Meta's non-commercial research license, are now available under gated access at the AI at Meta Hugging Face repository. Meta first announced the upcoming availability of these models about two weeks ago, when the company published an update on several of its research artifacts, including Meta Chameleon (which was released as a limited version that outputs text only), text-to-music generation model that accepts conditioning inputs, a digital watermarking technique, and the PRISM dataset.

As the name suggests, multi-token prediction approaches teach models to predict the next n tokens at once, as opposed to the standard next-token prediction approach which has the model maximize the probability of the next token in a sequence, given the history of the previous tokens. Although multi-token prediction is not new, the research team behind Meta's models developed a novel approach that has no training time or memory overhead, is shown to be beneficial at scale, with models solving more coding problems from the benchmark testing, and is demonstrated to enable inference up to 3 times faster than baseline.

The model release includes two pairs of 7 billion-parameter baseline and multi-token prediction models. The first pair of models was trained on 200 billion tokens of code, while the other pair was trained on 1 trillion.

by Ellie Ramirez-Camara

Updated July 07, 2024

Subscribe to Our Newsletter

Meta released its models trained on multi-token prediction for research purposes

The massive DeepSeek V3 rivals many of the latest openly available models

Google is reportedly testing Gemini against Anthropic's Claude

Coralogix acquired Aporia to offer a comprehensive observability platform to users

Mindgard, a Lancaster University spinoff, raised $8M to expand its presence in the US

Weekly AI Highlights Review: December 17–25

Data Phoenix Digest

Read More

The massive DeepSeek V3 rivals many of the latest openly available models

Google is reportedly testing Gemini against Anthropic's Claude

Coralogix acquired Aporia to offer a comprehensive observability platform to users

Mindgard, a Lancaster University spinoff, raised $8M to expand its presence in the US