Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks
Meta released its models trained  on multi-token prediction for research purposes
Credit: Meta (from the research paper)

Meta released its models trained on multi-token prediction for research purposes

Meta has released pre-trained code completion models licensed for research use. The models were trained using a novel multi-token prediction approach, which improves performance and inference speed without additional training time or memory overhead.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara

Meta recently announced the release of its models pre-trained for code completion using its multi-token prediction approach. The models, which fall under Meta's non-commercial research license, are now available under gated access at the AI at Meta Hugging Face repository. Meta first announced the upcoming availability of these models about two weeks ago, when the company published an update on several of its research artifacts, including Meta Chameleon (which was released as a limited version that outputs text only), text-to-music generation model that accepts conditioning inputs, a digital watermarking technique, and the PRISM dataset.

As the name suggests, multi-token prediction approaches teach models to predict the next n tokens at once, as opposed to the standard next-token prediction approach which has the model maximize the probability of the next token in a sequence, given the history of the previous tokens. Although multi-token prediction is not new, the research team behind Meta's models developed a novel approach that has no training time or memory overhead, is shown to be beneficial at scale, with models solving more coding problems from the benchmark testing, and is demonstrated to enable inference up to 3 times faster than baseline.

The model release includes two pairs of 7 billion-parameter baseline and multi-token prediction models. The first pair of models was trained on 200 billion tokens of code, while the other pair was trained on 1 trillion.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More