Groq trained a state-of-the-art tool use model based on Llama using synthetic data only

Groq has released two open-source models specialized in tool use: Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use. The models were developed in collaboration with Glaive, a company that focuses on helping customers build and enhance custom models using synthetic data. Upon launch, Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use became the first and third best models for tool use in the Berkeley Function Calling Leaderboard (BFCL), outperforming models including Claude 3.5 Sonnet and GPT-4o.

To achieve groundbreaking performance using synthetic data exclusively, the models were trained using a combination of full fine-tuning and Direct Preference Optimization (DPO). The synthetic data was decontaminated to avoid overfitting using the LMSYS method described in this blog post. The findings were that the datasets had a very low contamination rate, 5.6% for the SFT data and 1.3% for the DPO. According to Groq, this means that there is little to none overfitting with respect to the test set data.

Although the models were trained on a learning schedule, there is a slight impact on general purpose performance. For this reason, Groq recommends a hybrid approach in which queries are redirected to the most appropriate model, for instance, Llama-3-Groq-70B-Tool-Use for queries involving function calling, API interactions, or structured data manipulation, and the unmodified Llama 3 70B for any general purpose queries. Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use are available under the same permisive license as their unmodified counterparts. The models can be previewed through the Groq API, and are available on the GroqCloud Developer Hub and on Hugging Face.