Cosine has launched Genie, which the company describes as an "AI software engineer". Genie's most noteworthy capability is that it scores 30% on SWE-Bench, a standard benchmark in software engineering. Genie has attained state-of-the-art performance thanks to this score, surpassing Amazon's coding assistant Q, Factory's Code Droid scores, and Cognition's Devin. Genie can work autonomously or in collaboration with human developers, performing tasks like solving bugs, building features, and code refactoring in 15 popular programming languages, including JavaScript, Python, and TypeScript.

The secret to Genie's performance is Cosine's proprietary dataset. The company reports that it has been working for months on a dataset based on real development activity from human programmers, and enhanced with AI-powered methods that enable the reconstruction of the implied reasoning and decision-taking that led to the final result. An upshot to Cosine's data pipeline is that as the LLMs used to reconstruct the reasoning process behind the data improve, the quality of the extracted data increases, vastly improving the performance of the final model. Once trained, the model is also added to the pipeline, allowing the next version to learn from its predecessor's mistakes.

Cosine shared it uses long-context models from OpenAI as its foundational models. However, the company aims to apply its flexible training method to release Genie models in various sizes, each suited to different tasks. Likewise, Cosine also plans to context-extend a leading open-source foundation model to have options available in both the open and closed-source ecosystems. Generally, the company plans to strengthen its dataset and model offerings in parallel, gathering as much feedback as possible. The coding assistant space is shaping up to be a tough one to be in, given the amount of competition, but Cosine's novel approach is sure to make a lasting impact.