Arcee AI has released Trinity-Large-Thinking, a model that the startup has touted as being "the strongest open model ever released outside of China." The claim seems to be based on Trinity-Large-Thinking's benchmark, as the startup released the results of pitting the model against Chinese competitors MiniMax-M2.7, GLM-5, and Kimi-K2.5. For further comparison, the benchmark testing also included Opus-4.6, which Trinity-Large-Thinking beats in the Tau2-Telecom and Tau2-Airline benchmarks. The latter evaluate how conversational agents powered by LLMs perform in controlled settings related to customer service scenarios related to airlines and telecom issues.

Released under the permissive Apache 2.0 license, Trinity-Large-Thinking targets developers seeking alternatives to both Chinese models and closed-source systems from OpenAI and Anthropic. Trinity-Large-Thinking's positioning as a US-developed reasoning model that can power AI agents has become especially relevant given Anthropic's recent decision to charge for connecting Claude with popular AI agent tool OpenClaw separately, as a pay-as-you-go service not covered by subscriptions. OpenRouter data shows Trinity-Large-Preview became the most-used open model in the US for OpenClaw workflows and the fourth globally.

Trinity-Large-Thinking excels at multi-turn tool use, context coherence, and long-horizon agent runs. Companies can download the weights for on-premises deployment or access via Arcee's API. The model is available now on Arcee's platform and OpenRouter. The model scores #2 on PinchBench for agent-relevant tasks, just behind Claude Opus 4.6, while costing $0.90 per million output tokens—roughly 96% cheaper.