Fastino has announced a $17.5 million seed round led by Khosla Ventures that will support the startup as it works towards its mission of developing AI models that are specialized in a single task, can be deployed on diverse hardware setups, and do not compromise on speed or affordability. The funding round brings Fastino's total funding to nearly $25 million, as it follows a $7 million pre-seed round led by Insight Partners and Microsoft's M12 venture fund last November. Other participants in the current round include Valor Equity Partners, former Docker CEO Scott Johnson, and Weights & Biases co-founders CEO Lukas Biewald and CTO Shawn Lewis.
The AI industry is no stranger to the benefits of smaller specialized models in terms of cost, size and hardware requirements. Because of these features, small language models have gained traction in the open source ecosystem, and have been released by tech giants and research organizations alike. Still, the overall trend continues to favor massive general-purpose models that handle everything from chatbots to code generation. Fastino is boldly betting against this trend by developing specialized models that excel at specific enterprise tasks.
One of the more interesting claims the startup makes is that its models leverage a novel architecture capable of delivering or surpassing frontier model performance in task-specific benchmarks. What's more, Fastino says it is developing its task-specific models using only consumer-grade gaming GPUs with a training budget under $100,000—no H100s required. According to the company, their models are 99 times faster than standard LLMs while being small enough to run efficiently on CPUs. Their initial suite includes models for summarization, function calling, text-to-JSON conversion, PII redaction, classification, profanity censoring, and information extraction.
Fastino is also disrupting AI pricing models. Instead of per-token fees, the company has introduced flat pricing in the form of subscription plans, which include a free tier that offers up to 10,000 requests per month—running entirely on CPUs to minimize environmental impact. The other two subscription tiers Fastino markets are the Pro and Team tiers. The former brings a 100K monthly request limit, a faster GPU-hosted models API, a private mode, and batch processing support. Among other additional benefits, the Team tier ups the request limit to 3 million, and supports an expanded context window; from 16,000 to 128,000 tokens.
"AI developers don't need an LLM trained on trillions of irrelevant data points – they need the right model for their task," said Hurn-Maloney, Fastino's COO.
Comments