Mistral AI, perhaps the most notable European AI startup, launched this week a document understanding API that processes PDF files with OCR and generates a Markdown document that interleaves text and images. Expectedly, the company's main target is retrieval-augmented generation or RAG. While PDF has become the standard format for complex, multimodal documents like slide sets or scientific papers with equations or figures, this file format is not particularly accessible for AI models, which excel at processing raw text. Thus, an API like Mistral's can tap into previously inaccessible troves of data.

On the one hand, this sounds like an obvious benefit for a startup whose main activity is developing generative AI models. However, the Mistral OCR API should be of great interest to Mistral's enterprise customers as a way to give their internal AI systems access to proprietary data sources that would otherwise be inaccessible. Additionally, since the Mistral OCR API outputs formatted, human-readable Markdown documents instead of intractable walls of text, it seems that the API is well-suited for just about any case that requires transforming a PDF-formatted document, paper, or slide deck into a text-based, editable format.

Evaluation results on Mistral's text-only test set (higher is better).

Mistral says that, in addition to accepting multimodal PDF files as input, something that rivals from Google, Microsoft, and OpenAI cannot do, the Mistral OCR API delivers an unprecedented performance, as demonstrated by its internal testing of the Mistral OCR and other rival products. The evaluations show that the Mistral OCR excels at accurately transcribing mathematical equations, scanned documents, and multilingual fonts and scripts. The Mistral OCR API is also remarkably fast; it processes up to 2000 pages per minute per node.

The Mistral OCR API is now the default model for document understanding on the Le Chat service. Via Le Plateforme, customers can access the mistral-ocr-latest API and process documents at a rate of a thousand pages per dollar or almost twice as many using batch inference. As with many of its other services, the Mistral OCR API is available for private deployments on a per-case basis, and it will be available via Mistral's cloud and inference partners soon.