ChatGPT was an overnight success as a consumer product, reaching its first million users in less than five days. Regardless enterprise adoption of this and other generative AI products has been noticeably slower, with many potential enterprise clients still hesitant about the adoption and deployment of AI solutions as part of their workflow.
Upon realizing that the lack of objective evaluation and security solutions was a major hurdle delaying the adoption of generative AI in enterprise settings, Anand Kannappan and Rebecca Qian decided that “[they were] on a mission to boost enterprise confidence in generative AI.” The product of their labor is Patronus AI, an automated evaluation and security platform for LLMs. With the help of Patronus AI, enterprises can deploy AI products with the confidence that they are using them correctly and safely.
Currently, some of the most frequent and urgent issues with LLMs are the following:
- Hallucinations: LLMs are known for making things up to compensate for their lack of knowledge on a topic, and do so quite confidently. This makes it hard to adopt LLMs in any field where accuracy is a priority.
- Safety: LLMs can unknowingly leak private or sensitive information. It can also behave unsafely and unexpectedly during runtime.
- Alignment: That a given LLM aligns with personal or consumer goals does not mean that it will also align with the goals of an enterprise.
These issues have proven difficult to solve because evaluation and security processes are costly and ineffective. According to Lightspeed, enterprises usually hire expensive external advisors and testers. Since there are no enterprise-specific standards, engineers will spend a significant time manually designing tests tailored to the enterprise’s needs. This means that tests designed for an enterprise at a specific period will be inherently unscalable, meaning that enterprises will often decide to cut back on evaluation as they grow larger.
The situation only worsens when one considers that the non-deterministic nature of LLMs will complicate predicting mistakes and that most models are trained and evaluated using academic benchmarks that only sometimes reflect real-life scenarios, much less enterprise scenarios.
Patronus AI wants to establish itself as a platform that automates scoring, test generation, and benchmarking of Al products. Rather than having developers claim their products are the best solution for consumers everywhere, Patronus AI would intervene and provide an unbiased, independent perspective.