Pinecone debuts Pinecone serverless, a revamped vector database
Pinecone, the most popular vector database provider, has launched Pinecone serverless, the next generation of their vector database. With the rising popularity of chat-based interfaces and Q&A applications for generative AI, retrieval augmented generation (RAG) has become the standard method for increasing the performance of LLMs, superseding both fine-tuning on proprietary data and prompt engineering (used as a method to decrease hallucinations). Moreover, RAG with a vector database has proven to deliver even faster and cost-effective results when compared to the use of a vector index, especially when dealing with massive, dynamically growing amounts of vector data.
With serverless indexes, Pinecone has introduced another remarkable improvement to their service so developers can focus less on data storage and more on other critical (and rewarding) aspects of their workflow. Working with a standalone vector index means a separate vector data storage solution is in place. Then, a vector index is based on the storage solution and used only for context-based searches on the data. Any other function, such as inserting, removing, or updating, has to be dealt with directly in the chosen storage solution. Pinecone's pod-based indexes simplify some tasks and unify vector data storage and indexing into a single workflow. However, clients still have to manage some aspects of their database, including scaling the hardware (pods) they use as their data grows.
Thus, Pinecone serverless eliminates every other concern developers may have that is not about uploading or querying their data. For those thinking this may be too good to be true, Pinecone is showcasing a selection of example notebooks featuring use cases from simple semantic search to chatbot agents. These use cases show that Pinecone is delivering on the claims that serverless indexes do not lose functionality, accuracy, or performance to pod-based indexing. However, Pinecone still advises prospective customers to estimate their costs, as well as consider that the public preview offering is not thought out for high-throughput applications. These kinds of applications may see reads throttled and will be subject to updated pricing in the future.