Treeverse, the creator of the data version control system, lakeFS, has announced the general availability of lakeFS 1.0. This release comes three years after the initial availability of the version control system, which was launched to provide a platform with Git-like functionalities for data lakes. Since its initial release, lakeFS has become "the only scalable, high-performance data version control option in the market suitable for enterprise-level data operations." Last year, Treeverse also launched lakeFS Cloud, a managed cloud service data version control. Perhaps the most notable feature in this new update is the guaranteed forward and backward compatibility with all releases of lakeFS, thus preserving compatibility with a vast range of tools and frameworks such as Microsoft Azure, Databricks, and Apache Iceberg.
Technologies like Databricks and Apache Iceberg allow the creation of versions of specific tables or schemas, but this is far from a full data version control system, which is lakeFS' offering. For this reason, lakeFS is better seen as complementing rather than competing with these other data technologies. While Databricks and Iceberg allow the versioning of tables and schemas, lakeFS offers the possibility to version entire data pipelines and corresponding workflows. Metadata collected by lakeFS helps users achieve reproducibility and integration into any existing workflow. Furthermore, lakeFS is format agnostic, giving it a more varied range of applications than other technologies that may be tied to a specific data format.
With suggested use cases in data science and engineering, such as data pre-processing, reproducible model training, and isolated Dev/Test environments, lakeFS claims that it already helps thousands of developers reduce costs, double efficiency, and increase production. That being said, this does not mean that the company does not have big plans for the future: in an interview with VentureBeat, Treeverse co-founder and CEO Einat Orr stated that the company is just starting research into data version control capability for vector database technologies.