Aleph Alpha's Pharia-1-LLM model family focuses on transparency and compliance
Aleph Alpha has released Pharia-1-LLM-7B, a new open-source foundation model family designed for concise, length-controlled responses, optimized for German, French, and Spanish, and trained on carefully curated data in compliance with EU regulations.
Aleph Alpha recently introduced the Pharia-1-LLM family of models, which includes the base model Pharia-1-LLM-7B-control and the instruction-tuned Pharia-1-LLM-7B-control-aligned, which features additional guardrails. Together, both models are meant to cover a variety of multilingual tasks in English, German, Spanish, and French. Given that it lacks any further safety or instruction-following tuning, Pharia-1-LLM-7B-control provides concise and direct responses ideal for text extraction and summarization tasks; in contrast, Pharia-1-LLM-7B-control-aligned is better suited for conversational applications, such as chatbots and AI-powered assistants.
The models are available under the Open Aleph License, which covers non-commercial research and educational use. One of the most notable features of the Pharia-1-LLM models is that they were trained on a painstakingly curated dataset to comply with European copyright and data privacy regulations. The models are proficient at domain-specific applications, demonstrating expertise in the automotive and engineering industries. In addition to the model weights being openly released, Aleph Alpha is also releasing its model training codebase Scaling for non-commercial purposes. Furthermore, detailed information on the model's architecture, training dataset and techniques, and evaluation are available in the official announcement and the Pharia-1-LLM model card.