Aleph Alpha's Pharia-1-LLM model family focuses on transparency and compliance

Aleph Alpha recently introduced the Pharia-1-LLM family of models, which includes the base model Pharia-1-LLM-7B-control and the instruction-tuned Pharia-1-LLM-7B-control-aligned, which features additional guardrails. Together, both models are meant to cover a variety of multilingual tasks in English, German, Spanish, and French. Given that it lacks any further safety or instruction-following tuning, Pharia-1-LLM-7B-control provides concise and direct responses ideal for text extraction and summarization tasks; in contrast, Pharia-1-LLM-7B-control-aligned is better suited for conversational applications, such as chatbots and AI-powered assistants.

The models are available under the Open Aleph License, which covers non-commercial research and educational use. One of the most notable features of the Pharia-1-LLM models is that they were trained on a painstakingly curated dataset to comply with European copyright and data privacy regulations. The models are proficient at domain-specific applications, demonstrating expertise in the automotive and engineering industries. In addition to the model weights being openly released, Aleph Alpha is also releasing its model training codebase Scaling for non-commercial purposes. Furthermore, detailed information on the model's architecture, training dataset and techniques, and evaluation are available in the official announcement and the Pharia-1-LLM model card.