Aligning LLMs: RLAIF
AI Makerspace presents part 2 of the Alignment series, Aligning LLMs: RLAIF! They will tackle RLAIF, a technique that attempts to solve the fundamental bottleneck of RLHF, namely, that it’s kind of expensive to collect a ton of data directly from humans!
Fine-tuning and alignment are often misunderstood terms regarding Large Language Models (LLMs). In this series on Aligning LLMs, we will cover the most popular fine-tuning alignment methods, as well as emerging techniques, namely:
- Reinforcement Learning with Human Feedback (RLHF)
- Reinforcement Learning with AI Feedback (RLAIF)
- Direct Preference Optimization (DPO)
- Reasoning with Reinforced Fine-Tuning (ReFT)
In our second event, we tackle RLAIF, a technique that attempts to solve the fundamental bottleneck of RLHF; namely, that it’s kind of expensive to collect a ton of data directly from humans!
In RLAIF, also known as constitutional AI, we train an AI assistant to help us to create and critique responses to harmful prompts, based on high-level guiding principles provided by humans in the form of an AI constitution.
RLAIF breaks down into the following steps, each of which we’ll cover in detail:
- Create an AI constitution: Outline the high-level principles that will guide the critiques that an LLM will make of responses from another LLM.
- Generate a revisions dataset: Pass harmful prompts into a helpful LLM to generate an initial response, critique the response according to the constitution, and create a revision in light of the critique
- Supervised Learning for Constitutional AI (SL-CAI): Supervised fine-tuning of pre-trained LLM on the revisions dataset
- Generate a harmlessness dataset: Starting with the SL-CAI LLM, generate two responses using another LLM to decide which one is less harmful according to the constitution
- Reward Modeling and Proximal Policy Optimization (PPO): Train the reward model based on the harmlessness dataset and perform PPO
Finally, we’ll discuss the limitations of RLAIF, which will motivate the continuation of our series on alignment! We will perform all steps in a Google Colab notebook environment, and all code will be provided directly to attendees!
Join us live to learn:
- How RLAIF can be used to align LLMs to being helpful and harmless
- The tradeoffs between RLAIF and RLHF, and the role of AI critique in alignment
- How to leverage an AI constitution to create revision and harmlessness datasets
Speakers:
- Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
- Chris Alexiuk is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Previously, he’s held roles as a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.