Aligning LLMs: RLAIF

Fine-tuning and alignment are often misunderstood terms regarding Large Language Models (LLMs). In this series on Aligning LLMs, we will cover the most popular fine-tuning alignment methods, as well as emerging techniques, namely:

  1. ​Reinforcement Learning with Human Feedback (RLHF)
  2. ​Reinforcement Learning with AI Feedback (RLAIF)
  3. ​Direct Preference Optimization (DPO)
  4. ​Reasoning with Reinforced Fine-Tuning (ReFT)

​In our second event, we tackle RLAIF, a technique that attempts to solve the fundamental bottleneck of RLHF; namely, that it’s kind of expensive to collect a ton of data directly from humans!

​In RLAIF, also known as constitutional AI, we train an AI assistant to help us to create and critique responses to harmful prompts, based on high-level guiding principles provided by humans in the form of an AI constitution.

​RLAIF breaks down into the following steps, each of which we’ll cover in detail:

  1. ​Create an AI constitution: Outline the high-level principles that will guide the critiques that an LLM will make of responses from another LLM.
  2. ​Generate a revisions dataset: Pass harmful prompts into a helpful LLM to generate an initial response, critique the response according to the constitution, and create a revision in light of the critique
  3. Supervised Learning for Constitutional AI (SL-CAI): Supervised fine-tuning of pre-trained LLM on the revisions dataset
  4. ​Generate a harmlessness dataset: Starting with the SL-CAI LLM, generate two responses using another LLM to decide which one is less harmful according to the constitution
  5. Reward Modeling and Proximal Policy Optimization (PPO): Train the reward model based on the harmlessness dataset and perform PPO

​Finally, we’ll discuss the limitations of RLAIF, which will motivate the continuation of our series on alignment! We will perform all steps in a Google Colab notebook environment, and all code will be provided directly to attendees!

​Join us live to learn:

  • ​How RLAIF can be used to align LLMs to being helpful and harmless
  • ​The tradeoffs between RLAIF and RLHF, and the role of AI critique in alignment
  • ​How to leverage an AI constitution to create revision and harmlessness datasets

​Speakers:

  • Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs.  Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher.  He loves trail running and is based in Dayton, Ohio.
  • Chris Alexiuk is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Previously, he’s held roles as a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.


Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.