News AI Makerspace

Aligning LLMs: RLAIF

AI Makerspace presents part 2 of the Alignment series, Aligning LLMs: RLAIF! They will tackle RLAIF, a technique that attempts to solve the fundamental bottleneck of RLHF, namely, that it’s kind of expensive to collect a ton of data directly from humans!

by Sarah DeSouza

Updated February 06, 2024

Fine-tuning and alignment are often misunderstood terms regarding Large Language Models (LLMs). In this series on Aligning LLMs, we will cover the most popular fine-tuning alignment methods, as well as emerging techniques, namely:

Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with AI Feedback (RLAIF)
Direct Preference Optimization (DPO)
Reasoning with Reinforced Fine-Tuning (ReFT)

In our second event, we tackle RLAIF, a technique that attempts to solve the fundamental bottleneck of RLHF; namely, that it’s kind of expensive to collect a ton of data directly from humans!

In RLAIF, also known as constitutional AI, we train an AI assistant to help us to create and critique responses to harmful prompts, based on high-level guiding principles provided by humans in the form of an AI constitution.

RLAIF breaks down into the following steps, each of which we’ll cover in detail:

Create an AI constitution: Outline the high-level principles that will guide the critiques that an LLM will make of responses from another LLM.
Generate a revisions dataset: Pass harmful prompts into a helpful LLM to generate an initial response, critique the response according to the constitution, and create a revision in light of the critique
Supervised Learning for Constitutional AI (SL-CAI): Supervised fine-tuning of pre-trained LLM on the revisions dataset
Generate a harmlessness dataset: Starting with the SL-CAI LLM, generate two responses using another LLM to decide which one is less harmful according to the constitution
Reward Modeling and Proximal Policy Optimization (PPO): Train the reward model based on the harmlessness dataset and perform PPO

Finally, we’ll discuss the limitations of RLAIF, which will motivate the continuation of our series on alignment! We will perform all steps in a Google Colab notebook environment, and all code will be provided directly to attendees!

Join us live to learn:

How RLAIF can be used to align LLMs to being helpful and harmless
The tradeoffs between RLAIF and RLHF, and the role of AI critique in alignment
How to leverage an AI constitution to create revision and harmlessness datasets

Speakers:

Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their AI Engineering Bootcamp. Previously, he’s held roles as a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.

by Sarah DeSouza

Updated February 06, 2024

Subscribe to Our Newsletter

Aligning LLMs: RLAIF

The FTC is gathering information on surveillance pricing products and services

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet

Adobe introduced new Firefly AI-powered features for Photoshop and Illustrator

Cohere's Rerank 3 Nimble supports fast and accurate enterprise search applications

Data Phoenix Digest

Read More

The FTC is gathering information on surveillance pricing products and services

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet

Adobe introduced new Firefly AI-powered features for Photoshop and Illustrator