One of the biggest problems when it comes to building LLM applications is that the Language Models are really Large.
In fact, they’re often so large that we can’t leverage them directly without a lot of compute and access to some serious hardware. This is where quantization comes in: a often misunderstood technique that makes it more cost-effective to deal with LLMs, from 1) loading them onto our local machines or into limited-capacity cloud workspaces, 2) fine-tuning them to improve task-level performance on a single GPU, or 3) performing inference on them during development or in production.
In this event, we’ll talk about the essence of quantization and what it does to reduce the size of the parameters that we load, train, and perform inference on within the LLMs.
We’ll also discuss the intuition behind why this works so well! First, we’ll cover how the bitsandbytes library is used to load quantized versions of LLM parameters directly, bypassing the need to download all model weights for our Mistral-7B demos.
Then, building off of our previous event that covered fine-tuning via Parameter Efficient Fine-Tuning and Low-Rank Adaption (PEFT-LoRA), we’ll dive into the details of how to perform fine-tuning using a quantized approach based on LoRA known as QLoRA. We will be leverage Mistral-7B-Instruct for all loading, fine-tuning, and inference demonstrations, and as always code will be provided!
Join us live to speed up your LLM application development cycle by streamlining your ability to load and fine-tune models!
- What is quantization and what do I need to know related to LLMs?
- How to load quantized LLMs directly using bitsandbytes
- How to fine-tune LLMs using a quantized approach with QLoRA
- Greg Loughnane is the Founder & CEO of AI Makerspace, where he serves as an instructor for their LLM Ops: LLMs in Production courses. Since 2021 he has built and led industry-leading Machine Learning & AI boot camp programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.
- Chris Alexiuk, is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their LLM Ops: LLMs in Production courses. A former Data Scientist, he also works as the Founding Machine Learning Engineer at Ox. As an experienced online instructor, curriculum developer, and YouTube creator, he’s always learning, building, shipping, and sharing his work! He loves Dungeons & Dragons and is based in Toronto, Canada.
Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.