News AI Makerspace

Quantization of LLMs and Fine-Tuning with QLoRA

Join AI Makerspace for a workshop on Quantization of LLMs and Fine-Tuning with QLoRA! Learn how to use quantization for loading, fine-tuning, and performing inference on LLMs!

by Sarah DeSouza

Updated January 09, 2024

Quantization of LLMs and Fine-Tuning with QLoRA

RSVP

One of the biggest problems when it comes to building LLM applications is that the Language Models are really Large.

In fact, they’re often so large that we can’t leverage them directly without a lot of compute and access to some serious hardware. This is where quantization comes in: a often misunderstood technique that makes it more cost-effective to deal with LLMs, from 1) loading them onto our local machines or into limited-capacity cloud workspaces, 2) fine-tuning them to improve task-level performance on a single GPU, or 3) performing inference on them during development or in production.

In this event, we’ll talk about the essence of quantization and what it does to reduce the size of the parameters that we load, train, and perform inference on within the LLMs.

We’ll also discuss the intuition behind why this works so well! First, we’ll cover how the bitsandbytes library is used to load quantized versions of LLM parameters directly, bypassing the need to download all model weights for our Mistral-7B demos.

Then, building off of our previous event that covered fine-tuning via Parameter Efficient Fine-Tuning and Low-Rank Adaption (PEFT-LoRA), we’ll dive into the details of how to perform fine-tuning using a quantized approach based on LoRA known as QLoRA. We will be leverage Mistral-7B-Instruct for all loading, fine-tuning, and inference demonstrations, and as always code will be provided!

Join us live to speed up your LLM application development cycle by streamlining your ability to load and fine-tune models!

You’ll learn:

What is quantization and what do I need to know related to LLMs?
How to load quantized LLMs directly using bitsandbytes
How to fine-tune LLMs using a quantized approach with QLoRA

Speakers:

Greg Loughnane is the Founder & CEO of AI Makerspace, where he serves as an instructor for their LLM Ops: LLMs in Production courses. Since 2021 he has built and led industry-leading Machine Learning & AI boot camp programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk, is the Co-Founder & CTO at AI Makerspace, where he serves as an instructor for their LLM Ops: LLMs in Production courses. A former Data Scientist, he also works as the Founding Machine Learning Engineer at Ox. As an experienced online instructor, curriculum developer, and YouTube creator, he’s always learning, building, shipping, and sharing his work! He loves Dungeons & Dragons and is based in Toronto, Canada.

Follow AI Makerspace on LinkedIn & YouTube to stay updated with workshops, new courses, and opportunities for corporate training.

RSVP

by Sarah DeSouza

Updated January 09, 2024

Subscribe to Our Newsletter

Quantization of LLMs and Fine-Tuning with QLoRA

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet

Adobe introduced new Firefly AI-powered features for Photoshop and Illustrator

Cohere's Rerank 3 Nimble supports fast and accurate enterprise search applications

Harvey is the latest legal tech startup to announce a funding raise

Data Phoenix Digest

Read More

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet

Adobe introduced new Firefly AI-powered features for Photoshop and Illustrator

Cohere's Rerank 3 Nimble supports fast and accurate enterprise search applications