Mastering Generative AI: Advanced Fine-Tuning for LLMs

IBM

Build advanced, job-ready skills in fine-tuning using Hugging Face, reinforcement learning using proximal preference optimization (PPO), and optimal solutions for direct preference optimization (DPO).

3 hrs/week2 weeksEnglish825 enrolled

Free to Audit

About this Course

Employers are actively hunting for AI engineers who know how to fine-tune transformers for gen AI applications. This Mastering Generative AI - Advanced Fine-Tuning for LLMs course is designed to give AI engineers and other AI specialists the highly sought-after skills employers need. AI engineers use advanced fine-tuning skills for LLMs to tailor pre-trained models for specific tasks to ensure accuracy and relevance in applications like chatbots, translation, and content generation. During this course, you’ll explore the basics of instruction-tuning with Hugging Face, reward modeling, and training a reward model. You’ll look at proximal policy optimization (PPO) with Hugging Face and its configuration, large language models (LLMs) as distributions, and reinforcement learning from human feedback (RLHF). Plus, you’ll investigate direct performance optimization (DPO) with Hugging Face using the partition function. As you progress through the course, you’ll also build your practical hands-on experience in online labs where you’ll work on reward modeling, PPO, and DPO. If you’re keen to extend your gen AI engineering skills to include advanced fine-tuning for LLMs so you can catch the eye of an employer, ENROLL TODAY and power up your resume in just 2 weeks! Prerequisites: To take this course, you need knowledge of LLMs, instruction-tuning, and reinforcement learning. Familiarity with machine learning and neural network concepts is useful too. 3b:T9b4, <

What You'll Learn

Advanced, job-ready skills in fine-tuning for LLMs employers are looking for in just 2 weeks.
How to perform instruction-tuning and reward modeling with the Hugging Face.
How to use large language models (LLMs) as policies and reinforcement learning with human feedback (RLHF).
How to apply direct preference optimization (DPO) with partition function and Hugging Face and create an optimal solution to a DPO problem.
How to use proximal policy optimization (PPO) with Hugging Face to create a scoring function and perform dataset tokenization.

Prerequisites

Basic knowledge of LLMs, instruction-tuning, and reinforcement learning. Familiarity with machine learning and neural network concepts.

Instructors

Joseph Santarcangelo

PhD., Data Scientist

Course Info

PlatformedX

LevelIntermediate

PacingUnknown

CertificateAvailable

PriceFree to Audit

Start Learning Now