Sample-based Learning Methods

University of Alberta

Learn algorithms that derive near-optimal policies through trial-and-error interaction with the environment, relying solely on an agent's experience without prior knowledge.

Unknown5 weeksEnglish38,204 enrolled

Free

About this Course

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

What You'll Learn

Understand Temporal-Difference learning and Monte Carlo strategies for estimating value functions from sampled experience
Analyze the importance of exploration over dynamic programming sweeps within a model
Implement and apply the Temporal-Difference algorithm for value function estimation
Implement and apply Expected Sarsa and Q-learning methods for control
Distinguish between on-policy and off-policy control methods

Prerequisites

Basic knowledge of probabilities and expectations
Basic linear algebra
Basic calculus

Instructors

Martha White

Assistant Professor

Adam White

Assistant Professor

Topics

Machine Learning

Data Science

Algorithms

Computer Science

Artificial Intelligence and Machine Learning (AI/ML)

Simulations

Sampling (Statistics)

Machine Learning Algorithms

Probability Distribution

Reinforcement Learning

Course Info

PlatformCoursera

LevelUnknown

PacingUnknown

PriceFree

Skills

تعلم الآلة

علوم البيانات

الخوارزميات

علوم الحاسوب

الذكاء الاصطناعي

المحاكاة

الإحصاء

خوارزميات التعلم

Probability Distribution

Reinforcement Learning

Start Learning Now