TrueschoTruescho
All Courses
Sample-based Learning Methods
Coursera
Course
Unknown

Sample-based Learning Methods

University of Alberta

Learn algorithms that derive near-optimal policies through trial-and-error interaction with the environment, relying solely on an agent's experience without prior knowledge.

Unknown5 weeksEnglish38,204 enrolled

About this Course

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

What You'll Learn

  • Understand Temporal-Difference learning and Monte Carlo strategies for estimating value functions from sampled experience
  • Analyze the importance of exploration over dynamic programming sweeps within a model
  • Implement and apply the Temporal-Difference algorithm for value function estimation
  • Implement and apply Expected Sarsa and Q-learning methods for control
  • Distinguish between on-policy and off-policy control methods

Prerequisites

  • Basic knowledge of probabilities and expectations
  • Basic linear algebra
  • Basic calculus

Instructors

M

Martha White

Assistant Professor

A

Adam White

Assistant Professor

Topics

Machine Learning
Data Science
Algorithms
Computer Science
Artificial Intelligence and Machine Learning (AI/ML)
Simulations
Sampling (Statistics)
Machine Learning Algorithms
Probability Distribution
Reinforcement Learning

Course Info

PlatformCoursera
LevelUnknown
PacingUnknown
PriceFree

Skills

تعلم الآلة
علوم البيانات
الخوارزميات
علوم الحاسوب
الذكاء الاصطناعي
المحاكاة
الإحصاء
خوارزميات التعلم
Probability Distribution
Reinforcement Learning

Start Learning Now