Inverse Reinforcement Learning

TagLast edit: Dec 30, 2024, 9:58 AM by Dakara

Inverse Reinforcement Learning (IRL) is a technique in the field of machine learning where an AI system learns the preferences or objectives of an agent, typically a human, by observing their behavior. Unlike traditional Reinforcement Learning (RL), where an agent learns to optimize its actions based on given reward functions, IRL works by inferring the underlying reward function from the demonstrated behavior.

In other words, IRL aims to understand the motivations and goals of an agent by examining their actions in various situations. Once the AI system has learned the inferred reward function, it can then use this information to make decisions that align with the preferences or objectives of the observed agent.

IRL is particularly relevant in the context of AI alignment, as it provides a potential approach to align AI systems with human values. By learning from human demonstrations, AI systems can be designed to better understand and respect the preferences, intentions, and values of the humans they interact with or serve.

Thoughts on “Human-Compatible”

TurnTroutOct 10, 2019, 5:24 AM

64 points

34 comments5 min readLW link

Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans and jsteinhardt

Nov 9, 2018, 3:33 PM

34 points

3 comments16 min readLW link

Our take on CHAI’s research agenda in under 1500 words

Alex FlintJun 17, 2020, 12:24 PM

113 points

18 comments5 min readLW link

Learning biases and rewards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM

41 points

3 comments4 min readLW link

Humans can be assigned any values whatsoever...

Stuart_ArmstrongOct 24, 2017, 12:03 PM

3 points

1 comment4 min readLW link

Problems integrating decision theory and inverse reinforcement learning

agilecavemanMay 8, 2018, 5:11 AM

7 points

2 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex FlintApr 2, 2021, 7:51 PM

59 points

4 comments7 min readLW link

AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell

DanielFilanJun 8, 2021, 11:20 PM

22 points

1 comment72 min readLW link

IRL 1/8: Inverse Reinforcement Learning and the problem of degeneracy

RAISEMar 4, 2019, 1:11 PM

20 points

2 comments1 min readLW link

(app.grasple.com)

Delegative Inverse Reinforcement Learning

Vanessa KosoyJul 12, 2017, 12:18 PM

15 points

13 comments16 min readLW link

[Question] Can coherent extrapolated volition be estimated with Inverse Reinforcement Learning?

Jade BishopApr 15, 2019, 3:23 AM

12 points

5 comments3 min readLW link

Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormalJun 18, 2016, 12:55 AM

17 points

2 comments3 min readLW link

[Question] Is CIRL a promising agenda?

Chris_LeongJun 23, 2022, 5:12 PM

28 points

16 comments1 min readLW link

A Survey of Foundational Methods in Inverse Reinforcement Learning

adamkSep 1, 2022, 6:21 PM

27 points

0 comments12 min readLW link

Inverse reinforcement learning on self, pre-ontology-change

Stuart_ArmstrongNov 18, 2015, 1:23 PM

0 points

2 comments1 min readLW link

Biased reward-learning in CIRL

Stuart_ArmstrongJan 5, 2018, 6:12 PM

8 points

3 comments7 min readLW link

(C)IRL is not solely a learning process

Stuart_ArmstrongSep 15, 2016, 8:35 AM

1 point

29 comments3 min readLW link

CIRL Wireheading

tom4everittAug 8, 2017, 6:33 AM

3 points

4 comments2 min readLW link

Book Review: Human Compatible

Scott AlexanderJan 31, 2020, 5:20 AM

78 points

6 comments16 min readLW link

(slatestarcodex.com)

Book review: Human Compatible

PeterMcCluskeyJan 19, 2020, 3:32 AM

37 points

2 comments5 min readLW link

(www.bayesianinvestor.com)

AXRP Episode 2 - Learning Human Biases with Rohin Shah

DanielFilanDec 29, 2020, 8:43 PM

13 points

0 comments35 min readLW link

ACI#9: What is Intelligence

Akira PyinyaDec 9, 2024, 9:54 PM

3 points

0 comments8 min readLW link

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

Joar SkalseFeb 28, 2025, 7:20 PM

25 points

4 comments14 min readLW link

Partial Identifiability in Reward Learning

Joar SkalseFeb 28, 2025, 7:23 PM

15 points

0 comments12 min readLW link

Misspecification in Inverse Reinforcement Learning

Joar SkalseFeb 28, 2025, 7:24 PM

19 points

0 comments11 min readLW link

Misspecification in Inverse Reinforcement Learning—Part II

Joar SkalseFeb 28, 2025, 7:24 PM

9 points

0 comments7 min readLW link

Defining and Characterising Reward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM

15 points

0 comments4 min readLW link

Other Papers About the Theory of Reward Learning

Joar SkalseFeb 28, 2025, 7:26 PM

16 points

0 comments5 min readLW link

How to Contribute to Theoretical Reward Learning Research

Joar SkalseFeb 28, 2025, 7:27 PM

16 points

0 comments21 min readLW link

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut CirsteaNov 4, 2023, 5:34 PM

27 points

0 comments1 min readLW link

(arxiv.org)

RAISE is launching their MVP

nullFeb 26, 2019, 11:45 AM

67 points

1 comment1 min readLW link

Human-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM

42 points

7 comments2 min readLW link

(bair.berkeley.edu)

Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

steven0461Jul 11, 2018, 2:59 AM

28 points

11 comments1 min readLW link

Humans can be assigned any values whatsoever...

Stuart_ArmstrongOct 13, 2017, 11:29 AM

16 points

6 comments4 min readLW link

Hardcode the AGI to need our approval indefinitely?

MichaelStJulesNov 11, 2021, 7:04 AM

2 points

2 comments1 min readLW link

Machines vs Memes Part 3: Imitation and Memes

ceru23Jun 1, 2022, 1:36 PM

7 points

0 comments7 min readLW link

Data for IRL: What is needed to learn human values?

Jan WehnerOct 3, 2022, 9:23 AM

18 points

6 comments12 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran WFeb 3, 2024, 4:00 AM

16 points

0 comments5 min readLW link

No comments.

In­verse Re­in­force­ment Learning

Inverse Reinforcement Learning