RSS

In­verse Re­in­force­ment Learning

TagLast edit: Dec 30, 2024, 9:58 AM by Dakara

Inverse Reinforcement Learning (IRL) is a technique in the field of machine learning where an AI system learns the preferences or objectives of an agent, typically a human, by observing their behavior. Unlike traditional Reinforcement Learning (RL), where an agent learns to optimize its actions based on given reward functions, IRL works by inferring the underlying reward function from the demonstrated behavior.

In other words, IRL aims to understand the motivations and goals of an agent by examining their actions in various situations. Once the AI system has learned the inferred reward function, it can then use this information to make decisions that align with the preferences or objectives of the observed agent.

IRL is particularly relevant in the context of AI alignment, as it provides a potential approach to align AI systems with human values. By learning from human demonstrations, AI systems can be designed to better understand and respect the preferences, intentions, and values of the humans they interact with or serve.

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTroutOct 10, 2019, 5:24 AM
64 points
34 comments5 min readLW link

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learning

Nov 9, 2018, 3:33 PM
34 points
3 comments16 min readLW link

Our take on CHAI’s re­search agenda in un­der 1500 words

Alex FlintJun 17, 2020, 12:24 PM
113 points
18 comments5 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM
41 points
3 comments4 min readLW link

Hu­mans can be as­signed any val­ues what­so­ever...

Stuart_ArmstrongOct 24, 2017, 12:03 PM
3 points
1 comment4 min readLW link

Prob­lems in­te­grat­ing de­ci­sion the­ory and in­verse re­in­force­ment learning

agilecavemanMay 8, 2018, 5:11 AM
7 points
2 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex FlintApr 2, 2021, 7:51 PM
59 points
4 comments7 min readLW link

AXRP Epi­sode 8 - As­sis­tance Games with Dy­lan Had­field-Menell

DanielFilanJun 8, 2021, 11:20 PM
22 points
1 comment72 min readLW link

IRL 1/​8: In­verse Re­in­force­ment Learn­ing and the prob­lem of degeneracy

RAISEMar 4, 2019, 1:11 PM
20 points
2 comments1 min readLW link
(app.grasple.com)

Del­ega­tive In­verse Re­in­force­ment Learning

Vanessa KosoyJul 12, 2017, 12:18 PM
15 points
13 comments16 min readLW link

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

Jade BishopApr 15, 2019, 3:23 AM
12 points
5 comments3 min readLW link

Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing vs. Ir­ra­tional Hu­man Preferences

orthonormalJun 18, 2016, 12:55 AM
17 points
2 comments3 min readLW link

[Question] Is CIRL a promis­ing agenda?

Chris_LeongJun 23, 2022, 5:12 PM
28 points
16 comments1 min readLW link

A Sur­vey of Foun­da­tional Meth­ods in In­verse Re­in­force­ment Learning

adamkSep 1, 2022, 6:21 PM
27 points
0 comments12 min readLW link

In­verse re­in­force­ment learn­ing on self, pre-on­tol­ogy-change

Stuart_ArmstrongNov 18, 2015, 1:23 PM
0 points
2 comments1 min readLW link

Bi­ased re­ward-learn­ing in CIRL

Stuart_ArmstrongJan 5, 2018, 6:12 PM
8 points
3 comments7 min readLW link

(C)IRL is not solely a learn­ing process

Stuart_ArmstrongSep 15, 2016, 8:35 AM
1 point
29 comments3 min readLW link

CIRL Wireheading

tom4everittAug 8, 2017, 6:33 AM
3 points
4 comments2 min readLW link

Book Re­view: Hu­man Compatible

Scott AlexanderJan 31, 2020, 5:20 AM
78 points
6 comments16 min readLW link
(slatestarcodex.com)

Book re­view: Hu­man Compatible

PeterMcCluskeyJan 19, 2020, 3:32 AM
37 points
2 comments5 min readLW link
(www.bayesianinvestor.com)

AXRP Epi­sode 2 - Learn­ing Hu­man Bi­ases with Ro­hin Shah

DanielFilanDec 29, 2020, 8:43 PM
13 points
0 comments35 min readLW link

ACI#9: What is Intelligence

Akira PyinyaDec 9, 2024, 9:54 PM
3 points
0 comments8 min readLW link

The The­o­ret­i­cal Re­ward Learn­ing Re­search Agenda: In­tro­duc­tion and Motivation

Joar SkalseFeb 28, 2025, 7:20 PM
25 points
4 comments14 min readLW link

Par­tial Iden­ti­fi­a­bil­ity in Re­ward Learning

Joar SkalseFeb 28, 2025, 7:23 PM
15 points
0 comments12 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learning

Joar SkalseFeb 28, 2025, 7:24 PM
19 points
0 comments11 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learn­ing—Part II

Joar SkalseFeb 28, 2025, 7:24 PM
9 points
0 comments7 min readLW link

Defin­ing and Char­ac­ter­is­ing Re­ward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM
15 points
0 comments4 min readLW link

Other Papers About the The­ory of Re­ward Learning

Joar SkalseFeb 28, 2025, 7:26 PM
16 points
0 comments5 min readLW link

How to Con­tribute to The­o­ret­i­cal Re­ward Learn­ing Research

Joar SkalseFeb 28, 2025, 7:27 PM
16 points
0 comments21 min readLW link

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut CirsteaNov 4, 2023, 5:34 PM
27 points
0 comments1 min readLW link
(arxiv.org)

RAISE is launch­ing their MVP

nullFeb 26, 2019, 11:45 AM
67 points
1 comment1 min readLW link

Hu­man-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM
42 points
7 comments2 min readLW link
(bair.berkeley.edu)

Agents That Learn From Hu­man Be­hav­ior Can’t Learn Hu­man Values That Hu­mans Haven’t Learned Yet

steven0461Jul 11, 2018, 2:59 AM
28 points
11 comments1 min readLW link

Hu­mans can be as­signed any val­ues what­so­ever...

Stuart_ArmstrongOct 13, 2017, 11:29 AM
16 points
6 comments4 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJulesNov 11, 2021, 7:04 AM
2 points
2 comments1 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru23Jun 1, 2022, 1:36 PM
7 points
0 comments7 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan WehnerOct 3, 2022, 9:23 AM
18 points
6 comments12 min readLW link

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran WFeb 3, 2024, 4:00 AM
16 points
0 comments5 min readLW link
No comments.