RSS

Re­in­force­ment Learning

TagLast edit: 29 May 2023 19:17 UTC by Roman Leventov

Within the field of machine learning, reinforcement learning refers to the study of how to train agents to complete tasks by updating (“reinforcing”) the agents with feedback signals.

Related: Inverse Reinforcement Learning, Machine learning, Friendly AI, Game Theory, Prediction, Agency.

Consider an agent that receives an input informing the agent of the environment’s state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment. This action will in itself change the state of the environment, which will result in a new input, and so on, each time also presenting the agent with the reward (or reinforcement signal) relative to its actions in the environment. In “policy gradient” approaches, the reinforcement signal is often used to update the agent (the “policy”), although sometimes an agent will do limited online (model-based) heuristic search to instead optimize the reward signal + heuristic evaluation.

RL is distinguished from energy-based architectures such as Active Inference, Joint Embedded Predictive Architectures (JEPA), and GFlowNets.

Exploration and Optimization

Knowing that randomly selecting the actions will result in poor performances, one of the biggest problems in reinforcement learning is exploring the avaliable set of responses to avoid getting stuck in sub-optimal choices and proceed to better ones.

This is the problem of exploration, which is best described in the most studied reinforcement learning problem—the k-armed bandit. In it, an agent has to decide which sequence of levers to pull in a gambling room, not having any information about the probabilities of winning in each machine besides the reward it receives each time. The problem revolves about deciding which is the optimal lever and what criteria defines the lever as such.

Parallel with an exploration implementation, it is still necessary to chose the criteria which makes a certain action optimal when compared to another. This study of this property has led to several methods, from brute forcing to taking into account temporal differences in the received reward. Despite this and the great results obtained by reinforcement methods in solving small problems, it suffers from a lack of scalability, having difficulties solving larger, close-to-human scenarios.

Further Reading & References

See Also

Think care­fully be­fore call­ing RL poli­cies “agents”

TurnTrout2 Jun 2023 3:46 UTC
133 points
36 comments4 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
376 points
123 comments10 min readLW link3 reviews

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

28 Oct 2020 16:01 UTC
47 points
2 comments1 min readLW link

Effi­cien­tZero: How It Works

1a3orn26 Nov 2021 15:17 UTC
297 points
50 comments29 min readLW link1 review

Models Don’t “Get Re­ward”

Sam Ringer30 Dec 2022 10:37 UTC
313 points
61 comments5 min readLW link1 review

AGI will have learnt util­ity functions

beren25 Jan 2023 19:42 UTC
36 points
3 comments13 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
14 comments9 min readLW link

Why al­most ev­ery RL agent does learned optimization

Lee Sharkey12 Feb 2023 4:58 UTC
32 points
3 comments5 min readLW link

Re­mak­ing Effi­cien­tZero (as best I can)

Hoagy4 Jul 2022 11:03 UTC
36 points
9 comments22 min readLW link

Jit­ters No Ev­i­dence of Stu­pidity in RL

1a3orn16 Sep 2021 22:43 UTC
96 points
18 comments3 min readLW link

Book Re­view: Re­in­force­ment Learn­ing by Sut­ton and Barto

billmei20 Oct 2020 19:40 UTC
52 points
3 comments10 min readLW link

Deep­Mind: Gen­er­ally ca­pa­ble agents emerge from open-ended play

Daniel Kokotajlo27 Jul 2021 14:19 UTC
247 points
53 comments2 min readLW link
(deepmind.com)

LeCun’s “A Path Towards Au­tonomous Ma­chine In­tel­li­gence” has an un­solved tech­ni­cal al­ign­ment problem

Steven Byrnes8 May 2023 19:35 UTC
137 points
37 comments15 min readLW link

Multi-di­men­sional re­wards for AGI in­ter­pretabil­ity and control

Steven Byrnes4 Jan 2021 3:08 UTC
19 points
8 comments10 min readLW link

Train­ing My Friend to Cook

lsusr29 Aug 2021 5:54 UTC
70 points
33 comments3 min readLW link

Multi-Agent In­verse Re­in­force­ment Learn­ing: Subop­ti­mal De­mon­stra­tions and Alter­na­tive Solu­tion Concepts

sage_bergerson7 Sep 2021 16:11 UTC
5 points
0 comments1 min readLW link

My take on Vanessa Kosoy’s take on AGI safety

Steven Byrnes30 Sep 2021 12:23 UTC
103 points
10 comments31 min readLW link

In­fer­ence-Only De­bate Ex­per­i­ments Us­ing Math Problems

6 Aug 2024 17:44 UTC
31 points
0 comments2 min readLW link

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

28 Apr 2024 13:00 UTC
41 points
4 comments8 min readLW link

(Ap­pet­i­tive, Con­sum­ma­tory) ≈ (RL, re­flex)

Steven Byrnes15 Jun 2024 15:57 UTC
38 points
1 comment3 min readLW link

Scalar re­ward is not enough for al­igned AGI

Peter Vamplew17 Jan 2022 21:02 UTC
13 points
3 comments11 min readLW link

Emo­tions = Re­ward Functions

jpyykko20 Jan 2022 18:46 UTC
16 points
10 comments5 min readLW link

He­donic Loops and Tam­ing RL

beren19 Jul 2023 15:12 UTC
20 points
14 comments9 min readLW link

[In­tro to brain-like-AGI safety] 5. The “long-term pre­dic­tor”, and TD learning

Steven Byrnes23 Feb 2022 14:44 UTC
52 points
27 comments20 min readLW link

[In­tro to brain-like-AGI safety] 6. Big pic­ture of mo­ti­va­tion, de­ci­sion-mak­ing, and RL

Steven Byrnes2 Mar 2022 15:26 UTC
68 points
17 comments16 min readLW link

Re­in­force­ment Learn­ing in the Iter­ated Am­plifi­ca­tion Framework

William_S9 Feb 2019 0:56 UTC
25 points
12 comments4 min readLW link

Refine­ment of Ac­tive In­fer­ence agency ontology

Roman Leventov15 Dec 2023 9:31 UTC
16 points
0 comments5 min readLW link
(arxiv.org)

Re­in­force­ment learn­ing with im­per­cep­ti­ble rewards

Vanessa Kosoy7 Apr 2019 10:27 UTC
26 points
1 comment32 min readLW link

RLHF

Ansh Radhakrishnan12 May 2022 21:18 UTC
18 points
5 comments5 min readLW link

Re­in­force­ment Learn­ing: A Non-Stan­dard In­tro­duc­tion (Part 1)

royf29 Jul 2012 0:13 UTC
33 points
19 comments2 min readLW link

Re­in­force­ment, Prefer­ence and Utility

royf8 Aug 2012 6:23 UTC
14 points
5 comments3 min readLW link

Re­in­force­ment Learn­ing: A Non-Stan­dard In­tro­duc­tion (Part 2)

royf2 Aug 2012 8:17 UTC
16 points
7 comments3 min readLW link

Ap­ply­ing re­in­force­ment learn­ing the­ory to re­duce felt tem­po­ral distance

Kaj_Sotala26 Jan 2014 9:17 UTC
19 points
6 comments3 min readLW link

Imi­ta­tive Re­in­force­ment Learn­ing as an AGI Approach

TIMUR ZEKI VURAL21 May 2018 14:47 UTC
2 points
1 comment1 min readLW link

Shard The­ory: An Overview

David Udell11 Aug 2022 5:44 UTC
166 points
34 comments10 min readLW link

Del­ega­tive Re­in­force­ment Learn­ing with a Merely Sane Advisor

Vanessa Kosoy5 Oct 2017 14:15 UTC
1 point
2 comments14 min readLW link

In­verse re­in­force­ment learn­ing on self, pre-on­tol­ogy-change

Stuart_Armstrong18 Nov 2015 13:23 UTC
0 points
2 comments1 min readLW link

Clar­ifi­ca­tion: Be­havi­ourism & Reinforcement

Zaine10 Oct 2012 5:30 UTC
13 points
30 comments2 min readLW link

Is Global Re­in­force­ment Learn­ing (RL) a Fan­tasy?

[deleted]31 Oct 2016 1:49 UTC
5 points
50 comments12 min readLW link

Del­ega­tive In­verse Re­in­force­ment Learning

Vanessa Kosoy12 Jul 2017 12:18 UTC
15 points
13 comments16 min readLW link

Vec­tor-Valued Re­in­force­ment Learning

orthonormal1 Nov 2016 0:21 UTC
2 points
1 comment4 min readLW link

Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing vs. Ir­ra­tional Hu­man Preferences

orthonormal18 Jun 2016 0:55 UTC
17 points
2 comments3 min readLW link

Evolu­tion as Back­stop for Re­in­force­ment Learn­ing: multi-level paradigms

gwern12 Jan 2019 17:45 UTC
19 points
0 comments1 min readLW link
(www.gwern.net)

IRL 1/​8: In­verse Re­in­force­ment Learn­ing and the prob­lem of degeneracy

RAISE4 Mar 2019 13:11 UTC
20 points
2 comments1 min readLW link
(app.grasple.com)

Keep­ing up with deep re­in­force­ment learn­ing re­search: /​r/​reinforcementlearning

gwern16 May 2017 19:12 UTC
6 points
2 comments1 min readLW link
(www.reddit.com)

psy­chol­ogy and ap­pli­ca­tions of re­in­force­ment learn­ing: where do I learn more?

jsalvatier26 Jun 2011 20:56 UTC
5 points
1 comment1 min readLW link

Re­ward/​value learn­ing for re­in­force­ment learning

Stuart_Armstrong2 Jun 2017 16:34 UTC
0 points
2 comments2 min readLW link

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learning

9 Nov 2018 15:33 UTC
34 points
3 comments16 min readLW link

“Hu­man-level con­trol through deep re­in­force­ment learn­ing”—com­puter learns 49 differ­ent games

skeptical_lurker26 Feb 2015 6:21 UTC
19 points
19 comments1 min readLW link

Mak­ing a Differ­ence Tem­pore: In­sights from ‘Re­in­force­ment Learn­ing: An In­tro­duc­tion’

TurnTrout5 Jul 2018 0:34 UTC
33 points
6 comments8 min readLW link

Prob­lems in­te­grat­ing de­ci­sion the­ory and in­verse re­in­force­ment learning

agilecaveman8 May 2018 5:11 UTC
7 points
2 comments3 min readLW link

“AIXIjs: A Soft­ware Demo for Gen­eral Re­in­force­ment Learn­ing”, As­lanides 2017

gwern29 May 2017 21:09 UTC
7 points
1 comment1 min readLW link
(arxiv.org)

Some work on con­nect­ing UDT and Re­in­force­ment Learning

IAFF-User-11117 Dec 2015 23:58 UTC
4 points
5 comments1 min readLW link
(drive.google.com)

[Question] What prob­lem would you like to see Re­in­force­ment Learn­ing ap­plied to?

Julian Schrittwieser8 Jul 2020 2:40 UTC
43 points
4 comments1 min readLW link

200 COP in MI: In­ter­pret­ing Re­in­force­ment Learning

Neel Nanda10 Jan 2023 17:37 UTC
25 points
1 comment10 min readLW link

[Question] What messy prob­lems do you see Deep Re­in­force­ment Learn­ing ap­pli­ca­ble to?

Riccardo Volpato5 Apr 2020 17:43 UTC
5 points
0 comments1 min readLW link

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

Jade Bishop15 Apr 2019 3:23 UTC
12 points
5 comments3 min readLW link

Pod­cast with Divia Eden on op­er­ant conditioning

DanielFilan15 Jan 2023 2:44 UTC
14 points
0 comments1 min readLW link
(youtu.be)

Model­ing the ca­pa­bil­ities of ad­vanced AI sys­tems as epi­sodic re­in­force­ment learning

jessicata19 Aug 2016 2:52 UTC
4 points
6 comments5 min readLW link

FHI is ac­cept­ing ap­pli­ca­tions for in­tern­ships in the area of AI Safety and Re­in­force­ment Learning

crmflynn7 Nov 2016 16:33 UTC
9 points
0 comments1 min readLW link
(www.fhi.ox.ac.uk)

We have promis­ing al­ign­ment plans with low taxes

Seth Herd10 Nov 2023 18:51 UTC
40 points
9 comments5 min readLW link

Her­i­ta­bil­ity, Be­hav­iorism, and Within-Life­time RL

Steven Byrnes2 Feb 2023 16:34 UTC
39 points
3 comments4 min readLW link

Beyond Re­in­force­ment Learn­ing: Pre­dic­tive Pro­cess­ing and Checksums

lsusr15 Feb 2023 7:32 UTC
12 points
14 comments3 min readLW link

Hu­man beats SOTA Go AI by learn­ing an ad­ver­sar­ial policy

Vanessa Kosoy19 Feb 2023 9:38 UTC
58 points
32 comments1 min readLW link
(goattack.far.ai)

RL, but don’t do any­thing I wouldn’t do

Gunnar_Zarncke7 Dec 2024 22:54 UTC
63 points
5 comments1 min readLW link
(arxiv.org)

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
30 points
7 comments10 min readLW link

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilan29 Dec 2020 20:41 UTC
12 points
5 comments34 min readLW link

AXRP Epi­sode 3 - Ne­go­tiable Re­in­force­ment Learn­ing with An­drew Critch

DanielFilan29 Dec 2020 20:45 UTC
27 points
0 comments28 min readLW link

Is RL in­volved in sen­sory pro­cess­ing?

Steven Byrnes18 Mar 2021 13:57 UTC
21 points
21 comments5 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC
74 points
40 comments16 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex Flint2 Apr 2021 19:51 UTC
59 points
4 comments7 min readLW link

Mode col­lapse in RL may be fueled by the up­date equation

19 Jun 2023 21:51 UTC
49 points
10 comments8 min readLW link

Big pic­ture of pha­sic dopamine

Steven Byrnes8 Jun 2021 13:07 UTC
66 points
22 comments36 min readLW link

Sup­ple­ment to “Big pic­ture of pha­sic dopamine”

Steven Byrnes8 Jun 2021 13:08 UTC
14 points
2 comments9 min readLW link

Re­ward Is Not Enough

Steven Byrnes16 Jun 2021 13:52 UTC
123 points
19 comments10 min readLW link1 review

A model of de­ci­sion-mak­ing in the brain (the short ver­sion)

Steven Byrnes18 Jul 2021 14:39 UTC
20 points
0 comments3 min readLW link

MDPs and the Bel­l­man Equa­tion, In­tu­itively Explained

Jack O'Brien27 Dec 2022 5:50 UTC
11 points
3 comments14 min readLW link

Notes on Meta’s Di­plo­macy-Play­ing AI

Erich_Grunewald22 Dec 2022 11:34 UTC
14 points
2 comments14 min readLW link
(www.erichgrunewald.com)

On the Im­por­tance of Open Sourc­ing Re­ward Models

elandgre2 Jan 2023 19:01 UTC
18 points
5 comments6 min readLW link

[Linkpost] Dream­erV3: A Gen­eral RL Architecture

simeon_c12 Jan 2023 3:55 UTC
23 points
3 comments1 min readLW link
(arxiv.org)

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

Ex­per­i­ment Idea: RL Agents Evad­ing Learned Shutdownability

Leon Lang16 Jan 2023 22:46 UTC
31 points
7 comments17 min readLW link
(docs.google.com)

Pa­ram­e­ter Scal­ing Comes for RL, Maybe

1a3orn24 Jan 2023 13:55 UTC
99 points
3 comments14 min readLW link

Tem­po­rally Lay­ered Ar­chi­tec­ture for Adap­tive, Distributed and Con­tin­u­ous Control

Roman Leventov2 Feb 2023 6:29 UTC
6 points
4 comments1 min readLW link
(arxiv.org)

Pow­er­ful mesa-op­ti­mi­sa­tion is already here

Roman Leventov17 Feb 2023 4:59 UTC
35 points
1 comment2 min readLW link
(arxiv.org)

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

beren20 Feb 2023 21:32 UTC
14 points
1 comment4 min readLW link

Hu­man-level Full-Press Di­plo­macy (some bare facts).

Cleo Nardo22 Nov 2022 20:59 UTC
50 points
7 comments3 min readLW link

Krueger Lab AI Safety In­tern­ship 2024

Joey Bream24 Jan 2024 19:17 UTC
3 points
0 comments1 min readLW link

Skep­ti­cism About Deep­Mind’s “Grand­mas­ter-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC
55 points
13 comments3 min readLW link

The Carnot Eng­ine of Economics

StrivingForLegibility9 Aug 2024 15:59 UTC
5 points
0 comments5 min readLW link

[Aspira­tion-based de­signs] 2. For­mal frame­work, ba­sic algorithm

28 Apr 2024 13:02 UTC
16 points
2 comments16 min readLW link

Mea­sur­ing Learned Op­ti­miza­tion in Small Trans­former Models

J Bostock8 Apr 2024 14:41 UTC
22 points
0 comments11 min readLW link

Speedrun ru­iner re­search idea

lemonhope13 Apr 2024 23:42 UTC
2 points
11 comments2 min readLW link

Find­ing the es­ti­mate of the value of a state in RL agents

3 Jun 2024 20:26 UTC
7 points
4 comments4 min readLW link

Lan­guage for Goal Mis­gen­er­al­iza­tion: Some For­mal­isms from my MSc Thesis

Giulio14 Jun 2024 19:35 UTC
4 points
0 comments8 min readLW link
(www.giuliostarace.com)

Towards shut­down­able agents via stochas­tic choice

8 Jul 2024 10:14 UTC
59 points
11 comments23 min readLW link
(arxiv.org)

On pre­dictabil­ity, chaos and AIs that don’t game our goals

Alejandro Tlaie15 Jul 2024 17:16 UTC
4 points
8 comments6 min readLW link

Pac­ing Out­side the Box: RNNs Learn to Plan in Sokoban

25 Jul 2024 22:00 UTC
59 points
8 comments2 min readLW link
(arxiv.org)

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

25 Sep 2024 14:52 UTC
30 points
2 comments4 min readLW link
(arxiv.org)

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

7 Nov 2024 15:39 UTC
47 points
6 comments11 min readLW link

The Ex­plore vs. Ex­ploit Dilemma

nathanjzhao14 Oct 2024 6:20 UTC
1 point
0 comments1 min readLW link
(nathanzhao.cc)

[Question] Re­in­force­ment Learn­ing: Essen­tial Step Towards AGI or Ir­rele­vant?

Double17 Oct 2024 3:37 UTC
1 point
0 comments1 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

Au­to­mated mon­i­tor­ing systems

hiki_t28 Nov 2024 18:54 UTC
1 point
0 comments2 min readLW link

Imi­ta­tion Learn­ing from Lan­guage Feedback

30 Mar 2023 14:11 UTC
71 points
3 comments10 min readLW link

Ex­plo­ra­tory Anal­y­sis of RLHF Trans­form­ers with TransformerLens

Curt Tigges3 Apr 2023 16:09 UTC
21 points
2 comments11 min readLW link
(blog.eleuther.ai)

[Question] How is re­in­force­ment learn­ing pos­si­ble in non-sen­tient agents?

SomeoneKind5 Jan 2021 20:57 UTC
3 points
5 comments1 min readLW link

[Question] Where to be­gin in ML/​AI?

Jake the Student6 Apr 2023 20:45 UTC
9 points
4 comments1 min readLW link

Wire­head­ing and mis­al­ign­ment by com­po­si­tion on NetHack

pierlucadoro27 Oct 2023 17:43 UTC
34 points
4 comments4 min readLW link

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst Heitzig11 Nov 2023 18:22 UTC
12 points
0 comments1 min readLW link
(docs.google.com)

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDev1 Dec 2023 5:18 UTC
7 points
0 comments29 min readLW link

Plan­ning in LLMs: In­sights from AlphaGo

jco4 Dec 2023 18:48 UTC
8 points
10 comments11 min readLW link

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max H15 May 2023 3:22 UTC
32 points
4 comments5 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC
36 points
2 comments16 min readLW link

Op­ti­miza­tion hap­pens in­side the mind, not in the world

azsantosk3 Jun 2023 21:36 UTC
17 points
10 comments5 min readLW link

Direct Prefer­ence Op­ti­miza­tion in One Minute

lukemarks26 Jun 2023 11:52 UTC
22 points
3 comments2 min readLW link

[Question] Least-prob­le­matic Re­source for learn­ing RL?

Dalcy18 Jul 2023 16:30 UTC
9 points
7 comments1 min readLW link

Op­ti­miza­tion, loss set at var­i­ance in RL

Clairstan22 Jul 2023 18:25 UTC
1 point
1 comment3 min readLW link

Ex­plor­ing the Mul­ti­verse of Large Lan­guage Models

franky6 Aug 2023 2:38 UTC
1 point
0 comments5 min readLW link

Aspira­tion-based Q-Learning

27 Oct 2023 14:42 UTC
38 points
5 comments11 min readLW link

AlphaS­tar: Im­pres­sive for RL progress, not for AGI progress

orthonormal2 Nov 2019 1:50 UTC
113 points
58 comments2 min readLW link1 review

Good­hart’s Law in Re­in­force­ment Learning

16 Oct 2023 0:54 UTC
126 points
22 comments7 min readLW link

[Question] Train­ing a RL Model with Con­tin­u­ous State & Ac­tion Space in a Real-World Scenario

Alexander Ries15 Oct 2023 22:59 UTC
0 points
0 comments1 min readLW link

Unity Gridworlds

WillPetillo15 Oct 2023 4:36 UTC
9 points
0 comments1 min readLW link

VLM-RM: Spec­i­fy­ing Re­wards with Nat­u­ral Language

23 Oct 2023 14:11 UTC
20 points
2 comments5 min readLW link
(far.ai)

Which an­i­mals can suffer?

Valentin20261 Jun 2021 3:42 UTC
7 points
16 comments1 min readLW link

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hub24 Aug 2021 16:34 UTC
18 points
2 comments5 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axioman1 Oct 2021 21:32 UTC
14 points
12 comments11 min readLW link

[Pro­posal] Method of lo­cat­ing use­ful sub­nets in large models

Quintin Pope13 Oct 2021 20:52 UTC
9 points
0 comments2 min readLW link

Effi­cien­tZero: hu­man ALE sam­ple-effi­ciency w/​MuZero+self-supervised

gwern2 Nov 2021 2:32 UTC
137 points
52 comments1 min readLW link
(arxiv.org)

Be­hav­ior Clon­ing is Miscalibrated

leogao5 Dec 2021 1:36 UTC
77 points
3 comments3 min readLW link

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.Holtman21 Dec 2021 17:32 UTC
8 points
5 comments5 min readLW link

Re­in­force­ment Learn­ing Study Group

Kay Kozaronek26 Dec 2021 23:11 UTC
20 points
8 comments1 min readLW link

Ques­tion 1: Pre­dicted ar­chi­tec­ture of AGI learn­ing al­gorithm(s)

Cameron Berg10 Feb 2022 17:22 UTC
13 points
1 comment7 min readLW link

[Question] What is a train­ing “step” vs. “epi­sode” in ma­chine learn­ing?

Evan R. Murphy28 Apr 2022 21:53 UTC
10 points
4 comments1 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

6 May 2022 9:37 UTC
12 points
6 comments17 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

25 May 2022 9:23 UTC
114 points
17 comments12 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet Farlow31 May 2022 22:03 UTC
19 points
1 comment6 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022 13:36 UTC
7 points
0 comments7 min readLW link

[Link] OpenAI: Learn­ing to Play Minecraft with Video PreTrain­ing (VPT)

Aryeh Englander23 Jun 2022 16:29 UTC
53 points
3 comments1 min readLW link

Re­in­force­ment Learner Wireheading

Nate Showell8 Jul 2022 5:32 UTC
8 points
2 comments3 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

25 Oct 2022 20:48 UTC
14 points
2 comments4 min readLW link

Con­di­tion­ing, Prompts, and Fine-Tuning

Adam Jermyn17 Aug 2022 20:52 UTC
38 points
9 comments4 min readLW link

Deep Q-Net­works Explained

Jay Bailey13 Sep 2022 12:01 UTC
58 points
8 comments20 min readLW link

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John Nay18 Sep 2022 20:39 UTC
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogao21 Sep 2022 0:36 UTC
81 points
7 comments4 min readLW link

Re­ward IS the Op­ti­miza­tion Target

Carn28 Sep 2022 17:59 UTC
−2 points
3 comments5 min readLW link

[Question] What Is the Idea Be­hind (Un-)Su­per­vised Learn­ing and Re­in­force­ment Learn­ing?

Morpheus30 Sep 2022 16:48 UTC
9 points
6 comments2 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
33 points
4 comments8 min readLW link
(www.gladstone.ai)

Misal­ign­ment-by-de­fault in multi-agent systems

13 Oct 2022 15:38 UTC
21 points
8 comments20 min readLW link
(www.gladstone.ai)

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

14 Oct 2022 15:50 UTC
22 points
0 comments17 min readLW link
(www.gladstone.ai)

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
29 points
0 comments1 min readLW link
(github.com)

AGIs may value in­trin­sic re­wards more than ex­trin­sic ones

catubc17 Nov 2022 21:49 UTC
8 points
6 comments4 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

19 Nov 2022 21:04 UTC
45 points
0 comments3 min readLW link

Utility ≠ Reward

Vlad Mikulik5 Sep 2019 17:28 UTC
130 points
24 comments1 min readLW link2 reviews

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

23 Nov 2022 6:49 UTC
11 points
0 comments8 min readLW link
No comments.