Reinforcement Learning

TagLast edit: 29 May 2023 19:17 UTC by Roman Leventov

Within the field of machine learning, reinforcement learning refers to the study of how to train agents to complete tasks by updating (“reinforcing”) the agents with feedback signals.

Consider an agent that receives an input informing the agent of the environment’s state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment. This action will in itself change the state of the environment, which will result in a new input, and so on, each time also presenting the agent with the reward (or reinforcement signal) relative to its actions in the environment. In “policy gradient” approaches, the reinforcement signal is often used to update the agent (the “policy”), although sometimes an agent will do limited online (model-based) heuristic search to instead optimize the reward signal + heuristic evaluation.

RL is distinguished from energy-based architectures such as Active Inference, Joint Embedded Predictive Architectures (JEPA), and GFlowNets.

Exploration and Optimization

Knowing that randomly selecting the actions will result in poor performances, one of the biggest problems in reinforcement learning is exploring the avaliable set of responses to avoid getting stuck in sub-optimal choices and proceed to better ones.

This is the problem of exploration, which is best described in the most studied reinforcement learning problem—the k-armed bandit. In it, an agent has to decide which sequence of levers to pull in a gambling room, not having any information about the probabilities of winning in each machine besides the reward it receives each time. The problem revolves about deciding which is the optimal lever and what criteria defines the lever as such.

Parallel with an exploration implementation, it is still necessary to chose the criteria which makes a certain action optimal when compared to another. This study of this property has led to several methods, from brute forcing to taking into account temporal differences in the received reward. Despite this and the great results obtained by reinforcement methods in solving small problems, it suffers from a lack of scalability, having difficulties solving larger, close-to-human scenarios.

Think carefully before calling RL policies “agents”

TurnTrout2 Jun 2023 3:46 UTC

133 points

36 comments4 min readLW link

Reward is not the optimization target

TurnTrout25 Jul 2022 0:03 UTC

376 points

123 comments10 min readLW link 3 reviews

Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato and Ramana Kumar

28 Oct 2020 16:01 UTC

47 points

2 comments1 min readLW link

Remaking EfficientZero (as best I can)

Hoagy4 Jul 2022 11:03 UTC

36 points

9 comments22 min readLW link

Why almost every RL agent does learned optimization

Lee Sharkey12 Feb 2023 4:58 UTC

32 points

3 comments5 min readLW link

Models Don’t “Get Reward”

Sam Ringer30 Dec 2022 10:37 UTC

312 points

61 comments5 min readLW link 1 review

Interpreting the Learning of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC

30 points

14 comments9 min readLW link

AGI will have learnt utility functions

beren25 Jan 2023 19:42 UTC

36 points

3 comments13 min readLW link

EfficientZero: How It Works

1a3orn26 Nov 2021 15:17 UTC

297 points

50 comments29 min readLW link 1 review

Book Review: Reinforcement Learning by Sutton and Barto

billmei20 Oct 2020 19:40 UTC

52 points

3 comments10 min readLW link

Jitters No Evidence of Stupidity in RL

1a3orn16 Sep 2021 22:43 UTC

96 points

18 comments3 min readLW link

DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo27 Jul 2021 14:19 UTC

247 points

53 comments2 min readLW link

(deepmind.com)

[Aspiration-based designs] 1. Informal introduction

B Jacobs, Jobst Heitzig, Simon Fischer and Simon Dima

28 Apr 2024 13:00 UTC

41 points

4 comments8 min readLW link

Training My Friend to Cook

lsusr29 Aug 2021 5:54 UTC

70 points

33 comments3 min readLW link

Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts

sage_bergerson7 Sep 2021 16:11 UTC

5 points

0 comments1 min readLW link

My take on Vanessa Kosoy’s take on AGI safety

Steven Byrnes30 Sep 2021 12:23 UTC

103 points

10 comments31 min readLW link

Inference-Only Debate Experiments Using Math Problems

Arjun Panickssery, Abhimanyu Pallavi Sudhir and JacksonKaunismaa

6 Aug 2024 17:44 UTC

31 points

0 comments2 min readLW link

Refinement of Active Inference agency ontology

Roman Leventov15 Dec 2023 9:31 UTC

16 points

0 comments5 min readLW link

(arxiv.org)

(Appetitive, Consummatory) ≈ (RL, reflex)

Steven Byrnes15 Jun 2024 15:57 UTC

36 points

1 comment3 min readLW link

Hedonic Loops and Taming RL

beren19 Jul 2023 15:12 UTC

20 points

14 comments9 min readLW link

Scalar reward is not enough for aligned AGI

Peter Vamplew17 Jan 2022 21:02 UTC

13 points

3 comments11 min readLW link

Emotions = Reward Functions

jpyykko20 Jan 2022 18:46 UTC

16 points

10 comments5 min readLW link

Reinforcement Learning in the Iterated Amplification Framework

William_S9 Feb 2019 0:56 UTC

25 points

12 comments4 min readLW link

[Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning

Steven Byrnes23 Feb 2022 14:44 UTC

52 points

27 comments20 min readLW link

[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL

Steven Byrnes2 Mar 2022 15:26 UTC

68 points

17 comments16 min readLW link

We have promising alignment plans with low taxes

Seth Herd10 Nov 2023 18:51 UTC

33 points

9 comments5 min readLW link

Reinforcement learning with imperceptible rewards

Vanessa Kosoy7 Apr 2019 10:27 UTC

26 points

1 comment32 min readLW link

Reinforcement Learning: A Non-Standard Introduction (Part 1)

royf29 Jul 2012 0:13 UTC

33 points

19 comments2 min readLW link

RLHF

Ansh Radhakrishnan12 May 2022 21:18 UTC

18 points

5 comments5 min readLW link

Reinforcement, Preference and Utility

royf8 Aug 2012 6:23 UTC

14 points

5 comments3 min readLW link

Reinforcement Learning: A Non-Standard Introduction (Part 2)

royf2 Aug 2012 8:17 UTC

16 points

7 comments3 min readLW link

Applying reinforcement learning theory to reduce felt temporal distance

Kaj_Sotala26 Jan 2014 9:17 UTC

19 points

6 comments3 min readLW link

Imitative Reinforcement Learning as an AGI Approach

TIMUR ZEKI VURAL21 May 2018 14:47 UTC

2 points

1 comment1 min readLW link

Delegative Reinforcement Learning with a Merely Sane Advisor

Vanessa Kosoy5 Oct 2017 14:15 UTC

1 point

2 comments14 min readLW link

Shard Theory: An Overview

David Udell11 Aug 2022 5:44 UTC

165 points

34 comments10 min readLW link

Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong18 Nov 2015 13:23 UTC

0 points

2 comments1 min readLW link

Clarification: Behaviourism & Reinforcement

Zaine10 Oct 2012 5:30 UTC

13 points

30 comments2 min readLW link

Is Global Reinforcement Learning (RL) a Fantasy?

[deleted]31 Oct 2016 1:49 UTC

5 points

50 comments12 min readLW link

Delegative Inverse Reinforcement Learning

Vanessa Kosoy12 Jul 2017 12:18 UTC

15 points

13 comments16 min readLW link

Vector-Valued Reinforcement Learning

orthonormal1 Nov 2016 0:21 UTC

2 points

1 comment4 min readLW link

Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormal18 Jun 2016 0:55 UTC

17 points

2 comments3 min readLW link

Evolution as Backstop for Reinforcement Learning: multi-level paradigms

gwern12 Jan 2019 17:45 UTC

19 points

0 comments1 min readLW link

(www.gwern.net)

IRL 1/8: Inverse Reinforcement Learning and the problem of degeneracy

RAISE4 Mar 2019 13:11 UTC

20 points

2 comments1 min readLW link

(app.grasple.com)

Keeping up with deep reinforcement learning research: /r/reinforcementlearning

gwern16 May 2017 19:12 UTC

6 points

2 comments1 min readLW link

(www.reddit.com)

psychology and applications of reinforcement learning: where do I learn more?

jsalvatier26 Jun 2011 20:56 UTC

5 points

1 comment1 min readLW link

Reward/value learning for reinforcement learning

Stuart_Armstrong2 Jun 2017 16:34 UTC

0 points

2 comments2 min readLW link

Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans and jsteinhardt

9 Nov 2018 15:33 UTC

34 points

3 comments16 min readLW link

“Human-level control through deep reinforcement learning”—computer learns 49 different games

skeptical_lurker26 Feb 2015 6:21 UTC

19 points

19 comments1 min readLW link

Making a Difference Tempore: Insights from ‘Reinforcement Learning: An Introduction’

TurnTrout5 Jul 2018 0:34 UTC

33 points

6 comments8 min readLW link

Problems integrating decision theory and inverse reinforcement learning

agilecaveman8 May 2018 5:11 UTC

7 points

2 comments3 min readLW link

“AIXIjs: A Software Demo for General Reinforcement Learning”, Aslanides 2017

gwern29 May 2017 21:09 UTC

7 points

1 comment1 min readLW link

(arxiv.org)

Some work on connecting UDT and Reinforcement Learning

IAFF-User-11117 Dec 2015 23:58 UTC

4 points

5 comments1 min readLW link

(drive.google.com)

[Question] What problem would you like to see Reinforcement Learning applied to?

Julian Schrittwieser8 Jul 2020 2:40 UTC

43 points

4 comments1 min readLW link

[Question] What messy problems do you see Deep Reinforcement Learning applicable to?

Riccardo Volpato5 Apr 2020 17:43 UTC

5 points

0 comments1 min readLW link

200 COP in MI: Interpreting Reinforcement Learning

Neel Nanda10 Jan 2023 17:37 UTC

25 points

1 comment10 min readLW link

[Question] Can coherent extrapolated volition be estimated with Inverse Reinforcement Learning?

Jade Bishop15 Apr 2019 3:23 UTC

12 points

5 comments3 min readLW link

Modeling the capabilities of advanced AI systems as episodic reinforcement learning

jessicata19 Aug 2016 2:52 UTC

4 points

6 comments5 min readLW link

Podcast with Divia Eden on operant conditioning

DanielFilan15 Jan 2023 2:44 UTC

14 points

0 comments1 min readLW link

(youtu.be)

FHI is accepting applications for internships in the area of AI Safety and Reinforcement Learning

crmflynn7 Nov 2016 16:33 UTC

9 points

0 comments1 min readLW link

(www.fhi.ox.ac.uk)

Heritability, Behaviorism, and Within-Lifetime RL

Steven Byrnes2 Feb 2023 16:34 UTC

39 points

3 comments4 min readLW link

Beyond Reinforcement Learning: Predictive Processing and Checksums

lsusr15 Feb 2023 7:32 UTC

12 points

14 comments3 min readLW link

Human beats SOTA Go AI by learning an adversarial policy

Vanessa Kosoy19 Feb 2023 9:38 UTC

58 points

32 comments1 min readLW link

(goattack.far.ai)

AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan29 Dec 2020 20:41 UTC

12 points

5 comments34 min readLW link

AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch

DanielFilan29 Dec 2020 20:45 UTC

26 points

0 comments28 min readLW link

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC

30 points

7 comments10 min readLW link

Multi-dimensional rewards for AGI interpretability and control

Steven Byrnes4 Jan 2021 3:08 UTC

19 points

8 comments10 min readLW link

Mode collapse in RL may be fueled by the update equation

TurnTrout and MichaelEinhorn

19 Jun 2023 21:51 UTC

49 points

10 comments8 min readLW link

Is RL involved in sensory processing?

Steven Byrnes18 Mar 2021 13:57 UTC

21 points

21 comments5 min readLW link

My AGI Threat Model: Misaligned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC

74 points

40 comments16 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex Flint2 Apr 2021 19:51 UTC

59 points

4 comments7 min readLW link

LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem

Steven Byrnes8 May 2023 19:35 UTC

136 points

37 comments15 min readLW link

Big picture of phasic dopamine

Steven Byrnes8 Jun 2021 13:07 UTC

66 points

22 comments36 min readLW link

Supplement to “Big picture of phasic dopamine”

Steven Byrnes8 Jun 2021 13:08 UTC

14 points

2 comments9 min readLW link

Reward Is Not Enough

Steven Byrnes16 Jun 2021 13:52 UTC

123 points

19 comments10 min readLW link 1 review

A model of decision-making in the brain (the short version)

Steven Byrnes18 Jul 2021 14:39 UTC

20 points

0 comments3 min readLW link

Notes on Meta’s Diplomacy-Playing AI

Erich_Grunewald22 Dec 2022 11:34 UTC

14 points

2 comments14 min readLW link

(www.erichgrunewald.com)

On the Importance of Open Sourcing Reward Models

elandgre2 Jan 2023 19:01 UTC

18 points

5 comments6 min readLW link

[Linkpost] DreamerV3: A General RL Architecture

simeon_c12 Jan 2023 3:55 UTC

23 points

3 comments1 min readLW link

(arxiv.org)

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC

17 points

2 comments2 min readLW link

(arxiv.org)

Experiment Idea: RL Agents Evading Learned Shutdownability

Leon Lang16 Jan 2023 22:46 UTC

31 points

7 comments17 min readLW link

(docs.google.com)

Parameter Scaling Comes for RL, Maybe

1a3orn24 Jan 2023 13:55 UTC

99 points

3 comments14 min readLW link

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

Roman Leventov2 Feb 2023 6:29 UTC

6 points

4 comments1 min readLW link

(arxiv.org)

Powerful mesa-optimisation is already here

Roman Leventov17 Feb 2023 4:59 UTC

35 points

1 comment2 min readLW link

(arxiv.org)

Validator models: A simple approach to detecting goodharting

beren20 Feb 2023 21:32 UTC

14 points

1 comment4 min readLW link

Human-level Full-Press Diplomacy (some bare facts).

Cleo Nardo22 Nov 2022 20:59 UTC

50 points

7 comments3 min readLW link

Krueger Lab AI Safety Internship 2024

Joey Bream24 Jan 2024 19:17 UTC

3 points

0 comments1 min readLW link

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun Panickssery12 Feb 2024 0:56 UTC

55 points

13 comments3 min readLW link

The Carnot Engine of Economics

StrivingForLegibility9 Aug 2024 15:59 UTC

5 points

0 comments5 min readLW link

[Aspiration-based designs] 2. Formal framework, basic algorithm

Jobst Heitzig, Simon Dima and Simon Fischer

28 Apr 2024 13:02 UTC

16 points

2 comments16 min readLW link

Measuring Learned Optimization in Small Transformer Models

J Bostock8 Apr 2024 14:41 UTC

22 points

0 comments11 min readLW link

Speedrun ruiner research idea

lemonhope13 Apr 2024 23:42 UTC

2 points

11 comments2 min readLW link

Finding the estimate of the value of a state in RL agents

Clément Dumas, Walter Laurito , KlaRo and Kaarel

3 Jun 2024 20:26 UTC

7 points

4 comments4 min readLW link

Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis

Giulio14 Jun 2024 19:35 UTC

4 points

0 comments8 min readLW link

(www.giuliostarace.com)

Towards shutdownable agents via stochastic choice

EJT, alexr, christosi and LAThomson

8 Jul 2024 10:14 UTC

59 points

11 comments23 min readLW link

(arxiv.org)

On predictability, chaos and AIs that don’t game our goals

Alejandro Tlaie15 Jul 2024 17:16 UTC

4 points

8 comments6 min readLW link

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

Adrià Garriga-alonso, taufeeque, AdamGleave and ChengCheng

25 Jul 2024 22:00 UTC

59 points

8 comments2 min readLW link

(arxiv.org)

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi and Dylan Cope

25 Sep 2024 14:52 UTC

30 points

2 comments4 min readLW link

(arxiv.org)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser and Brendan Murphy

7 Nov 2024 15:39 UTC

47 points

6 comments11 min readLW link

The Explore vs. Exploit Dilemma

nathanjzhao14 Oct 2024 6:20 UTC

1 point

0 comments1 min readLW link

(nathanzhao.cc)

[Question] Reinforcement Learning: Essential Step Towards AGI or Irrelevant?

Double17 Oct 2024 3:37 UTC

1 point

0 comments1 min readLW link

Imitation Learning from Language Feedback

Jérémy Scheurer, Tomek Korbak and Ethan Perez

30 Mar 2023 14:11 UTC

71 points

3 comments10 min readLW link

Exploratory Analysis of RLHF Transformers with TransformerLens

Curt Tigges3 Apr 2023 16:09 UTC

21 points

2 comments11 min readLW link

(blog.eleuther.ai)

[Question] How is reinforcement learning possible in non-sentient agents?

SomeoneKind5 Jan 2021 20:57 UTC

3 points

5 comments1 min readLW link

[Question] Where to begin in ML/AI?

Jake the Student6 Apr 2023 20:45 UTC

9 points

4 comments1 min readLW link

Wireheading and misalignment by composition on NetHack

pierlucadoro27 Oct 2023 17:43 UTC

34 points

4 comments4 min readLW link

AISC project: SatisfIA – AI that satisfies without overdoing it

Jobst Heitzig11 Nov 2023 18:22 UTC

12 points

0 comments1 min readLW link

(docs.google.com)

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDev1 Dec 2023 5:18 UTC

7 points

0 comments29 min readLW link

Planning in LLMs: Insights from AlphaGo

jco4 Dec 2023 18:48 UTC

8 points

10 comments11 min readLW link

Reward is the optimization target (of capabilities researchers)

Max H15 May 2023 3:22 UTC

32 points

4 comments5 min readLW link

A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC

36 points

2 comments16 min readLW link

Optimization happens inside the mind, not in the world

azsantosk3 Jun 2023 21:36 UTC

17 points

10 comments5 min readLW link

Direct Preference Optimization in One Minute

lukemarks26 Jun 2023 11:52 UTC

22 points

3 comments2 min readLW link

[Question] Least-problematic Resource for learning RL?

Dalcy18 Jul 2023 16:30 UTC

9 points

7 comments1 min readLW link

Optimization, loss set at variance in RL

Clairstan22 Jul 2023 18:25 UTC

1 point

1 comment3 min readLW link

Exploring the Multiverse of Large Language Models

franky6 Aug 2023 2:38 UTC

1 point

0 comments5 min readLW link

Aspiration-based Q-Learning

Clément Dumas and Jobst Heitzig

27 Oct 2023 14:42 UTC

38 points

5 comments11 min readLW link

AlphaStar: Impressive for RL progress, not for AGI progress

orthonormal2 Nov 2019 1:50 UTC

113 points

58 comments2 min readLW link 1 review

Goodhart’s Law in Reinforcement Learning

jacek, Joar Skalse, OliverHayman, charlie_griffin and Xingjian Bai

16 Oct 2023 0:54 UTC

126 points

22 comments7 min readLW link

[Question] Training a RL Model with Continuous State & Action Space in a Real-World Scenario

Alexander Ries15 Oct 2023 22:59 UTC

0 points

0 comments1 min readLW link

Unity Gridworlds

WillPetillo15 Oct 2023 4:36 UTC

9 points

0 comments1 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

Which animals can suffer?

Valentin20261 Jun 2021 3:42 UTC

7 points

16 comments1 min readLW link

Extraction of human preferences 👨→🤖

arunraja-hub24 Aug 2021 16:34 UTC

18 points

2 comments5 min readLW link

Proposal: Scaling laws for RL generalization

axioman1 Oct 2021 21:32 UTC

14 points

12 comments11 min readLW link

[Proposal] Method of locating useful subnets in large models

Quintin Pope13 Oct 2021 20:52 UTC

9 points

0 comments2 min readLW link

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern2 Nov 2021 2:32 UTC

137 points

52 comments1 min readLW link

(arxiv.org)

Behavior Cloning is Miscalibrated

leogao5 Dec 2021 1:36 UTC

77 points

3 comments3 min readLW link

Demanding and Designing Aligned Cognitive Architectures

Koen.Holtman21 Dec 2021 17:32 UTC

8 points

5 comments5 min readLW link

Reinforcement Learning Study Group

Kay Kozaronek26 Dec 2021 23:11 UTC

20 points

8 comments1 min readLW link

Question 1: Predicted architecture of AGI learning algorithm(s)

Cameron Berg10 Feb 2022 17:22 UTC

13 points

1 comment7 min readLW link

[Question] What is a training “step” vs. “episode” in machine learning?

Evan R. Murphy28 Apr 2022 21:53 UTC

10 points

4 comments1 min readLW link

Open Problems in Negative Side Effect Minimization

Fabian Schimpf and Lukas Fluri

6 May 2022 9:37 UTC

12 points

6 comments17 min readLW link

RL with KL penalties is better seen as Bayesian inference

Tomek Korbak and Ethan Perez

25 May 2022 9:23 UTC

114 points

17 comments12 min readLW link

Machines vs Memes Part 1: AI Alignment and Memetics

Harriet Farlow31 May 2022 22:03 UTC

19 points

1 comment6 min readLW link

Machines vs Memes Part 3: Imitation and Memes

ceru231 Jun 2022 13:36 UTC

7 points

0 comments7 min readLW link

[Link] OpenAI: Learning to Play Minecraft with Video PreTraining (VPT)

Aryeh Englander23 Jun 2022 16:29 UTC

53 points

3 comments1 min readLW link

Reinforcement Learner Wireheading

Nate Showell8 Jul 2022 5:32 UTC

8 points

2 comments3 min readLW link

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

25 Oct 2022 20:48 UTC

14 points

2 comments4 min readLW link

Conditioning, Prompts, and Fine-Tuning

Adam Jermyn17 Aug 2022 20:52 UTC

38 points

9 comments4 min readLW link

Deep Q-Networks Explained

Jay Bailey13 Sep 2022 12:01 UTC

58 points

8 comments20 min readLW link

Leveraging Legal Informatics to Align AI

John Nay18 Sep 2022 20:39 UTC

11 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

Towards deconfusing wireheading and reward maximization

leogao21 Sep 2022 0:36 UTC

81 points

7 comments4 min readLW link

Reward IS the Optimization Target

Carn28 Sep 2022 17:59 UTC

−2 points

3 comments5 min readLW link

[Question] What Is the Idea Behind (Un-)Supervised Learning and Reinforcement Learning?

Morpheus30 Sep 2022 16:48 UTC

9 points

6 comments2 min readLW link

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

12 Oct 2022 12:24 UTC

33 points

4 comments8 min readLW link

(www.gladstone.ai)

Misalignment-by-default in multi-agent systems

Edouard Harris and simonsdsuo

13 Oct 2022 15:38 UTC

21 points

8 comments20 min readLW link

(www.gladstone.ai)

Instrumental convergence: scale and physical interactions

Edouard Harris and simonsdsuo

14 Oct 2022 15:50 UTC

22 points

0 comments17 min readLW link

(www.gladstone.ai)

Learning societal values from law as part of an AGI alignment strategy

John Nay21 Oct 2022 2:03 UTC

5 points

18 comments54 min readLW link

POWERplay: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC

29 points

0 comments1 min readLW link

(github.com)

AGIs may value intrinsic rewards more than extrinsic ones

catubc17 Nov 2022 21:49 UTC

8 points

6 comments4 min readLW link

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

19 Nov 2022 21:04 UTC

45 points

0 comments3 min readLW link

Utility ≠ Reward

Vlad Mikulik5 Sep 2019 17:28 UTC

130 points

24 comments1 min readLW link 2 reviews

Sets of objectives for a multi-objective RL agent to optimize

Ben Smith and Roland Pihlakas

23 Nov 2022 6:49 UTC

11 points

0 comments8 min readLW link

MDPs and the Bellman Equation, Intuitively Explained

Jack O'Brien27 Dec 2022 5:50 UTC

11 points

3 comments14 min readLW link

No comments.

Re­in­force­ment Learning

Exploration and Optimization

Further Reading & References

See Also

Reinforcement Learning