RSS

The Poin­t­ers Problem

TagLast edit: Oct 18, 2024, 9:29 AM by Rafael Harth

Consider an agent with a model of the world W. How does W relate to the real world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem is about figuring out f.

In John’s words (who introduced the concept here):

What functions of what variables (if any) in the environment and/​or another world-model correspond to the latent variables in the agent’s world-model?

This relates to alignment, as we would like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. Therefore we’d like to figure out how to point to our values directly.

The Poin­t­ers Prob­lem: Clar­ifi­ca­tions/​Variations

abramdemskiJan 5, 2021, 5:29 PM
61 points
16 comments18 min readLW link

The Poin­t­ers Prob­lem: Hu­man Values Are A Func­tion Of Hu­mans’ La­tent Variables

johnswentworthNov 18, 2020, 5:47 PM
128 points
50 comments11 min readLW link2 reviews

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

Nov 18, 2022, 1:48 AM
72 points
64 comments12 min readLW link

The Poin­ter Re­s­olu­tion Problem

JozdienFeb 16, 2024, 9:25 PM
41 points
20 comments3 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
116 points
10 comments1 min readLW link

[In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

Steven ByrnesMar 23, 2022, 12:48 PM
46 points
11 comments22 min readLW link

Peo­ple care about each other even though they have im­perfect mo­ti­va­tional poin­t­ers?

TurnTroutNov 8, 2022, 6:15 PM
33 points
25 comments7 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTroutNov 26, 2022, 9:16 PM
48 points
49 comments18 min readLW link

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTroutNov 29, 2022, 6:23 AM
62 points
41 comments15 min readLW link

Stable Poin­t­ers to Value II: En­vi­ron­men­tal Goals

abramdemskiFeb 9, 2018, 6:03 AM
19 points
3 comments4 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemskiJul 21, 2018, 8:06 AM
20 points
4 comments4 min readLW link

Stable Poin­t­ers to Value: An Agent Embed­ded in Its Own Utility Function

abramdemskiAug 17, 2017, 12:22 AM
15 points
9 comments5 min readLW link

Half-baked idea: a straight­for­ward method for learn­ing en­vi­ron­men­tal goals?

Q HomeFeb 4, 2025, 6:56 AM
16 points
7 comments5 min readLW link

[Question] Pop­u­lar ma­te­ri­als about en­vi­ron­men­tal goals/​agent foun­da­tions? Peo­ple want­ing to dis­cuss such top­ics?

Q HomeJan 22, 2025, 3:30 AM
5 points
0 comments1 min readLW link

Hu­man sex­u­al­ity as an in­ter­est­ing case study of alignment

berenDec 30, 2022, 1:37 PM
39 points
26 comments3 min readLW link

Up­dat­ing Utility Functions

May 9, 2022, 9:44 AM
41 points
6 comments8 min readLW link

Clar­ify­ing Align­ment Fun­da­men­tals Through the Lens of Ontology

eternal/ephemeraOct 7, 2024, 8:57 PM
12 points
4 comments24 min readLW link
No comments.