RSS

Spu­ri­ous Counterfactuals

TagLast edit: Jun 20, 2022, 4:04 PM by abramdemski

Spurious Counterfactuals are spuriously low evaluations of the quality of a potential action, which are only provable because they are self-fulfilling (usually due to Lob’s theorem). For example, if I know that I go left, then it is logically true that if I went right, I would get −10 utility (because in classical logic, false statements imply any statement). This suggests that if I fully believed that I went left, then I would indeed go left. By Lob’s theorem, I indeed go left.

Building agents who avoid this line of reasoning, despite having full access to their own source code and the ability to logically reason about their own behavior, is one goal of Embedded Agency.

A Pos­si­ble Re­s­olu­tion To Spu­ri­ous Counterfactuals

JoshuaOSHickmanDec 6, 2021, 6:26 PM
15 points
5 comments4 min readLW link

Model­ing nat­u­ral­ized de­ci­sion prob­lems in lin­ear logic

jessicataMay 6, 2020, 12:15 AM
14 points
2 comments6 min readLW link
(unstableontology.com)

An In­tro­duc­tion to Löb’s The­o­rem in MIRI Research

orthonormalMar 23, 2015, 10:22 PM
29 points
27 comments2 min readLW link

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
201 points
17 comments54 min readLW link

Threat­en­ing to do the im­pos­si­ble: A solu­tion to spu­ri­ous coun­ter­fac­tu­als for func­tional de­ci­sion the­ory via proof theory

Christopher KingFeb 11, 2023, 7:57 AM
5 points
4 comments5 min readLW link
No comments.