RSS

Deconfusion

TagLast edit: Mar 17, 2021, 7:13 PM by abramdemski

Narrowly, deconfusion is a specific branch of AI alignment research, discussed in MIRI’s 2018 research update. More broadly, the term applies to any domain. Quoting from the research update:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

Look­ing Deeper at Deconfusion

adamShimiJun 13, 2021, 9:29 PM
62 points
13 comments15 min readLW link

Builder/​Breaker for Deconfusion

abramdemskiSep 29, 2022, 5:36 PM
72 points
9 comments9 min readLW link

Traps of For­mal­iza­tion in Deconfusion

adamShimiAug 5, 2021, 10:40 PM
28 points
7 comments6 min readLW link

On MIRI’s new re­search directions

Rob BensingerNov 22, 2018, 11:42 PM
53 points
12 comments1 min readLW link
(intelligence.org)

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM
16 points
8 comments15 min readLW link

[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTroutJun 16, 2021, 2:09 AM
23 points
59 comments1 min readLW link

De­cep­tive Align­ment and Homuncularity

Jan 16, 2025, 1:55 PM
25 points
12 comments22 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

berenDec 2, 2022, 11:30 AM
134 points
19 comments10 min readLW link

My re­search agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM
75 points
9 comments11 min readLW link

My Cen­tral Align­ment Pri­or­ity (2 July 2023)

Nicholas / Heather KrossJul 3, 2023, 1:46 AM
12 points
1 comment3 min readLW link

Strat­egy is the De­con­fu­sion of Action

ryan_bJan 2, 2019, 8:56 PM
69 points
4 comments6 min readLW link

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_bJul 14, 2020, 10:48 PM
35 points
25 comments3 min readLW link

Ex­er­cises in Com­pre­hen­sive In­for­ma­tion Gathering

johnswentworthFeb 15, 2020, 5:27 PM
141 points
18 comments3 min readLW link1 review

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimiJun 23, 2021, 9:44 AM
15 points
3 comments3 min readLW link

Mus­ings on gen­eral sys­tems alignment

Alex FlintJun 30, 2021, 6:16 PM
31 points
11 comments3 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM
38 points
3 comments5 min readLW link1 review

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimiAug 9, 2021, 2:26 PM
16 points
4 comments2 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimiAug 12, 2021, 8:37 PM
11 points
9 comments4 min readLW link

A re­view of “Agents and De­vices”

adamShimiAug 13, 2021, 8:42 AM
21 points
0 comments4 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimiAug 14, 2021, 3:16 PM
16 points
8 comments8 min readLW link

Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

Aug 16, 2021, 7:12 AM
91 points
0 comments9 min readLW link

My sum­mary of the al­ign­ment problem

Peter HroššoAug 11, 2022, 7:42 PM
15 points
3 comments2 min readLW link
(threadreaderapp.com)

Re­ward is the op­ti­miza­tion tar­get (of ca­pa­bil­ities re­searchers)

Max HMay 15, 2023, 3:22 AM
32 points
4 comments5 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarksJun 11, 2023, 12:13 AM
22 points
0 comments5 min readLW link

Simulators

janusSep 2, 2022, 12:45 PM
632 points
168 comments41 min readLW link8 reviews
(generative.ink)

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

AorouSep 24, 2022, 2:33 AM
7 points
6 comments3 min readLW link

Real­ity and re­al­ity-boxes

Jim PivarskiMay 13, 2023, 2:14 PM
37 points
11 comments21 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTroutJul 25, 2022, 12:03 AM
376 points
123 comments10 min readLW link3 reviews

Try­ing to iso­late ob­jec­tives: ap­proaches to­ward high-level interpretability

JozdienJan 9, 2023, 6:33 PM
49 points
14 comments8 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo PuteraMay 11, 2023, 4:16 AM
11 points
1 comment3 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM
30 points
14 comments9 min readLW link

The Point of Trade

Eliezer YudkowskyJun 22, 2021, 5:56 PM
175 points
77 comments4 min readLW link1 review

The Plan

johnswentworthDec 10, 2021, 11:41 PM
260 points
78 comments14 min readLW link1 review

Clar­ify­ing in­ner al­ign­ment terminology

evhubNov 9, 2020, 8:40 PM
109 points
17 comments3 min readLW link1 review
No comments.