Shard Theory

Jul 14, 2022, 1:36 AM

Written by Quintin Pope, Alex Turner, Charles Foster, and Logan Smith. Card image generated by DALL-E 2:

Humans provide an untapped wealth of evidence about alignment

TurnTrout and Quintin Pope

Jul 14, 2022, 2:31 AM

211 points

94 comments9 min readLW link 1 review

Human values & biases are inaccessible to the genome

TurnTroutJul 7, 2022, 5:29 PM

94 points

54 comments6 min readLW link 1 review

General alignment properties

TurnTroutAug 8, 2022, 11:40 PM

50 points

2 comments1 min readLW link

Evolution is a bad analogy for AGI: inner alignment

Quintin PopeAug 13, 2022, 10:15 PM

79 points

15 comments8 min readLW link

Reward is not the optimization target

TurnTroutJul 25, 2022, 12:03 AM

375 points

123 comments10 min readLW link 3 reviews

The shard theory of human values

Quintin Pope and TurnTrout

Sep 4, 2022, 4:28 AM

255 points

67 comments24 min readLW link 2 reviews

Understanding and avoiding value drift

TurnTroutSep 9, 2022, 4:16 AM

48 points

14 comments6 min readLW link

A shot at the diamond-alignment problem

TurnTroutOct 6, 2022, 6:29 PM

95 points

67 comments15 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

Don’t align agents to evaluations of plans

TurnTroutNov 26, 2022, 9:16 PM

48 points

49 comments18 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTroutNov 29, 2022, 6:23 AM

62 points

41 comments15 min readLW link

Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTroutDec 2, 2022, 2:43 AM

148 points

22 comments47 min readLW link 3 reviews

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer