Link post
Dwarkesh’s summary:
Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!We discuss:Does he regret inventing RLHF, and is alignment necessarily dual-use?Why he has relatively modest timelines (40% by 2040, 15% by 2030),What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,His current research into a new proof system, and how this could solve alignment by explaining model’s behaviorand much more.
Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!
We discuss:
Does he regret inventing RLHF, and is alignment necessarily dual-use?
Why he has relatively modest timelines (40% by 2040, 15% by 2030),
What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,
His current research into a new proof system, and how this could solve alignment by explaining model’s behavior
and much more.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
Paul Christiano on Dwarkesh Podcast
Link post
Dwarkesh’s summary: