Neel Nanda comments on All AGI safety questions welcome (especially basic ones) [July 2022]

Neel Nanda 17 Jul 2022 3:30 UTC
6 points
1
Two points where I disagree with this argument:
We may not be able to prove something about an arbitrary AGI, but could interpret the resulting program and prove things about that
Alignment does not mean probably correct, I would define it as “empirically doesn’t kill us”