orthonormal comments on Ends Don’t Justify Means (Among Humans)

orthonormal 14 Apr 2020 3:03 UTC
12 points
The notion of untrusted hardware seems like something wholly outside the realm of classical decision theory. (What it does to reflective decision theory I can’t yet say, but that would seem to be the appropriate level to handle it.)
It’s nice to see the genesis of corrigibility before Eliezer had unconfused himself enough to take that first step.
What links here?
- Corrigibility as outside view by TurnTrout (8 May 2020 21:56 UTC; 36 points)