The notion of untrusted hardware seems like something wholly outside the realm of classical decision theory. (What it does to reflective decision theory I can’t yet say, but that would seem to be the appropriate level to handle it.)
It’s nice to see the genesis of corrigibility before Eliezer had unconfused himself enough to take that first step.
It’s nice to see the genesis of corrigibility before Eliezer had unconfused himself enough to take that first step.