Interesting, thanks! Serves me right for not reading the “Indifference” paper!
I think the discussions here and especially here are strong evidence that at least Eliezer & Nate are expecting powerful AGIs to be pure-long-term-consequentialist. (I didn’t ask, I’m just going by what they wrote.) I surmise they have a (correct) picture in their head of how super-powerful a pure-long-term-consequentialist AI can be—e.g. it can self-modify, it can pursue creative instrumental goals, it’s reflectively stable, etc.—but they have not similarly envisioned a partially-but-not-completely-long-term-consequentialist AI that is only modestly less powerful (and in particular can still self-modify, can still pursue creative instrumental goals, and is still reflectively stable). That’s what “My corrigibility proposal sketch” was trying to offer.
I’ll reword to try to describe the situation better, thanks again.
Interesting, thanks! Serves me right for not reading the “Indifference” paper!
I think the discussions here and especially here are strong evidence that at least Eliezer & Nate are expecting powerful AGIs to be pure-long-term-consequentialist. (I didn’t ask, I’m just going by what they wrote.) I surmise they have a (correct) picture in their head of how super-powerful a pure-long-term-consequentialist AI can be—e.g. it can self-modify, it can pursue creative instrumental goals, it’s reflectively stable, etc.—but they have not similarly envisioned a partially-but-not-completely-long-term-consequentialist AI that is only modestly less powerful (and in particular can still self-modify, can still pursue creative instrumental goals, and is still reflectively stable). That’s what “My corrigibility proposal sketch” was trying to offer.
I’ll reword to try to describe the situation better, thanks again.