You’re assuming that: - There is a single AGI instance running. - There will be a single person telling that AGI what to do - The AGI’s obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don’t find that scenario very likely, though.
You’re assuming that:
- There is a single AGI instance running.
- There will be a single person telling that AGI what to do
- The AGI’s obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don’t find that scenario very likely, though.