I didn’t mean to make 1. sound bad. I’m only trying to put my finger on a crux. My impression of most prosaic alignment work seems to be that they have 2. in mind, even though MIRI/Bostrom/LW seem to believe that 1. is actually what we should be aiming towards. Do prosaic alignment people think that work on human ‘control’ now will lead to scenario 1 in the long run, or do they just reject scenario 1?
I’m not sure I understand the “prosaic alignment” position well enough to answer this.
I guess, personally, I can see appeal of scenario 2, of keeping a super-optimizer under control and using it in limited ways to solve specific problems. I also find that scenario incredibly terrifying, because super-optimizers that don’t optimize for the full set of human values are dangerous.
I didn’t mean to make 1. sound bad. I’m only trying to put my finger on a crux. My impression of most prosaic alignment work seems to be that they have 2. in mind, even though MIRI/Bostrom/LW seem to believe that 1. is actually what we should be aiming towards. Do prosaic alignment people think that work on human ‘control’ now will lead to scenario 1 in the long run, or do they just reject scenario 1?
I’m not sure I understand the “prosaic alignment” position well enough to answer this.
I guess, personally, I can see appeal of scenario 2, of keeping a super-optimizer under control and using it in limited ways to solve specific problems. I also find that scenario incredibly terrifying, because super-optimizers that don’t optimize for the full set of human values are dangerous.