So, to my ears, it sounds like we don’t have much of an idea at all where the CEV would end up—which means that it most likely ends up somewhere bad, since most random places are bad.
Well, if it captures the key parts of what you want, you can know it will turn out fine even if you’re extremely ignorant about what exactly the result will be.
Yes, as the Spartans answered to Alexander the Great’s father when he said “You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city.” :
which means that it most likely ends up somewhere bad, since most random places are bad.
I don’t think that follows, at all. CEV isn’t a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word ‘bad’?
So, to my ears, it sounds like we don’t have much of an idea at all where the CEV would end up—which means that it most likely ends up somewhere bad, since most random places are bad.
Well, if it captures the key parts of what you want, you can know it will turn out fine even if you’re extremely ignorant about what exactly the result will be.
Yes, as the Spartans answered to Alexander the Great’s father when he said “You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city.” :
“If”.
Yup. So, perhaps, focus on that “if.”
Shouldn’t we be able to rule out at least some classes of scenarios? For instance, paperclip maximization seems like an unlikely CEV output.
Most likely we can rule out most scenarios that all humans agree are bad. So better than clippy, probably.
But we really need a better model of what CEV does! Then we can start to talk sensibly about it.
I don’t think that follows, at all. CEV isn’t a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word ‘bad’?