The first thing generally, or CEV specifically, is unworkable because the complexity of what needs to be aligned or meta-aligned for our Real Actual Values is far out of reach for our FIRST TRY at AGI. Yes I mean specifically that the dataset, meta-learning algorithm, and what needs to be learned, is far out of reach for our first try. It’s not just non-hand-codable, it is unteachable on-the-first-try because the thing you are trying to teach is too weird and complicated.
Why is CEV so difficult? And if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren’t there easier utopias?
Many humans would be able to distinguish utopia from dystopia if they saw them, and humanity’s only advantage over an AI is that the brain has “evolution presets”.
Humans are relatively dumb, so why can’t even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?
To anyone reading: don’t interpret these questions as disagreement. If someone doesn’t, for example, understand a mathematical proof, they might express disagreement with the proof while knowing full well that they haven’t discovered a mistake in it and that they are simply confused.
I’ll give myself a provisional answer. I’m not sure if it satisfies me, but it’s enough to make me pause: Anything short of CEV might leave open an unacceptably high chance of fates worse than death.
CEV is difficult because our values seem to be very complex.
Value is fragile, OK, but aren’t there easier utopias?
Building an AGI (let alone a super-intelligent AGI) that aimed for an ‘easier utopia’ would have to somehow convince/persuade/align the AI to give up a LOT of value. I don’t think it’s possible without solving alignment anyways. Essentially, it seems like we’d be trying to ‘convince’ the AGI to ‘not go to fast because that might be bad’. The problem is that we don’t know how to precisely what “bad” is anyways.
Many humans would be able to distinguish utopia from dystopia if they saw them
That’s very much not obvious. I don’t think that, e.g. humans from even 100 years ago teleported to today would be able to reliably distinguish the current world from a ‘dystopia’.
I haven’t myself noticed much agreement about the various utopias people have already described! That seems like pretty strong evidence that ‘utopia’ is in fact very hard to specify.
Why is CEV so difficult? And if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren’t there easier utopias?
Many humans would be able to distinguish utopia from dystopia if they saw them, and humanity’s only advantage over an AI is that the brain has “evolution presets”.
Humans are relatively dumb, so why can’t even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?
To anyone reading: don’t interpret these questions as disagreement. If someone doesn’t, for example, understand a mathematical proof, they might express disagreement with the proof while knowing full well that they haven’t discovered a mistake in it and that they are simply confused.
I’ll give myself a provisional answer. I’m not sure if it satisfies me, but it’s enough to make me pause: Anything short of CEV might leave open an unacceptably high chance of fates worse than death.
CEV is difficult because our values seem to be very complex.
Building an AGI (let alone a super-intelligent AGI) that aimed for an ‘easier utopia’ would have to somehow convince/persuade/align the AI to give up a LOT of value. I don’t think it’s possible without solving alignment anyways. Essentially, it seems like we’d be trying to ‘convince’ the AGI to ‘not go to fast because that might be bad’. The problem is that we don’t know how to precisely what “bad” is anyways.
That’s very much not obvious. I don’t think that, e.g. humans from even 100 years ago teleported to today would be able to reliably distinguish the current world from a ‘dystopia’.
I haven’t myself noticed much agreement about the various utopias people have already described! That seems like pretty strong evidence that ‘utopia’ is in fact very hard to specify.