One underlying idea comes from how AI misalignment is intended to work. If superintelligent AI systems are misaligned, does this misalignment look like an inaccurate generalization from what their overseers wanted, or a ‘randomly rolled utility function’ deceptively misaligned goal that’s entirely unrelated to anything their overseers intended to train? This is represented by Levels 1-4 vs levels 5+, in my difficulty scale, more or less. If the misalignment is result of economic pressures and a ‘race to the bottom’ dynamic then its more likely to result in systems that care about human welfare alongside other things.
If the AI that’s misaligned ends up ‘egregiously’ misaligned and doesn’t care at all about anything valuable to us, as Eliezer thinks is most likely, then it places zero terminal value on human welfare and only trade, threats or compromise would get it to be nice. If the AI super-intelligent and you aren’t, none of those considerations apply. Hence, nothing is left for humans.
If the AI is misaligned but doesn’t have an arbitrary value system, then it may value human survival at least a bit and do some equivalent of leaving a hole in the dyson sphere.
One underlying idea comes from how AI misalignment is intended to work. If superintelligent AI systems are misaligned, does this misalignment look like an inaccurate generalization from what their overseers wanted, or a ‘randomly rolled utility function’ deceptively misaligned goal that’s entirely unrelated to anything their overseers intended to train? This is represented by Levels 1-4 vs levels 5+, in my difficulty scale, more or less. If the misalignment is result of economic pressures and a ‘race to the bottom’ dynamic then its more likely to result in systems that care about human welfare alongside other things.
If the AI that’s misaligned ends up ‘egregiously’ misaligned and doesn’t care at all about anything valuable to us, as Eliezer thinks is most likely, then it places zero terminal value on human welfare and only trade, threats or compromise would get it to be nice. If the AI super-intelligent and you aren’t, none of those considerations apply. Hence, nothing is left for humans.
If the AI is misaligned but doesn’t have an arbitrary value system, then it may value human survival at least a bit and do some equivalent of leaving a hole in the dyson sphere.