My first objection is: human value formation doesn’t work like this. There’s no way to raise a human such that their value system cleanly revolves around the one single goal of duplicating a strawberry, and nothing else. By asking for a method of forming values which would permit such a narrow specification of end goals, you’re asking for a value formation process that’s fundamentally different from the one humans use. There’s no guarantee that such a thing even exists, and implicitly aiming to avoid the one value formation process we know is compatible with our own values seems like a terrible idea.
I think that’s true of humans. But humans are not very coherent on the scale of things.
If you think that an AI (or a human for that matter) reflecting on its decision process, converges to something AIXI-like, in the long run, you should think that it does actually end up with a value system that cleanly resolves around one goal, or at least a value system that resolves around a single utility function.
(My understanding is that Quintin doesn’t buy this claim: and that this kind of convergence process to coherence doesn’t actually happen as LessWrongers typically imagine it. I don’t speak for him, but I think for reasons regarding the computational difficulty of working out all the trades between shards that lead to coherence or something?)
I think that’s true of humans. But humans are not very coherent on the scale of things.
If you think that an AI (or a human for that matter) reflecting on its decision process, converges to something AIXI-like, in the long run, you should think that it does actually end up with a value system that cleanly resolves around one goal, or at least a value system that resolves around a single utility function.
(My understanding is that Quintin doesn’t buy this claim: and that this kind of convergence process to coherence doesn’t actually happen as LessWrongers typically imagine it. I don’t speak for him, but I think for reasons regarding the computational difficulty of working out all the trades between shards that lead to coherence or something?)