Cutting away all the word games, this paper appears to claim that if an agent is intelligent in a way that isn’t limited to some narrow part of the world, then it can’t stably have a narrow goal, because reasoning about its goals will destabilize them. This is incorrect. I think AIXI-tl is a straightforward counterexample.
(AIXI-tl is an AI that is mathematically simple to describe, but which can’t be instantiated in this universe because it uses too much computation. Because it is mathematically simple, its properties are easy to reason about. It is unambiguously superintelligent, and does not exhibit the unstable-goal behavior you predict.)
I think we’re in a sort of weird part of concept-space where we’re thinking both about absolutes (“all X are Y” disproved by exhibiting an X that is not Y) and distributions (“the connection between goals and intelligence is normally accidental instead of necessary”), and I think this counterexample is against a part of the paper that’s trying to make a distributional claim instead of an absolute claim.
Roughly, their argument as I understand it is:
Large amounts of instrumental intelligence can be applied to nearly any goal.
Large amounts of frame-capable intelligence will take over civilization’s steering wheel from humans.
Frame-capable intelligence won’t be as bad as the randomly chosen intelligence implied by Bostrom, and so this argument for AI x-risk doesn’t hold water; superintelligence risk isn’t as bad as it seems.
I think I differ on the 3rd point a little (as discussed in more depth here), but roughly agree that the situation we’re in probably isn’t as bad as the “AIXI-tl with a random utility function implemented on a hypercomputer” world, for structural reasons that make this not a compelling counterexample.
Like, in my view, much of the work of “why be worried about the transition instead of blasé?” is done by stuff like Value is Fragile, which isn’t really part of the standard argument as they’re describing it here.
AIXI is not an example of a system that can reason about goals without incurring goal instability, because it is not an example of a system that can reason about goals.
Cutting away all the word games, this paper appears to claim that if an agent is intelligent in a way that isn’t limited to some narrow part of the world, then it can’t stably have a narrow goal, because reasoning about its goals will destabilize them. This is incorrect. I think AIXI-tl is a straightforward counterexample.
(AIXI-tl is an AI that is mathematically simple to describe, but which can’t be instantiated in this universe because it uses too much computation. Because it is mathematically simple, its properties are easy to reason about. It is unambiguously superintelligent, and does not exhibit the unstable-goal behavior you predict.)
I think we’re in a sort of weird part of concept-space where we’re thinking both about absolutes (“all X are Y” disproved by exhibiting an X that is not Y) and distributions (“the connection between goals and intelligence is normally accidental instead of necessary”), and I think this counterexample is against a part of the paper that’s trying to make a distributional claim instead of an absolute claim.
Roughly, their argument as I understand it is:
Large amounts of instrumental intelligence can be applied to nearly any goal.
Large amounts of frame-capable intelligence will take over civilization’s steering wheel from humans.
Frame-capable intelligence won’t be as bad as the randomly chosen intelligence implied by Bostrom, and so this argument for AI x-risk doesn’t hold water; superintelligence risk isn’t as bad as it seems.
I think I differ on the 3rd point a little (as discussed in more depth here), but roughly agree that the situation we’re in probably isn’t as bad as the “AIXI-tl with a random utility function implemented on a hypercomputer” world, for structural reasons that make this not a compelling counterexample.
Like, in my view, much of the work of “why be worried about the transition instead of blasé?” is done by stuff like Value is Fragile, which isn’t really part of the standard argument as they’re describing it here.
AIXI is not an example of a system that can reason about goals without incurring goal instability, because it is not an example of a system that can reason about goals.
… plus we say that in the paper :)
apologies, I don’t recognise the paper here :)