Is there a strong theoretical basis for guessing what capabilities superhuman intelligence may have, be it sooner or later? I’m aware of the speed & quality superintelligence frameworks, but I have issues with them.
Speed alone seems relatively weak as an axis of superiority; I can only speculate about what I might be able to accomplish if, for example, my cognition were sped up 1000x, but it find it hard to believe it would extend to achieving strategic dominance over all humanity, especially if there are still limits on my ability to act and perceive information that happen on normal-human timescales. One could shorthand this to “how much more optimal could your decisions be if you were able to take maximal time to research and reflect on them in advance,” to which my answer is “only about as good as my decisions turned out to be when I wasn’t under time pressure and did do the research”. I’d be the greatest Starcraft player to ever exist, but I don’t think that generalizes outside the domain of [tactics measured in frames rather than minutes or hours or days].
To me quality superiority is the far more load-bearing but much muddier part of the argument for the dangers of AGI. Writing about the lives and minds of human prodigies like Von Neumann or Terry Tao or whoever you care to name frequently verges on the mystical; I don’t think even the very intelligent among us have a good gears-level model of how intelligence is working. To me this is a double-edged sword; if Ramanujan’s brain might as well have been magic, that’s evidence against our collective ability to guess what a quality superintelligence could accomplish. We don’t know what intelligence can do at very high levels (bad for our ability to survive AGI), but we also don’t know what it can’t do, which could turn out to be just as important. What if there are rapidly diminishing returns on the accuracy of prediction as the system has to account for more and more entropy? If that were true, an incredibly intelligent agent might still only have a marginal edge in decision-making which could be overwhelmed by other factors. What if the Kolmogorov complexity of x-risk is just straight up too many bits, or requires precision of measurement beyond what the AI has access to?
I don’t want to privilege the hypothesis that maybe the smartest thing we can build is still not that scary because the world is chaotic, but I feel I’ve seen many arguments that privilege the opposite; that the “sharp left turn” will hit and the rest is merely moving chess pieces through a solved endgame. So what is the best work on the topic?
In some ways this doesn’t matter. During the time that there is no AGI disaster yet, AGI timelines are also timelines to commercial success and abundance, by which point AGIs are collectively in control. The problem is that despite being useful and apparently aligned in current behavior (if that somehow works out and there is no disaster before then), AGIs still by default remain misaligned in the long term, in the goals they settle towards after reflecting on what that should be. They are motivated to capture the option to do that, and being put in control of a lot of the infrastructure makes it easy, doesn’t even require coordination. There are some storiesabout that.
This could be countered by steering the long term goals and managing current alignment security, but it’s unclear how to do that at all and by the time AGIs are a commercial success it’s too late, unless the AGIs that are aligned in current behavior can be leveraged to solve such problems in time. Which is, unclear.
This sort of failure probably takes away cosmic endowment, but might preserve human civilization in a tiny corner of the future if there is a tiny bit of sympathy/compassion in AGI goals, which is plausible for goals built out of training on human culture, or if it’s part of generic values that most CEV processes starting from disparate initial volitions settle on. This can’t work out for AGIs with reflectively stable goals that hold no sympathy, so that’s a bit of apparent alignment that can backfire.
Is there a strong theoretical basis for guessing what capabilities superhuman intelligence may have, be it sooner or later? I’m aware of the speed & quality superintelligence frameworks, but I have issues with them.
Speed alone seems relatively weak as an axis of superiority; I can only speculate about what I might be able to accomplish if, for example, my cognition were sped up 1000x, but it find it hard to believe it would extend to achieving strategic dominance over all humanity, especially if there are still limits on my ability to act and perceive information that happen on normal-human timescales. One could shorthand this to “how much more optimal could your decisions be if you were able to take maximal time to research and reflect on them in advance,” to which my answer is “only about as good as my decisions turned out to be when I wasn’t under time pressure and did do the research”. I’d be the greatest Starcraft player to ever exist, but I don’t think that generalizes outside the domain of [tactics measured in frames rather than minutes or hours or days].
To me quality superiority is the far more load-bearing but much muddier part of the argument for the dangers of AGI. Writing about the lives and minds of human prodigies like Von Neumann or Terry Tao or whoever you care to name frequently verges on the mystical; I don’t think even the very intelligent among us have a good gears-level model of how intelligence is working. To me this is a double-edged sword; if Ramanujan’s brain might as well have been magic, that’s evidence against our collective ability to guess what a quality superintelligence could accomplish. We don’t know what intelligence can do at very high levels (bad for our ability to survive AGI), but we also don’t know what it can’t do, which could turn out to be just as important. What if there are rapidly diminishing returns on the accuracy of prediction as the system has to account for more and more entropy? If that were true, an incredibly intelligent agent might still only have a marginal edge in decision-making which could be overwhelmed by other factors. What if the Kolmogorov complexity of x-risk is just straight up too many bits, or requires precision of measurement beyond what the AI has access to?
I don’t want to privilege the hypothesis that maybe the smartest thing we can build is still not that scary because the world is chaotic, but I feel I’ve seen many arguments that privilege the opposite; that the “sharp left turn” will hit and the rest is merely moving chess pieces through a solved endgame. So what is the best work on the topic?
In some ways this doesn’t matter. During the time that there is no AGI disaster yet, AGI timelines are also timelines to commercial success and abundance, by which point AGIs are collectively in control. The problem is that despite being useful and apparently aligned in current behavior (if that somehow works out and there is no disaster before then), AGIs still by default remain misaligned in the long term, in the goals they settle towards after reflecting on what that should be. They are motivated to capture the option to do that, and being put in control of a lot of the infrastructure makes it easy, doesn’t even require coordination. There are some stories about that.
This could be countered by steering the long term goals and managing current alignment security, but it’s unclear how to do that at all and by the time AGIs are a commercial success it’s too late, unless the AGIs that are aligned in current behavior can be leveraged to solve such problems in time. Which is, unclear.
This sort of failure probably takes away cosmic endowment, but might preserve human civilization in a tiny corner of the future if there is a tiny bit of sympathy/compassion in AGI goals, which is plausible for goals built out of training on human culture, or if it’s part of generic values that most CEV processes starting from disparate initial volitions settle on. This can’t work out for AGIs with reflectively stable goals that hold no sympathy, so that’s a bit of apparent alignment that can backfire.