at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around.
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)
I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
However, this large disagreement between subjects should make us suspicious of exactly what we are measuring when we ask about coherence.
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)
I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.