I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
However, this large disagreement between subjects should make us suspicious of exactly what we are measuring when we ask about coherence.
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.
I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.