Moreover, it pursues this goal with such fanaticism, that it cannot spare any resources, and so the whole galaxy is not big enough for humans and this goal to co-exist.
The paperclip maximizer is a fine metaphor, but I find it unlikely that highly intelligent entities will monomaniacally pursue any particular goal.
“Fanaticism” and “monomaniacally” seem like over-anthropomorphizing to me. Also, at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.[1]
And the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around.The squiggle maximizer thought experiment was just one example of a goal for which it is particularly obvious that keeping humans around is not optimal; see this tweet by Eliezer for more.
Just as humans have learned the wisdom of diversifying our investments and the precautionary principle (even if to a more limited than some readers would like), I suspect AIs would know this too.
Some internal part of the AI might be Wise, but that doesn’t mean that the system as a whole is Wise. My own suspicion is that it’s possible for an optimization process to be powerful and general enough to re-arrange all the matter and energy in the universe basically arbitrarily, without the “driver” of that optimization process being Wise (or even conscious / sentient) itself.
More generally, I think that when people say things like “a sufficiently smart AI would ‘know’” something like the precautionary principle or have wisdom generally, they’re usually not carefully distinguishing between three distinct things, each of which might or might not contain wisdom:
The thing-which-optimizes (e.g. a human, an AI system as a whole)
The method by which it optimizes (e.g. for humans, thinking and reflecting for a while and then carrying out actions based on a combination of that reflection and general vibes, for an AI, maybe something like explicit tree search over world states using sophisticated heuristics)
Any sufficiently powerful method of optimization (second bullet) is likely to contain some subsystem which has an understanding of and use for various wise concepts, because wisdom is instrumentally useful for achieving many different kinds of goals and therefore likely to be convergent. But that doesn’t mean that the thing-which-optimizes is necessarily wise itself, nor that whatever it is optimizing for has any relation to what humans are optimizing for.
Tangential, but I actually think most people would want this with enough reflection—lots of people have an innate disgust or averse reaction to e.g. throwing away food or not recycling. Leaving solar systems unused is just waste on a much grander scale.
at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around.
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)
I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
However, this large disagreement between subjects should make us suspicious of exactly what we are measuring when we ask about coherence.
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.
“Fanaticism” and “monomaniacally” seem like over-anthropomorphizing to me. Also, at least some humans (e.g. most transhumanists), are “fanatical maximizers”: we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.[1]
And the goals of an AI don’t have to be simple to not be best fulfilled by keeping humans around. The squiggle maximizer thought experiment was just one example of a goal for which it is particularly obvious that keeping humans around is not optimal; see this tweet by Eliezer for more.
Some internal part of the AI might be Wise, but that doesn’t mean that the system as a whole is Wise. My own suspicion is that it’s possible for an optimization process to be powerful and general enough to re-arrange all the matter and energy in the universe basically arbitrarily, without the “driver” of that optimization process being Wise (or even conscious / sentient) itself.
More generally, I think that when people say things like “a sufficiently smart AI would ‘know’” something like the precautionary principle or have wisdom generally, they’re usually not carefully distinguishing between three distinct things, each of which might or might not contain wisdom:
The thing-which-optimizes (e.g. a human, an AI system as a whole)
The method by which it optimizes (e.g. for humans, thinking and reflecting for a while and then carrying out actions based on a combination of that reflection and general vibes, for an AI, maybe something like explicit tree search over world states using sophisticated heuristics)
What the thing optimizes for (molecular squiggles, flourishing sentient life, iterating SHA256 hashes of audio clips of cows mooing, etc.)
Any sufficiently powerful method of optimization (second bullet) is likely to contain some subsystem which has an understanding of and use for various wise concepts, because wisdom is instrumentally useful for achieving many different kinds of goals and therefore likely to be convergent. But that doesn’t mean that the thing-which-optimizes is necessarily wise itself, nor that whatever it is optimizing for has any relation to what humans are optimizing for.
Tangential, but I actually think most people would want this with enough reflection—lots of people have an innate disgust or averse reaction to e.g. throwing away food or not recycling. Leaving solar systems unused is just waste on a much grander scale.
I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
The point is not about having simple goals, but rather about optimizing goals to the extreme.
I think there is another point of disagreement. As I’ve written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don’t have exponential time computation and hence need to use “sophisticated heuristics” to do tree search.)
What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to “regularize” and not go all out in exhausting a resource which you might not know if you’ll need it later on in the future.
Just to be clear, I think a “hot mess super-intelligent AI” could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don’t see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.)
I think the hot mess theory (more intelligence ⇒ less coherence) is just not true. Two objections:
It’s not really using a useful definition of coherence (the author notes this limitation):
Most of the examples (animals, current AI systems, organizations) are not above the threshold where any definition of intelligence or coherence is particularly meaningful.
My own working definition is that intelligence is mainly about ability to steer towards a large set of possible futures, and an agent’s values / goals / utility function determine which futures in its reachable set it actually chooses to steer towards.
Given the same starting resources, more intelligent agents will be capable of steering into a larger set of possible futures. Being coherent in this framework means that an agent tends not to work at cross purposes against itself (“step on its own toes”) or take actions far from the Pareto-optimal frontier. Having complicated goals which directly or indirectly require making trade-offs doesn’t make one incoherent in this framework, even if some humans might rate agents with such goals as less coherent in an experimental setup.
Whether the future is “inherently chaotic” or not might limit the set of reachable futures even for a superintelligence, but that doesn’t necessarily affect which future(s) the superintelligence will try to reach. And there are plenty of very bad (and very good) futures that seem well within reach even for humans, let alone ASI, regardless of any inherent uncertainty about or unpredictability of the future.
The larger issue is even from a capabilities perspective, the sort of essentially unconstrained instrumental convergence that are assumed to be there in a paperclip maximizer is actually bad, and in particular, I suspect the human case of potentially fanaticism in pursuing instrumental goals is fundamentally an anomaly of both the huge time scales and the fact that evolution has way more compute than we have, often over 20 orders of magnitude more.
We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.
Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.