Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.
Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?
Imagine that a magically powerful AI decides to set a new political system for humans and create a “Constitution of Earth” that will be perfectly enforced by local smaller AIs, while the greatest one travels away to explore other galaxies.
The AI decides that the most fair way to create the constitution is randomly. It will choose a length, for example 10000 words of English text. Then it will generate all possible combinations of 10000 English words. (It is magical, so let’s not worry about how much compute that would actually take.) Out of the generated combinations, it will remove the ones that don’t make any sense (an overwhelming majority of them) and the ones that could not be meaningfully interpreted as “a constitution” of a country (this is kinda subjective, but the AI does not mind reading them all, evaluating each of them patiently using the same criteria, and accepting only the ones that pass a certain threshold). Out of the remaining ones, the AI will choose the “Constitution of Earth” randomly, using a fair quantum randomness generator.
Shortly before the result is announced, how optimistic would you feel about your future life, as a citizen of Earth?
As an aside (that’s still rather relevant, IMO), it is a huge pet peeve of mine when people use the word “randomly” in technical or semi-technical contexts (like this one) to mean “uniformly at random” instead of just “according to some probability distribution.” I think the former elevates and reifies a way-too-common confusion and draws attention away from the important upstream generator of disagreements, namely how exactly the constitution is sampled.
I wouldn’t normally have said this, but given your obvious interest in math, it’s worth pointing out that the answers to these questions you have raised naturally depend very heavily on what distribution we would be drawing from. If we are talking about, again, a uniform distribution from “the design space of minds-in-general” (so we are just summoning a “random” demon or shoggoth), then we might expect one answer. If, however, the search is inherently biased towards a particular submanifold of that space, because of the very nature of how these AIs are trained/fine-tuned/analyzed/etc., then you could expect a different answer.
Fair point. (I am not convinced by the argument that if the AI’s are trained on human texts and feedback, they are likely to end up with values similar to humans, but that would be a long debate.)
Most configurations of matter, most courses of action, and most mind designs, are not conducive to flourishing intelligent life. Just like most parts of the universe don’t contain flourishing intelligent life. I’m sure this stuff has been formally stated somewhere, but the underlying intuition seems pretty clear, doesn’t it?
I wish there had been some effort to quantify @stephen_wolfram’s “pockets or irreducibility” (section 1.2 & 4.2) because if we can prove that there aren’t many or they are hard to find & exploit by ASI, then the risk might be lower.
I got this tweet wrong. I meant if pockets of irreducibility are common and non-pockets are rare and hard to find, then the risk from superhuman AI might be lower. I think Stephen Wolfram’s intuition has merit but needs more analysis to be convicing.
Are Solomonoff Daemons exponentially dense?
Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.
Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?
Imagine that a magically powerful AI decides to set a new political system for humans and create a “Constitution of Earth” that will be perfectly enforced by local smaller AIs, while the greatest one travels away to explore other galaxies.
The AI decides that the most fair way to create the constitution is randomly. It will choose a length, for example 10000 words of English text. Then it will generate all possible combinations of 10000 English words. (It is magical, so let’s not worry about how much compute that would actually take.) Out of the generated combinations, it will remove the ones that don’t make any sense (an overwhelming majority of them) and the ones that could not be meaningfully interpreted as “a constitution” of a country (this is kinda subjective, but the AI does not mind reading them all, evaluating each of them patiently using the same criteria, and accepting only the ones that pass a certain threshold). Out of the remaining ones, the AI will choose the “Constitution of Earth” randomly, using a fair quantum randomness generator.
Shortly before the result is announced, how optimistic would you feel about your future life, as a citizen of Earth?
As an aside (that’s still rather relevant, IMO), it is a huge pet peeve of mine when people use the word “randomly” in technical or semi-technical contexts (like this one) to mean “uniformly at random” instead of just “according to some probability distribution.” I think the former elevates and reifies a way-too-common confusion and draws attention away from the important upstream generator of disagreements, namely how exactly the constitution is sampled.
I wouldn’t normally have said this, but given your obvious interest in math, it’s worth pointing out that the answers to these questions you have raised naturally depend very heavily on what distribution we would be drawing from. If we are talking about, again, a uniform distribution from “the design space of minds-in-general” (so we are just summoning a “random” demon or shoggoth), then we might expect one answer. If, however, the search is inherently biased towards a particular submanifold of that space, because of the very nature of how these AIs are trained/fine-tuned/analyzed/etc., then you could expect a different answer.
Fair point. (I am not convinced by the argument that if the AI’s are trained on human texts and feedback, they are likely to end up with values similar to humans, but that would be a long debate.)
Most configurations of matter, most courses of action, and most mind designs, are not conducive to flourishing intelligent life. Just like most parts of the universe don’t contain flourishing intelligent life. I’m sure this stuff has been formally stated somewhere, but the underlying intuition seems pretty clear, doesn’t it?
This sounds related to my complaint about the YUDKOWSKY + WOLFRAM ON AI RISK debate:
I got this tweet wrong. I meant if pockets of irreducibility are common and non-pockets are rare and hard to find, then the risk from superhuman AI might be lower. I think Stephen Wolfram’s intuition has merit but needs more analysis to be convicing.