The “manipulative” program is something like: run this simple cellular automaton and output the value at location X, one every T steps, starting from step T0. It doesn’t encode extra stuff to become manipulative at some point, it’s just that the program simulates consequentialists who can influence its output and for whom the manipulative strategy is a natural way to expand their influence.
(I’m happy to just drop this, or someone else can try to explain, sorry the original post is not very clear.)
I’m probably being clueless again, but here’s two more questions:
1) If they have enough computing power to simulate many universes like ours, and their starting conditions are simpler than ours, what do they want our universe for?
2) If they have so much power and want our universe for some reason, why do they need to manipulate our AIs? Can’t they just simulation-argument us?
I think I find plausible the claim that Solomonoff Induction is maybe bad at induction because it’s big enough to build consequentialists who correctly output the real world for a while and then change to doing a different thing, and perhaps such hypotheses compete with running the code for the real world such that we can’t quite trust SI. The thing that I find implausible is that… the consequentialists know what they’re doing, or something? [Which is important because the reason to be afraid of reasoners is that they can know what they’re doing, and deliberately get things right instead of randomly getting them right.]
That is, in a world where we’re simulating programs that take our observations and predict our future observations, it seems like a reasoner in our hypothesis space could build a model and compete with pure models. In a world where we’re simply conditioning on programs that got the right outputs so far, the only reasoners that we see are ones that randomly guess 1) the right universe to simulate 2) the right random seed for that universe and 3) the right output cell to manipulate. But for none of those three is their reasoning important—if you think of intelligence as a mapping from observations to desirable outcomes, they don’t have any observations (even of what outcomes are ‘desirable’).
We can still view them as coordinated in some way—for each of those three things where they needed to guess right, the set of all of them can split their guesses such that ‘at least one’ of them is right, and trade acausally to act as one group. But it’s seems shocking to me that [the largest element of the (set of them big enough that one of them gets it right)] is small relative to the real universe, and it seems less shocking but still surprising that all sets of such sets end up pushing the outermost predictions in a predictable direction. [Basically, I understand this as a claim that the deviations of the “agent-rich” predictions from the “agent-free” predictions will not just be random noise pointed in different directions by agents with different preferences, or further a claim that no such agents have preferences different enough that they don’t acausally trade to average.]
That is, it’s easy to me to imagine a short computer program that learns how to recognize cat pictures given data, that’s actually shorter than the computer program that by itself knows how to recognize cat pictures and only uses data to fuel immediate judgments. But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed. Am I imagining some part of this hypothesis incorrectly?
But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed.
Solomonoff induction has to “guess” the world model + random seed + location.
The consequentialist also have to guess the world model + random seed + location. The basic situation seems to be the same. If the consequentialists just picked a random program and ran it, then they’d do as well as Solomonoff induction (ignoring computational limitations). Of course, they would then have only a small amount of control over the predictions, since they only control a small fraction of the measure under Solomonoff induction.
The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources), e.g. by guessing again and again until they get an interesting one. I argue that the extra advantage they get, from restricting to interesting worlds+locations, is probably significantly larger than the 1/(fraction of measure they control). This is because the fraction of measure controlled by consequentialists is probably much larger than the fraction of measure corresponding to “interesting” invocations of SI.
(This argument ignores computational limitations of the consequentialists.)
The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources)
Ah, okay, now I see how the reasoning helps them, but it seems like there’s a strong form of this and a weak form of this.
The weak form is just that the consequentialists, rather than considering all coherent world models, can do some anthropic-style arguments about worldmodels in order to restrict their attention to world models that could in theory sustain intelligent life. This plausibly gives them an edge over ‘agent-free’ SI, which is naively spending measure on all possible world models, that covers the cost of having the universe that contains consequentialists. [It seems unlikely that it covers the cost of the consequentialists having to guess where their output channel is, unless this is also a cost paid by the ‘agent-free’ hypothesis?]
The strong form relates to the computational limitations of the consequentialists that you bring up—it seems like they have to solve something halting-problem like in order to determine that (given you correctly guessed the outer universe dynamics) a particular random seed leads to agents running SI and giving it large amounts of control, and so you probably can’t do search on random seeds (especially not if you want to seriously threaten completeness). This seems like a potentially more important source of reducing wasted measure, and so if the consequentialists didn’t have computational limitations then this would seem more important. [But it seems like most ways of giving them more computational resources also leads to an increase in the difficulty of them finding their output channel; perhaps there’s a theorem here? Not obvious this is more fruitful than just using about a speed prior instead.]
The “manipulative” program is something like: run this simple cellular automaton and output the value at location X, one every T steps, starting from step T0. It doesn’t encode extra stuff to become manipulative at some point, it’s just that the program simulates consequentialists who can influence its output and for whom the manipulative strategy is a natural way to expand their influence.
(I’m happy to just drop this, or someone else can try to explain, sorry the original post is not very clear.)
After reading this comment I went back and reread your post, and it suddenly made sense. Thanks! I don’t see any obvious holes anymore.
I’m probably being clueless again, but here’s two more questions:
1) If they have enough computing power to simulate many universes like ours, and their starting conditions are simpler than ours, what do they want our universe for?
2) If they have so much power and want our universe for some reason, why do they need to manipulate our AIs? Can’t they just simulation-argument us?
I think I find plausible the claim that Solomonoff Induction is maybe bad at induction because it’s big enough to build consequentialists who correctly output the real world for a while and then change to doing a different thing, and perhaps such hypotheses compete with running the code for the real world such that we can’t quite trust SI. The thing that I find implausible is that… the consequentialists know what they’re doing, or something? [Which is important because the reason to be afraid of reasoners is that they can know what they’re doing, and deliberately get things right instead of randomly getting them right.]
That is, in a world where we’re simulating programs that take our observations and predict our future observations, it seems like a reasoner in our hypothesis space could build a model and compete with pure models. In a world where we’re simply conditioning on programs that got the right outputs so far, the only reasoners that we see are ones that randomly guess 1) the right universe to simulate 2) the right random seed for that universe and 3) the right output cell to manipulate. But for none of those three is their reasoning important—if you think of intelligence as a mapping from observations to desirable outcomes, they don’t have any observations (even of what outcomes are ‘desirable’).
We can still view them as coordinated in some way—for each of those three things where they needed to guess right, the set of all of them can split their guesses such that ‘at least one’ of them is right, and trade acausally to act as one group. But it’s seems shocking to me that [the largest element of the (set of them big enough that one of them gets it right)] is small relative to the real universe, and it seems less shocking but still surprising that all sets of such sets end up pushing the outermost predictions in a predictable direction. [Basically, I understand this as a claim that the deviations of the “agent-rich” predictions from the “agent-free” predictions will not just be random noise pointed in different directions by agents with different preferences, or further a claim that no such agents have preferences different enough that they don’t acausally trade to average.]
That is, it’s easy to me to imagine a short computer program that learns how to recognize cat pictures given data, that’s actually shorter than the computer program that by itself knows how to recognize cat pictures and only uses data to fuel immediate judgments. But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed. Am I imagining some part of this hypothesis incorrectly?
Solomonoff induction has to “guess” the world model + random seed + location.
The consequentialist also have to guess the world model + random seed + location. The basic situation seems to be the same. If the consequentialists just picked a random program and ran it, then they’d do as well as Solomonoff induction (ignoring computational limitations). Of course, they would then have only a small amount of control over the predictions, since they only control a small fraction of the measure under Solomonoff induction.
The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources), e.g. by guessing again and again until they get an interesting one. I argue that the extra advantage they get, from restricting to interesting worlds+locations, is probably significantly larger than the 1/(fraction of measure they control). This is because the fraction of measure controlled by consequentialists is probably much larger than the fraction of measure corresponding to “interesting” invocations of SI.
(This argument ignores computational limitations of the consequentialists.)
Ah, okay, now I see how the reasoning helps them, but it seems like there’s a strong form of this and a weak form of this.
The weak form is just that the consequentialists, rather than considering all coherent world models, can do some anthropic-style arguments about worldmodels in order to restrict their attention to world models that could in theory sustain intelligent life. This plausibly gives them an edge over ‘agent-free’ SI, which is naively spending measure on all possible world models, that covers the cost of having the universe that contains consequentialists. [It seems unlikely that it covers the cost of the consequentialists having to guess where their output channel is, unless this is also a cost paid by the ‘agent-free’ hypothesis?]
The strong form relates to the computational limitations of the consequentialists that you bring up—it seems like they have to solve something halting-problem like in order to determine that (given you correctly guessed the outer universe dynamics) a particular random seed leads to agents running SI and giving it large amounts of control, and so you probably can’t do search on random seeds (especially not if you want to seriously threaten completeness). This seems like a potentially more important source of reducing wasted measure, and so if the consequentialists didn’t have computational limitations then this would seem more important. [But it seems like most ways of giving them more computational resources also leads to an increase in the difficulty of them finding their output channel; perhaps there’s a theorem here? Not obvious this is more fruitful than just using about a speed prior instead.]