I’m imagining that the consequentialists care about something, like e.g. human flourishing. They think that they could use their control over the universal prior to achieve more of what they care about, i.e. by achieving a bunch of human flourishing in some other universe where someone thinks about the universal prior. Randomizing is one strategy available to them to do that.
So I’m saying that I expect they will do better—i.e. get more influence over the outside world (per unit of cost paid in their world)---than if they had simply randomized. That’s because randomizing is one of the strategies available to them and they are trying to pick the best one.
(In fact I think they will do many orders of magnitude better than randomizing since they can simultaneously win for many different output methods, and they can ignore the overwhelming majority of output rules which have no chance of describing something interesting about the world).
You seem to be saying that they will get less influence than if they randomized. Something about how this behavior is not sensible “goal-oriented behavior,” and instead the sensible goal-oriented behavior is something that doesn’t get them any influence? In what sense do you think it is sensible goal-oriented behavior, if it doesn’t result in getting any influence?
Maybe the key difference is that I’m talking about a scenario where the consequentialists have the goal of influencing the universal prior, and that possibility seems so weird to you that you aren’t even engaging with it?
It’s definitely not too weird a possibility for me. I’m trying to reason backwards here—the best strategy available to them can’t be effective in expectation at achieving whatever their goals are with the output tape, because of information-theoretic impossibilities, and therefore, any given strategy will be that bad or worse, including randomization.
I feel like this story has run aground on an impossibility result. If a random variable’s value is unknowable (but its distribution is known) and an intelligent agent wants to act on its value, and they randomize their actions, the expected log probability of them acting on the true value cannot exceed the entropy of the distribution, no matter their intelligence.
I think that’s right (other than the fact that they can win simultaneously for many different output rules, but I’m happy ignoring that for now). But I don’t see why it contradicts the story at all. In the story the best case is that we know the true distribution of output rules, and then we do the utility-maximizing thing, and that results in our sequence having way more probability than some random camera on old earth.
If you want to talk about the information theory, and ignore the fact that we can do multiple things, then we control the single output channel with maximal probability, while the camera is just some random output channel (presumably with some much smaller probability).
The information theory isn’t very helpful, because actually all of the action is about which output channels are controllable. If you restrict to some subset of “controllable” channels, and believe that any output rule that outputs the camera is controllable, then the conclusion still holds. So the only way it fails is when the camera is higher probability than the best controllable output channels.
I currently don’t understand the information-theoretic argument at all (and feels like it must come down to some kind of miscommunication), so it seems easiest to talk about how the impossibility argument applies to the situation being discussed.
If we want to instead engage on the abstract argument, I think it would be helpful to me to present it as a series of steps that ends up saying “And that’s why the consequentialists can’t have any influence.” I think the key place I get lost is the connection between the math you are saying and a conclusion about the influence that the consequentialists have.
If these consequentialists ascribed a value of 100 to the next output bit being 1, and a value of 0 to the next output bit being 0, and they valued nothing else, would you agree that all actions available to them have identical expected value under the distribution over Turing machines that I have described?
I don’t agree, but I may still misunderstand something. Stepping back to the beginning:
Suppose they know the sequence that actually gets fed to the camera. It is x= 010...011.
They want to make the next bit 1. That is, they want to maximize the probability of the sequence (x+1)=010...0111.
They have developed a plan for controlling an output channel to get it to output (x+1).
For concreteness imagine that they did this by somehow encoding x+1 in a sequence of ultra high-energy photons sent in a particular direction. Maybe they encode 1 as a photon with frequency A and a 0 as a photon with frequency B.
There is no way this plan results in the next bit being 0. If they are wrong about how the output channel encodes photons (i.e. it decodes A as 1 and B as 0) then that channel isn’t going to end up with any probability.
You don’t try to encode 010...0111 and then accidentally end up encoding 010...0110. You end up encoding something like 101...1000, or something totally different.
I don’t think I understand what you mean. Their goal is to increase the probability of the sequence x+1, so that someone who has observed the sequence x will predict 1.
What do you mean when you say “What about in the case where they don’t know”?
I agree that under your prior, someone has no way to increase e.g. the fraction of sequences in the universal prior that start with 1 (or the fraction of 1s in a typical sequence under the universal prior, or any other property that is antisymmetric under exchange of 0 and 1).
Okay, now suppose they want the first N bits of the output of their Turing machine to obey predicate P, and they assign that a value of 100, and a they assign a value of 0 to any N-bit string that does not obey predicate P. And they don’t value anything else. If some actions have a higher value than other actions, what information about the output tape dynamics are they using, and how did they acquire it?
They are using their highest probability guess about the output channel, which will be higher probability than the output channel exactly matching some camera on old earth (but may still be very low probability). I still don’t understand the relevance.
I’m probably going to give up soon, but there was one hint about a possible miscommunication:
Suppose they want the first N bits of the output of their Turing machine to obey predicate P, and they assign that a value of 100
They don’t care about “their” Turing machine, indeed they live in an infinite number of Turing machines that (among other things) output bits in different ways. They just care about the probability of the bitstring x+1 under the universal prior—they want to make the mass of x+1 larger than the mass of x+0. So they will behave in a way that causes some of the Turing machines containing them to output x+1.
And then the question is whether the total mass of Turing machines (i.e. probability of noise strings fed into the UTM) that they are able to get to output x+1 is larger or smaller than the mass of Turing machines that output x for the “intended” reason.
They are using their highest probability guess about the output channel, which will be higher probability than the output channel exactly matching some camera on old earth (but may still be very low probability). I still don’t understand the relevance.
I’m trying to find the simplest setting where we have a disagreement. We don’t need to think about cameras on earth quite yet. I understand the relevance isn’t immediate.
They don’t care about “their” Turing machine, indeed they live in an infinite number of Turing machines that (among other things) output bits in different ways.
I think I see the distinction between the frameworks we most naturally think about the situation. I agree that they live in an infinite number of Turing machines, in the sense that their conscious patterns appear in many different Turing machines. All of these Turing machines have weight in some prior. When they change their behavior, they (potentially) change the outputs of any of these Turing machines. Taking these Turing machines as a set, weighted by those prior weights we can consider the probability that the output obeys a predicate P. The answer to this question can be arrived at through an equivalent process. Let the inhabitants imagine that there is a correct answer to the question “which Turing machine do I really live in?” They then reason anthropically about which Turing machines give rise to such conscious experiences as theirs. They then use the same prior over Turing machines that I described above. And then they make the same calculation about the probability that “their” Turing machine outputs something that obeys the predicate P. So on the one hand, we could say that we are asking “what is the probability that the section of the universal prior which gives rise to these inhabitants outputs an output that obeys predicate P?” Or we could equivalently ask “what is the probability that this inhabitant ascribes to ‘its’ Turing machine outputting a string that obeys predicate P?”
There are facts that I find much easier to incorporate when thinking in the latter framework, such as “a work tape inhabitant knows nothing about the behavior of its Turing machine’s output tape, except that it has relative simplicity given the world that it knows.” (If it believes that its conscious existence depends on its Turing machine never having output a bit that differs from a data stream in a base world, it will infer other things about its output tape, but you seem to disagree that it would make that assumption, and I’m fine to go along with that). (If the fact were much simpler—“a work tape inhabitant knows nothing about the behavior of its Turing machine’s output tape” full stop—I would feel fairly comfortable in either framework.)
If it is the case that, for any action that a work tape inhabitant takes, the following is unchanged: [the probability that it (anthropically) ascribes to “its” Turing machine printing an output that obeys predicate P after it takes that action], then, no matter its choice of action, then the probability under the universal prior that the output obeys predicate P is also unchanged.
What if the work tape inhabitant only cares about the output when the the universal prior is being used for important applications? Let Q be the predicate [P and “the sequence begins with a sequence which is indicative of important application of the universal prior”]. The same logic that applies to P applies to Q. (It feels easier to talk about probabilities of predicates (expectations of Boolean functions) rather than expectations of general functions, but if we wanted to do importance weighting instead of using a strict predicate on importance, the logic is the same).
Writing about the fact I described above about what the inhabitants believe about their Turing machine’s output has actually clarified my thinking a bit. Here’s a predicate where I think inhabitants could expect certain actions to make it more likely that their Turing machine output obeys that predicate. “The output contains the string [particular 1000 bit string]”. They believe that their world’s output is simple given their world’s dynamics, so if they write that 1000 bit string somewhere, it is more likely for the predicate to hold. (Simple manipulations of the string are nearly equally more likely to be output).
So there are severe restrictions on the precision with which they can control even low-probability changes to the output, but not total restrictions. So I wasn’t quite right in describing it as a max-entropy situation. But the one piece of information that distinguishes their situation from one of maximum uncertainty about the output is very slight. So I think it’s useful to try to think in terms of how they get from that information to their goal for the output tape.
I was describing the situation where I wanted to maximize the probability where the output of our world obeys the predicate: “this output causes decision-maker simulators to believe that virtue pays”. I think I could very slightly increase that probability by trying to reward virtuous people around me. Consider consequentialists who want to maximize the probability of the predicate “this output causes simulator-decision-makers to run code that recreates us in their world”. They want to make the internals of their world such that there are simple relative descriptions for outputs for which that predicate holds. I guess I think that approach offers extremely limited and imprecise ability to deliberately influence the output, no matter how smart you are.
If an approach has very limited success probability, (i.e. very limited sway over the universal prior), they can focus all their effort on mimicking a few worlds, but then we’ll probably get lucky, and ours won’t be one of the ones they focus on.
From a separate recent comment,
But now that we’ve learned that physics is the game of life, we can make much better guesses about how to build a dataset so that a TM could output it. For example, we can:
Build the dataset at a large number of places.
[etc.]
...
I challenge you to find any plausible description of a rule that outputs the bits observed by a camera, for which I can’t describe a simpler extraction rule that would output some set of bits controlled by the sophisticated civilization.
You’re comparing the probability of one of these many controlled locations driving the output of the machine to the probability that a random camera does on an earth-like Turing machine drives the output. Whereas it seems to me like the right question is to look at the absolute probabilities that one of these controlled locations drives the output. The reason is that what they attempt to output is a mixture over many sequences that a decision-maker-simulator might want to know about. But if the sequence we’re feeding in is from a camera on earth, than their antics only matter to the extent that their mixture puts weight on a random camera on earth. So they have to specify the random camera on an earth-like Turing machine too. They’re paying the same cost, minus any anthropic update. So the costs to compare are roughly [- log prob. successful control of output + bits to specify camera on earth—bits saved from anthropic update] vs. [bits to specify camera on earth—bits saved from directly programmed anthropic update]. This framing seems to imply we can cross off [bits to specify camera on earth] from both sides.
bits to specify camera on earth—bits saved from anthropic update
I think the relevant number is just “log_2 of the number of predictions that the manipulators want to influence.” It seems tricky to think about this (rather small) number as the difference between two (giant) numbers.
So they have to specify the random camera on an earth-like Turing machine too.
They are just looking at the earth-like Turing machine, looking for the inductors whose predictions are important, and then trying to copy those input sequences. This seems mostly unrelated to the complexity of adding states to the Turing machine so that it reads data from a particular location on a particular hard drive. It just rests on them being able to look at the simulation and figure out what’s going on.
On the other hand, the complexity of adding states to the Turing machine so that it reads data from a particular location on a particular hard drive seems very closely related to the complexity of adding states to the Turing machine so that it outputs data encoded by the sophisticated civilization in the format that they thought was easiest for the Turing machine to output.
bits to specify camera on earth—bits saved from directly programmed anthropic update
Do you have some candidate “directly programmed anthropic update” in mind? (That said, my original claim was just about the universal prior, not about a modified version with an anthropic update)
I still feel like the quantitative question we’re discussing is a blow-out and it’s not clear to me where we are diverging on that. My main uncertainty about the broader question is about whether any sophisticated civilizations are motivated to do this kind of thing (which may depend on the nature of the inductor and how much reasoning they have time to do, since that determines whether the inductor’s prediction is connected in the decision-theoretically relevant way with the civilization’s decisions or commitments).
Do you have some candidate “directly programmed anthropic update” in mind? (That said, my original claim was just about the universal prior, not about a modified version with an anthropic update)
I’m talking about the weight of an anthropically updated prior within the universal prior. I should have added “+ bits to encode anthropic update directly” to that side of the equation. That is, it takes some number of bits to encode “the universal prior, but conditioned on the strings being important to decision-makers in important worlds”. I don’t know how to encode this, but there is presumably a relatively simple direct encoding, since it’s a relatively simple concept. This is what I was talking about in my response to the section “The competition”.
One way that might be helpful about thinking about the bits saved from the anthropic update is that it is −logprobstring∼universal prior(string is important to decision-makers in important worlds). I think this gives us a handle in reasoning about anthropic savings as a self-contained object, even if it’s a big number.
> bits to specify camera on earth—bits saved from anthropic update
I think the relevant number is just “log_2 of the number of predictions that the manipulators want to influence.” It seems tricky to think about this (rather small) number as the difference between two (giant) numbers.
But suppose they picked only one string to try to manipulate. The cost would go way down, but then it probably wouldn’t be us that they hit. If log of the number of predictions that the manipulators want to influence is 7 bits shorter than [bits to specify camera on earth—bits saved from anthropic update], then there’s a 99% chance we’re okay. If different manipulators in different worlds are choosing differently, we can expect 1% of them to choose our world, and so we start worrying again, but we add the 7 bits back because it’s only 1% of them.
So let’s consider two Turing machines. Each row will have a cost in bits.
Weight of earth-camera within anthropically updated prior
The last point can be decomposed into [description length of camera in our world—anthropic savings], but it doesn’t matter; it appears in both options.
I don’t think this is what you have in mind, but I’ll add another case, in case this is what you meant by “They are just looking at the earth-like Turing machine”. Maybe, just skip this though.
A B
Consq-alists emerge in a world like ours, Directly prog. anthropic update.
make good guesses about controllable output,
output (strong) anth. updated prior.
Weight of earth-camera in strong anth. update … in normal anth. update
They can make a stronger anthropic update by using information about their world, but the savings will be equal to the extra cost of specifying that the consequentialists are in a world like ours. This is basically the case I mentioned above where different manipulators choose different sets of worlds to try to influence, but then the set of manipulators that choose our world has smaller weight.
------ end potential skip
What I think it boils down to is the question:
Is the anthropically updated version of the universal prior most simply described as “the universal prior, but conditioned on the strings being important to decision-makers in important worlds” or “that thing consequentialists sometimes output”? (And consequentialists themselves may be more simply described as “those things that often emerge”). “Sometimes” is of course doing a lot of work, and it will take bits to specify which “sometimes” we are talking about. If the latter is more simple, then we might expect the natural continuation of those sequences to usually contain treacherous turns, and if the former is more simple, then we wouldn’t. This is why I don’t think the weight of an earth-camera in the universal prior ever comes into it.
But/so I don’t understand if I’m missing the point of a couple paragraphs of your comment—the one which starts “They are just looking at the earth-like Turing machine”, and the next paragraph, which I agree with.
The easiest way to specify an important prediction problem (in the sense of a prediction that would be valuable for someone to influence) is likely to be by saying “Run the following Turing machine, then pick an important decision from within it.” Let’s say the complexity of that specification is N bits.
You think that if consequentialists dedicate some fraction of their resources to doing something that’s easy for the universal prior to output, it will still likely take more than N bits or not much less.
[Probably] You think the differences may be small enough that they can be influenced by factors of 1/1000 or 1/billion (i.e. 10-30 bits) of improbability of consequentialists spending significant resources in this task.
[Probably] You think the TM-definition update (where the manipulators get to focus on inductors who put high probability on their own universe) or the philosophical sophistication update (where manipulators use the “right” prior over possible worlds rather than choosing some programming language) are small relative to these other considerations.
I think the biggest disagreement is about 1+2. It feels implausible to me that “sample a data stream that is being used by someone to make predictions that would be valuable to manipulate” is simpler than any of the other extraction procedures that consequentialists could manipulate (like sample the sequence that appears the most times, sample the highest energy experiments, sample the weirdest thing on some other axis...)
But suppose they picked only one string to try to manipulate. The cost would go way down, but then it probably wouldn’t be us that they hit.
I think we’re probably on the same page now, but I’d say: the consequentialists can also sample from the “important predictions” prior (i.e. the same thing as that fragment of the universal prior). If “sample output channel controlled by consequentialists” has higher probability than “Sample an important prediction,” then the consequentialists control every important prediction. If on the other hand “Sample an important prediction” has higher probability than the consequentialists, I guess maybe they could take over a few predictions, but unless they were super close it would be a tiny fraction and I agree we wouldn’t care.
I think with 4, I’ve been assuming for the sake of argument that manipulators get free access to the right prior, and I don’t have a strong stance on the question, but it’s not complicated for a directly programmed anthropic update to be built on that right prior too.
I guess I can give some estimates for how many bits I think are required for each of the rows in the table. I’ll give a point estimate, and a range for a 50% confidence interval for what my point estimate would be if I thought about it for an hour by myself and had to write up my thinking along the way.
I don’t have a good sense for how many bits it takes to get past things that are just extremely basic, like an empty string, or an infinite string of 0s. But whatever that number is, add it to both 1 and 6.
1) Consequentialists emerge, 10 − 50 bits; point estimate 18
2) TM output has not yet begun, 10 − 30 bits; point estimate 18
3) make good guesses about controllable output, 18 − 150 bits; point estimate 40
4) decide to output anthropically updated prior, 8 − 35 bits; point estimate 15
5) decide to do a treacherous turn. 1 − 12 bits; point estimate 5
vs. 6) direct program for anthropic update. 18-100 bits; point estimate 30
By (3) do you mean the same thing as “Simplest output channel that is controllable by advanced civilization with modest resources”?
I assume (6) means that your “anthropic update” scans across possible universes to find those that contain important decisions you might want to influence?
If you want to compare most easily to models like that, then instead of using (1)+(2)+(3) you should compare to (6′) = “Simplest program that scans across many possible worlds to find those that contain some pattern that can be engineered by consequentialists trying to influence prior.”
Then the comparison is between specifying “important predictor to influence” and whatever the easiest-to-specify pattern that can be engineered by a consequentialist. It feels extremely likely to me that the second category is easier, indeed it’s kind of hard for me to see any version of (6) that doesn’t have an obviously simpler analog that could be engineered by a sophisticated civilization.
With respect to (4)+(5), I guess you are saying that your point estimate is that only 1/million of consequentialists decide to try to influence the universal prior. I find that surprisingly low but not totally indefensible, and it depends on exactly how expensive this kind of influence is. I also don’t really see why you are splitting them apart, shouldn’t we just combine them into “wants to influence predictors”? If you’re doing that presumably you’d both use the anthropic prior and then the treacherous turn.
But it’s also worth noting that (6′) gets to largely skip (4′) if it can search for some feature that is mostly brought about deliberately by consequentialists (who are trying to create a beacon recognizable by some program that scans across possible worlds looking for it, doing the same thing that “predictor that influences the future” is doing in (6)).
I assume (6) means that your “anthropic update” scans across possible universes to find those that contain important decisions you might want to influence?
Yes, and then outputs strings from that set with probability proportional to their weight in the universal prior.
By (3) do you mean the same thing as “Simplest output channel that is controllable by advanced civilization with modest resources”?
I would say “successfully controlled” instead of controllable, although that may be what you meant by the term. (I decomposed this as controllable + making good guesses.) For some definitions of controllable, I might have given a point estimate of maybe 1 or 5 bits. But there has to be an output channel for which the way you transmit a bitstring out is the way the evolved consequentialists expect. But recasting it in these terms, implicitly makes the suggestion that the specification of the output channel can take on some of the character of (6′), makes me want to put my range down to 15-60; point estimate 25.
instead of using (1)+(2)+(3) you should compare to (6′) = “Simplest program that scans across many possible worlds to find those that contain some pattern that can be engineered by consequentialists trying to influence prior.”
Similarly, I would replace “can be” with “seems to have been”. And just to make sure we’re talking about the same thing, it takes this list of patterns, and outputs them with probability proportional to their weight in the universal prior.
Yeah, this seems like it would make some significant savings compared to (1)+(2)+(3). I think replacing parts of the story from being specified as [arising from natural world dynamics] to being specified as [picked out “deliberately” by a program] generally leads to savings.
Then the comparison is between specifying “important predictor to influence” and whatever the easiest-to-specify pattern that can be engineered by a consequentialist. It feels extremely likely to me that the second category is easier, indeed it’s kind of hard for me to see any version of (6) that doesn’t have an obviously simpler analog that could be engineered by a sophisticated civilization.
I don’t quite understand the sense in which [worlds with consequentialist beacons/geoglyphs] can be described as [easiest-to-specify controllable pattern]. (And if you accept the change of “can be” to “seems to have been”, it propagates here). Scanning for important predictors to influence does feel very similar to me to scanning for consequentialist beacons, especially since the important worlds are plausibly the ones with consequentialists.
There’s a bit more work to be done in (6′) besides just scanning for consequentialist beacons. If the output channel is selected “conveniently” for the consequentialists, since the program is looking for the beacons, instead of the consequentialists giving it their best guess(es) and putting up a bunch of beacons, there has to be some part of the program which aggregates the information of multiple beacons (by searching for coherence, e.g.), or else determines which beacon takes precedence, and then also determines how to interpret their physical signature as a bitstring.
Tangent: in heading down a path trying to compare [scan for “important to influence”] vs. [scan for “consequentialist attempted output messages”] just now, my first attempt had an error, so I’ll point it out. It’s not necessarily harder to specify “scan for X” than “scan for Y” when X is a subset of Y. For instance “scan for primes” is probably simpler than “scan for numbers with less than 6 factors”.
Maybe clarifying or recasting the language around “easiest-to-specify controllable pattern” will clear this up, but can you explain more why it feels to you that [scan for “consequentialists’ attempted output messages”] is so much simpler than [scan for “important-to-influence data streams”]? My very preliminary first take is that they are within 8-15 bits.
I also don’t really see why you are splitting them [(4) + (5)] apart, shouldn’t we just combine them into “wants to influence predictors”? If you’re doing that presumably you’d both use the anthropic prior and then the treacherous turn.
I split them in part in case there is there is a contingent of consequentialists who believes that outputting the right bitstring is key to their continued existence, believing that they stop being simulated if they output the wrong bit. I haven’t responded to your claim that this would be faulty metapyhsics on their part; it still seems fairly tangential to our main discussion. But you can interpret my 5 bit point estimate for (5) as claiming that 31 times out of 32 that a civilization of consequentialists tries to influence their world’s output, it is in an attempt to survive. Tell me if you’re interested in a longer justification that responds to your original “line by line comments” comment.
Just look at the prior—for any set of instructions for the work tape heads of the Turing machine, flipping the “write-1” instructions of the output tape with the “write-0″ instructions gives an equally probably Turing machine.
I basically agree that if the civilization has a really good grasp of the situation, and in particular has no subjective uncertainty (merely uncertainty over which particular TM they are), then they can do even better by just focusing their effort on the single best set of channels rather than randomizing.
(Randomization is still relevant for reducing the cost to them though.)
With randomization, you reduce the cost and the upside in concert. If a pair of shoes costs $100, and that’s more than I’m willing to pay, I could buy the shoes with probability 1%, and it will only cost me $1 in expectation, but I will only get the shoes with probability 1⁄100.
I agree that randomization reduces the “upside” in the sense of “reducing our weight in the universal prior.” But utility is not linear in that weight.
I’m saying that the consequentialists completely dominate the universal prior, and they will still completely dominate if you reduce their weight by 2x. So either way they get all the influence. (Quantitatively, suppose the consequentialists currently have probability 1000 times greater than the intended model. Then they have 99.9% of the posterior. If they decreased their probability of acting by two, then they’d have 500 times the probability of the intended model, and so have 99.8% of the posterior. This is almost as good as 99.9%.)
That could fail if e.g. if there are a bunch of other consequentialists also trying to control the sequence. Or if some other model beyond the intended one has much higher probability. But if you think that the consequentialists are X bits simpler than the intended model, and you are trying to argue that the intended model dominates the posterior, then you need to argue that the consequentialists wouldn’t try to grab the universal prior even when doing so only requires acting in 2−X of worlds.
Someone in the basement universe is reasoning about the output of a randomized Turing machine that I’m running on.
I care about what they believe about that Turing machine. Namely, I want them to believe that most of the time when the sequence x appears, it is followed by a 1.
Their beliefs depend in a linear way on my probabilities of action.
(At least if e.g. I committed to that policy at an early enough time for them to reason about it, or if my policy is sufficiently predictable to be correlated with their predictions, or if they are able to actually simulate me in a universe with reflective oracles… If I’m not able to influence their beliefs about me, then of course I can’t influence their beliefs about anything and the whole manipulative project doesn’t get off the ground.)
But my utility is a non-linear function of their beliefs, since P(1|x) is a non-linear function of their beliefs.
So my utility is a non-linear function of my policy.
I’m imagining that the consequentialists care about something, like e.g. human flourishing. They think that they could use their control over the universal prior to achieve more of what they care about, i.e. by achieving a bunch of human flourishing in some other universe where someone thinks about the universal prior. Randomizing is one strategy available to them to do that.
So I’m saying that I expect they will do better—i.e. get more influence over the outside world (per unit of cost paid in their world)---than if they had simply randomized. That’s because randomizing is one of the strategies available to them and they are trying to pick the best one.
(In fact I think they will do many orders of magnitude better than randomizing since they can simultaneously win for many different output methods, and they can ignore the overwhelming majority of output rules which have no chance of describing something interesting about the world).
You seem to be saying that they will get less influence than if they randomized. Something about how this behavior is not sensible “goal-oriented behavior,” and instead the sensible goal-oriented behavior is something that doesn’t get them any influence? In what sense do you think it is sensible goal-oriented behavior, if it doesn’t result in getting any influence?
Maybe the key difference is that I’m talking about a scenario where the consequentialists have the goal of influencing the universal prior, and that possibility seems so weird to you that you aren’t even engaging with it?
It’s definitely not too weird a possibility for me. I’m trying to reason backwards here—the best strategy available to them can’t be effective in expectation at achieving whatever their goals are with the output tape, because of information-theoretic impossibilities, and therefore, any given strategy will be that bad or worse, including randomization.
To express my confusion more precisely:
I think that’s right (other than the fact that they can win simultaneously for many different output rules, but I’m happy ignoring that for now). But I don’t see why it contradicts the story at all. In the story the best case is that we know the true distribution of output rules, and then we do the utility-maximizing thing, and that results in our sequence having way more probability than some random camera on old earth.
If you want to talk about the information theory, and ignore the fact that we can do multiple things, then we control the single output channel with maximal probability, while the camera is just some random output channel (presumably with some much smaller probability).
The information theory isn’t very helpful, because actually all of the action is about which output channels are controllable. If you restrict to some subset of “controllable” channels, and believe that any output rule that outputs the camera is controllable, then the conclusion still holds. So the only way it fails is when the camera is higher probability than the best controllable output channels.
I currently don’t understand the information-theoretic argument at all (and feels like it must come down to some kind of miscommunication), so it seems easiest to talk about how the impossibility argument applies to the situation being discussed.
If we want to instead engage on the abstract argument, I think it would be helpful to me to present it as a series of steps that ends up saying “And that’s why the consequentialists can’t have any influence.” I think the key place I get lost is the connection between the math you are saying and a conclusion about the influence that the consequentialists have.
If these consequentialists ascribed a value of 100 to the next output bit being 1, and a value of 0 to the next output bit being 0, and they valued nothing else, would you agree that all actions available to them have identical expected value under the distribution over Turing machines that I have described?
I don’t agree, but I may still misunderstand something. Stepping back to the beginning:
Suppose they know the sequence that actually gets fed to the camera. It is x= 010...011.
They want to make the next bit 1. That is, they want to maximize the probability of the sequence (x+1)=010...0111.
They have developed a plan for controlling an output channel to get it to output (x+1).
For concreteness imagine that they did this by somehow encoding x+1 in a sequence of ultra high-energy photons sent in a particular direction. Maybe they encode 1 as a photon with frequency A and a 0 as a photon with frequency B.
There is no way this plan results in the next bit being 0. If they are wrong about how the output channel encodes photons (i.e. it decodes A as 1 and B as 0) then that channel isn’t going to end up with any probability.
You don’t try to encode 010...0111 and then accidentally end up encoding 010...0110. You end up encoding something like 101...1000, or something totally different.
If you’re saying that they know their Turing machine has output x so far, then I 100% agree. What about in the case where they don’t know?
I don’t think I understand what you mean. Their goal is to increase the probability of the sequence x+1, so that someone who has observed the sequence x will predict 1.
What do you mean when you say “What about in the case where they don’t know”?
I agree that under your prior, someone has no way to increase e.g. the fraction of sequences in the universal prior that start with 1 (or the fraction of 1s in a typical sequence under the universal prior, or any other property that is antisymmetric under exchange of 0 and 1).
Okay, now suppose they want the first N bits of the output of their Turing machine to obey predicate P, and they assign that a value of 100, and a they assign a value of 0 to any N-bit string that does not obey predicate P. And they don’t value anything else. If some actions have a higher value than other actions, what information about the output tape dynamics are they using, and how did they acquire it?
They are using their highest probability guess about the output channel, which will be higher probability than the output channel exactly matching some camera on old earth (but may still be very low probability). I still don’t understand the relevance.
I’m probably going to give up soon, but there was one hint about a possible miscommunication:
They don’t care about “their” Turing machine, indeed they live in an infinite number of Turing machines that (among other things) output bits in different ways. They just care about the probability of the bitstring x+1 under the universal prior—they want to make the mass of x+1 larger than the mass of x+0. So they will behave in a way that causes some of the Turing machines containing them to output x+1.
And then the question is whether the total mass of Turing machines (i.e. probability of noise strings fed into the UTM) that they are able to get to output x+1 is larger or smaller than the mass of Turing machines that output x for the “intended” reason.
I’m trying to find the simplest setting where we have a disagreement. We don’t need to think about cameras on earth quite yet. I understand the relevance isn’t immediate.
I think I see the distinction between the frameworks we most naturally think about the situation. I agree that they live in an infinite number of Turing machines, in the sense that their conscious patterns appear in many different Turing machines. All of these Turing machines have weight in some prior. When they change their behavior, they (potentially) change the outputs of any of these Turing machines. Taking these Turing machines as a set, weighted by those prior weights we can consider the probability that the output obeys a predicate P. The answer to this question can be arrived at through an equivalent process. Let the inhabitants imagine that there is a correct answer to the question “which Turing machine do I really live in?” They then reason anthropically about which Turing machines give rise to such conscious experiences as theirs. They then use the same prior over Turing machines that I described above. And then they make the same calculation about the probability that “their” Turing machine outputs something that obeys the predicate P. So on the one hand, we could say that we are asking “what is the probability that the section of the universal prior which gives rise to these inhabitants outputs an output that obeys predicate P?” Or we could equivalently ask “what is the probability that this inhabitant ascribes to ‘its’ Turing machine outputting a string that obeys predicate P?”
There are facts that I find much easier to incorporate when thinking in the latter framework, such as “a work tape inhabitant knows nothing about the behavior of its Turing machine’s output tape, except that it has relative simplicity given the world that it knows.” (If it believes that its conscious existence depends on its Turing machine never having output a bit that differs from a data stream in a base world, it will infer other things about its output tape, but you seem to disagree that it would make that assumption, and I’m fine to go along with that). (If the fact were much simpler—“a work tape inhabitant knows nothing about the behavior of its Turing machine’s output tape” full stop—I would feel fairly comfortable in either framework.)
If it is the case that, for any action that a work tape inhabitant takes, the following is unchanged: [the probability that it (anthropically) ascribes to “its” Turing machine printing an output that obeys predicate P after it takes that action], then, no matter its choice of action, then the probability under the universal prior that the output obeys predicate P is also unchanged.
What if the work tape inhabitant only cares about the output when the the universal prior is being used for important applications? Let Q be the predicate [P and “the sequence begins with a sequence which is indicative of important application of the universal prior”]. The same logic that applies to P applies to Q. (It feels easier to talk about probabilities of predicates (expectations of Boolean functions) rather than expectations of general functions, but if we wanted to do importance weighting instead of using a strict predicate on importance, the logic is the same).
Writing about the fact I described above about what the inhabitants believe about their Turing machine’s output has actually clarified my thinking a bit. Here’s a predicate where I think inhabitants could expect certain actions to make it more likely that their Turing machine output obeys that predicate. “The output contains the string [particular 1000 bit string]”. They believe that their world’s output is simple given their world’s dynamics, so if they write that 1000 bit string somewhere, it is more likely for the predicate to hold. (Simple manipulations of the string are nearly equally more likely to be output).
So there are severe restrictions on the precision with which they can control even low-probability changes to the output, but not total restrictions. So I wasn’t quite right in describing it as a max-entropy situation. But the one piece of information that distinguishes their situation from one of maximum uncertainty about the output is very slight. So I think it’s useful to try to think in terms of how they get from that information to their goal for the output tape.
I was describing the situation where I wanted to maximize the probability where the output of our world obeys the predicate: “this output causes decision-maker simulators to believe that virtue pays”. I think I could very slightly increase that probability by trying to reward virtuous people around me. Consider consequentialists who want to maximize the probability of the predicate “this output causes simulator-decision-makers to run code that recreates us in their world”. They want to make the internals of their world such that there are simple relative descriptions for outputs for which that predicate holds. I guess I think that approach offers extremely limited and imprecise ability to deliberately influence the output, no matter how smart you are.
If an approach has very limited success probability, (i.e. very limited sway over the universal prior), they can focus all their effort on mimicking a few worlds, but then we’ll probably get lucky, and ours won’t be one of the ones they focus on.
From a separate recent comment,
You’re comparing the probability of one of these many controlled locations driving the output of the machine to the probability that a random camera does on an earth-like Turing machine drives the output. Whereas it seems to me like the right question is to look at the absolute probabilities that one of these controlled locations drives the output. The reason is that what they attempt to output is a mixture over many sequences that a decision-maker-simulator might want to know about. But if the sequence we’re feeding in is from a camera on earth, than their antics only matter to the extent that their mixture puts weight on a random camera on earth. So they have to specify the random camera on an earth-like Turing machine too. They’re paying the same cost, minus any anthropic update. So the costs to compare are roughly [- log prob. successful control of output + bits to specify camera on earth—bits saved from anthropic update] vs. [bits to specify camera on earth—bits saved from directly programmed anthropic update]. This framing seems to imply we can cross off [bits to specify camera on earth] from both sides.
I think the relevant number is just “log_2 of the number of predictions that the manipulators want to influence.” It seems tricky to think about this (rather small) number as the difference between two (giant) numbers.
They are just looking at the earth-like Turing machine, looking for the inductors whose predictions are important, and then trying to copy those input sequences. This seems mostly unrelated to the complexity of adding states to the Turing machine so that it reads data from a particular location on a particular hard drive. It just rests on them being able to look at the simulation and figure out what’s going on.
On the other hand, the complexity of adding states to the Turing machine so that it reads data from a particular location on a particular hard drive seems very closely related to the complexity of adding states to the Turing machine so that it outputs data encoded by the sophisticated civilization in the format that they thought was easiest for the Turing machine to output.
Do you have some candidate “directly programmed anthropic update” in mind? (That said, my original claim was just about the universal prior, not about a modified version with an anthropic update)
I still feel like the quantitative question we’re discussing is a blow-out and it’s not clear to me where we are diverging on that. My main uncertainty about the broader question is about whether any sophisticated civilizations are motivated to do this kind of thing (which may depend on the nature of the inductor and how much reasoning they have time to do, since that determines whether the inductor’s prediction is connected in the decision-theoretically relevant way with the civilization’s decisions or commitments).
I’m talking about the weight of an anthropically updated prior within the universal prior. I should have added “+ bits to encode anthropic update directly” to that side of the equation. That is, it takes some number of bits to encode “the universal prior, but conditioned on the strings being important to decision-makers in important worlds”. I don’t know how to encode this, but there is presumably a relatively simple direct encoding, since it’s a relatively simple concept. This is what I was talking about in my response to the section “The competition”.
One way that might be helpful about thinking about the bits saved from the anthropic update is that it is −logprobstring∼universal prior(string is important to decision-makers in important worlds). I think this gives us a handle in reasoning about anthropic savings as a self-contained object, even if it’s a big number.
But suppose they picked only one string to try to manipulate. The cost would go way down, but then it probably wouldn’t be us that they hit. If log of the number of predictions that the manipulators want to influence is 7 bits shorter than [bits to specify camera on earth—bits saved from anthropic update], then there’s a 99% chance we’re okay. If different manipulators in different worlds are choosing differently, we can expect 1% of them to choose our world, and so we start worrying again, but we add the 7 bits back because it’s only 1% of them.
So let’s consider two Turing machines. Each row will have a cost in bits.
A B
Consequentialists emerge, Directly programmed anthropic update.
make good guesses about controllable output,
decide to output anthropically updated prior.
Weight of earth-camera within anthropically updated prior
The last point can be decomposed into [description length of camera in our world—anthropic savings], but it doesn’t matter; it appears in both options.
I don’t think this is what you have in mind, but I’ll add another case, in case this is what you meant by “They are just looking at the earth-like Turing machine”. Maybe, just skip this though.
A B
Consq-alists emerge in a world like ours, Directly prog. anthropic update.
make good guesses about controllable output,
output (strong) anth. updated prior.
Weight of earth-camera in strong anth. update … in normal anth. update
They can make a stronger anthropic update by using information about their world, but the savings will be equal to the extra cost of specifying that the consequentialists are in a world like ours. This is basically the case I mentioned above where different manipulators choose different sets of worlds to try to influence, but then the set of manipulators that choose our world has smaller weight.
------ end potential skip
What I think it boils down to is the question:
Is the anthropically updated version of the universal prior most simply described as “the universal prior, but conditioned on the strings being important to decision-makers in important worlds” or “that thing consequentialists sometimes output”? (And consequentialists themselves may be more simply described as “those things that often emerge”). “Sometimes” is of course doing a lot of work, and it will take bits to specify which “sometimes” we are talking about. If the latter is more simple, then we might expect the natural continuation of those sequences to usually contain treacherous turns, and if the former is more simple, then we wouldn’t. This is why I don’t think the weight of an earth-camera in the universal prior ever comes into it.
But/so I don’t understand if I’m missing the point of a couple paragraphs of your comment—the one which starts “They are just looking at the earth-like Turing machine”, and the next paragraph, which I agree with.
Here’s my current understanding of your position:
The easiest way to specify an important prediction problem (in the sense of a prediction that would be valuable for someone to influence) is likely to be by saying “Run the following Turing machine, then pick an important decision from within it.” Let’s say the complexity of that specification is N bits.
You think that if consequentialists dedicate some fraction of their resources to doing something that’s easy for the universal prior to output, it will still likely take more than N bits or not much less.
[Probably] You think the differences may be small enough that they can be influenced by factors of 1/1000 or 1/billion (i.e. 10-30 bits) of improbability of consequentialists spending significant resources in this task.
[Probably] You think the TM-definition update (where the manipulators get to focus on inductors who put high probability on their own universe) or the philosophical sophistication update (where manipulators use the “right” prior over possible worlds rather than choosing some programming language) are small relative to these other considerations.
I think the biggest disagreement is about 1+2. It feels implausible to me that “sample a data stream that is being used by someone to make predictions that would be valuable to manipulate” is simpler than any of the other extraction procedures that consequentialists could manipulate (like sample the sequence that appears the most times, sample the highest energy experiments, sample the weirdest thing on some other axis...)
I think we’re probably on the same page now, but I’d say: the consequentialists can also sample from the “important predictions” prior (i.e. the same thing as that fragment of the universal prior). If “sample output channel controlled by consequentialists” has higher probability than “Sample an important prediction,” then the consequentialists control every important prediction. If on the other hand “Sample an important prediction” has higher probability than the consequentialists, I guess maybe they could take over a few predictions, but unless they were super close it would be a tiny fraction and I agree we wouldn’t care.
Yeah, seems about right.
I think with 4, I’ve been assuming for the sake of argument that manipulators get free access to the right prior, and I don’t have a strong stance on the question, but it’s not complicated for a directly programmed anthropic update to be built on that right prior too.
I guess I can give some estimates for how many bits I think are required for each of the rows in the table. I’ll give a point estimate, and a range for a 50% confidence interval for what my point estimate would be if I thought about it for an hour by myself and had to write up my thinking along the way.
I don’t have a good sense for how many bits it takes to get past things that are just extremely basic, like an empty string, or an infinite string of 0s. But whatever that number is, add it to both 1 and 6.
1) Consequentialists emerge, 10 − 50 bits; point estimate 18
2) TM output has not yet begun, 10 − 30 bits; point estimate 18
3) make good guesses about controllable output, 18 − 150 bits; point estimate 40
4) decide to output anthropically updated prior, 8 − 35 bits; point estimate 15
5) decide to do a treacherous turn. 1 − 12 bits; point estimate 5
vs. 6) direct program for anthropic update. 18-100 bits; point estimate 30
The ranges are fairly correlated.
By (3) do you mean the same thing as “Simplest output channel that is controllable by advanced civilization with modest resources”?
I assume (6) means that your “anthropic update” scans across possible universes to find those that contain important decisions you might want to influence?
If you want to compare most easily to models like that, then instead of using (1)+(2)+(3) you should compare to (6′) = “Simplest program that scans across many possible worlds to find those that contain some pattern that can be engineered by consequentialists trying to influence prior.”
Then the comparison is between specifying “important predictor to influence” and whatever the easiest-to-specify pattern that can be engineered by a consequentialist. It feels extremely likely to me that the second category is easier, indeed it’s kind of hard for me to see any version of (6) that doesn’t have an obviously simpler analog that could be engineered by a sophisticated civilization.
With respect to (4)+(5), I guess you are saying that your point estimate is that only 1/million of consequentialists decide to try to influence the universal prior. I find that surprisingly low but not totally indefensible, and it depends on exactly how expensive this kind of influence is. I also don’t really see why you are splitting them apart, shouldn’t we just combine them into “wants to influence predictors”? If you’re doing that presumably you’d both use the anthropic prior and then the treacherous turn.
But it’s also worth noting that (6′) gets to largely skip (4′) if it can search for some feature that is mostly brought about deliberately by consequentialists (who are trying to create a beacon recognizable by some program that scans across possible worlds looking for it, doing the same thing that “predictor that influences the future” is doing in (6)).
Yes, and then outputs strings from that set with probability proportional to their weight in the universal prior.
I would say “successfully controlled” instead of controllable, although that may be what you meant by the term. (I decomposed this as controllable + making good guesses.) For some definitions of controllable, I might have given a point estimate of maybe 1 or 5 bits. But there has to be an output channel for which the way you transmit a bitstring out is the way the evolved consequentialists expect. But recasting it in these terms, implicitly makes the suggestion that the specification of the output channel can take on some of the character of (6′), makes me want to put my range down to 15-60; point estimate 25.
Similarly, I would replace “can be” with “seems to have been”. And just to make sure we’re talking about the same thing, it takes this list of patterns, and outputs them with probability proportional to their weight in the universal prior.
Yeah, this seems like it would make some significant savings compared to (1)+(2)+(3). I think replacing parts of the story from being specified as [arising from natural world dynamics] to being specified as [picked out “deliberately” by a program] generally leads to savings.
I don’t quite understand the sense in which [worlds with consequentialist beacons/geoglyphs] can be described as [easiest-to-specify controllable pattern]. (And if you accept the change of “can be” to “seems to have been”, it propagates here). Scanning for important predictors to influence does feel very similar to me to scanning for consequentialist beacons, especially since the important worlds are plausibly the ones with consequentialists.
There’s a bit more work to be done in (6′) besides just scanning for consequentialist beacons. If the output channel is selected “conveniently” for the consequentialists, since the program is looking for the beacons, instead of the consequentialists giving it their best guess(es) and putting up a bunch of beacons, there has to be some part of the program which aggregates the information of multiple beacons (by searching for coherence, e.g.), or else determines which beacon takes precedence, and then also determines how to interpret their physical signature as a bitstring.
Tangent: in heading down a path trying to compare [scan for “important to influence”] vs. [scan for “consequentialist attempted output messages”] just now, my first attempt had an error, so I’ll point it out. It’s not necessarily harder to specify “scan for X” than “scan for Y” when X is a subset of Y. For instance “scan for primes” is probably simpler than “scan for numbers with less than 6 factors”.
Maybe clarifying or recasting the language around “easiest-to-specify controllable pattern” will clear this up, but can you explain more why it feels to you that [scan for “consequentialists’ attempted output messages”] is so much simpler than [scan for “important-to-influence data streams”]? My very preliminary first take is that they are within 8-15 bits.
I split them in part in case there is there is a contingent of consequentialists who believes that outputting the right bitstring is key to their continued existence, believing that they stop being simulated if they output the wrong bit. I haven’t responded to your claim that this would be faulty metapyhsics on their part; it still seems fairly tangential to our main discussion. But you can interpret my 5 bit point estimate for (5) as claiming that 31 times out of 32 that a civilization of consequentialists tries to influence their world’s output, it is in an attempt to survive. Tell me if you’re interested in a longer justification that responds to your original “line by line comments” comment.
Just look at the prior—for any set of instructions for the work tape heads of the Turing machine, flipping the “write-1” instructions of the output tape with the “write-0″ instructions gives an equally probably Turing machine.
I basically agree that if the civilization has a really good grasp of the situation, and in particular has no subjective uncertainty (merely uncertainty over which particular TM they are), then they can do even better by just focusing their effort on the single best set of channels rather than randomizing.
(Randomization is still relevant for reducing the cost to them though.)
With randomization, you reduce the cost and the upside in concert. If a pair of shoes costs $100, and that’s more than I’m willing to pay, I could buy the shoes with probability 1%, and it will only cost me $1 in expectation, but I will only get the shoes with probability 1⁄100.
I agree that randomization reduces the “upside” in the sense of “reducing our weight in the universal prior.” But utility is not linear in that weight.
I’m saying that the consequentialists completely dominate the universal prior, and they will still completely dominate if you reduce their weight by 2x. So either way they get all the influence. (Quantitatively, suppose the consequentialists currently have probability 1000 times greater than the intended model. Then they have 99.9% of the posterior. If they decreased their probability of acting by two, then they’d have 500 times the probability of the intended model, and so have 99.8% of the posterior. This is almost as good as 99.9%.)
That could fail if e.g. if there are a bunch of other consequentialists also trying to control the sequence. Or if some other model beyond the intended one has much higher probability. But if you think that the consequentialists are X bits simpler than the intended model, and you are trying to argue that the intended model dominates the posterior, then you need to argue that the consequentialists wouldn’t try to grab the universal prior even when doing so only requires acting in 2−X of worlds.
If I flip a coin to randomize between two policies, I don’t see how that mixed policy could produce more value for me than the base policies.
(ETA: the logical implications about the fact of my randomization don’t have any weird anti-adversarial effects here).
Someone in the basement universe is reasoning about the output of a randomized Turing machine that I’m running on.
I care about what they believe about that Turing machine. Namely, I want them to believe that most of the time when the sequence x appears, it is followed by a 1.
Their beliefs depend in a linear way on my probabilities of action.
(At least if e.g. I committed to that policy at an early enough time for them to reason about it, or if my policy is sufficiently predictable to be correlated with their predictions, or if they are able to actually simulate me in a universe with reflective oracles… If I’m not able to influence their beliefs about me, then of course I can’t influence their beliefs about anything and the whole manipulative project doesn’t get off the ground.)
But my utility is a non-linear function of their beliefs, since P(1|x) is a non-linear function of their beliefs.
So my utility is a non-linear function of my policy.