Completeness of simulations
Suppose I have an exact simulation of a human. Feeling ambitious, I decide to print out a GLUT of the action this human will take in every circumstance; while the simulation of course works at the level of quarks, I have a different program that takes lists of quark movements and translates them into a suitably high-level language, such as “Confronted with the evidence that his wife is also his mother, the subject will blind himself and abdicate”.
Now, one possible situation is “The subject is confronted with the evidence that his wife is also his mother, and additionally with the fact that this GLUT predicts he will do X”. Is it clear that an accurate X exists? In high-level language, I would say that, whatever the prediction is, the subject may choose to do something different. More formally we can notice that the simulation is now self-referential: Part of the result is to be used as the input to the calculation, and therefore affects the result. It is not obvious to me that a self-consistent solution necessarily exists.
It seems to me that this is somehow reminiscent of the Halting Problem, and can perhaps be reduced to it. That is, it may be possible to show that an algorithm that can produce X for arbitrary Turing machines would also be a Halting Oracle. If so, this seems to say something interesting about limitations on what a simulation can do, but I’m not sure exactly what.
For clarity, a GLUT is a Giant Look Up Table, numerating the response of a system to every possible sequence of stimulus.
http://lesswrong.com/lw/pa/gazp_vs_glut/
If the subject is a contrarian, the desired X doesn’t exist. But this doesn’t prevent the GLUT from predicting all actions correctly.
We are imagining the following GLUT entry:
“The subject is presented with evidence E and the [possibly false] fact that the predicted action for this entry is X” → The subject will do action Y.
X need not be equal to Y. It may be that there is no X for which the predicted action Y equals the action X that we told the subject we predicted.This is trivially possible if the subject is a contrarian who will always do the opposite of what you tell him he will do. But the GLUT is still completely accurate! We simply lied to the subject about what the GLUT predicted. We told him we predicted he would do X on being told he would do X, but actually we knew he would do Y.
Sure, but that’s equivalent to just not showing him the GLUT in the first place. All you’ve done is move the problem one level up. There is still a circumstance in which the GLUT cannot reliably predict his action, namely that where you show him an accurate GLUT.
Where did you get the accurate GLUT? Before you had an accurate GLUT , you couldn’t show him the accurate GLUT which would be needed to calculate the accurate GLUT.
Sure, you can calculate the GLUT for all possible inputs, some of which may be accurate GLUT inputs, but then you have to solve the problem of finding which of all possible inputs are accurate GLUT inputs. To answer the question, someone has to do calculations above and beyond just simulating a human.
OK, that’s fair. Here are some more thoughts.
To be concrete suppose the input you give to the subject is a finite string of bits of length N. The GLUT is computed by simulating the output produced by each of the 2^N possible inputs, observing the result, and writing it down. Now we want to know what happens when we encode the GLUT as a bit string of length N and giving it to the subject as an input. I think the answer is that in general we can’t do this, because to encode all 2^N entries of the GLUT clearly requires far more than N bits, so the GLUT is too big to be a valid input to the simulation. So it seems an accurate GLUT is too big for us to be able to show it to the subject. Maybe this is one answer to the question.
But maybe for some subjects we can do enough compression on the GLUT that we can fit it into N bits. As a trivial example, if the input string is large but the subject’s behavior is simply to say “yes” for any input, then we can show him the accurate compressed GLUT: “For any input, the subject will say ‘yes.’”
Less trivially, we could think of the simulator program itself as a highly compressed GLUT. And maybe the simulator executable is a mere 100 MB file, while the simulation accepts 100 GB of input. Then we could supply the simulated subject with the executable that is running him as an accurate compressed GLUT.
Suppose the simulated subject is a contrarian and tries to prove that the compressed GLUT we gave him is not accurate. He loads the executable onto a simulated computer and runs the executable on itself to see what it will output. Of course, it creates a simulated simulated subject who takes the executable and loads it onto a simulated simulated computer to see what it will output. This creates a simulated simulated simulated subject...
So this simulation will never halt when given this input. Presumably an accurate uncompressed GLUT would have an entry reading “Subject is supplied with his own source code → simulation does not halt.”
So maybe we can indeed connect to the halting problem?
Very likely; but I was only going to show him the bits predicting his response to his immediate situation + being shown those bits. We can save a bit of paper by not printing out his predicted reaction to chocolate elephants falling from the sky. :)
It seems to me that your construction relies on the subject having access to the executable. If we don’t give him that access, he cannot use this method of attempted disproof, no matter how contrarian he is.
OK, but again this simply may not be possible even if you have an accurate GLUT. If you give him anything that lets him compute your true prediction in finite time, then he can compute your true prediction and then do the opposite. Even if we have a complete and accurate GLUT, we can never supply him with a true prediction if the accurate GLUT contains no entries of the form “Subject is told he is predicted to do X” → subject does X.
Well, that’s precisely my point. But see prase’s comment below, with the very interesting point that every sufficiently-nice function f(x) has some x for which f(x)=x. The question is whether the human brain is sufficiently nice.
Your actions are compact and continuous, thanks to the laws of physics. If the GLUT outputs a prediction in a way that is also compact and continuous, either because it follows the laws of physics or you just programmed it that way, then there’s at least one fixed point where he’ll take the action he sees. Text output is discrete, and thus this wouldn’t work, but there are other ways of doing this. For example, you could show him video of what he’ll do.
It says that it’s not necessarily possible to make a self-fulfilling prophesy. You can predict what someone will do without input from you, but you may or may not be able to make an accurate prediction given that you tell them the prediction.
For those interested: Brouwer fixed-point theorem
I don’t see compactness of the set of possible predictions.
I’d use the Schauder fixed-point theorem so that you don’t have to worry as much about what space you use.
One way to define the prediction space would be to have it predict the state of the universe immediately after the prediction is stated. Since anything after that is a function of that point in time, it’s sufficient. Every particle is within the distance light would have moved in that time. Each particle has at most the energy of the entire universe plus the maximum energy output of the GLUT response (I’m assuming it’s somehow working from outside this universe). This includes some impossible predictions, but that just means that they’re not in the range. They’re still in the domain. Just take the Cartesian products of the positions and momentums of the particles, and you end up in a Euclidean space.
If you want to take quantum physics into account, you’d need to use the quantum configuration space for each prediction. You only need to worry about everything in the light cone of the prediction again. You define the distance between each configuration space as the integral of the square of the magnitude of the difference of the quantum waveform at each point. Since the integral of the square of the magnitude of each waveform adds to one, it’s bounded. Since it’s always exactly one, you get a sphere, which isn’t convex, but that’s just the range. You just take the convex hull, a ball, as the domain.
The actual output of the GLUT may or may not be compact depending on how you display the output.
I don’t understand why that follows; can you elaborate?
Additionally, “at least one fixed point” seems distinct from “we can construct X for all situations”.
A sufficiently nice function is mathematically guaranteed to have at least one fixed point where f(x) = x. We need to make some assumptions to make it nice enough, but once we do that, we just set x as the hypothetical GLUT output, and f(x) to the GLUT output of the subject’s reaction to x, and we know there’s some value of x where the GLUT output of the reaction is the same as what the subject is reacting to.
f is the situation, and the fixed point is what you’re calling X. Also, I’m not sure if there’s a method to construct X. We just know it exists. You can’t even just check every value of X, because that only works if it’s discrete, which means it’s not sufficiently nice.
Thanks for the elaboration; this is a very interesting point that I wasn’t aware of. But it does seem to rely on the function having the same domain as its range, which presumably is one of the assumptions going into the niceness. It is not clear to me, although perhaps I’m just not thinking it through, that “future movements of quarks” is the same as “symbols to be interpreted as future movements of quarks”.
You could think of it as x is the GLUT output, f(x) is the subject’s response, and g(f(x)) is the GLUT’s interpretation of the subject’s response. f maps from GLUT output to subject response, and g maps from subject response to GLUT output. f and g don’t have fixed points, because they don’t have the same domain and range. f∘g, however, maps from GLUT output to GLUT output, so it has the same domain and range. I was just calling it f, but this way it might be less confusing.
There is not a general solution.
Suppose that the human is just a mathematical function, where the input is a real number x and his output is a real number y. The person is the function y=f(x); and (in the non-self-referential case) an accurate GLUT just predicts: if the person is given the number x, then he will give you the number y.
In the second case, where the person is told what the GLUT says, he is now a function of two variables h(x,y)=z. The person is given the number x, and told that the GLUT predicts that he will output y, and then he outputs z. In order for the GLUT to be an accurate predictor we must have y=z, but that is not possible in general. For instance, consider h(x,y)=y+1; the person always outputs a number that is 1 more than what the GLUT said he would output.
The problem that the GLUT-maker faces is that she knows the human’s function, h(x,y)=z, and she must create a GLUT function, y=g(x), such that g(x)=h(x,g(x)) for all x. I presume that there is math out there which says what features h(x,y) must have in order for this to be possible, but I don’t know it.
What about Exists y: h(x,y) = y? If this holds, take Forall x: g(x) = y and we are done.
Edit: to be more general, we can let y depend on x, which gives Forall x: Exists y(x): h(x,y(x)) = y(x), but that is actually the original statement with the letter y standing in place of the original g. I doubt there is anything more to say about such functions, at least in general.
Is the fact that the simulated subject is a human important for the proposed thought experiment, besides that it activates all sorts of wrong intuitions about free will and makes the lookup table unimaginably huge or even infinite?
It is not, why should it be? By assumption the subject does whatever the GLUT predicts but it doesn’t follow that the GLUT includes a proposition “if the subject is confronted with the information that the GLUT predicts that he will do X, he will do X”.
I don’t think so, any Turing machine will do.
The exact simulation will be quantum. You can make statements of quantum mechanics that include multiple ‘cat-states’ - and in general, this is the sort of result you will get from the GLUT. You will only be able to observe one of these, so such a prediction, though correct, will appear wrong.
At this point you are not talking about a simulation of a person, but rather a simulation of the entire world. Confronted with the knowledge that a GLUT exists predicting I will do anything, I will resolve to pick some relatively trivial decision each day, write down 10 choices, numbered 0 to 9, and then choose the item based on the first numerical digit I see written out on the 11th page of the first section of the New York Times for that day, excluding page numbers or references to page numbers. At that point, the simulation of me has to simulate the production of the next day’s New York Times, which is tantamount to having to simulate the entire world.
Not that that is necessarily any harder to do than a complete and accurate simulation of an individual person, mind you.
Sure. A mere computational detail.
There are Löbian situations and Gödelian situations. In the Löbian situation, the simulation makes no difference to the outcome. You’re hungry, you just made breakast, and the simulation predicts you will eat the breakfast because there’s no perverse impulse present in your psyche right now, that would drive you to go hungry just to contradict the prediction. The Gödelian situations arise when knowing the prediction does matter, factually or motivationally. If an agent has the intention and the power to always do A if you say B and B if you say A, then there’s no way to say A or B and get it right. But you can “predict”, for example, that it will listen to your prediction and … become annoyed, because you didn’t say A or B.
This dilemma could not possibly exist because: the map must be smaller than the territory. (Has this concept ever been formalized anywhere on lesswrong before? I can’t seem to find it. Maybe it should?).
Every time you add a scenario of the form:
“The subject is confronted with the evidence that his wife is also his mother, and additionally with the fact that this GLUT predicts he will do X”
Where the GLUT itself comes into play, you increase the scenarios that the GLUT must compute by one.
Now the GLUT needs to factor in not just situation “n,” but also situation “n + GLUT.” The GLUT that simulates this new “n + 1” complex of situations is now the new “GLUT-beta,” and it is a more-complex lookup table than the original GLUT.
So, yes, GLUT-beta can simulate “person + interaction with GLUT” and come up with a unique prediction X.
What GLUT-beta CANNOT DO is simulate “person + interaction with GLUT-beta” because “person + GLUT-beta” is more complex than “GLUT-beta” by itself, in the same way that “n + 1″ will always be larger than n. The lookup table that would simulate “person + interaction with GLUT-beta” would have to be an even more complex lookup table...call it “GLUT-gamma,” perhaps.
The problem is that a Giant Lookup Table is a map, but it is not a very compressed map. In fact, it is the least compressed map that is possible. It is a map that is already as complex as the territory that it is mapping.
A Giant Lookup Table is like making a 1:1 scale model of the Titanic. What you end up with is just the original Titanic. Likewise, a 1:1 map of England would be...a replica of England.
Now, a 1:1 Giant Lookup Table can still be useful because you can map from one medium to another. This is what a Giant Lookup Table does. If you think of learning a foreign language as learning a Giant Lookup Table of 1:1 vocabulary translations (which is not a perfect description of foreign language learning, but just bear with the thought-experiment for a moment), we can see how a Giant Lookup Table could still be useful even though it is not compressing the information at all.
So your 1:1 replica of the Titanic might be made out of cardboard rather than steel, and your 1:1 map of England might be made out of paper rather than dirt. But your 1:1 cardboard replica of the Titanic is still going to have to be hundreds of meters in length, and your 1:1 map of England is still going to have to be hundreds of kilometers across.
Now, let’s say you come to me wanting to create a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic.” Okay, fine. You end up basically with two 1:1 replicas of the Titanic. This is our “GLUT-beta.”
Now, let’s say you demand that I make a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic,” but you only have room in your drydock for 1 ship, so you demand that the result be compressed into the space of 1 Titanic. It cannot be done. You will have to make the drydock (the territory) larger, or you will have to compress the replicas somehow (in other words, go from a 1:1 replica to a 1:2 scale-model).
This is the same dilemma you get from asking the GLUT to simulate “person + GLUT.” Either the GLUT has to get bigger (in which case, it is no longer simulating “person + itself,” but rather, “person + earlier simpler version of itself”), or you have to replace the 1:1 GLUT with a more efficient (compressed) prediction algorithm.
Likewise, if you ask a computer to perfectly, quark-by-quark, in 1:1 fashion, simulate the entire universe, it can’t be done because the computer would have to simulate “itself + the rest of the universe,” and a 1:1 simulation of itself is already as big as itself, so it in order to simulate the rest of the universe, it has to get bigger, in which case it has more of itself to model, so that it must get bigger, in which case it has even more of itself to model, etc. in an infinite regress.
The map must always be less than or equal to the territory (and if the map is equal to the territory, then it is not really a “map” as we ordinarily use the term, but more like 1:1 scale model). So all of this can be simplified by saying:
The map must be smaller than the territory.
Or perhaps, this saying should be further refined to:
The map must be less complex than the territory.
That is because, after all, one could create a map of England that was twice as big as the original England. However, such a map would not be able to exceed the complexity of the original England. Everywhere in the original England where there was 1 quark, you would have 2 quarks on your map. You wouldn’t get increasingly complex configurations of quarks in the larger map of England. If you did, it would not be a faithful map of the original England.
I don’t see why this is so.
If we understand “smaller” as “less complex” then the claim is that a model must be less complex than reality which it represents. That doesn’t look true to me.
You mean, he is confronted with the statement that this GLUT predicts he will do X. That statement may or may not be true, depending on his behavior. He can choose a strategy of either always choosing to do what is predicted, always choosing to do the opposite of what is predicted, or ignoring the prediction and choosing based on unrelated criteria. It is possible to construct a lookup table containing accurate predictions of this sort only in the first and third cases, but not in the second.
Except that if the simulation really is accurate, his response should be already taken into account. Reality is deterministic, an adequately accurate and detailed program should be able to predict exactly. Human free will relies on the fact that our behavior has too many influences to be predicted by any past or current means. Currently, we can’t even define all of the influences.
If the simulation is really accurate, then the GLUT would enter an infinite loop if he uses an ‘always do the opposite’ strategy.
Ie, “Choose either heads or tails. The oracle predicts you will choose .” If his strategy is ‘choose heads because I like heads’ then the oracle will correctly predict it. If his strategy is ‘do what the oracle says’, then the oracle can choose either heads or tails, and the oracle will predict that and get it correct. If his strategy is ‘flip a coin and choose what it says’ then the oracle will predict that action and if it is a sufficiently powerful oracle, get it correct by modeling all the physical interactions that could change the state of the coin.
However, if his strategy is ‘do the opposite’, then the oracle will never halt. It will get in an infinite recursion choosing heads, then tails, then heads, then tails, etc. until it crashes. It’s no different than an infinite loop in a computer program.
It’s not that the oracle is inaccurate. It’s that a recursive GLUT cannot be constructed for all possible agents.
In what situation does the GLUT predict that he will do X? Let’s say, for example, that the GLUT predicts: “Confronted with the evidence that his wife is also his mother, the subject will do X.” When the subject is confronted with the evidence, and also with the GLUT’s prediction, he does Y. But the GLUT is not wrong, because it predicted his actions under a different set of information than he actually had. Alternatively, let’s define A as: “evidence that his wife is also his mother, and that when shown evidence A, does X”. There is no particular reason that there has to exist an X such that it is possible to construct A.
Your A is the scenario I had in mind.
The concept of an “exact simulation” disquiets me on a deep level, which historically has suggested a wrong question is in the neighborhood.
I do agree with your sense that this is an interpretation of the Halting Problem, with the quirk that self-referential programs are explicitly mentioned.