This dilemma could not possibly exist because: the map must be smaller than the territory. (Has this concept ever been formalized anywhere on lesswrong before? I can’t seem to find it. Maybe it should?).
Every time you add a scenario of the form:
“The subject is confronted with the evidence that his wife is also his mother, and additionally with the fact that this GLUT predicts he will do X”
Where the GLUT itself comes into play, you increase the scenarios that the GLUT must compute by one.
Now the GLUT needs to factor in not just situation “n,” but also situation “n + GLUT.” The GLUT that simulates this new “n + 1” complex of situations is now the new “GLUT-beta,” and it is a more-complex lookup table than the original GLUT.
So, yes, GLUT-beta can simulate “person + interaction with GLUT” and come up with a unique prediction X.
What GLUT-beta CANNOT DO is simulate “person + interaction with GLUT-beta” because “person + GLUT-beta” is more complex than “GLUT-beta” by itself, in the same way that “n + 1″ will always be larger than n. The lookup table that would simulate “person + interaction with GLUT-beta” would have to be an even more complex lookup table...call it “GLUT-gamma,” perhaps.
The problem is that a Giant Lookup Table is a map, but it is not a very compressed map. In fact, it is the least compressed map that is possible. It is a map that is already as complex as the territory that it is mapping.
A Giant Lookup Table is like making a 1:1 scale model of the Titanic. What you end up with is just the original Titanic. Likewise, a 1:1 map of England would be...a replica of England.
Now, a 1:1 Giant Lookup Table can still be useful because you can map from one medium to another. This is what a Giant Lookup Table does. If you think of learning a foreign language as learning a Giant Lookup Table of 1:1 vocabulary translations (which is not a perfect description of foreign language learning, but just bear with the thought-experiment for a moment), we can see how a Giant Lookup Table could still be useful even though it is not compressing the information at all.
So your 1:1 replica of the Titanic might be made out of cardboard rather than steel, and your 1:1 map of England might be made out of paper rather than dirt. But your 1:1 cardboard replica of the Titanic is still going to have to be hundreds of meters in length, and your 1:1 map of England is still going to have to be hundreds of kilometers across.
Now, let’s say you come to me wanting to create a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic.” Okay, fine. You end up basically with two 1:1 replicas of the Titanic. This is our “GLUT-beta.”
Now, let’s say you demand that I make a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic,” but you only have room in your drydock for 1 ship, so you demand that the result be compressed into the space of 1 Titanic. It cannot be done. You will have to make the drydock (the territory) larger, or you will have to compress the replicas somehow (in other words, go from a 1:1 replica to a 1:2 scale-model).
This is the same dilemma you get from asking the GLUT to simulate “person + GLUT.” Either the GLUT has to get bigger (in which case, it is no longer simulating “person + itself,” but rather, “person + earlier simpler version of itself”), or you have to replace the 1:1 GLUT with a more efficient (compressed) prediction algorithm.
Likewise, if you ask a computer to perfectly, quark-by-quark, in 1:1 fashion, simulate the entire universe, it can’t be done because the computer would have to simulate “itself + the rest of the universe,” and a 1:1 simulation of itself is already as big as itself, so it in order to simulate the rest of the universe, it has to get bigger, in which case it has more of itself to model, so that it must get bigger, in which case it has even more of itself to model, etc. in an infinite regress.
The map must always be less than or equal to the territory (and if the map is equal to the territory, then it is not really a “map” as we ordinarily use the term, but more like 1:1 scale model). So all of this can be simplified by saying:
The map must be smaller than the territory.
Or perhaps, this saying should be further refined to:
The map must be less complex than the territory.
That is because, after all, one could create a map of England that was twice as big as the original England. However, such a map would not be able to exceed the complexity of the original England. Everywhere in the original England where there was 1 quark, you would have 2 quarks on your map. You wouldn’t get increasingly complex configurations of quarks in the larger map of England. If you did, it would not be a faithful map of the original England.
If we understand “smaller” as “less complex” then the claim is that a model must be less complex than reality which it represents. That doesn’t look true to me.
This dilemma could not possibly exist because: the map must be smaller than the territory. (Has this concept ever been formalized anywhere on lesswrong before? I can’t seem to find it. Maybe it should?).
Every time you add a scenario of the form:
“The subject is confronted with the evidence that his wife is also his mother, and additionally with the fact that this GLUT predicts he will do X”
Where the GLUT itself comes into play, you increase the scenarios that the GLUT must compute by one.
Now the GLUT needs to factor in not just situation “n,” but also situation “n + GLUT.” The GLUT that simulates this new “n + 1” complex of situations is now the new “GLUT-beta,” and it is a more-complex lookup table than the original GLUT.
So, yes, GLUT-beta can simulate “person + interaction with GLUT” and come up with a unique prediction X.
What GLUT-beta CANNOT DO is simulate “person + interaction with GLUT-beta” because “person + GLUT-beta” is more complex than “GLUT-beta” by itself, in the same way that “n + 1″ will always be larger than n. The lookup table that would simulate “person + interaction with GLUT-beta” would have to be an even more complex lookup table...call it “GLUT-gamma,” perhaps.
The problem is that a Giant Lookup Table is a map, but it is not a very compressed map. In fact, it is the least compressed map that is possible. It is a map that is already as complex as the territory that it is mapping.
A Giant Lookup Table is like making a 1:1 scale model of the Titanic. What you end up with is just the original Titanic. Likewise, a 1:1 map of England would be...a replica of England.
Now, a 1:1 Giant Lookup Table can still be useful because you can map from one medium to another. This is what a Giant Lookup Table does. If you think of learning a foreign language as learning a Giant Lookup Table of 1:1 vocabulary translations (which is not a perfect description of foreign language learning, but just bear with the thought-experiment for a moment), we can see how a Giant Lookup Table could still be useful even though it is not compressing the information at all.
So your 1:1 replica of the Titanic might be made out of cardboard rather than steel, and your 1:1 map of England might be made out of paper rather than dirt. But your 1:1 cardboard replica of the Titanic is still going to have to be hundreds of meters in length, and your 1:1 map of England is still going to have to be hundreds of kilometers across.
Now, let’s say you come to me wanting to create a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic.” Okay, fine. You end up basically with two 1:1 replicas of the Titanic. This is our “GLUT-beta.”
Now, let’s say you demand that I make a 1:1 replica of “the Titanic + a 1:1 replica of the Titanic,” but you only have room in your drydock for 1 ship, so you demand that the result be compressed into the space of 1 Titanic. It cannot be done. You will have to make the drydock (the territory) larger, or you will have to compress the replicas somehow (in other words, go from a 1:1 replica to a 1:2 scale-model).
This is the same dilemma you get from asking the GLUT to simulate “person + GLUT.” Either the GLUT has to get bigger (in which case, it is no longer simulating “person + itself,” but rather, “person + earlier simpler version of itself”), or you have to replace the 1:1 GLUT with a more efficient (compressed) prediction algorithm.
Likewise, if you ask a computer to perfectly, quark-by-quark, in 1:1 fashion, simulate the entire universe, it can’t be done because the computer would have to simulate “itself + the rest of the universe,” and a 1:1 simulation of itself is already as big as itself, so it in order to simulate the rest of the universe, it has to get bigger, in which case it has more of itself to model, so that it must get bigger, in which case it has even more of itself to model, etc. in an infinite regress.
The map must always be less than or equal to the territory (and if the map is equal to the territory, then it is not really a “map” as we ordinarily use the term, but more like 1:1 scale model). So all of this can be simplified by saying:
The map must be smaller than the territory.
Or perhaps, this saying should be further refined to:
The map must be less complex than the territory.
That is because, after all, one could create a map of England that was twice as big as the original England. However, such a map would not be able to exceed the complexity of the original England. Everywhere in the original England where there was 1 quark, you would have 2 quarks on your map. You wouldn’t get increasingly complex configurations of quarks in the larger map of England. If you did, it would not be a faithful map of the original England.
I don’t see why this is so.
If we understand “smaller” as “less complex” then the claim is that a model must be less complex than reality which it represents. That doesn’t look true to me.