No, its not just about the information, its about information, our utility function, and our epistemic capabilities. Suppose I had taken ultra high resolution electron microscope images of one particular brick in the wall. And burried the hard drives on the moon. Most of the information about the wall that isn’t near the wall is the hard drives. But if you are trying to reach the top, and want to know how big a ladder to get, you still don’t care about my electron microscope images.
Humans don’t track the entire causal graph. We just track the fragments that are most important to achieving our utility function, given our mental limitations. A superintelligent AI might be able to track consequences of brick parity all over the place. All we know is that we can’t track it very far. If we are too far from the wall to see the brick parity, we can’t track it.
Information about brick-parity just doesn’t propagate very far in the causal graph of the world; it’s quickly wiped out by noise in other variables.
How do you distinguish the info not being there, from you being unable to see it? A function can be perfectly deterministic, but seem random to you because you can’t compute it.
The problem with the hard-drive example is that the information is only on that one hard drive, buried somewhere on the moon. It’s not about how much information is relevant far away, it’s about how many different far-away places the information is relevant. Information which is relevant to many different neighborhoods of far-away variables is more likely to be relevant to something humans care about (because it’s relevant to many things); information which is relevant to only a few far-away chunks of variables is less likely to touch anything humans care about.
What makes wall-height interesting is that it’s relevant to a lot of different variables in the world—or, equivalently, we can learn something about the wall-height by observing many different things from many different places. If I’m standing on the lawn next door, look down and see the building’s shadow, then I’ve gained info about the building height. If I’m looking at the block from far away, and see the building over the surrounding buildings, I’ve learned something about the height. If I’m moving a couch around inside the building, and find that I have enough space to stand the couch on its end, then I’ve learned something about the height.
To put it differently: I can learn about the height from many different vantage points.
A toy model I use to study this sort of thing: we have a sparse causal network of normal variables. Pick one neighborhood of variables in this network, and calculate what it tells you about the variables in some other neighborhood elsewhere in the network. The main empirical result is that, if we fix one neighborhood X and ask what information we can gain about X by examining many different neighborhoods Y1,Y2,..., then it turns out that most of the neighborhoods Y contain approximately-the-same information about X. (Specifically: we can apply a singular vector decomposition to the covariance matrix of X with each of the Y’s, and it turns out that it’s usually low-rank and that the X-side singular vectors are approximately the same for a wide variety of Y’s.) I’ll have a post on this at some point.
In the hard drive example, the information is only in one little chunk of the world. (Well, two little chunks: the hard drive and the original brick.) By contrast, information about the wall height is contained in a wide(r) variety of other variables in other places.
How do you distinguish the info not being there, from you being unable to see it? A function can be perfectly deterministic, but seem random to you because you can’t compute it.
Well, at least in the toy models, I have can calculate exactly what information is available, and I do expect the key assumptions of these toy models to carry over to the real world. More generally, for chaotic systems (including e.g. motions of air molecules), we know that information is quickly wiped out given any uncertainty at all in the initial conditions.
If my only evidence were “it looks random”, then yes, I’d agree that’s weak evidence. Things we don’t understand look random, not mysterious. But we do have theory backing up the idea that information is quickly wiped out in the real world, given even very small uncertainty in initial conditions.
No, its not just about the information, its about information, our utility function, and our epistemic capabilities. Suppose I had taken ultra high resolution electron microscope images of one particular brick in the wall. And burried the hard drives on the moon. Most of the information about the wall that isn’t near the wall is the hard drives. But if you are trying to reach the top, and want to know how big a ladder to get, you still don’t care about my electron microscope images.
Humans don’t track the entire causal graph. We just track the fragments that are most important to achieving our utility function, given our mental limitations. A superintelligent AI might be able to track consequences of brick parity all over the place. All we know is that we can’t track it very far. If we are too far from the wall to see the brick parity, we can’t track it.
How do you distinguish the info not being there, from you being unable to see it? A function can be perfectly deterministic, but seem random to you because you can’t compute it.
The problem with the hard-drive example is that the information is only on that one hard drive, buried somewhere on the moon. It’s not about how much information is relevant far away, it’s about how many different far-away places the information is relevant. Information which is relevant to many different neighborhoods of far-away variables is more likely to be relevant to something humans care about (because it’s relevant to many things); information which is relevant to only a few far-away chunks of variables is less likely to touch anything humans care about.
What makes wall-height interesting is that it’s relevant to a lot of different variables in the world—or, equivalently, we can learn something about the wall-height by observing many different things from many different places. If I’m standing on the lawn next door, look down and see the building’s shadow, then I’ve gained info about the building height. If I’m looking at the block from far away, and see the building over the surrounding buildings, I’ve learned something about the height. If I’m moving a couch around inside the building, and find that I have enough space to stand the couch on its end, then I’ve learned something about the height.
To put it differently: I can learn about the height from many different vantage points.
A toy model I use to study this sort of thing: we have a sparse causal network of normal variables. Pick one neighborhood of variables in this network, and calculate what it tells you about the variables in some other neighborhood elsewhere in the network. The main empirical result is that, if we fix one neighborhood X and ask what information we can gain about X by examining many different neighborhoods Y1,Y2,..., then it turns out that most of the neighborhoods Y contain approximately-the-same information about X. (Specifically: we can apply a singular vector decomposition to the covariance matrix of X with each of the Y’s, and it turns out that it’s usually low-rank and that the X-side singular vectors are approximately the same for a wide variety of Y’s.) I’ll have a post on this at some point.
In the hard drive example, the information is only in one little chunk of the world. (Well, two little chunks: the hard drive and the original brick.) By contrast, information about the wall height is contained in a wide(r) variety of other variables in other places.
Well, at least in the toy models, I have can calculate exactly what information is available, and I do expect the key assumptions of these toy models to carry over to the real world. More generally, for chaotic systems (including e.g. motions of air molecules), we know that information is quickly wiped out given any uncertainty at all in the initial conditions.
If my only evidence were “it looks random”, then yes, I’d agree that’s weak evidence. Things we don’t understand look random, not mysterious. But we do have theory backing up the idea that information is quickly wiped out in the real world, given even very small uncertainty in initial conditions.