There’s something called the No Free Lunch theorem which says, approximately, that there’s no truly general algorithm for learning: if an algorithm predicts some environment better than chance, there must exist some adversarial environment on which it will do at least that much worse than chance. (Yes, this is even true of Solomonoff induction.)
In the real world, this is almost completely irrelevant; empirically, general intelligence exists. However, leaving anthropics aside for a moment, we ought to find this irrelevance surprising in some sense; a robust theory of learning first needs to answer the question of why, in our particular universe, it’s possible to learn anything at all.
I suspect that Wentworth’s Telephone Theorem, which says that in the limit of causal distance, information is either completely preserved or completely destroyed, may be a component of a possible answer. The Telephone Theorem is not a property of our universe, but it does single out a property of things we expect to be learnable in the first place: mostly, we can only make observations at large causal distance, since we ourselves are very large in terms of underlying physics, and therefore we only care about the preserved information, not the destroyed information. A maximum-entropy universe, of the sort usually considered by no-free-lunch theorems, would actually look simpler to a macroscale observer, since macroscopic properties like temperature, density, etc. would be approximately uniform throughout.
I expect that this ought to imply something about the class of learning algorithms that work well on the type of data we want to predict, but I’m not sure what.
[Question] Does the Telephone Theorem give us a free lunch?
There’s something called the No Free Lunch theorem which says, approximately, that there’s no truly general algorithm for learning: if an algorithm predicts some environment better than chance, there must exist some adversarial environment on which it will do at least that much worse than chance. (Yes, this is even true of Solomonoff induction.)
In the real world, this is almost completely irrelevant; empirically, general intelligence exists. However, leaving anthropics aside for a moment, we ought to find this irrelevance surprising in some sense; a robust theory of learning first needs to answer the question of why, in our particular universe, it’s possible to learn anything at all.
I suspect that Wentworth’s Telephone Theorem, which says that in the limit of causal distance, information is either completely preserved or completely destroyed, may be a component of a possible answer. The Telephone Theorem is not a property of our universe, but it does single out a property of things we expect to be learnable in the first place: mostly, we can only make observations at large causal distance, since we ourselves are very large in terms of underlying physics, and therefore we only care about the preserved information, not the destroyed information. A maximum-entropy universe, of the sort usually considered by no-free-lunch theorems, would actually look simpler to a macroscale observer, since macroscopic properties like temperature, density, etc. would be approximately uniform throughout.
I expect that this ought to imply something about the class of learning algorithms that work well on the type of data we want to predict, but I’m not sure what.