Note that pure prediction is not understanding. As a simple example take the case of predicting the outcomes of 100 fair coin tosses. Predicting tails every flip will give you maximum expected predictive accuracy (50%), but it is not the correct generative model for the data. Over the course of this sequence, we will come to formally understand why this is the case.
You are just not using a proper scoring rule. If you used log or brier score, then predicting 50% head, 50% tails would indeed give the highest score.
for i in range(data_length/3):
data.append(0)
data.append(1)
data.append(np.random.choice([0,1])
Here you are cheating by using an external library. Just calling random_bits[i] is also very compact. You might at least include your pseudo-random number generator.
conceptual downside if we want Kolmogorov complexity to measure structure in natural systems: it assigns maximal ‘complexity’ to random strings.
Maybe I’ve read too much Jaynes recently, but I think quoting ‘complexity’ in that sentence is misplaced. Random/pseudo-random processes really are complex. Jaynes actually uses the coin tossing example and shows how to easily cheat at coin tossing in an undetectable way (page 317). Random sounds simple, but that’s just our brain being ‘lazy’ and giving up (beware mind projection fallacy blah...). If you don’t have infinite compute (as seems to be the case in reality), you must make compromises, which is why this looks pretty promising.
These points are well taken. I agree re: log-score. We were trying to compare to the most straightforward/naive reward maximization setup. Like in a game where you get +1 point for correct answers and −1 point for incorrect. But I take your point that other scores lead to different (in this case better) results.
Re: cheating. Yes! That is correct. It was too much to explain in this post, but ultimately we would like to argue that we can augment Turing Machines with a heat source (ie a source of entropy) such that it can generate random bits. Under that type of setup the “random number generator” becomes much simpler/easier than having to use a deterministic algorithm. In addition, the argument will go, this augmented Turing machine is in better correspondence with natural systems that we want to understand as computing.
Which leads to your last point, which I think is very fundamental. I disagree a little bit, in a specific sense. While it is true that “randomness” comes from a specific type of “laziness,” I think it’s equally true that this laziness actually confers computational powers of a certain sort. For now, I’ll just say that this has to do with abstraction and uncertainty, and leave the explanation of that for another post.
Looking forward to the rest of the sequence!
Nitpicks:
You are just not using a proper scoring rule. If you used log or brier score, then predicting 50% head, 50% tails would indeed give the highest score.
Here you are cheating by using an external library. Just calling random_bits[i] is also very compact. You might at least include your pseudo-random number generator.
Maybe I’ve read too much Jaynes recently, but I think quoting ‘complexity’ in that sentence is misplaced. Random/pseudo-random processes really are complex. Jaynes actually uses the coin tossing example and shows how to easily cheat at coin tossing in an undetectable way (page 317). Random sounds simple, but that’s just our brain being ‘lazy’ and giving up (beware mind projection fallacy blah...). If you don’t have infinite compute (as seems to be the case in reality), you must make compromises, which is why this looks pretty promising.
These points are well taken. I agree re: log-score. We were trying to compare to the most straightforward/naive reward maximization setup. Like in a game where you get +1 point for correct answers and −1 point for incorrect. But I take your point that other scores lead to different (in this case better) results.
Re: cheating. Yes! That is correct. It was too much to explain in this post, but ultimately we would like to argue that we can augment Turing Machines with a heat source (ie a source of entropy) such that it can generate random bits. Under that type of setup the “random number generator” becomes much simpler/easier than having to use a deterministic algorithm. In addition, the argument will go, this augmented Turing machine is in better correspondence with natural systems that we want to understand as computing.
Which leads to your last point, which I think is very fundamental. I disagree a little bit, in a specific sense. While it is true that “randomness” comes from a specific type of “laziness,” I think it’s equally true that this laziness actually confers computational powers of a certain sort. For now, I’ll just say that this has to do with abstraction and uncertainty, and leave the explanation of that for another post.
My understanding has improved since writing this post.
Generative and predictive models can indeed be substantially different—but as you point out the reason we give is unsatisfying.
The better thing to point towards is there are finite generative models such that the optimal predictive model is infinite.
See this paper for more.