We can salvage a counting argument. But it needs to be a little subtle. And it’s all about the comments, not the code.
Suppose a neural network has 1 megabyte of memory. To slightly oversimplify, lets say it can represent a python file of 1 megabyte.
One option is for the network to store a giant lookup table. Lets say the network needs half a megabyte to store the training data in this table. This leaves the other half free to be any rubbish. Hence around 24,000,000 possible networks.
The other option is for the network to implement a simple algorithm, using up only 1kb. Then the remaining 999kb can be used for gibberish comments. This gives 27,992,000 possible networks. Which is a lot more.
The comments can be any form of data that doesn’t show up during training. Whether it can show up in other circumstances or is a pure comment doesn’t matter to the training dynamics.
If the line between training and test is simple, there isn’t a strong counting argument against nonsense showing up in test.
But programs that go
if in_traning():
return sensible_algorithm()
else:
return “random nonsense goes here”
Have to pay the extra cost of an “in_training” function that returns true in training. If the test data is similar to training, the cost of a step that returns false in test can be large. This is assuming that there is a unique sensible algorithm.
We can salvage a counting argument. But it needs to be a little subtle. And it’s all about the comments, not the code.
Suppose a neural network has 1 megabyte of memory. To slightly oversimplify, lets say it can represent a python file of 1 megabyte.
One option is for the network to store a giant lookup table. Lets say the network needs half a megabyte to store the training data in this table. This leaves the other half free to be any rubbish. Hence around 24,000,000 possible networks.
The other option is for the network to implement a simple algorithm, using up only 1kb. Then the remaining 999kb can be used for gibberish comments. This gives 27,992,000 possible networks. Which is a lot more.
The comments can be any form of data that doesn’t show up during training. Whether it can show up in other circumstances or is a pure comment doesn’t matter to the training dynamics.
If the line between training and test is simple, there isn’t a strong counting argument against nonsense showing up in test.
But programs that go
Have to pay the extra cost of an “in_training” function that returns true in training. If the test data is similar to training, the cost of a step that returns false in test can be large. This is assuming that there is a unique sensible algorithm.