I’m not sure I understand the search vs discriminative distinction. If my hand touches fire and thus immediately moves backwards by reflex, would this be an example of a discriminative policy, because an input signal directly causes an action without being processed in the brain?
About the goal of winning at chess: in the case of minimax search, p generates the complete tree of the game using D and then selects the winning policy; as you said, this is probably the simplest agent (in terms of Kolmogorov complexity, given D) that wins at chess—and actually wins at any game that can be solved using minimax/backward induction. In the case of RL(p), p reads the environmental data D about chess to assign reward 1 to winning states and 0 elsewhere, and RL represents an ideal RL procedure that exploits interaction with the environment to generate the optimal policy π that maximises the reward function created by p. The main feature is that in both cases, when the environment gets bigger and D grows, the description length of the two algorithms given D doesn’t change: you could use minimax or the ideal RL procedure to generate a winning policy even for chess on a larger board, for example. If instead you wanted to use a giant lookup table, you would have to extend your algorithm each time a new state gets added to the environment.
I guess the confusion may come from the fact that D is underspecified. I tried to formalise it more precisely by using logic, but there were some problems and it’s still work in progress.
By the way, thanks for the links! I hope I’ll learn something new about how the brain works, I’m definitely not an expert on cognitive science :)
Hmm, maybe we’re talking past each other. Let’s say I have something like AlphaZero, where 50,000 bytes of machine code trains an AlphaZero-type chess-playing agent, whose core is a 1-billion-parameter ConvNet. The ConvNet takes 1 billion bytes to specify. Meanwhile, the reward-calculator p, which calculates whether checkmate has occurred, is 100 bytes of machine code.
Would you say that the complexity of the trained chess-playing agent is 100 bytes or 50,000 bytes or 1 billion bytes?
I guess you’re going to say 50,000, because you’re imagining a Turing machine that spends a year doing the self-play to calculate the billion-parameter ConvNet and then immediately the same Turning machine starts running that ConvNet it just calculated. From the perspective of Kolmogorov complexity, it doesn’t matter that it spends a year calculating the ConvNet, as long as it does so eventually.
By the same token, you can always turn a search-y agent into an equivalent discriminitive-y agent, given infinite processing time and storage, by training the latter on a googol queries of the former. If you’re thinking about Kolmogorov complexity, then you don’t care about a googol queries, as long as it works eventually.
Therefore, my first comment is not really relevant to what you’re thinking about. Sorry. I was not thinking about algorithms-that-write-arbitrary-code-and-then-immediately-run-it, I was thinking about the complexity of the algorithms that are actually in operation as the agent acts in the world.
If my hand touches fire and thus immediately moves backwards by reflex, would this be an example of a discriminative policy, because an input signal directly causes an action without being processed in the brain?
Yes. But the lack-of-processing-in-the-brain is not the important part. A typical ConvNet image classifier does involve many steps of processing, but is still discriminative, not search-y, because it does not work by trying out different generative models and picks the one that best explains the data. You can build a search-y image classifier that does exactly that, but most people these days don’t.
I’m not sure I understand the search vs discriminative distinction. If my hand touches fire and thus immediately moves backwards by reflex, would this be an example of a discriminative policy, because an input signal directly causes an action without being processed in the brain?
About the goal of winning at chess: in the case of minimax search, p generates the complete tree of the game using D and then selects the winning policy; as you said, this is probably the simplest agent (in terms of Kolmogorov complexity, given D) that wins at chess—and actually wins at any game that can be solved using minimax/backward induction. In the case of RL(p), p reads the environmental data D about chess to assign reward 1 to winning states and 0 elsewhere, and RL represents an ideal RL procedure that exploits interaction with the environment to generate the optimal policy π that maximises the reward function created by p. The main feature is that in both cases, when the environment gets bigger and D grows, the description length of the two algorithms given D doesn’t change: you could use minimax or the ideal RL procedure to generate a winning policy even for chess on a larger board, for example. If instead you wanted to use a giant lookup table, you would have to extend your algorithm each time a new state gets added to the environment.
I guess the confusion may come from the fact that D is underspecified. I tried to formalise it more precisely by using logic, but there were some problems and it’s still work in progress.
By the way, thanks for the links! I hope I’ll learn something new about how the brain works, I’m definitely not an expert on cognitive science :)
Hmm, maybe we’re talking past each other. Let’s say I have something like AlphaZero, where 50,000 bytes of machine code trains an AlphaZero-type chess-playing agent, whose core is a 1-billion-parameter ConvNet. The ConvNet takes 1 billion bytes to specify. Meanwhile, the reward-calculator p, which calculates whether checkmate has occurred, is 100 bytes of machine code.
Would you say that the complexity of the trained chess-playing agent is 100 bytes or 50,000 bytes or 1 billion bytes?
I guess you’re going to say 50,000, because you’re imagining a Turing machine that spends a year doing the self-play to calculate the billion-parameter ConvNet and then immediately the same Turning machine starts running that ConvNet it just calculated. From the perspective of Kolmogorov complexity, it doesn’t matter that it spends a year calculating the ConvNet, as long as it does so eventually.
By the same token, you can always turn a search-y agent into an equivalent discriminitive-y agent, given infinite processing time and storage, by training the latter on a googol queries of the former. If you’re thinking about Kolmogorov complexity, then you don’t care about a googol queries, as long as it works eventually.
Therefore, my first comment is not really relevant to what you’re thinking about. Sorry. I was not thinking about algorithms-that-write-arbitrary-code-and-then-immediately-run-it, I was thinking about the complexity of the algorithms that are actually in operation as the agent acts in the world.
Yes. But the lack-of-processing-in-the-brain is not the important part. A typical ConvNet image classifier does involve many steps of processing, but is still discriminative, not search-y, because it does not work by trying out different generative models and picks the one that best explains the data. You can build a search-y image classifier that does exactly that, but most people these days don’t.
No worries, I think your comment still provides good food for thought!