Well, you are clearly an expert here. And indeed bridging from the neurons to algorithms has been an open problem since forever. What I meant is, assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code. Plus the underlying language, plus the compiler, plus the OS, plus the hardware, which is basically gates upon gates upon gates, all alike. There is no reinforcement learning there at all, and yet rows of nearly identical gates become an algorithm. Maybe some parts of the brain work like that, as well?
LOL, “fake it til you make it” ftw! I disagree, but that’s very kind of you to say. :-)
assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code.
Hmm, if the NPC didn’t need to be very good, I would use bacteria-level logic like “if you’re getting attacked, move away from the last hit, if you can attack, attack, otherwise move towards the player” or whatever. Then an “urge to be more aggressive” would be, like, move towards the player more quickly, or changing some thresholds. But there’s no foresight / planning here. So that’s not exactly relevant to this post. The rats are making a plan to get saltwater.
So, if I were going to make the NPC better, maybe I would next incorporate planning. Like: Define an “expected reward” function (related to position, health, etc.), consider possible things to do next, and then pick the thing with the highest expected reward at the end. That might be more than a few dozen lines of code … I guess it depends on the library functions available :-P Then you could have an “urge” be expressed through tweaking the parameters of the expected-reward calculation. And now the NPC would then be able to make plans to satisfy the urge, as opposed to just acting based on what’s right in front of it, or whatever.
This thing where it considering multiple possible courses of action—that’s not yet model-based reinforcement learning. There’s no learning! But I would say it’s a first step. There is indeed something like that as an ingredient in AlphaZero, for example.
But that’s actually all that matters for this post anyway, I think. The real RL part−where you learn things—didn’t come up. Maybe I shouldn’t have brought up RL at all, now that I think about it :-P
Well, you are clearly an expert here. And indeed bridging from the neurons to algorithms has been an open problem since forever. What I meant is, assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code. Plus the underlying language, plus the compiler, plus the OS, plus the hardware, which is basically gates upon gates upon gates, all alike. There is no reinforcement learning there at all, and yet rows of nearly identical gates become an algorithm. Maybe some parts of the brain work like that, as well?
LOL, “fake it til you make it” ftw! I disagree, but that’s very kind of you to say. :-)
Hmm, if the NPC didn’t need to be very good, I would use bacteria-level logic like “if you’re getting attacked, move away from the last hit, if you can attack, attack, otherwise move towards the player” or whatever. Then an “urge to be more aggressive” would be, like, move towards the player more quickly, or changing some thresholds. But there’s no foresight / planning here. So that’s not exactly relevant to this post. The rats are making a plan to get saltwater.
So, if I were going to make the NPC better, maybe I would next incorporate planning. Like: Define an “expected reward” function (related to position, health, etc.), consider possible things to do next, and then pick the thing with the highest expected reward at the end. That might be more than a few dozen lines of code … I guess it depends on the library functions available :-P Then you could have an “urge” be expressed through tweaking the parameters of the expected-reward calculation. And now the NPC would then be able to make plans to satisfy the urge, as opposed to just acting based on what’s right in front of it, or whatever.
This thing where it considering multiple possible courses of action—that’s not yet model-based reinforcement learning. There’s no learning! But I would say it’s a first step. There is indeed something like that as an ingredient in AlphaZero, for example.
But that’s actually all that matters for this post anyway, I think. The real RL part−where you learn things—didn’t come up. Maybe I shouldn’t have brought up RL at all, now that I think about it :-P