Steven Byrnes comments on Inner Alignment in Salt-Starved Rats

Steven Byrnes 20 Nov 2020 18:52 UTC
4 points
Well, you are clearly an expert here.
LOL, “fake it til you make it” ftw! I disagree, but that’s very kind of you to say. :-)
assuming you needed to code, say, an NPC in a game, you would code an “urge” certain way, probably in just a few dozen lines of code.
Hmm, if the NPC didn’t need to be very good, I would use bacteria-level logic like “if you’re getting attacked, move away from the last hit, if you can attack, attack, otherwise move towards the player” or whatever. Then an “urge to be more aggressive” would be, like, move towards the player more quickly, or changing some thresholds. But there’s no foresight / planning here. So that’s not exactly relevant to this post. The rats are making a plan to get saltwater.
So, if I were going to make the NPC better, maybe I would next incorporate planning. Like: Define an “expected reward” function (related to position, health, etc.), consider possible things to do next, and then pick the thing with the highest expected reward at the end. That might be more than a few dozen lines of code … I guess it depends on the library functions available :-P Then you could have an “urge” be expressed through tweaking the parameters of the expected-reward calculation. And now the NPC would then be able to make plans to satisfy the urge, as opposed to just acting based on what’s right in front of it, or whatever.
This thing where it considering multiple possible courses of action—that’s not yet model-based reinforcement learning. There’s no learning! But I would say it’s a first step. There is indeed something like that as an ingredient in AlphaZero, for example.
But that’s actually all that matters for this post anyway, I think. The real RL part−where you learn things—didn’t come up. Maybe I shouldn’t have brought up RL at all, now that I think about it :-P