Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Neat!
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
True true.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.