A self-model might help, but it might not. It depends on the details of how it plans and how time discounting and uncertainty get factored in.
That comes at the stage before the agent inserts a jump-to-register or modifies its defaults or whatever it ends up doing, though. Once it does that, it can’t plan no matter how good of a self-model it had before. The reward function isn’t a component of the planning system in a reinforcement learner; it is the planning system. No reward gradient, no planning.
(Early versions of EURISKO allegedly ran into this problem. The maintainer eventually ended up walling off the reward function from self-modification—a measure that a sufficiently smart AI would presumably be able to work around.)
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.
A self-model might help, but it might not. It depends on the details of how it plans and how time discounting and uncertainty get factored in.
That comes at the stage before the agent inserts a jump-to-register or modifies its defaults or whatever it ends up doing, though. Once it does that, it can’t plan no matter how good of a self-model it had before. The reward function isn’t a component of the planning system in a reinforcement learner; it is the planning system. No reward gradient, no planning.
(Early versions of EURISKO allegedly ran into this problem. The maintainer eventually ended up walling off the reward function from self-modification—a measure that a sufficiently smart AI would presumably be able to work around.)
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I’d been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.)
For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it’s computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly.
Neat!
Well, this is limited by the agent’s ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to “why aren’t we all pure reinforcement learners?” is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards.
Even so, it’s not perfect. Heroin addicts do exist.
True true.
However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.