RL creates agents, and RL seemed to be the way to AGI. In the 2010s, reinforcement learning was the dominant paradigm for those interested in AGI (e.g. OpenAI). RL lends naturally to creating agents that pursue rewards/​utility/​objectives. So there was reason to expect that agentic AI would be the first (and by the theoretical arguments, last) form that superintelligence would take.
Why are you confident that RL creates agents? Is it the non-stochasticity of optimal policies for almost all reward functions? The on-policy data collection of PPO? I think there are a few valid reasons to suspect that, but this excerpt seems surprisingly confident.
Why are you confident that RL creates agents? Is it the non-stochasticity of optimal policies for almost all reward functions? The on-policy data collection of PPO? I think there are a few valid reasons to suspect that, but this excerpt seems surprisingly confident.