These problems are partly related to poor planning, but they are clearly also related to language models, which are primarily restricted to operate on text. Actual AGI will likely have to work more like an animal or human brain, which is predicting sensory data (or rather: latent representations of sensory data, JEPA) instead of text tokens. An LLM with good planning may be able to finally beat Pokémon, but it will almost certainly not be able to do robotics or driving or anything with complex or real-time visual data.
These problems are partly related to poor planning, but they are clearly also related to language models, which are primarily restricted to operate on text. Actual AGI will likely have to work more like an animal or human brain, which is predicting sensory data (or rather: latent representations of sensory data, JEPA) instead of text tokens. An LLM with good planning may be able to finally beat Pokémon, but it will almost certainly not be able to do robotics or driving or anything with complex or real-time visual data.