long term planning is hard but maybe not super hard
if your training uses short term feedback—which typically is all anyone who isn’t an evolution has time for, and even evolved systems need to use a lot of—then there’s usually some simpler solution than long term planning to satisfy that short term feedback, which means the system doesn’t typically reach the long term planning solution using gradient descent based on the short term feedback
under recursive self-improvement, a long-term-planner will tend to preserve its long-term-planning nature, while a non-long-term-planner will not care, making long-term planning an attractor state under recursive self-improvement
sufficiently advanced metacognition might be equivalent to recursive self-improvement
I think:
long term planning is hard but maybe not super hard
if your training uses short term feedback—which typically is all anyone who isn’t an evolution has time for, and even evolved systems need to use a lot of—then there’s usually some simpler solution than long term planning to satisfy that short term feedback, which means the system doesn’t typically reach the long term planning solution using gradient descent based on the short term feedback
under recursive self-improvement, a long-term-planner will tend to preserve its long-term-planning nature, while a non-long-term-planner will not care, making long-term planning an attractor state under recursive self-improvement
sufficiently advanced metacognition might be equivalent to recursive self-improvement