I don’t think “Adaptations Executors VS Fitness Maximizers” is a good way of thinking about this. All of the behaviors described in the post can be understood as a consequence of next-word prediction, it’s just that what performing extremely well at next-word prediction looks like is counterintuitive. There’s no need to posit a difference in inner/outer objective.
An exact match? No. But the observations in this post don’t point towards any particular mismatch, because the behaviors described would be seen even if the inner objective was perfectly aligned with the outer.
I don’t think “Adaptations Executors VS Fitness Maximizers” is a good way of thinking about this. All of the behaviors described in the post can be understood as a consequence of next-word prediction, it’s just that what performing extremely well at next-word prediction looks like is counterintuitive. There’s no need to posit a difference in inner/outer objective.
Is there a reason to suspect an exact match between inner and outer objective?
An exact match? No. But the observations in this post don’t point towards any particular mismatch, because the behaviors described would be seen even if the inner objective was perfectly aligned with the outer.