Although on the other hand, decade+ old arguments about the instrumental utility of good behavior while dependent on humans have more or less the same format. Seeing good behavior is better evidence of intelligence (capabilities generalizing) than it is of benevolence (goals ‘generalizing’).
The big difference is that the olde-style argument would be about actual agents being evaluated by humans, while the mesa-optimizers argument is about potential configurations of a reinforcement learner being evaluated by a reward function.
Although on the other hand, decade+ old arguments about the instrumental utility of good behavior while dependent on humans have more or less the same format. Seeing good behavior is better evidence of intelligence (capabilities generalizing) than it is of benevolence (goals ‘generalizing’).
The big difference is that the olde-style argument would be about actual agents being evaluated by humans, while the mesa-optimizers argument is about potential configurations of a reinforcement learner being evaluated by a reward function.