By “on reflection” I mean reflection by simulacra that are already AGIs (but don’t necessarily yet have any reliable professional skills), them generating datasets for retraining of their models into gaining more skills or into not getting confused on prompts that are too far out-of-distribution with respect to the data they did have originally in the datasets. To the extent their original models behave in a human-like way, reflection should tend to preserve that, as part of its intended purpose.
Applying optimization power in other ways is the different worry, for which the proxy in my comment was fine-tuning into utter alienness. I consider this failure mode distinct from surprising outcomes of reflection.
By “on reflection” I mean reflection by simulacra that are already AGIs (but don’t necessarily yet have any reliable professional skills), them generating datasets for retraining of their models into gaining more skills or into not getting confused on prompts that are too far out-of-distribution with respect to the data they did have originally in the datasets. To the extent their original models behave in a human-like way, reflection should tend to preserve that, as part of its intended purpose.
Applying optimization power in other ways is the different worry, for which the proxy in my comment was fine-tuning into utter alienness. I consider this failure mode distinct from surprising outcomes of reflection.