simeon_c comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

simeon_c 13 Jul 2022 23:11 UTC
1 point
0
Do you think we could use grokking/current existing generalization phenomena (e.g induction heads) to test your theory? Or do you expect the generalizations that would lead to the sharp left turn to be greater/more significant than those that occurred earlier in the training?