Just to be clear, I also think that your grokking work increases alignment much more than capabilities on balance.
I think the way in which it increases capabilities would roughly look like this: “your insight on grokking is a key to understanding fast generalization better; other people build on this insight and then modify training; this improves the speed of learning and thus capabilities”.
I think your work is clearly net positive, I just wanted to use a concrete example in the post to show that there are trade-offs worth taking.
Just to be clear, I also think that your grokking work increases alignment much more than capabilities on balance.
I think the way in which it increases capabilities would roughly look like this: “your insight on grokking is a key to understanding fast generalization better; other people build on this insight and then modify training; this improves the speed of learning and thus capabilities”.
I think your work is clearly net positive, I just wanted to use a concrete example in the post to show that there are trade-offs worth taking.