gwern comments on Trading off compute in training and inference (Overview)