I thoroughly enjoyed this paper, and would very much like to see the same experiment performed on language models in the billion-parameter range. Would you expect the results to change, and how?
Good question! We’re not sure. The fact that PHF scales well with dataset size might provide weak evidence that it would scale well with model size too.
I thoroughly enjoyed this paper, and would very much like to see the same experiment performed on language models in the billion-parameter range. Would you expect the results to change, and how?
Good question! We’re not sure. The fact that PHF scales well with dataset size might provide weak evidence that it would scale well with model size too.