I suspect that some knowledge transfers. For example, I suspect that increasingly large LMs learn features of language roughly in order of their importance for predicting English, and so I’d expect that LMs that get similar language modeling losses usually know roughly the same features of English. (You could just run two LMs on the same text and see their logprobs on the correct next token for every token, and then make a scatter plot; presumably there will be a bunch of correlation, but you might notice patterns in the things that one LM did much better than the other.)
And the methodology for playing with LMs probably transfers.
But I generally have no idea here, and it seems really useful to know more about this.
Do you suspect that black-box knowledge will be transferable between different models, or that the findings will be idiosyncratic to each system?
I suspect that some knowledge transfers. For example, I suspect that increasingly large LMs learn features of language roughly in order of their importance for predicting English, and so I’d expect that LMs that get similar language modeling losses usually know roughly the same features of English. (You could just run two LMs on the same text and see their logprobs on the correct next token for every token, and then make a scatter plot; presumably there will be a bunch of correlation, but you might notice patterns in the things that one LM did much better than the other.)
And the methodology for playing with LMs probably transfers.
But I generally have no idea here, and it seems really useful to know more about this.