I was going to ask for interesting examples. But perhaps we can do even better and choose examples with the highest value of… uhm… something.
I am just wildly guessing here, but it seems to me that if these features are somehow implied by the human text, the ones that are “implied most strongly” could be the most interesting ones. Unless they are just random artifacts of the process of learning.
If we trained the LLM using the same text database, but randomly arranged the sources, or otherwise introduced some noise, would the same concepts appear?
I was going to ask for interesting examples. But perhaps we can do even better and choose examples with the highest value of… uhm… something.
I am just wildly guessing here, but it seems to me that if these features are somehow implied by the human text, the ones that are “implied most strongly” could be the most interesting ones. Unless they are just random artifacts of the process of learning.
If we trained the LLM using the same text database, but randomly arranged the sources, or otherwise introduced some noise, would the same concepts appear?