if we can really just think about the feed-forward layers as encoding simple key-value knowledge pairs
Once upon a time, people thought that you could make AI simply by putting a sufficiently large body of facts into a database for the system to reason over. Later we realised that of course that was silly and would never work.
But apparently they were right all along, and training a neural network is just an efficient way of persuading a computer to do the data entry for you?
On one hand, maybe? Maybe training using a differential representation and SGD was the only missing ingredient.
But I think I’ll believe it when I see large neural models distilled into relatively tiny symbolic models with no big loss of function. If that’s hard, it means that partial activations and small differences in coefficients are doing important work.
Once upon a time, people thought that you could make AI simply by putting a sufficiently large body of facts into a database for the system to reason over. Later we realised that of course that was silly and would never work.
But apparently they were right all along, and training a neural network is just an efficient way of persuading a computer to do the data entry for you?
On one hand, maybe? Maybe training using a differential representation and SGD was the only missing ingredient.
But I think I’ll believe it when I see large neural models distilled into relatively tiny symbolic models with no big loss of function. If that’s hard, it means that partial activations and small differences in coefficients are doing important work.