Very nice post. It is certainly useful to do this exercise of manually encoding language rules into the weights of a transformer in order to better understand the machinery involved.
“The ultimate ambition of this work would be to go toe-to-toe with a comparably-sized Transformer model trained in the traditional way on a modern-sized data set. This might require several people-years of focused effort though.”
There is a long history of attempting to parse natural language with hand design rules and heuristics. The general consensus now is that hand engineering is insufficient, and some learning from data is necessary. To me it seems that this direction inherits the problems of these old fashioned language systems since you are codifying your own hand designed heuristics and rules into the network weights.
Do you see a way to introduce learning from data without sacrificing the interpretability that your approach provides?
There are a number of ways to combine this approach with learning, but I haven’t had time to try any of them yet. Some ideas I have thought of:
Use hard-coded weights, plus some random noise, to initialize the weights of a transformer that you then train in the traditional fashion
Doesn’t really help with interpretability or alignment, but might(???) help with performance
Write out all the weight and bias parameters as combinations of semes and outer products of semes, then learn seme embeddings by gradient descent
Semantic seme embeddings could be initialized from something like WordNet relationships, or learned with word2vec, to automate those guys
You could do smallish amounts of gradient descent to suggest new rules to add, but then add them by hand
Still would be very slow
Perhaps it is possible to start with a strong learned transformer and gradually identify human-legible rules that it is using, and replacing those specific parts with hard-coding
Could prove very difficult!!!
It seems almost certain to me that hard-coding weights would at least help us build the muscles needed to recognize what is going on, to the extent that we are able to
Very nice post. It is certainly useful to do this exercise of manually encoding language rules into the weights of a transformer in order to better understand the machinery involved.
There is a long history of attempting to parse natural language with hand design rules and heuristics. The general consensus now is that hand engineering is insufficient, and some learning from data is necessary. To me it seems that this direction inherits the problems of these old fashioned language systems since you are codifying your own hand designed heuristics and rules into the network weights.
Do you see a way to introduce learning from data without sacrificing the interpretability that your approach provides?
There are a number of ways to combine this approach with learning, but I haven’t had time to try any of them yet. Some ideas I have thought of:
Use hard-coded weights, plus some random noise, to initialize the weights of a transformer that you then train in the traditional fashion
Doesn’t really help with interpretability or alignment, but might(???) help with performance
Write out all the weight and bias parameters as combinations of semes and outer products of semes, then learn seme embeddings by gradient descent
Semantic seme embeddings could be initialized from something like WordNet relationships, or learned with word2vec, to automate those guys
You could do smallish amounts of gradient descent to suggest new rules to add, but then add them by hand
Still would be very slow
Perhaps it is possible to start with a strong learned transformer and gradually identify human-legible rules that it is using, and replacing those specific parts with hard-coding
Could prove very difficult!!!
It seems almost certain to me that hard-coding weights would at least help us build the muscles needed to recognize what is going on, to the extent that we are able to