In order to predict the inner workings of a language model well enough to understand the outputs, you not only need to know the structure of the model, but also the weights and how they interact. It is very hard to do that without a deep understanding of the training data, and so effectively predicting what the model will do requires understanding both the model and the world the model was trained on.
Here is a concrete example:
Let’s say I have two functions, defined as follows:
import random
words = []
def do_training(n):
for i in range(n):
word = input('Please enter a word: ')
words.append(word)
def do_inference(n):
output = []
for i in range(n):
word = random.choice(words)
output.append(word)
return output
If I call do_training(100) and then hand the computer to you for you to put 100 words into, and you then handed the computer back to me (and cleared the screen), I would be able to tell you that do_inference(100) would spit out 100 words pulled from some distribution, but I wouldn’t be able to tell you what distribution that is without seeing the training data.
See this post for a more in-depth exploration of this idea.
Sounds like you haven’t done much programming. It’s hard enough to understand the code one wrote oneself six months ago. (Or indeed, why the thing I wrote five minutes ago isn’t behaving as expected.) Just because I wrote it, doesn’t mean I memorized it. Understanding what someone else wrote is usually much harder, especially if they wrote it poorly, or in an unfamiliar language.
A machine learning system is even harder to understand than that. I’m sure there are some who understand in great detail what the human-written parts of the algorithm do. But to get anything useful out of a machine learning system, it needs to learn. You apply it to an enormous amount of data, and in the end, what it’s learned amounts to possibly gigabytes of inscrutable matrices of floating-point numbers. On paper, a gigabyte is about 4 million pages of text. That is far larger than the human-written source code that generated it, which could typically fit in a small book. How that works is anyone’s guess.
Reading this would be like trying to read someone’s mind by examining their brain under a microscope. Maybe it’s possible in principle, but don’t expect a human to be able to do it. We’d need better tools. That’s “interpretability research”.
There are approaches to machine learning that are indeed closer to cross breeding than designing cars (genetic algorithms), but the current paradigm in vogue is based on neural networks, kind of an artificial brain made of virtual neurons.
It is true that programmers sometimes build things ignoring the underlying wiring of the systems they are using. But programmers in general create things relying on tools that were thouroughly tested. Besides that, they are builders, doers, not academics. Think of really good guitar players: they probably don’t understand how sounds propagate through matter, but they can play their instrument beautifully.
Anonymous #5 asks:
In order to predict the inner workings of a language model well enough to understand the outputs, you not only need to know the structure of the model, but also the weights and how they interact. It is very hard to do that without a deep understanding of the training data, and so effectively predicting what the model will do requires understanding both the model and the world the model was trained on.
Here is a concrete example:
Let’s say I have two functions, defined as follows:
If I call
do_training(100)
and then hand the computer to you for you to put 100 words into, and you then handed the computer back to me (and cleared the screen), I would be able to tell you thatdo_inference(100)
would spit out 100 words pulled from some distribution, but I wouldn’t be able to tell you what distribution that is without seeing the training data.See this post for a more in-depth exploration of this idea.
Sounds like you haven’t done much programming. It’s hard enough to understand the code one wrote oneself six months ago. (Or indeed, why the thing I wrote five minutes ago isn’t behaving as expected.) Just because I wrote it, doesn’t mean I memorized it. Understanding what someone else wrote is usually much harder, especially if they wrote it poorly, or in an unfamiliar language.
A machine learning system is even harder to understand than that. I’m sure there are some who understand in great detail what the human-written parts of the algorithm do. But to get anything useful out of a machine learning system, it needs to learn. You apply it to an enormous amount of data, and in the end, what it’s learned amounts to possibly gigabytes of inscrutable matrices of floating-point numbers. On paper, a gigabyte is about 4 million pages of text. That is far larger than the human-written source code that generated it, which could typically fit in a small book. How that works is anyone’s guess.
Reading this would be like trying to read someone’s mind by examining their brain under a microscope. Maybe it’s possible in principle, but don’t expect a human to be able to do it. We’d need better tools. That’s “interpretability research”.
There are approaches to machine learning that are indeed closer to cross breeding than designing cars (genetic algorithms), but the current paradigm in vogue is based on neural networks, kind of an artificial brain made of virtual neurons.
I personally think the cross-breeder analogy is pretty reasonable for modern ML systems.
It is true that programmers sometimes build things ignoring the underlying wiring of the systems they are using. But programmers in general create things relying on tools that were thouroughly tested. Besides that, they are builders, doers, not academics. Think of really good guitar players: they probably don’t understand how sounds propagate through matter, but they can play their instrument beautifully.