I find your argument interesting but I don’t understand how it applies for the GPT-n family.
From my understanding GPT-3 (the only one I really have read about) is merely a probabilistic language construction algorithm. In other words you feed it a sequence of words and it tries to guess the most likely word that follows. It guesses that from all of the texts it previously read during its training. However I might not have correctly understood GPT-n functioning, in that case I’d love to get an an explanation or a link toward one.
On the other hand I find the idea of making an AI introspect very interesting, even if I’m not qualified enough to understand the technical implications of that.
Take into account that the AI that interprets needs not be the same as the network being interpreted.
Why do you think that a mere autocomplete engine could not do interpretability work? It has been demonstrated to write comments for code and code for specs.
I find your argument interesting but I don’t understand how it applies for the GPT-n family.
From my understanding GPT-3 (the only one I really have read about) is merely a probabilistic language construction algorithm. In other words you feed it a sequence of words and it tries to guess the most likely word that follows. It guesses that from all of the texts it previously read during its training. However I might not have correctly understood GPT-n functioning, in that case I’d love to get an an explanation or a link toward one.
On the other hand I find the idea of making an AI introspect very interesting, even if I’m not qualified enough to understand the technical implications of that.
Take into account that the AI that interprets needs not be the same as the network being interpreted.
Why do you think that a mere autocomplete engine could not do interpretability work? It has been demonstrated to write comments for code and code for specs.