Hey, I wanted to clarify my thoughts on the concrete AI problem that is being solved here. No comment on the fantastic grant making/give-away scheme.
I don’t have much expertise on the mechanisms of the GPT-3 systems, but I wonder if there is a more efficient way in providing human comprehendible intermediaries that expose the workings of the algorithm.
My worry is that many of the annotated thoughts imputed by authors are irrelevant to the actual process of design the AI goes through to create it’s output. Asking the machine to produce a line of ‘thoughts’ alongside it’s final statement is fair-play, although this doesn’t seem to solve the problem of creating human comprehendible intermediaries, but instead gives the AI a pattern-matching/prediction task similar to what it goes through to create the original output. Wouldn’t it be the case that the ‘thoughts’ the machine creates serve no more effect on the process of calculation than the original output (prompt)?
This process still seems to be serve a rudimentary function of indirectly shedding more light on the processes of calculation, much the same as how a bigger prompt would. Yet puzzlingly, we in fact want to “get different sensible outputs by intervening on the thoughts”, which this indicates we expect thoughts to have a effect on the calculation of the final prompt. I suppose we could feed through the output for thoughts into the creation of the prompt, but my intuition suggests this would limit the complexity of the prompt by shackling it’s creation to an unnecessary component, the thought.
I say intuition because, again, I have little knowledge of this operation of this algorithm. Most of my musing here are just guesses!
That being said, it seems to me that another way of tackling this critical problem is by identifying the processes that the algorithm DOES use to create the output already, and then finding data that expresses those processes with human-compatible annotations. Instead of imposing another method of calculation in the form of Thoughts, maybe just make the existing method more comprehendible?
If I’m missing something frightfully obvious here, or just barking up the wrong tree please let me know where I’m going wrong!
I think you’re essentially correct—but if I understand you, what you’re suggesting is similar to Chris Olah et al’s Circuits work (mentioned above in the paragraph starting “This sort of interpretability is distinct...”). If you have a viable approach aiming at that kind of transparency, many people will be eager to provide whatever resources are necessary. This is being proposed as something different, and almost certainly easier.
One specific thought:
but my intuition suggests this would limit the complexity of the prompt by shackling it’s creation to an unnecessary component, the thought
To the extent that this is correct, it’s more of a feature than a bug. You’d want the thoughts to narrow the probability distribution over outputs. However, I don’t think it’s quite right: the output can still have just as much complexity; the thoughts only serve to focus that complexity.
E.g. consider [This will be a realist novel about 15th century France] vs [This will be a surrealist space opera]. An output corresponding to either can be similarly complex.
I don’t have much direct experience with transformers (I was part of some research with BERT once where we found it was really hard to use without adding hard-coded rules on top, but I have no experience with the modern GPT stuff). However, what you are saying makes a lot of sense to me based on my experience with CNNs and the attempts I’ve seen to explain/justify CNN behaviour with side channels (for instance this medical image classification system that also generates text as a side output).
Hey, I wanted to clarify my thoughts on the concrete AI problem that is being solved here. No comment on the fantastic grant making/give-away scheme.
I don’t have much expertise on the mechanisms of the GPT-3 systems, but I wonder if there is a more efficient way in providing human comprehendible intermediaries that expose the workings of the algorithm.
My worry is that many of the annotated thoughts imputed by authors are irrelevant to the actual process of design the AI goes through to create it’s output. Asking the machine to produce a line of ‘thoughts’ alongside it’s final statement is fair-play, although this doesn’t seem to solve the problem of creating human comprehendible intermediaries, but instead gives the AI a pattern-matching/prediction task similar to what it goes through to create the original output. Wouldn’t it be the case that the ‘thoughts’ the machine creates serve no more effect on the process of calculation than the original output (prompt)?
This process still seems to be serve a rudimentary function of indirectly shedding more light on the processes of calculation, much the same as how a bigger prompt would. Yet puzzlingly, we in fact want to “get different sensible outputs by intervening on the thoughts”, which this indicates we expect thoughts to have a effect on the calculation of the final prompt. I suppose we could feed through the output for thoughts into the creation of the prompt, but my intuition suggests this would limit the complexity of the prompt by shackling it’s creation to an unnecessary component, the thought.
I say intuition because, again, I have little knowledge of this operation of this algorithm. Most of my musing here are just guesses!
That being said, it seems to me that another way of tackling this critical problem is by identifying the processes that the algorithm DOES use to create the output already, and then finding data that expresses those processes with human-compatible annotations. Instead of imposing another method of calculation in the form of Thoughts, maybe just make the existing method more comprehendible?
If I’m missing something frightfully obvious here, or just barking up the wrong tree please let me know where I’m going wrong!
I think you’re essentially correct—but if I understand you, what you’re suggesting is similar to Chris Olah et al’s Circuits work (mentioned above in the paragraph starting “This sort of interpretability is distinct...”). If you have a viable approach aiming at that kind of transparency, many people will be eager to provide whatever resources are necessary.
This is being proposed as something different, and almost certainly easier.
One specific thought:
To the extent that this is correct, it’s more of a feature than a bug. You’d want the thoughts to narrow the probability distribution over outputs. However, I don’t think it’s quite right: the output can still have just as much complexity; the thoughts only serve to focus that complexity.
E.g. consider [This will be a realist novel about 15th century France] vs [This will be a surrealist space opera]. An output corresponding to either can be similarly complex.
I don’t have much direct experience with transformers (I was part of some research with BERT once where we found it was really hard to use without adding hard-coded rules on top, but I have no experience with the modern GPT stuff). However, what you are saying makes a lot of sense to me based on my experience with CNNs and the attempts I’ve seen to explain/justify CNN behaviour with side channels (for instance this medical image classification system that also generates text as a side output).
See also my comment on Facebook.