So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
In that context, I would view the unguided interpolation artwork as ‘original’ and ‘interesting’, even though the majority of it would never be used.
Unlike the time spent by animators interpolating, running trained gpt-2 is essentially free. So, in absolute terms, this approach, even if it produces garbage the overwhelming majority of the time, which it will, is moderate to very likely to find interesting approaches with a low, but reasonable for human reviewers, probability (meaning, the human must review dozens of worthless outputs, not hundreds of millions like the monkeys on typewriters).
I suspect that a mathematician with the tool I proposed could type in a thesis, see what emerges, and have a moderate to high probability of eventually encountering some text that inspires something like the following thought: ‘well, this is clearly wrong, but I would not have thought to associate this thesis with that particular technique, let me do some work of my own and see if there is anything to this’.
I view the output in that particular example to be ‘encountering something interesting’, and the probability if it occurring at least once if my proposed tool were to be developed to be moderate to high, and that the cost in terms of time spent reviewing outputs would not be high enough to make the approach have negative value to the proposed user community.
I price the value of bringing this tool into existence in terms of the resources available to me personally as ‘worth a bit less than $1000 usd’.
So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
Precisely. Also, are you familiar with Google’s DeepDream?
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
And while DeepDream creates incredibly surreal visual imagery, it is simply not capable of deep insights or originality. It really just morphs an image into a reflection of its training data, but in a fractal complex way that has the surface appearance of meaning. So too does GPT-2 extend a prompt as a reflection of its training data, but without any deep insight or understanding.
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
Your problem (excuse me) is that you keep imagining these scenarios with a human actor as the inbetweener or the writer. Yes, if even an artist with basic skills is given the opportunity to ‘in-between’ without limits and invent their own animation, some of them are bound to be good. But if you hand the same keyframe to DeepDream and expect it to ‘interpolate’ the next fame, and then the frame after, etc., you’d be crazy to expect anything other than fractal replay of its training data. That’s all it can do.
All GPT-2 can do is replay remixed versions of its own training data based on the prompts. Originality is not in the architecture.
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
If by ‘fundamentally the same’ you mean ‘actually they’re completely different and optimize completely different things and give completely different results on completely different modalities’, then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.
Gwern, I respect you but sometimes you miss the mark. I was describing a particular application of deep dream in which the output is fed in as input, which doesn’t strike me as any different from your own description of GPT-2.
A little less hostility in your comment and it would be received better.
Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are ‘fundamentally the same’, not even close.
And let’s consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
There was no snide there. I honestly think he’ll learn something of value. I don’t think he’ll get the result he wanted, but he will learn something in the process.
So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
In that context, I would view the unguided interpolation artwork as ‘original’ and ‘interesting’, even though the majority of it would never be used.
Unlike the time spent by animators interpolating, running trained gpt-2 is essentially free. So, in absolute terms, this approach, even if it produces garbage the overwhelming majority of the time, which it will, is moderate to very likely to find interesting approaches with a low, but reasonable for human reviewers, probability (meaning, the human must review dozens of worthless outputs, not hundreds of millions like the monkeys on typewriters).
I suspect that a mathematician with the tool I proposed could type in a thesis, see what emerges, and have a moderate to high probability of eventually encountering some text that inspires something like the following thought: ‘well, this is clearly wrong, but I would not have thought to associate this thesis with that particular technique, let me do some work of my own and see if there is anything to this’.
I view the output in that particular example to be ‘encountering something interesting’, and the probability if it occurring at least once if my proposed tool were to be developed to be moderate to high, and that the cost in terms of time spent reviewing outputs would not be high enough to make the approach have negative value to the proposed user community.
I price the value of bringing this tool into existence in terms of the resources available to me personally as ‘worth a bit less than $1000 usd’.
Precisely. Also, are you familiar with Google’s DeepDream?
https://en.wikipedia.org/wiki/DeepDream
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
And while DeepDream creates incredibly surreal visual imagery, it is simply not capable of deep insights or originality. It really just morphs an image into a reflection of its training data, but in a fractal complex way that has the surface appearance of meaning. So too does GPT-2 extend a prompt as a reflection of its training data, but without any deep insight or understanding.
Your problem (excuse me) is that you keep imagining these scenarios with a human actor as the inbetweener or the writer. Yes, if even an artist with basic skills is given the opportunity to ‘in-between’ without limits and invent their own animation, some of them are bound to be good. But if you hand the same keyframe to DeepDream and expect it to ‘interpolate’ the next fame, and then the frame after, etc., you’d be crazy to expect anything other than fractal replay of its training data. That’s all it can do.
All GPT-2 can do is replay remixed versions of its own training data based on the prompts. Originality is not in the architecture.
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
If by ‘fundamentally the same’ you mean ‘actually they’re completely different and optimize completely different things and give completely different results on completely different modalities’, then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.
Gwern, I respect you but sometimes you miss the mark. I was describing a particular application of deep dream in which the output is fed in as input, which doesn’t strike me as any different from your own description of GPT-2.
A little less hostility in your comment and it would be received better.
Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are ‘fundamentally the same’, not even close.
And let’s consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide
There was no snide there. I honestly think he’ll learn something of value. I don’t think he’ll get the result he wanted, but he will learn something in the process.
https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
I’m gonna say I won this one.