It can write pretty looking poetry. It can learn to mimc metric and rhyme, and replay constructs it has seen before, which imitates poetry of a certain taste. It cannot write interesting poetry. That would require conceptual modeling of what it is writing and goal-driven thought processes which it lacks entirely.
It’s important to have a technical understanding of these things so that you can separate potentially good ideas from what would ultimately be a waste of time. GPT-2 cannot formulate original concepts. It’s just architecturally not possible.
My standard for interesting poetry is clearly different from (inferior to?) yours. If I understand you correctly, I predict that you think artwork created with StyleGAN by definition cannot have artistic merit on its own.
So we appear to be at an impasse. I do not see how you can simultaneously dismiss the value of the system for generating things with artistic merit (like poetry, mathematics, or song lyrics), and simultaneously share the anxieties of the developers about its’ apparent effectiveness at generating propaganda.
AI systems have recently surprised people by being unusually good at strange things, I think opimism for a creative profession like pure math is warranted. In short, the potential payoff (contributions to pure math) is massive, the risk is just an amount of money that in this industry is actually fairly small and the egos of people who believe that ‘their’ creative field (math) could not be conquered by ML models that can only do ‘derivative’ things.
I assert that at some point in the next two years, there will exist an AI engine which when given the total body of human work in mathematics and a small prompt (like the one used in gpt-2), is capable of generating mathematical works that humans in the field find interesting to read, provided of course that someone bothers to try.
If the estimated cost for actually training the model I described above, and thus ending this discussion, drops below $1000, and it has not been done, I will simply do it.
Edit: gwern actually did an arxiv paper from gpt-3, it was kind of interesting but probably not all that useful. I do not think I’ll be putting money up on this anytime soon, but do persist in my belief that within the next year, a mathematician would be able to use AI systems to generate mathematical ideas that are interesting to professionals in the field. I feel that someone with more knowledge than me could probably make the argument that the solution to protein folding represents an achievement in this space.
The disagreement is about whether ‘remixing’ can result in ‘originality’.
We are in agreement about the way gpt-2 works, and the types of outputs it produces, just disagreeing about whether they meet our criteria for ‘interesting’ or ‘original’. I believe that our definitions of those two things necessarily include a judgement call about the way we feel about ‘orginality’ and ‘insight’ as a human phenomenon.
Some attempts to explicate this agreement to see if I understand your position:
I argue that this track, which is nothing but a mashup of other music, stands as an interesting creative work in its’ own right. I suspect that you disagree, as it is just ‘remixing’: https://www.youtube.com/watch?v=YFg5q2hSl2E&app=desktop
I would also believe that gpt-2, properly trained on the whole of the Talmud (and nothing else), with the older stuff prioritized, could probably produce interesting commentary, particular if specific outputs are seeded with statements like ‘today this <thing> happened so therefore’.
I think you would ascribe no value to such commentary, due to the source being a robot remixer, rather than a scholar, regardless of the actual words in the actual output text.
If I remember the gpt-2 reddit thread correctly, most comments were trash, some of them made reading the rest of it worthwhile to me.
Don’t be confused about the reuse of the same word: ‘remix’ has a different meaning in applying to the YouTube video that you linked. To create that music video a creative, goal-directed general intelligence (the artist) took some source/training material and assembled an original song by making use of the artist’s generative concept modeling. That capability does not exist in the case of GPT-2, whose ‘remixing’ is more along the lines of replay than reimagine.
To use another art example, traditional animation is performed by skilled artists creating keyframe, and then junior artists doing the ‘in-between’ interpolative drawing. Making the keyframe requires originality. The in-betweening takes a certain amount of skill and there is some artistry involved, but it is fundamentally still just interpolation and there’s a reason the keyframer, not the in-betweener puts their name on the final product.
GPT-2 is doing verbal in-between’ing. You’re asking it to make keyframes. I’m saying it’s not going to work, at least not any better than monkeys mashing keyboards will generate Shakespeare.
So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
In that context, I would view the unguided interpolation artwork as ‘original’ and ‘interesting’, even though the majority of it would never be used.
Unlike the time spent by animators interpolating, running trained gpt-2 is essentially free. So, in absolute terms, this approach, even if it produces garbage the overwhelming majority of the time, which it will, is moderate to very likely to find interesting approaches with a low, but reasonable for human reviewers, probability (meaning, the human must review dozens of worthless outputs, not hundreds of millions like the monkeys on typewriters).
I suspect that a mathematician with the tool I proposed could type in a thesis, see what emerges, and have a moderate to high probability of eventually encountering some text that inspires something like the following thought: ‘well, this is clearly wrong, but I would not have thought to associate this thesis with that particular technique, let me do some work of my own and see if there is anything to this’.
I view the output in that particular example to be ‘encountering something interesting’, and the probability if it occurring at least once if my proposed tool were to be developed to be moderate to high, and that the cost in terms of time spent reviewing outputs would not be high enough to make the approach have negative value to the proposed user community.
I price the value of bringing this tool into existence in terms of the resources available to me personally as ‘worth a bit less than $1000 usd’.
So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
Precisely. Also, are you familiar with Google’s DeepDream?
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
And while DeepDream creates incredibly surreal visual imagery, it is simply not capable of deep insights or originality. It really just morphs an image into a reflection of its training data, but in a fractal complex way that has the surface appearance of meaning. So too does GPT-2 extend a prompt as a reflection of its training data, but without any deep insight or understanding.
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
Your problem (excuse me) is that you keep imagining these scenarios with a human actor as the inbetweener or the writer. Yes, if even an artist with basic skills is given the opportunity to ‘in-between’ without limits and invent their own animation, some of them are bound to be good. But if you hand the same keyframe to DeepDream and expect it to ‘interpolate’ the next fame, and then the frame after, etc., you’d be crazy to expect anything other than fractal replay of its training data. That’s all it can do.
All GPT-2 can do is replay remixed versions of its own training data based on the prompts. Originality is not in the architecture.
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
If by ‘fundamentally the same’ you mean ‘actually they’re completely different and optimize completely different things and give completely different results on completely different modalities’, then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.
Gwern, I respect you but sometimes you miss the mark. I was describing a particular application of deep dream in which the output is fed in as input, which doesn’t strike me as any different from your own description of GPT-2.
A little less hostility in your comment and it would be received better.
Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are ‘fundamentally the same’, not even close.
And let’s consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
There was no snide there. I honestly think he’ll learn something of value. I don’t think he’ll get the result he wanted, but he will learn something in the process.
I predict that you think artwork created with StyleGAN by definition cannot have artistic merit on its own.
Which is amusing because when people look at StyleGAN artwork and they don’t realize it, like my anime faces, they often quite like it. Perhaps they just haven’t seen anime faces drawn by a true Scotsman yet.
When TWDNE went up, I asked ‘how long will I have to read and mash refresh before I see a cute face with a plot I would probably be willing to watch while bored at 2am’ The answer was ‘less than 10minutes’, and this is either commentary on the effectiveness of the tool, or on my (lack of?) taste.
I have a few pieces of artwork I’ve made using StyleGAN that I absolutely love, and absolutely could not have made without the tool.
When I noticed a reply from ‘gwern’, I admit was mildly concerned that there would be a link to a working webpage and a paypal link, I’m pretty enthusiastic about the idea but have not done anything at all to pursue it.
Do you think training a language model, whether it is GPT-2 or a near term successor entirely on math papers could have value?
It seems that my ignorance is on display here, the fact that these papers are new to me shows just how out of touch with the field I am. I am unsurprised that ‘yes it works, mostly, but other approaches are better’ is the answer, and should not be surprised that someone went and did it.
It looks like the successful Facebook AI approach is several steps farther down the road than my proposal, so my offer is unlikely to provide any value outside of the intellectual exercise for me, so I’m probably not actually going to go through with it—by the time the price drops that far, I will want to play with the newer tools.
Waifulabs is adorable and awesome. I’ve mostly been using style transfers on still life photos and paintings, I have human waifu selfie to anime art on my to do list but it has been sitting there for a while.
Are you planning integration with DeepAnime and maybe WaveNet so your perfect waifus can talk? Though you would know if that’s a desirable feature for your userbase better than I would...
On the topic, it looks like someone could, today, convert a selfie of a partner into an anime face, train wavenet on a collection of voicemails, and train a generator using an archive of text message conversations, so that they could have inane conversations with a robot, with an anime face reading the messages to them with believable mouth movements.
I guess the next step after that would be to analyze the text for inferred emotional content (simple approaches with NLP might get really close to the target here, pretty sure they’re already built), and warp the voice/eyes for emotional expression (I think WaveNet can do this for voice, if I remember correctly?
Maybe a deepfake type approach that transforms the anime girls using a palatte of a set of representative emotion faces? I’d be unsurprised if this has already been done, though maybe it’s niche enough that it has not been.
This brings to mind an awful idea: In the future I could potentially make a model of myself and provide it as ‘consolation’ to someone I am breaking up with. Or worse, announce that the model has already been running for two weeks.
I suspect that today older style, still image heavy anime could probably be crafted entirely using generators (limited editing of the writing, no animators or voice actors), is there a large archive of anime scripts somewhere that a generator could train on, or is that data all scattered across privately held archives?
It can write pretty looking poetry. It can learn to mimc metric and rhyme, and replay constructs it has seen before, which imitates poetry of a certain taste. It cannot write interesting poetry. That would require conceptual modeling of what it is writing and goal-driven thought processes which it lacks entirely.
It’s important to have a technical understanding of these things so that you can separate potentially good ideas from what would ultimately be a waste of time. GPT-2 cannot formulate original concepts. It’s just architecturally not possible.
My standard for interesting poetry is clearly different from (inferior to?) yours. If I understand you correctly, I predict that you think artwork created with StyleGAN by definition cannot have artistic merit on its own.
So we appear to be at an impasse. I do not see how you can simultaneously dismiss the value of the system for generating things with artistic merit (like poetry, mathematics, or song lyrics), and simultaneously share the anxieties of the developers about its’ apparent effectiveness at generating propaganda.
AI systems have recently surprised people by being unusually good at strange things, I think opimism for a creative profession like pure math is warranted. In short, the potential payoff (contributions to pure math) is massive, the risk is just an amount of money that in this industry is actually fairly small and the egos of people who believe that ‘their’ creative field (math) could not be conquered by ML models that can only do ‘derivative’ things.
I assert that at some point in the next two years, there will exist an AI engine which when given the total body of human work in mathematics and a small prompt (like the one used in gpt-2), is capable of generating mathematical works that humans in the field find interesting to read, provided of course that someone bothers to try.
If the estimated cost for actually training the model I described above, and thus ending this discussion, drops below $1000, and it has not been done, I will simply do it.
Edit: gwern actually did an arxiv paper from gpt-3, it was kind of interesting but probably not all that useful. I do not think I’ll be putting money up on this anytime soon, but do persist in my belief that within the next year, a mathematician would be able to use AI systems to generate mathematical ideas that are interesting to professionals in the field. I feel that someone with more knowledge than me could probably make the argument that the solution to protein folding represents an achievement in this space.
You are looking for originality. GPT-2 offers remixing. These are not the same thing. I don’t know how to say that more clearly.
The disagreement is about whether ‘remixing’ can result in ‘originality’.
We are in agreement about the way gpt-2 works, and the types of outputs it produces, just disagreeing about whether they meet our criteria for ‘interesting’ or ‘original’. I believe that our definitions of those two things necessarily include a judgement call about the way we feel about ‘orginality’ and ‘insight’ as a human phenomenon.
Some attempts to explicate this agreement to see if I understand your position:
I argue that this track, which is nothing but a mashup of other music, stands as an interesting creative work in its’ own right. I suspect that you disagree, as it is just ‘remixing’: https://www.youtube.com/watch?v=YFg5q2hSl2E&app=desktop
I would also believe that gpt-2, properly trained on the whole of the Talmud (and nothing else), with the older stuff prioritized, could probably produce interesting commentary, particular if specific outputs are seeded with statements like ‘today this <thing> happened so therefore’.
I think you would ascribe no value to such commentary, due to the source being a robot remixer, rather than a scholar, regardless of the actual words in the actual output text.
If I remember the gpt-2 reddit thread correctly, most comments were trash, some of them made reading the rest of it worthwhile to me.
Just like a ‘real’ reddit thread.
Don’t be confused about the reuse of the same word: ‘remix’ has a different meaning in applying to the YouTube video that you linked. To create that music video a creative, goal-directed general intelligence (the artist) took some source/training material and assembled an original song by making use of the artist’s generative concept modeling. That capability does not exist in the case of GPT-2, whose ‘remixing’ is more along the lines of replay than reimagine.
To use another art example, traditional animation is performed by skilled artists creating keyframe, and then junior artists doing the ‘in-between’ interpolative drawing. Making the keyframe requires originality. The in-betweening takes a certain amount of skill and there is some artistry involved, but it is fundamentally still just interpolation and there’s a reason the keyframer, not the in-betweener puts their name on the final product.
GPT-2 is doing verbal in-between’ing. You’re asking it to make keyframes. I’m saying it’s not going to work, at least not any better than monkeys mashing keyboards will generate Shakespeare.
So in your analogy, would the ‘seed text’ provided to gpt-2 be analogous to a single keyframe, provided to an artist, and gpt-2s output be essentially what happens when you provide an interpolator (I know nothing about the craft of animation and am probably using this word wrong) a ‘start’ frame but no ‘finish’ frame?
I would argue that an approach in animation, where a keyframe artist is not sure exactly where to go with a scene, so he draws the keyframe, hands it to interpolative animators with the request to ‘start drawing where you think this is going’, and looks at the results for inspiration for the next ‘keyframe’ will probably result in a lot of wasted effort by the interpolators, and is probably inferior (in terms of cost and time) to plenty of other techniques available to the keyframe artist; but also that it has a moderate to high probability of eventually inspiring something useful if you do it enough times.
In that context, I would view the unguided interpolation artwork as ‘original’ and ‘interesting’, even though the majority of it would never be used.
Unlike the time spent by animators interpolating, running trained gpt-2 is essentially free. So, in absolute terms, this approach, even if it produces garbage the overwhelming majority of the time, which it will, is moderate to very likely to find interesting approaches with a low, but reasonable for human reviewers, probability (meaning, the human must review dozens of worthless outputs, not hundreds of millions like the monkeys on typewriters).
I suspect that a mathematician with the tool I proposed could type in a thesis, see what emerges, and have a moderate to high probability of eventually encountering some text that inspires something like the following thought: ‘well, this is clearly wrong, but I would not have thought to associate this thesis with that particular technique, let me do some work of my own and see if there is anything to this’.
I view the output in that particular example to be ‘encountering something interesting’, and the probability if it occurring at least once if my proposed tool were to be developed to be moderate to high, and that the cost in terms of time spent reviewing outputs would not be high enough to make the approach have negative value to the proposed user community.
I price the value of bringing this tool into existence in terms of the resources available to me personally as ‘worth a bit less than $1000 usd’.
Precisely. Also, are you familiar with Google’s DeepDream?
https://en.wikipedia.org/wiki/DeepDream
GPT-2 is best described IMHO as “DeepDream for text.” They use different neural network architectures, but that’s because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.
And while DeepDream creates incredibly surreal visual imagery, it is simply not capable of deep insights or originality. It really just morphs an image into a reflection of its training data, but in a fractal complex way that has the surface appearance of meaning. So too does GPT-2 extend a prompt as a reflection of its training data, but without any deep insight or understanding.
Your problem (excuse me) is that you keep imagining these scenarios with a human actor as the inbetweener or the writer. Yes, if even an artist with basic skills is given the opportunity to ‘in-between’ without limits and invent their own animation, some of them are bound to be good. But if you hand the same keyframe to DeepDream and expect it to ‘interpolate’ the next fame, and then the frame after, etc., you’d be crazy to expect anything other than fractal replay of its training data. That’s all it can do.
All GPT-2 can do is replay remixed versions of its own training data based on the prompts. Originality is not in the architecture.
But by all means, spend your $1000 on it. Maybe you’ll learn something in the process.
If by ‘fundamentally the same’ you mean ‘actually they’re completely different and optimize completely different things and give completely different results on completely different modalities’, then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.
Gwern, I respect you but sometimes you miss the mark. I was describing a particular application of deep dream in which the output is fed in as input, which doesn’t strike me as any different from your own description of GPT-2.
A little less hostility in your comment and it would be received better.
Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are ‘fundamentally the same’, not even close.
And let’s consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide
There was no snide there. I honestly think he’ll learn something of value. I don’t think he’ll get the result he wanted, but he will learn something in the process.
https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
I’m gonna say I won this one.
Which is amusing because when people look at StyleGAN artwork and they don’t realize it, like my anime faces, they often quite like it. Perhaps they just haven’t seen anime faces drawn by a true Scotsman yet.
When TWDNE went up, I asked ‘how long will I have to read and mash refresh before I see a cute face with a plot I would probably be willing to watch while bored at 2am’ The answer was ‘less than 10minutes’, and this is either commentary on the effectiveness of the tool, or on my (lack of?) taste.
I have a few pieces of artwork I’ve made using StyleGAN that I absolutely love, and absolutely could not have made without the tool.
When I noticed a reply from ‘gwern’, I admit was mildly concerned that there would be a link to a working webpage and a paypal link, I’m pretty enthusiastic about the idea but have not done anything at all to pursue it.
Do you think training a language model, whether it is GPT-2 or a near term successor entirely on math papers could have value?
Oh, well, if you want to pay for StyleGAN artwork, that can be arranged.
No, but mostly because there are so many more direct approaches to using NNs in math, like (to cite just the NN math papers I happened to read yesterday) planning in latent space or seq2seq rewriting. (Just because you can solve math problems in natural language input/output format with Transformers doesn’t mean you should try to solve it that way.)
Thank you for this!
It seems that my ignorance is on display here, the fact that these papers are new to me shows just how out of touch with the field I am. I am unsurprised that ‘yes it works, mostly, but other approaches are better’ is the answer, and should not be surprised that someone went and did it.
It looks like the successful Facebook AI approach is several steps farther down the road than my proposal, so my offer is unlikely to provide any value outside of the intellectual exercise for me, so I’m probably not actually going to go through with it—by the time the price drops that far, I will want to play with the newer tools.
Waifulabs is adorable and awesome. I’ve mostly been using style transfers on still life photos and paintings, I have human waifu selfie to anime art on my to do list but it has been sitting there for a while.
Are you planning integration with DeepAnime and maybe WaveNet so your perfect waifus can talk? Though you would know if that’s a desirable feature for your userbase better than I would...
On the topic, it looks like someone could, today, convert a selfie of a partner into an anime face, train wavenet on a collection of voicemails, and train a generator using an archive of text message conversations, so that they could have inane conversations with a robot, with an anime face reading the messages to them with believable mouth movements.
I guess the next step after that would be to analyze the text for inferred emotional content (simple approaches with NLP might get really close to the target here, pretty sure they’re already built), and warp the voice/eyes for emotional expression (I think WaveNet can do this for voice, if I remember correctly?
Maybe a deepfake type approach that transforms the anime girls using a palatte of a set of representative emotion faces? I’d be unsurprised if this has already been done, though maybe it’s niche enough that it has not been.
This brings to mind an awful idea: In the future I could potentially make a model of myself and provide it as ‘consolation’ to someone I am breaking up with. Or worse, announce that the model has already been running for two weeks.
I suspect that today older style, still image heavy anime could probably be crafted entirely using generators (limited editing of the writing, no animators or voice actors), is there a large archive of anime scripts somewhere that a generator could train on, or is that data all scattered across privately held archives?
What do you think?