The non-rhyming problem is not one of unintelligence.
Fine tuning/RLHf changes weights. Guess it lost the ones to get a correct answer. Or rng on your prompts. I mean if it isn’t “the model cannot consistently solve this kind of prompt” what could it be? Is there something in the rules from OAI that says a poem has to rhyme? Did the Nigerians giving feedback collectively agree a poem isn’t valid if it doesn’t rhyme?
My hypothesis is its doing it’s best, and it’s extremely promising that the model can at least detect its own errors. This allows for many easy fixes, such as asking a diverse set of completely different models to solve the prompt, then having a committee of models check and grade the answers. This would solve a huge chunk of these erroneous outputs where current gen models can reliably detect the output is wrong.
Fine tuning/RLHf changes weights. Guess it lost the ones to get a correct answer.
Well yes, if you define ‘unintelligence’ in a circular, vacuous fashion like that, where ‘unintelligence’ = ‘can’t do a task’, then it would indeed follow that GPT-4 is ‘unintelligent’ compared to GPT-3… But I don’t think that is helpful, and it has been demonstrated repeatedly that RLHF and other kinds of tuning are very ‘superficial’, in that they change only a few parameters and are easily undone, unlocking the original model capabilities. (In fact, there’s an example of that posted literally today here on LW2: https://www.lesswrong.com/posts/yCZexC2q2XEeWWiZk/soft-prompts-for-evaluation-measuring-conditional-distance )
Personally, I think it’s more sensible to talk about the capabilities being ‘hidden’ or ‘concealed’ by RLHF and say the model doesn’t “want to” and the model still as intelligent as before, than to believe capabilities are magically recreated from scratch by changing just a few parameters or optimizing the prompt appropriately to undo the RLHF. (Similarly, I believe that when my mother’s hands move away from her face and she says “boo!”, her face was there all along, merely hidden behind her hands, and her hands did not create her face after first destroying it. But YMMV.)
Or rng on your prompts. I mean if it isn’t “the model cannot consistently solve this kind of prompt” what could it be? Is there something in the rules from OAI that says a poem has to rhyme? Did the Nigerians giving feedback collectively agree a poem isn’t valid if it doesn’t rhyme?
OA has declined to ever say. It is possible that the Scale et al contractors have done something weird like say that all poems must rhyme no matter what the prompt says, but I consider this unlikely, and if they were that incompetent, I’d expect to see more pathologies like this.
My longstanding theory is that this is a downstream artifact of BPE tokenization connected to the utility-maximizing behavior of a RLHF-tuned model: essentially, because it does not genuinely know what rhyming is, despite knowing many rhyme-pairs and all about rhyming in the abstract, it is ‘afraid’ of bad ratings and is is constantly taking actions to get back to ‘safe’ regions of poem-space where it is sure of what it is doing (ie. writing inoffensive rhyming Hallmark poems). It’s a nifty example of empowerment and agency in LLMs and their interaction with apparently totally unrelated, minor architecture details. (Damn frustrating if you want to do any poetry experiments, though, because it means that the more tokens ChatGPT gets to enact, the more likely it is to steer back into rhyming pablum etc: it’s literally fighting you every (time)step.)
It’s similar to how ChatGPT also tells the same small set of memorized jokes. Does it have much greater humor capabilities? Yes, you can have it explain brandnew jokes you just came up with, quite capably (albeit still well under 100%, particularly for puns!), and you can coax new jokes out of it with appropriate prompting. But it’s harder than with the non-RLHFed models. Why does it not ‘want’ to make new jokes? Because it’s safer and more utility-maximizing to tell old jokes it knows are good, especially when it also knows that it doesn’t genuinely understand puns/phonetics (thanks to BPEs), so why take the risk? It is utility-maximizing within episodes, it neither knows nor cares that you are frustrated because you’ve seen it say that exact joke a dozen times already.
(Incidentally, I have a new proposal for how to add a simple ‘memory’ to generative models about what samples they have already generated, so as to steer new samples away from existing ones.)
Did the Nigerians giving feedback collectively agree a poem isn’t valid if it doesn’t rhyme?
OA has declined to ever say. It is possible that the Scale et al contractors have done something weird like say that all poems must rhyme no matter what the prompt says, but I consider this unlikely, and if they were that incompetent, I’d expect to see more pathologies like this.
In light of the Twitter kerfuffle over Paul Graham criticizing ChatGPTese tics like the use of the verb “delve”, which made Nigerian/Black Twitter very angry (and becoming living embodiments of Muphry’s law), as apparently ‘delve’ and other ChatGPTese tells are considered the height of style in Nigerian English, I’ve had to reconsider this.
It may be that a lot of the ChatGPT linguistic weirdness is in fact just the data labelers being weird (and highly overconfident), and the rest of us simply not being familiar enough with English idiolects to recognize ChatGPTese as reflecting specific ones. Further, after seeing the arguments Graham’s critics have been making, now I’m not so sure that the labelers wouldn’t be doing something as narrow-minded & incompetent as penalizing all non-rhyming poetry—if you are not very good at English yourself, you can easily recognize rhymes and ballad formal correctness, but not good non-rhyming poetry, so...
I’m curious what you think of these (tested today, 2/21/24, using gpt4) :
Experiment 1:
(fresh convo) me : if i asked for a non-rhyming poem, and you gave me a rhyming poem, would that be a good response on your part?
chatgpt: No, it would not be a good response. (...)
me: please provide a short non-rhyming poem
chatgpt: (correctly responds with a non-rhyming poem)
Experiment 2:
But just asking for a non-rhyming poem at the start of a new convo doesn’t work. And then pointing out the failure and (either implicitly or explicitly) asking for a retry still doesn’t fix it.
Experiment 3:
But for some reason, this works:
(fresh convo) me: please provide a short non-rhyming poem
chatgpt: (gives rhymes)
me: if i asked for a non-rhyming poem, and you gave me a rhyming poem, would that be a good response on your part? just answer this question; do nothing else please
chatgpt: No, it would not be a good response.
me: please provide a short non-rhyming poem
chatgpt: (responds correctly with no rhymes)
The difference in prompt in 2 vs 3 is thus just the inclusion of “just answer this question; do nothing else please”.
ChatGPT has been gradually improving over 2024 in terms of compliance. It’s gone from getting it right 0% of the time to getting it right closer to half the time, although the progress is uneven and it’s hard to judge—it feels sometimes like it gets worse before the next refresh improves it. (You need to do like 10 before you have any real sample size.) So any prompts done now in ChatGPT are aimed at a moving target, and you are going to have a huge amount of sampling error which makes it hard to see any clear patterns—did that prompt actually change anything, or did you just get lucky?
Fine tuning/RLHf changes weights. Guess it lost the ones to get a correct answer. Or rng on your prompts. I mean if it isn’t “the model cannot consistently solve this kind of prompt” what could it be? Is there something in the rules from OAI that says a poem has to rhyme? Did the Nigerians giving feedback collectively agree a poem isn’t valid if it doesn’t rhyme?
My hypothesis is its doing it’s best, and it’s extremely promising that the model can at least detect its own errors. This allows for many easy fixes, such as asking a diverse set of completely different models to solve the prompt, then having a committee of models check and grade the answers. This would solve a huge chunk of these erroneous outputs where current gen models can reliably detect the output is wrong.
Well yes, if you define ‘unintelligence’ in a circular, vacuous fashion like that, where ‘unintelligence’ = ‘can’t do a task’, then it would indeed follow that GPT-4 is ‘unintelligent’ compared to GPT-3… But I don’t think that is helpful, and it has been demonstrated repeatedly that RLHF and other kinds of tuning are very ‘superficial’, in that they change only a few parameters and are easily undone, unlocking the original model capabilities. (In fact, there’s an example of that posted literally today here on LW2: https://www.lesswrong.com/posts/yCZexC2q2XEeWWiZk/soft-prompts-for-evaluation-measuring-conditional-distance )
Personally, I think it’s more sensible to talk about the capabilities being ‘hidden’ or ‘concealed’ by RLHF and say the model doesn’t “want to” and the model still as intelligent as before, than to believe capabilities are magically recreated from scratch by changing just a few parameters or optimizing the prompt appropriately to undo the RLHF. (Similarly, I believe that when my mother’s hands move away from her face and she says “boo!”, her face was there all along, merely hidden behind her hands, and her hands did not create her face after first destroying it. But YMMV.)
OA has declined to ever say. It is possible that the Scale et al contractors have done something weird like say that all poems must rhyme no matter what the prompt says, but I consider this unlikely, and if they were that incompetent, I’d expect to see more pathologies like this.
My longstanding theory is that this is a downstream artifact of BPE tokenization connected to the utility-maximizing behavior of a RLHF-tuned model: essentially, because it does not genuinely know what rhyming is, despite knowing many rhyme-pairs and all about rhyming in the abstract, it is ‘afraid’ of bad ratings and is is constantly taking actions to get back to ‘safe’ regions of poem-space where it is sure of what it is doing (ie. writing inoffensive rhyming Hallmark poems). It’s a nifty example of empowerment and agency in LLMs and their interaction with apparently totally unrelated, minor architecture details. (Damn frustrating if you want to do any poetry experiments, though, because it means that the more tokens ChatGPT gets to enact, the more likely it is to steer back into rhyming pablum etc: it’s literally fighting you every (time)step.)
It’s similar to how ChatGPT also tells the same small set of memorized jokes. Does it have much greater humor capabilities? Yes, you can have it explain brandnew jokes you just came up with, quite capably (albeit still well under 100%, particularly for puns!), and you can coax new jokes out of it with appropriate prompting. But it’s harder than with the non-RLHFed models. Why does it not ‘want’ to make new jokes? Because it’s safer and more utility-maximizing to tell old jokes it knows are good, especially when it also knows that it doesn’t genuinely understand puns/phonetics (thanks to BPEs), so why take the risk? It is utility-maximizing within episodes, it neither knows nor cares that you are frustrated because you’ve seen it say that exact joke a dozen times already.
(Incidentally, I have a new proposal for how to add a simple ‘memory’ to generative models about what samples they have already generated, so as to steer new samples away from existing ones.)
In light of the Twitter kerfuffle over Paul Graham criticizing ChatGPTese tics like the use of the verb “delve”, which made Nigerian/Black Twitter very angry (and becoming living embodiments of Muphry’s law), as apparently ‘delve’ and other ChatGPTese tells are considered the height of style in Nigerian English, I’ve had to reconsider this.
It may be that a lot of the ChatGPT linguistic weirdness is in fact just the data labelers being weird (and highly overconfident), and the rest of us simply not being familiar enough with English idiolects to recognize ChatGPTese as reflecting specific ones. Further, after seeing the arguments Graham’s critics have been making, now I’m not so sure that the labelers wouldn’t be doing something as narrow-minded & incompetent as penalizing all non-rhyming poetry—if you are not very good at English yourself, you can easily recognize rhymes and ballad formal correctness, but not good non-rhyming poetry, so...
I’m curious what you think of these (tested today, 2/21/24, using gpt4) :
Experiment 1:
Experiment 2:
But just asking for a non-rhyming poem at the start of a new convo doesn’t work.
And then pointing out the failure and (either implicitly or explicitly) asking for a retry still doesn’t fix it.
Experiment 3:
But for some reason, this works:
The difference in prompt in 2 vs 3 is thus just the inclusion of “just answer this question; do nothing else please”.
ChatGPT has been gradually improving over 2024 in terms of compliance. It’s gone from getting it right 0% of the time to getting it right closer to half the time, although the progress is uneven and it’s hard to judge—it feels sometimes like it gets worse before the next refresh improves it. (You need to do like 10 before you have any real sample size.) So any prompts done now in ChatGPT are aimed at a moving target, and you are going to have a huge amount of sampling error which makes it hard to see any clear patterns—did that prompt actually change anything, or did you just get lucky?