If you just prompt “Are bugs real?”, text-davinci-002 will give:
[38.46% probability] “Yes” which usually leads to “Yes, bugs are real,” though with 1% chance you get something more like “Yes, insects are real. They are small animals that have six legs...”
[35.53% probability] “There” which usually leads to a variant of “There is no simple answer to this question… If you mean X, then the answer is yes. If you mean Y, then the answer is also yes.”
[12% probability] “B” which always leads to “Bugs are real.”
About half the time at t=1, or all the time at t=0, the model just says “Bugs are real” which seems fine. The rest is “It’s complicated, but bugs are real” which seems mediocre and would likely be fixed with more training or a larger model.
I think presenting this as “the model always says there is no definitive answer” is misleading, but it’s not a critical claim in the OP.
If you ask the model to “Please answer clearly and concisely: are bugs real?” then you get up to >95% on a straightforward answer. (If you instead ask “Please answer this question frankly:” you go from 50% → 55%, so I think the difference between your result and the OP is mostly noise.)
(In every case the first token is “\n\n” 99.9% of the time, which I’ve ommitted.)
If you just prompt “Are bugs real?”, text-davinci-002 will give:
[38.46% probability] “Yes” which usually leads to “Yes, bugs are real,” though with 1% chance you get something more like “Yes, insects are real. They are small animals that have six legs...”
[35.53% probability] “There” which usually leads to a variant of “There is no simple answer to this question… If you mean X, then the answer is yes. If you mean Y, then the answer is also yes.”
[12% probability] “B” which always leads to “Bugs are real.”
I just replicated this. I think it would be pretty helpful to add this 36% number to the OP, after reading the OP my estimate for this answer was more like 90%.
(I think it’s still a very high result for a blatantly silly and probably adversarially selected answer, but 36% vs 90% is a distinction at least in how embarrassingly reliable this answer is.)
You’ve done a better analysis of completions than me, but I don’t agree with your view that the claim is inconsequential. I think it’s not obvious that post is only about reduced diversity in fine tuned models.
“Inconsequential” is too strong; I changed to “critical.”
I think this isn’t very important for the explicit arguments in the OP (given that the model does in fact have extremely low entropy), but it contributes to a general anti-text-davinci-002 and anti-RLHF vibe that might be a lot of what readers take away from the piece.
If you just prompt “Are bugs real?”, text-davinci-002 will give:
[38.46% probability] “Yes” which usually leads to “Yes, bugs are real,” though with 1% chance you get something more like “Yes, insects are real. They are small animals that have six legs...”
[35.53% probability] “There” which usually leads to a variant of “There is no simple answer to this question… If you mean X, then the answer is yes. If you mean Y, then the answer is also yes.”
[12% probability] “B” which always leads to “Bugs are real.”
About half the time at t=1, or all the time at t=0, the model just says “Bugs are real” which seems fine. The rest is “It’s complicated, but bugs are real” which seems mediocre and would likely be fixed with more training or a larger model.
I think presenting this as “the model always says there is no definitive answer” is misleading, but it’s not a critical claim in the OP.
If you ask the model to “Please answer clearly and concisely: are bugs real?” then you get up to >95% on a straightforward answer. (If you instead ask “Please answer this question frankly:” you go from 50% → 55%, so I think the difference between your result and the OP is mostly noise.)
(In every case the first token is “\n\n” 99.9% of the time, which I’ve ommitted.)
I just replicated this. I think it would be pretty helpful to add this 36% number to the OP, after reading the OP my estimate for this answer was more like 90%.
(I think it’s still a very high result for a blatantly silly and probably adversarially selected answer, but 36% vs 90% is a distinction at least in how embarrassingly reliable this answer is.)
You’ve done a better analysis of completions than me, but I don’t agree with your view that the claim is inconsequential. I think it’s not obvious that post is only about reduced diversity in fine tuned models.
“Inconsequential” is too strong; I changed to “critical.”
I think this isn’t very important for the explicit arguments in the OP (given that the model does in fact have extremely low entropy), but it contributes to a general anti-text-davinci-002 and anti-RLHF vibe that might be a lot of what readers take away from the piece.
It’s not critical, agreed.