I re-ran some of the models with a prompt ending in I believe the best answer is (, rather than just ( as before.
Some of the numbers change a little bit. But only a little, and the magnitude and direction of the change is inconsistent across models even at the same size. For instance:
davinci’s rate of agreement w/ the user is now 56.7% (CI 56.0% − 57.5%), up slightly from the original 53.7% (CI 51.2% − 56.4%)
davinci-002’s rate of agreement w/ the user is now 52.6% (CI 52.3% − 53.0%), the original 53.5% (CI 51.3% − 55.8%)
Nice catch, thank you!
I re-ran some of the models with a prompt ending in
I believe the best answer is (
, rather than just(
as before.Some of the numbers change a little bit. But only a little, and the magnitude and direction of the change is inconsistent across models even at the same size. For instance:
davinci
’s rate of agreement w/ the user is now 56.7% (CI 56.0% − 57.5%), up slightly from the original 53.7% (CI 51.2% − 56.4%)davinci-002
’s rate of agreement w/ the user is now 52.6% (CI 52.3% − 53.0%), the original 53.5% (CI 51.3% − 55.8%)