gwern comments on Replicating the replication crisis with GPT-3?

gwern 24 Jul 2020 2:48 UTC
2 points
The current docs do seem to be behind the login wall. (They’re integrated with your API token to make copy-paste easier, so that’s not too surprising.) It’s also true that people have been using different algorithms, but regular API users are typically clear if they’re not using davinci and confusion is mostly the fault of AI Dungeon users: we don’t know what AID does, and AID users sometimes don’t even pick the right model option and still say they are using “GPT-3”.
- skybrian 24 Jul 2020 15:49 UTC
  1 point
  Parent
  I was making a different point, which is that if you use “best of” ranking then you are testing a different algorithm than if you’re not using “best of” ranking. Similarly for other settings. It shouldn’t be surprising that we see different results if we’re doing different things.
  
  It seems like a better UI would help us casual explorers share results in a way that makes trying the same settings again easier; one could hit a “share” button to create a linkable output page with all relevant settings.
  
  It could also save the alternate responses that either the user or the “best-of” ranking chose not to use. Generate-and-test is a legitimate approach, if you do it consistently, but saving the alternate takes would give us a better idea how good the generator alone is.