specbug comments on OpenAI Codex: First Impressions

specbug 13 Aug 2021 19:47 UTC
1 point
0
The only correctness filters are the hidden testcases (as is standard in most competitive coding competition). You can check the leaderboard—the positions correlate with the cumulative time taken to solve problems & codex assists. If there are any hidden metrics, I wouldn’t know.
If so, how was Codex deployed solo? Did they just sample it many times on the same prompt until it produced something that passed the tests? Or something more sophisticated?
They didn’t reveal this publicly. We can only guess here.
This makes no sense to me. Do you assume solo-Codex exploited the prompts submitted by other competitors? Or that the assistant-Codexes communicated with each other somehow? I kinda doubt either of those happened.
After I was done, I played around with Codex (from a new account). You could only use Codex in the editors within problems. In one of the problems, I cleared the editor and just put in a simple prompt (unrelated to the problem). I remember in one of the assists, it actually generated the code for that specific problem. This is why I assumed there is some state saving, or context awareness.
- tin482 14 Aug 2021 2:18 UTC
  6 points
  0
  Parent
  There is no state saving or learning at test time. The prompts were prepended to the API calls, you could see it in the requests
- Vanessa Kosoy 13 Aug 2021 20:40 UTC
  2 points
  0
  Parent
  Hmm, I suppose they might be combining the problem statement and the prompt provided by the user into a single prompt somehow, and feeding that to the network? Either that or they’re cheating :)
  - Optimization Process 14 Aug 2021 21:18 UTC
    1 point
    0
    Parent
    Yes, that’s what they did! (Emphasis on the “somehow”—details a mystery to me.) Some piece of intro text for the challenge explained that Codex would receive, as input, both the problem statement (which always included a handful of example inputs/output/explanation triplets), and the user’s current code up to their cursor.