gjm comments on Testing PaLM prompts on GPT3

gjm 6 Apr 2022 16:41 UTC
10 points
Aha! Thanks. (To save others a click: “change list” really just means “change” or “commit”: a single thing checked into version control or submitted for review.) I’m not sure the joke really lands for me—maybe I’m stupider than both GPT-3 and PaLM. It seems like the joke could be (1) the intern produced a hilariously excessive amount of code, perhaps because they e.g. failed to use elementary techniques like functions and loops for removing redundancy, or (2) the intern produced a normal amount of code but it was so bad that reading it was as painful as if it had been War-and-Peace-sized, or (3) the reviewer is incredibly lazy (so is telling a joke against himself) and finds reading even small amounts of other people’s code terribly hard work. Normally I’d use the obvious heuristic that the intended meaning is the one that’s funny, but unfortunately none of them seems very funny to me. I guess probably it’s #2?
(This is the difficulty about making up one’s own jokes for this sort of test...)
- Qumeric 6 Apr 2022 19:21 UTC
  4 points
  Parent
  I am sure the situation is that the intern never pushed his code to VCS for a few months, just wrote it locally, and then pushed tons of code. It is dreading because 1 day is a very small amount of time to review so much code.
  - Yitz 6 Apr 2022 20:30 UTC
    5 points
    Parent
    The fact that we humans are having trouble understanding this joke does not bode well for its use as an AI benchmark…
- Measure 6 Apr 2022 18:33 UTC
  2 points
  Parent
  Since it was the intern’s last day, they might have been less careful with their coding (or, depending on why they’re leaving, even added deliberate errors), so the reviewer will have to be extra thorough checking it.
- Yitz 6 Apr 2022 18:42 UTC
  1 point
  Parent
  Yeah, I do wonder how most of the example jokes not actually being very funny is effecting the results… It also is weird that they make an explicit reference to a term which is only used internally, and which presumably PaLM has little-to-no training on. Was that on purpose, or a slip-up by the authors?