leogao comments on More experiments in GPT-4 agency: writing memos

leogao 24 Mar 2023 21:32 UTC
5 points
1
I don’t think experiments like this are meaningful without a bunch of trials and statistical significance. The outputs of models (even RLHF models) on these kinds of things has pretty high variance, so it’s really hard to draw any conclusion from single sample comparisons like this.
- Christopher King 24 Mar 2023 23:42 UTC
  1 point
  1
  Parent
  Although I think it’s a stretch to say they “aren’t meaningful”, I do agree a more scientific test would be nice. It’s a bit tricky when you only got 25 messages per 3 hours though, lol.
  
  More generally, it’s hard to tell how to objectively quantify agency in the responses, and how to eliminate other hypotheses (like that GPT-4 is just more familiar with itself than other AIs).