Although I think it’s a stretch to say they “aren’t meaningful”, I do agree a more scientific test would be nice. It’s a bit tricky when you only got 25 messages per 3 hours though, lol.
More generally, it’s hard to tell how to objectively quantify agency in the responses, and how to eliminate other hypotheses (like that GPT-4 is just more familiar with itself than other AIs).
Although I think it’s a stretch to say they “aren’t meaningful”, I do agree a more scientific test would be nice. It’s a bit tricky when you only got 25 messages per 3 hours though, lol.
More generally, it’s hard to tell how to objectively quantify agency in the responses, and how to eliminate other hypotheses (like that GPT-4 is just more familiar with itself than other AIs).