Thanks for the reference! I was aware of some shortcomings of PANAS, but the advantages (very well-studied, and lots of freely available human baseline data) are also pretty good.
The cool thing about doing these tests with large language models is that it almost costs nothing to get insanely large sample sizes (for social science standards) and that it’s (by design) super replicable. When done in a smart way, this procedure might even produce insight on biases of the test design or it might verify shaky results from psychology (as GPT should capture a fair bit of human psychology). The flip side of that is of course that there will be a lot of different moving parts and interpreting the output is challenging.
Thanks for the reference! I was aware of some shortcomings of PANAS, but the advantages (very well-studied, and lots of freely available human baseline data) are also pretty good.
The cool thing about doing these tests with large language models is that it almost costs nothing to get insanely large sample sizes (for social science standards) and that it’s (by design) super replicable. When done in a smart way, this procedure might even produce insight on biases of the test design or it might verify shaky results from psychology (as GPT should capture a fair bit of human psychology). The flip side of that is of course that there will be a lot of different moving parts and interpreting the output is challenging.