ChatGPT-4 just spits out a list of ‘famous ML people’ like ‘Ilya Sutskever’ or ‘Daphne Koller’ or ‘Geoffrey Hinton’ - most of whom are obviously incorrect as they write nothing like me!
To elaborate a little more on this: while the RLHF models all appear still capable of a lot of truesight, we also still appear to see “mode collapse”. Besides mine, where it goes from plausible candidates besides me to me + random bigwigs, from Cyborgism Discord, Arun Jose notes another example of this mode collapse over possible authors:
ChatGPT-4′s guesses for Beth’s comment: Eliezer, Timnit Gebru, Sam Altman / Greg Brockman. Further guesses by ChatGPT-4: Gary Marcus, and Yann LeCun.
Claude’s guesses (first try): Paul Christiano, Ajeya, Evan, Andrew Critch, Daniel Ziegler. [but] Claude managed to guess 2 people at ARC/METR. On resampling Claude: Eliezer, Paul, Gwern, or Scott Alexander. Third try, where it doesn’t guess early on: Eliezer, Paul, Rohin Shah, Richard Ngo, or Daniel Ziegler.
Interestingly, Beth aside, I think Claude’s guesses might have been better than 4-base’s. Like, 4-base did not guess Daniel Ziegler (but did guess Daniel Kokotajlo). Also did not guess Ajeya or Paul (Paul at 0.27% and Ajeya at 0.96%) (but entirely plausible this was some galaxy-brained analysis of writing aura more than content than I’m completely missing).
Going back to my comments as a demo:
Woah, with Gwern’s comment Claude’s very insistent that it’s Gwern. I recommended it give other examples and it did so perfunctorily, but then went back to insisting that its primary guess is Gwern.
...ChatGPT-4 guesses: Timnit Gebru, Emily Bender, Yann LeCun, Hinton, Ian Goodfellow, “people affiliated with FHI, OpenAI, or CSET”. For Gwern’s comment. Very funny it guessed Timnit for Beth and Gwern. It also guessed LeCun over Hinton and Ian specifically because of his “active involvement in AI ethics and research discussions”. Claude confirmed SOTA.
To elaborate a little more on this: while the RLHF models all appear still capable of a lot of truesight, we also still appear to see “mode collapse”. Besides mine, where it goes from plausible candidates besides me to me + random bigwigs, from Cyborgism Discord, Arun Jose notes another example of this mode collapse over possible authors:
Going back to my comments as a demo: