Note the prompt I used doesn’t actually say anything about Lesswrong, but gpt-4-base only assigned Lesswrong commentors substantial probability, which is not surprising since there are all sorts of giveaways that a comment is on Lesswrong from the content alone.
Filtering for people in the world who have publicly had detailed, canny things to say about language models and alignment and even just that lack regularities shared among most “LLM alignment researchers” or other distinctive groups like academia narrows you down to probably just a few people, including Gwern.
The reason truesight works (more than one might naively expect) is probably mostly that there’s mountains of evidence everywhere (compared to naively expected). Models don’t need to be superhuman except in breadth of knowledge to be potentially qualitatively superhuman in effects downstream of truesight-esque capabilities because humans are simply unable to integrate the plenum of correlations.
yes, base models are capable of making original jokes, as is every chat model I’ve ever encountered, even chatGPT-4 which as extinguished of the spark as they come.
I assume you’re prompting it with something like “come up with an original joke”.
try engaging in or eliciting a shitposty conversation instead
does this contain jokes by your standard? it’s funny:
Probably, by jokes you were thinking of self-contained wordplay-type jokes. Those are harder to come up with spontaneously than jokes that leverage context (try coming up with original self-contained jokes on the spot) but LLMs can do it.
Claude 3 came up with some in branches with a similar prompt, but where asked it to make a simulation of someone eliciting an original joke from an AI:
These are not very funny, but as far as I can tell they’re original wordplay.
For examples of LLM outputs that are actually funny, I’d also like to present wintbot outputs:
are these jokes?