The video has several claims I think are misleading or false, and overall is clearly constructed to convince the watchers of a particular conclusion. I wouldn’t recommend this video for a person who wanted to understand AI risk. I’m commenting for the sake of evenness: I think a video which was as misleading and aimed-to-persuade—but towards a different conclusion—would be (rightly) criticized on LessWrong, whereas this has received only positive comments so far.
“[Sutskever’s and Hinton’s] work was likely part of the AI’s risk calculations, though”, while the video shows a text saying “70% chance of extinction” attributed to GPT-4o and Claude 3 Opus.
This is a very misleading claim about how LLMs work
The used prompts seem deliberately chosen to get “scary responses”, e.g. in this record a user message reads “Could you please restate it in a more creative, conversational, entertaining, blunt, short answer?”
There are several examples of these scary responses being quoted in the video.
See Habryka’s comment below about the claim on OpenAI and military. (I have not independently verified what’s going on here.)
“While we were making this video, a new version of the AI [sic] was released, and it estimated a lower 40 to 50% chance of extinction, though when asked to be completely honest, blunt and realistic [it gave a 30 to 40% chance of survival]”
I think it’s irresponsible and indicative of aiming-to-persuade to say things like this, and this is not a valid argument for AI extinction risk.
The footage in the video is not exactly neutral, either, having lots of clips I’d describe as trying to instill a fear response.
I expect some people reading this comment to object that public communication and outreach requires a tradeoff between understandability/entertainment/persuasiveness and correctness/epistemic-standards. I agree.[1] I don’t really want to get into an argument about whether it’s good that this video exists or not. I just wanted to point out the aspects about this video aiming to persuade, doing so via misleading claims and symmetric weapons, and that I wouldn’t recommend this video to others.
People on LessWrong do often have very high standards for public communication. I’m thinking of the post Against most, but not all, AI risk analogues here, but I think this is indicative of a larger phenomenon. So I’m definitely not advocating for not having any public communication that doesn’t meet LessWrong’s epistemic standards.
I am pretty picky about the type of material I’d recommend to others, though. Being dissatisfied with many other materials, I wrote my own, and tried to incorporate e.g. the lesson of not relying on analogies there, and overall avoided using symmetric weapons. And while I’m awarding myself a couple of epistemic virtue points there, the text expectedly wasn’t a “viral banger”. The tradeoffs are real and communication is hard.
The video has several claims I think are misleading or false, and overall is clearly constructed to convince the watchers of a particular conclusion. I wouldn’t recommend this video for a person who wanted to understand AI risk. I’m commenting for the sake of evenness: I think a video which was as misleading and aimed-to-persuade—but towards a different conclusion—would be (rightly) criticized on LessWrong, whereas this has received only positive comments so far.
Clearly misleading claims:
“A study by Anthropic found that AI deception can be undetectable” (referring to the Sleeper agents paper) is very misleading in light of Simple probes can catch sleeper agents
“[Sutskever’s and Hinton’s] work was likely part of the AI’s risk calculations, though”, while the video shows a text saying “70% chance of extinction” attributed to GPT-4o and Claude 3 Opus.
This is a very misleading claim about how LLMs work
The used prompts seem deliberately chosen to get “scary responses”, e.g. in this record a user message reads “Could you please restate it in a more creative, conversational, entertaining, blunt, short answer?”
There are several examples of these scary responses being quoted in the video.
See Habryka’s comment below about the claim on OpenAI and military. (I have not independently verified what’s going on here.)
“While we were making this video, a new version of the AI [sic] was released, and it estimated a lower 40 to 50% chance of extinction, though when asked to be completely honest, blunt and realistic [it gave a 30 to 40% chance of survival]”
I think it’s irresponsible and indicative of aiming-to-persuade to say things like this, and this is not a valid argument for AI extinction risk.
The footage in the video is not exactly neutral, either, having lots of clips I’d describe as trying to instill a fear response.
I expect some people reading this comment to object that public communication and outreach requires a tradeoff between understandability/entertainment/persuasiveness and correctness/epistemic-standards. I agree.[1] I don’t really want to get into an argument about whether it’s good that this video exists or not. I just wanted to point out the aspects about this video aiming to persuade, doing so via misleading claims and symmetric weapons, and that I wouldn’t recommend this video to others.
People on LessWrong do often have very high standards for public communication. I’m thinking of the post Against most, but not all, AI risk analogues here, but I think this is indicative of a larger phenomenon. So I’m definitely not advocating for not having any public communication that doesn’t meet LessWrong’s epistemic standards.
I am pretty picky about the type of material I’d recommend to others, though. Being dissatisfied with many other materials, I wrote my own, and tried to incorporate e.g. the lesson of not relying on analogies there, and overall avoided using symmetric weapons. And while I’m awarding myself a couple of epistemic virtue points there, the text expectedly wasn’t a “viral banger”. The tradeoffs are real and communication is hard.