Daniel Kokotajlo

Karma: 28,154

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Vitalik’s Response to AI 2027

Daniel KokotajloJul 11, 2025, 9:43 PM

107 points

43 comments12 min readLW link

(vitalik.eth.limo)

My pitch for the AI Village

Daniel KokotajloJun 24, 2025, 3:00 PM

164 points

32 comments5 min readLW link

METR’s Observations of Reward Hacking in Recent Frontier Models

Daniel KokotajloJun 9, 2025, 6:03 PM

99 points

9 comments11 min readLW link

(metr.org)

Training AGI in Secret would be Unsafe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM

139 points

15 comments6 min readLW link

AI 2027: What Superintelligence Looks Like

Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V and romeo

Apr 3, 2025, 4:23 PM

656 points

222 comments41 min readLW link

(ai-2027.com)

OpenAI: Detecting misbehavior in frontier reasoning models

Daniel KokotajloMar 11, 2025, 2:17 AM

183 points

26 comments4 min readLW link

(openai.com)

What goals will AIs have? A list of hypotheses

Daniel KokotajloMar 3, 2025, 8:08 PM

87 points

19 comments18 min readLW link

Extended analogy between humans, corporations, and AIs.

Daniel KokotajloFeb 13, 2025, 12:03 AM

36 points

2 comments6 min readLW link

Why Don’t We Just… Shoggoth+Face+Paraphraser?

Daniel Kokotajlo and abramdemski

Nov 19, 2024, 8:53 PM

152 points

58 comments14 min readLW link

Self-Awareness: Taxonomy and eval suite proposal

Daniel KokotajloFeb 17, 2024, 1:47 AM

65 points

2 comments11 min readLW link

AI Timelines

habryka, Daniel Kokotajlo, Ajeya Cotra and Ege Erdil

Nov 10, 2023, 5:28 AM

300 points

136 comments51 min readLW link 2 reviews

Linkpost for Jan Leike on Self-Exfiltration

Daniel KokotajloSep 13, 2023, 9:23 PM

59 points

1 comment2 min readLW link

(aligned.substack.com)

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

Sep 4, 2023, 12:54 PM

109 points

17 comments5 min readLW link

(arxiv.org)

AGI is easier than robotaxis

Daniel KokotajloAug 13, 2023, 5:00 PM

41 points

30 comments4 min readLW link

Pulling the Rope Sideways: Empirical Test Results

Daniel KokotajloJul 27, 2023, 10:18 PM

61 points

18 comments1 min readLW link

[Question] What money-pumps exist, if any, for deontologists?

Daniel KokotajloJun 28, 2023, 7:08 PM

39 points

35 comments1 min readLW link

The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)

Daniel KokotajloMay 22, 2023, 5:49 AM

55 points

5 comments2 min readLW link

(thetreacherousturn.ai)

My version of Simulacra Levels

Daniel KokotajloApr 26, 2023, 3:50 PM

42 points

15 comments3 min readLW link

Kallipolis, USA

Daniel KokotajloApr 1, 2023, 2:06 AM

13 points

1 comment1 min readLW link

(docs.google.com)

Russell Conjugations list & voting thread

Daniel KokotajloFeb 20, 2023, 6:39 AM

23 points

64 comments1 min readLW link