Daniel Kokotajlo

Karma: 22,992

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Why Don’t We Just… Shoggoth+Face+Paraphraser?

Daniel Kokotajlo and abramdemski

19 Nov 2024 20:53 UTC

136 points

55 comments14 min readLW link

Self-Awareness: Taxonomy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC

63 points

2 comments11 min readLW link

AI Timelines

habryka, Daniel Kokotajlo, Ajeya Cotra and Ege Erdil

10 Nov 2023 5:28 UTC

293 points

133 comments51 min readLW link 2 reviews

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC

59 points

1 comment2 min readLW link

(aligned.substack.com)

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

4 Sep 2023 12:54 UTC

109 points

16 comments5 min readLW link

(arxiv.org)

AGI is easier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC

41 points

30 comments4 min readLW link

Pulling the Rope Sideways: Empirical Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC

61 points

18 comments1 min readLW link

[Question] What money-pumps exist, if any, for deontologists?

Daniel Kokotajlo28 Jun 2023 19:08 UTC

39 points

35 comments1 min readLW link

The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC

55 points

5 comments2 min readLW link

(thetreacherousturn.ai)

My version of Simulacra Levels

Daniel Kokotajlo26 Apr 2023 15:50 UTC

42 points

15 comments3 min readLW link

Kallipolis, USA

Daniel Kokotajlo1 Apr 2023 2:06 UTC

13 points

1 comment1 min readLW link

(docs.google.com)

Russell Conjugations list & voting thread

Daniel Kokotajlo20 Feb 2023 6:39 UTC

22 points

62 comments1 min readLW link

Important fact about how people evaluate sets of arguments

Daniel Kokotajlo14 Feb 2023 5:27 UTC

33 points

11 comments2 min readLW link

AI takeover tabletop RPG: “The Treacherous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC

53 points

5 comments1 min readLW link

ACT-1: Transformer for Actions

Daniel Kokotajlo14 Sep 2022 19:09 UTC

52 points

4 comments1 min readLW link

(www.adept.ai)

Linkpost: Github Copilot productivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC

88 points

4 comments1 min readLW link

(github.blog)

Replacement for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC

58 points

6 comments2 min readLW link

Immanuel Kant and the Decision Theory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC

92 points

12 comments5 min readLW link

Forecasting Fusion Power

Daniel Kokotajlo18 Jun 2022 0:04 UTC

29 points

8 comments1 min readLW link

(astralcodexten.substack.com)

Why agents are powerful

Daniel Kokotajlo6 Jun 2022 1:37 UTC

37 points

7 comments7 min readLW link