All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 121314 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Value drift threat models

Garrett Baker12 May 2023 23:03 UTC

27 points

4 comments5 min readLW link

Aggregating Utilities for Corrigible AI [Feedback Draft]

Dan H and Simon Goldstein

12 May 2023 20:57 UTC

28 points

7 comments22 min readLW link

Turning off lights with model editing

Sam Marks12 May 2023 20:25 UTC

68 points

5 comments2 min readLW link

(arxiv.org)

Dark Forest Theories

Raemon12 May 2023 20:21 UTC

145 points

53 comments2 min readLW link 2 reviews

DELBERTing as an Adversarial Strategy

Matthew_Opitz12 May 2023 20:09 UTC

8 points

3 comments5 min readLW link

Microsoft/GitHub Copilot Chat’s confidential system Prompt: “You must refuse to discuss life, existence or sentience.”

Marvin von Hagen12 May 2023 19:46 UTC

6 points

2 comments1 min readLW link

(twitter.com)

Retrospective: Lessons from the Failed Alignment Startup AISafety.com

Søren Elverlin12 May 2023 18:07 UTC

105 points

9 comments3 min readLW link

The way AGI wins could look very stupid

Christopher King12 May 2023 16:34 UTC

49 points

22 comments1 min readLW link

Towards Measures of Optimisation

mattmacdermott and Alexander Gietelink Oldenziel

12 May 2023 15:29 UTC

53 points

37 comments4 min readLW link

The Eden Project

rogersbacon12 May 2023 14:58 UTC

−1 points

1 comment2 min readLW link

(www.secretorum.life)

Another formalization attempt: Central Argument That AGI Presents a Global Catastrophic Risk

avturchin12 May 2023 13:22 UTC

16 points

4 comments2 min readLW link

Infinite-width MLPs as an “ensemble prior”

Vivek Hebbar12 May 2023 11:45 UTC

46 points

0 comments5 min readLW link

Input Swap Graphs: Discovering the role of neural network components at scale

Alexandre Variengien12 May 2023 9:41 UTC

92 points

0 comments33 min readLW link

Uploads are Impossible

PashaKamyshev12 May 2023 8:03 UTC

−5 points

37 comments8 min readLW link

Formulating the AI Doom Argument for Analytic Philosophers

JonathanErhardt12 May 2023 7:54 UTC

13 points

0 comments2 min readLW link

Three Iterative Processes

LoganStrohl12 May 2023 2:50 UTC

49 points

0 comments3 min readLW link

Zuzalu LW Sequences Discussion

veronica12 May 2023 0:14 UTC

1 point

0 comments1 min readLW link

[Question] Term/Category for AI with Neutral Impact?

isomic11 May 2023 22:00 UTC

6 points

1 comment1 min readLW link

Thoughts on LessWrong norms, the Art of Discourse, and moderator mandate

Ruby11 May 2023 21:20 UTC

37 points

20 comments5 min readLW link

Alignment, Goals, and The Gut-Head Gap: A Review of Ngo. et al.

Violet Hour11 May 2023 18:06 UTC

20 points

2 comments13 min readLW link

Sequence opener: Jordan Harbinger’s 6 minute networking

Severin T. Seehrich11 May 2023 17:06 UTC

4 points

0 comments1 min readLW link

Advice for newly busy people

Severin T. Seehrich11 May 2023 16:46 UTC

149 points

3 comments5 min readLW link

AI #11: In Search of a Moat

Zvi11 May 2023 15:40 UTC

67 points

28 comments81 min readLW link

(thezvi.wordpress.com)

[Question] Bayesian update from sensationalistic sources

houkime11 May 2023 15:26 UTC

1 point

0 comments1 min readLW link

I bet $500 on AI winning the IMO gold medal by 2026

azsantosk11 May 2023 14:46 UTC

37 points

29 comments1 min readLW link

Fatebook for Slack: Track your forecasts, right where your team works

Sage Future and Adam B

11 May 2023 14:11 UTC

24 points

3 comments1 min readLW link

Contra Caller Signs

jefftk11 May 2023 13:10 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

Notes on the importance and implementation of safety-first cognitive architectures for AI

Brendon_Wong11 May 2023 10:03 UTC

3 points

0 comments3 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC

3 points

4 comments1 min readLW link

Separating the “control problem” from the “alignment problem”

Yi-Yang11 May 2023 9:41 UTC

12 points

1 comment4 min readLW link

[Question] Is Infra-Bayesianism Applicable to Value Learning?

RogerDearnaley11 May 2023 8:17 UTC

5 points

4 comments1 min readLW link

[Question] How should we think about the decision relevance of models estimating p(doom)?

Mo Putera11 May 2023 4:16 UTC

11 points

1 comment3 min readLW link

The Academic Field Pyramid—any point to encouraging broad but shallow AI risk engagement?

Matthew_Opitz11 May 2023 1:32 UTC

20 points

1 comment6 min readLW link

[Question] How should one feel morally about using chatbots?

Adam Zerner11 May 2023 1:01 UTC

18 points

4 comments1 min readLW link

[Question] AI interpretability could be harmful?

Roman Leventov10 May 2023 20:43 UTC

13 points

2 comments1 min readLW link

Athens, Greece – ACX Meetups Everywhere Spring 2023

Spyros Dovas10 May 2023 19:45 UTC

1 point

0 comments1 min readLW link

Better debates

TsviBT10 May 2023 19:34 UTC

70 points

7 comments3 min readLW link

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

Chris Scammell and DivineMango

10 May 2023 19:04 UTC

255 points

54 comments21 min readLW link

A Corrigibility Metaphore—Big Gambles

WCargo10 May 2023 18:13 UTC

16 points

0 comments4 min readLW link

Roadmap for a collaborative prototype of an Open Agency Architecture

Deger Turan10 May 2023 17:41 UTC

31 points

0 comments12 min readLW link

AGI-Automated Interpretability is Suicide

__RicG__10 May 2023 14:20 UTC

25 points

33 comments7 min readLW link

Class-Based Addressing

jefftk10 May 2023 13:40 UTC

22 points

6 comments1 min readLW link

(www.jefftk.com)

In defence of epistemic modesty [distillation]

Luise10 May 2023 9:44 UTC

17 points

2 comments9 min readLW link

[Question] How much of a concern are open-source LLMs in the short, medium and long terms?

JavierCC10 May 2023 9:14 UTC

5 points

0 comments1 min readLW link

10 great reasons why Lex Fridman should invite Eliezer and Robin to re-do the FOOM debate on his podcast

chaosmage10 May 2023 8:27 UTC

−7 points

1 comment1 min readLW link

(www.reddit.com)

New OpenAI Paper—Language models can explain neurons in language models

MrThink10 May 2023 7:46 UTC

47 points

14 comments1 min readLW link

Naturalist Experimentation

LoganStrohl10 May 2023 4:28 UTC

62 points

14 comments10 min readLW link

[Question] Could A Superintelligence Out-Argue A Doomer?

tjaffee10 May 2023 2:40 UTC

−16 points

6 comments1 min readLW link

Gradient hacking via actual hacking

Max H10 May 2023 1:57 UTC

12 points

7 comments3 min readLW link

Red teaming: challenges and research directions

joshc10 May 2023 1:40 UTC

31 points

1 comment10 min readLW link