All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

Thoughts on responsible scaling policies and regulation

paulfchristiano24 Oct 2023 22:21 UTC

220 points

33 comments6 min readLW link

The Screenplay Method

Yeshua God24 Oct 2023 17:41 UTC

−15 points

0 comments25 min readLW link

Blunt Razor

fryolysis24 Oct 2023 17:27 UTC

3 points

0 comments2 min readLW link

Halloween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC

−10 points

1 comment1 min readLW link

Who is Harry Potter? Some predictions.

Donald Hobson24 Oct 2023 16:14 UTC

23 points

7 comments2 min readLW link

Book Review: Going Infinite

Zvi24 Oct 2023 15:00 UTC

242 points

113 comments97 min readLW link 1 review

(thezvi.wordpress.com)

[Interview w/ Quintin Pope] Evolution, values, and AI Safety

fowlertm24 Oct 2023 13:53 UTC

11 points

0 comments1 min readLW link

Lying is Cowardice, not Strategy

Connor Leahy and Gabriel Alfour

24 Oct 2023 13:24 UTC

31 points

73 comments5 min readLW link

(cognition.cafe)

[Question] Anyone Else Using Brilliant?

Sable24 Oct 2023 12:12 UTC

19 points

0 comments1 min readLW link

Announcing #AISummitTalks featuring Professor Stuart Russell and many others

otto.barten24 Oct 2023 10:11 UTC

17 points

1 comment1 min readLW link

Linkpost: A Post Mortem on the Gino Case

Linch24 Oct 2023 6:50 UTC

89 points

7 comments2 min readLW link

(www.theorgplumber.com)

South Bay SSC Meetup, San Jose, November 5th.

David Friedman24 Oct 2023 4:50 UTC

2 points

1 comment1 min readLW link

AI Pause Will Likely Backfire (Guest Post)

jsteinhardt24 Oct 2023 4:30 UTC

47 points

6 comments15 min readLW link

(bounded-regret.ghost.io)

Human wanting

TsviBT24 Oct 2023 1:05 UTC

53 points

1 comment10 min readLW link

Towards Understanding Sycophancy in Language Models

Ethan Perez, mrinank_sharma, Meg and Tomek Korbak

24 Oct 2023 0:30 UTC

66 points

0 comments2 min readLW link

(arxiv.org)

Manifold Halloween Hackathon

Austin Chen23 Oct 2023 22:47 UTC

8 points

0 comments1 min readLW link

Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper

Neel Nanda23 Oct 2023 22:38 UTC

93 points

12 comments9 min readLW link

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

EJT23 Oct 2023 21:00 UTC

79 points

22 comments1 min readLW link

(philpapers.org)

AI Alignment [Incremental Progress Units] this Week (10/22/23)

Logan Zoellner23 Oct 2023 20:32 UTC

22 points

0 comments6 min readLW link

(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC

6 points

2 comments9 min readLW link

Some of my predictable updates on AI

Aaron_Scher23 Oct 2023 17:24 UTC

32 points

8 comments9 min readLW link

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Fabien Roger and Buck

23 Oct 2023 16:37 UTC

107 points

3 comments8 min readLW link

Machine Unlearning Evaluations as Interpretability Benchmarks

NickyP and Nandi

23 Oct 2023 16:33 UTC

33 points

2 comments11 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

Contra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Which LessWrongers are (aspiring) YouTubers?

Mati_Roy23 Oct 2023 13:21 UTC

22 points

13 comments1 min readLW link

[Question] What is an “anti-Occamian prior”?

Zane23 Oct 2023 2:26 UTC

35 points

22 comments1 min readLW link

AI Safety is Dropping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC

65 points

78 comments34 min readLW link

The Drowning Child

Tomás B.22 Oct 2023 16:39 UTC

25 points

8 comments1 min readLW link

Announcing Timaeus

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

22 Oct 2023 11:59 UTC

187 points

15 comments4 min readLW link

Into AI Safety—Episode 0

jacobhaimes22 Oct 2023 3:30 UTC

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

Thoughts On (Solving) Deep Deception

Jozdien21 Oct 2023 22:40 UTC

69 points

4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC

14 points

9 comments4 min readLW link

How toy models of ontology changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC

42 points

0 comments2 min readLW link

Soups as Spreads

jefftk21 Oct 2023 20:30 UTC

22 points

0 comments1 min readLW link

(www.jefftk.com)

Which COVID booster to get?

Sameerishere21 Oct 2023 19:43 UTC

8 points

0 comments2 min readLW link

Alignment Implications of LLM Successes: a Debate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC

247 points

51 comments13 min readLW link 1 review

How to find a good moving service

Ziyue Wang21 Oct 2023 4:59 UTC

8 points

0 comments3 min readLW link

Apply for MATS Winter 2023-24!

utilistrutil, Ryan Kidd and LauraVaughan

21 Oct 2023 2:27 UTC

104 points

6 comments5 min readLW link

[Question] Can we isolate neurons that recognize features vs. those which have some other role?

Joshua Clancy21 Oct 2023 0:30 UTC

4 points

2 comments3 min readLW link

Muddling Along Is More Likely Than Dystopia

Jeffrey Heninger20 Oct 2023 21:25 UTC

83 points

10 comments8 min readLW link

What’s Hard About The Shutdown Problem

johnswentworth20 Oct 2023 21:13 UTC

98 points

33 comments4 min readLW link

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy

jacobjacob, Robert Miles and Holly_Elmore

20 Oct 2023 21:04 UTC

162 points

30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_Elmore20 Oct 2023 18:15 UTC

105 points

26 comments2 min readLW link

The Overkill Conspiracy Hypothesis

ymeskhout20 Oct 2023 16:51 UTC

26 points

8 comments7 min readLW link

I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines

307th20 Oct 2023 16:37 UTC

119 points

33 comments9 min readLW link

Internal Target Information for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC

15 points

0 comments5 min readLW link

On the proper date for solstice celebrations

jchan20 Oct 2023 13:55 UTC

16 points

0 comments4 min readLW link

Are (at least some) Large Language Models Holographic Memory Stores?

Bill Benzon20 Oct 2023 13:07 UTC

11 points

4 comments6 min readLW link

Mechanistic interpretability of LLM analogy-making

Sergii20 Oct 2023 12:53 UTC

2 points

0 comments4 min readLW link

(grgv.xyz)