All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

Manifold Halloween Hackathon

Austin Chen23 Oct 2023 22:47 UTC

8 points

0 comments1 min readLW link

Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper

Neel Nanda23 Oct 2023 22:38 UTC

93 points

12 comments9 min readLW link

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

EJT23 Oct 2023 21:00 UTC

79 points

22 comments1 min readLW link

(philpapers.org)

AI Alignment [Incremental Progress Units] this Week (10/22/23)

Logan Zoellner23 Oct 2023 20:32 UTC

22 points

0 comments6 min readLW link

(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC

6 points

2 comments9 min readLW link

Some of my predictable updates on AI

Aaron_Scher23 Oct 2023 17:24 UTC

32 points

8 comments9 min readLW link

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Fabien Roger and Buck

23 Oct 2023 16:37 UTC

107 points

3 comments8 min readLW link

Machine Unlearning Evaluations as Interpretability Benchmarks

NickyP and Nandi

23 Oct 2023 16:33 UTC

33 points

2 comments11 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

Contra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Which LessWrongers are (aspiring) YouTubers?

Mati_Roy23 Oct 2023 13:21 UTC

22 points

13 comments1 min readLW link

[Question] What is an “anti-Occamian prior”?

Zane23 Oct 2023 2:26 UTC

35 points

22 comments1 min readLW link

AI Safety is Dropping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC

64 points

78 comments34 min readLW link

The Drowning Child

Tomás B.22 Oct 2023 16:39 UTC

25 points

8 comments1 min readLW link

Announcing Timaeus

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

22 Oct 2023 11:59 UTC

187 points

15 comments4 min readLW link

Into AI Safety—Episode 0

jacobhaimes22 Oct 2023 3:30 UTC

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

Thoughts On (Solving) Deep Deception

Jozdien21 Oct 2023 22:40 UTC

69 points

4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC

14 points

9 comments4 min readLW link

How toy models of ontology changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC

42 points

0 comments2 min readLW link

Soups as Spreads

jefftk21 Oct 2023 20:30 UTC

22 points

0 comments1 min readLW link

(www.jefftk.com)

Which COVID booster to get?

Sameerishere21 Oct 2023 19:43 UTC

8 points

0 comments2 min readLW link

Alignment Implications of LLM Successes: a Debate in One Act

Zack_M_Davis21 Oct 2023 15:22 UTC

247 points

51 comments13 min readLW link 1 review

How to find a good moving service

Ziyue Wang21 Oct 2023 4:59 UTC

8 points

0 comments3 min readLW link

Apply for MATS Winter 2023-24!

utilistrutil, Ryan Kidd and LauraVaughan

21 Oct 2023 2:27 UTC

104 points

6 comments5 min readLW link

[Question] Can we isolate neurons that recognize features vs. those which have some other role?

Joshua Clancy21 Oct 2023 0:30 UTC

4 points

2 comments3 min readLW link

Muddling Along Is More Likely Than Dystopia

Jeffrey Heninger20 Oct 2023 21:25 UTC

83 points

10 comments8 min readLW link

What’s Hard About The Shutdown Problem

johnswentworth20 Oct 2023 21:13 UTC

98 points

33 comments4 min readLW link

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy

jacobjacob, Robert Miles and Holly_Elmore

20 Oct 2023 21:04 UTC

162 points

30 comments27 min readLW link

TOMORROW: the largest AI Safety protest ever!

Holly_Elmore20 Oct 2023 18:15 UTC

105 points

26 comments2 min readLW link

The Overkill Conspiracy Hypothesis

ymeskhout20 Oct 2023 16:51 UTC

26 points

8 comments7 min readLW link

I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines

307th20 Oct 2023 16:37 UTC

119 points

33 comments9 min readLW link

Internal Target Information for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC

15 points

0 comments5 min readLW link

On the proper date for solstice celebrations

jchan20 Oct 2023 13:55 UTC

16 points

0 comments4 min readLW link

Are (at least some) Large Language Models Holographic Memory Stores?

Bill Benzon20 Oct 2023 13:07 UTC

11 points

4 comments6 min readLW link

Mechanistic interpretability of LLM analogy-making

Sergii20 Oct 2023 12:53 UTC

2 points

0 comments4 min readLW link

(grgv.xyz)

How To Socialize With Psycho(logist)s

Sable20 Oct 2023 11:33 UTC

37 points

11 comments3 min readLW link

(affablyevil.substack.com)

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC

119 points

15 comments22 min readLW link

Features and Adversaries in MemoryDT

Joseph Bloom and Jay Bailey

20 Oct 2023 7:32 UTC

31 points

6 comments25 min readLW link

AI Safety Hub Serbia Soft Launch

DusanDNesic20 Oct 2023 7:11 UTC

65 points

1 comment3 min readLW link

(forum.effectivealtruism.org)

Announcing new round of “Key Phenomena in AI Risk” Reading Group

DusanDNesic and Nora_Ammann

20 Oct 2023 7:11 UTC

15 points

2 comments1 min readLW link

Unpacking the dynamics of AGI conflict that suggest the necessity of a premptive pivotal act

Eli Tyre20 Oct 2023 6:48 UTC

61 points

2 comments8 min readLW link

Genocide isn’t Decolonization

robotelvis20 Oct 2023 4:14 UTC

33 points

19 comments5 min readLW link

(messyprogress.substack.com)

Trying to understand John Wentworth’s research agenda

johnswentworth, habryka and David Lorell

20 Oct 2023 0:05 UTC

92 points

13 comments12 min readLW link

Boost your productivity, happiness and health with this one weird trick

ajc58619 Oct 2023 23:30 UTC

9 points

9 comments1 min readLW link

A Good Explanation of Differential Gears

Johannes C. Mayer19 Oct 2023 23:07 UTC

47 points

4 comments1 min readLW link

(youtu.be)

Evening Wiki(pedia) Workout

mcint19 Oct 2023 21:29 UTC

1 point

1 comment1 min readLW link

New roles on my team: come build Open Phil’s technical AI safety program with me!

Ajeya Cotra19 Oct 2023 16:47 UTC

83 points

6 comments4 min readLW link

[Question] Infinite tower of meta-probability

fryolysis19 Oct 2023 16:44 UTC

6 points

5 comments3 min readLW link

A NotKillEveryoneIsm Argument for Accelerating Deep Learning Research

Logan Zoellner19 Oct 2023 16:28 UTC

−7 points

6 comments5 min readLW link

(midwitalignment.substack.com)

Knowledge Base 5: Business model

iwis19 Oct 2023 16:06 UTC

−4 points

2 comments1 min readLW link