All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The slingshot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC

33 points

0 comments8 min readLW link

Toward Safety Case Inspired Basic Research

Lucas Teixeira, Lauren Greenspan, Dmitry Vaintrob and Eric Winsor

31 Oct 2024 23:06 UTC

55 points

3 comments13 min readLW link

Spooky Recommendation System Scaling

phdead31 Oct 2024 22:00 UTC

11 points

0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

Lorec31 Oct 2024 17:25 UTC

1 point

0 comments3 min readLW link

Toward Safety Cases For AI Scheming

Mikita Balesni and Marius Hobbhahn

31 Oct 2024 17:20 UTC

60 points

1 comment2 min readLW link

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC

46 points

5 comments77 min readLW link

(thezvi.wordpress.com)

The Compendium, A full argument about extinction risk from AGI

adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell and Andrea_Miotti

31 Oct 2024 12:01 UTC

193 points

52 comments2 min readLW link

(www.thecompendium.ai)

Some Preliminary Notes on the Promise of a Wisdom Explosion

Chris_Leong31 Oct 2024 9:21 UTC

2 points

0 comments1 min readLW link

(aiimpacts.org)

What TMS is like

Sable31 Oct 2024 0:44 UTC

206 points

23 comments6 min readLW link

(affablyevil.substack.com)

AI Safety at the Frontier: Paper Highlights, October ’24

gasteigerjo31 Oct 2024 0:09 UTC

3 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC

27 points

0 comments12 min readLW link

Generic advice caveats

Saul Munn30 Oct 2024 21:03 UTC

27 points

1 comment3 min readLW link

(www.brasstacks.blog)

I turned decision theory problems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC

104 points

23 comments1 min readLW link

AI as a powerful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC

46 points

8 comments4 min readLW link

[Question] How might language influence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC

3 points

0 comments1 min readLW link

Motivation control

Joe Carlsmith30 Oct 2024 17:15 UTC

45 points

7 comments52 min readLW link

Updating the NAO Simulator

jefftk30 Oct 2024 13:50 UTC

11 points

0 comments2 min readLW link

(www.jefftk.com)

Occupational Licensing Roundup #1

Zvi30 Oct 2024 11:00 UTC

65 points

11 comments11 min readLW link

(thezvi.wordpress.com)

Three Notions of “Power”

johnswentworth30 Oct 2024 6:10 UTC

89 points

44 comments4 min readLW link

Introduction to Choice set Misspecification in Reward Inference

Rahul Chand29 Oct 2024 22:57 UTC

1 point

0 comments8 min readLW link

Gothenburg LW/ACX meetup

Stefan29 Oct 2024 20:40 UTC

2 points

0 comments1 min readLW link

The Alignment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC

57 points

17 comments5 min readLW link

(upcoder.com)

Housing Roundup #10

Zvi29 Oct 2024 13:50 UTC

32 points

2 comments32 min readLW link

(thezvi.wordpress.com)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC

51 points

2 comments16 min readLW link

Review: “The Case Against Reality”

David Gross29 Oct 2024 13:13 UTC

19 points

9 comments5 min readLW link

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC

12 points

0 comments9 min readLW link

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

EuanMcLean29 Oct 2024 12:16 UTC

36 points

8 comments26 min readLW link

AI #87: Staying in Character

Zvi29 Oct 2024 7:10 UTC

57 points

3 comments33 min readLW link

(thezvi.wordpress.com)

A path to human autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC

53 points

16 comments20 min readLW link

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

aphyer29 Oct 2024 1:21 UTC

47 points

13 comments6 min readLW link

Gwern: Why So Few Matt Levines?

kave29 Oct 2024 1:07 UTC

78 points

10 comments1 min readLW link

(gwern.net)

October 2024 Progress in Guaranteed Safe AI

Quinn28 Oct 2024 23:34 UTC

7 points

0 comments1 min readLW link

(gsai.substack.com)

5 homegrown EA projects, seeking small donors

Austin Chen28 Oct 2024 23:24 UTC

85 points

4 comments1 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe Carlsmith28 Oct 2024 21:57 UTC

54 points

5 comments32 min readLW link

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations

ozziegooen28 Oct 2024 21:44 UTC

7 points

0 comments1 min readLW link

AI & wisdom 3: AI effects on amortised optimisation

L Rudolf L28 Oct 2024 21:08 UTC

18 points

0 comments14 min readLW link

(rudolf.website)

AI & wisdom 2: growth and amortised optimisation

L Rudolf L28 Oct 2024 21:07 UTC

18 points

0 comments8 min readLW link

(rudolf.website)

AI & wisdom 1: wisdom, amortised optimisation, and AI

L Rudolf L28 Oct 2024 21:02 UTC

29 points

0 comments15 min readLW link

(rudolf.website)

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC

94 points

5 comments4 min readLW link

(manifund.org)

Towards the Operationalization of Philosophy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC

20 points

2 comments33 min readLW link

(aiimpacts.org)

Quantitative Trading Bootcamp [Nov 6-10]

Ricki Heicklen28 Oct 2024 18:39 UTC

7 points

0 comments1 min readLW link

Winners of the Essay competition on the Automation of Wisdom and Philosophy

owencb and AI Impacts

28 Oct 2024 17:10 UTC

40 points

3 comments30 min readLW link

(blog.aiimpacts.org)

Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority

Zach Stein-Perlman28 Oct 2024 17:00 UTC

22 points

4 comments3 min readLW link

(milesbrundage.substack.com)

[Question] somebody explain the word “epistemic” to me

KvmanThinking28 Oct 2024 16:40 UTC

7 points

8 comments1 min readLW link

~80 Interesting Questions about Foundation Model Agent Safety

RohanS and Govind Pimpale

28 Oct 2024 16:37 UTC

46 points

4 comments15 min readLW link

AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels

Corin Katzke, Corin Katzke, Alexa Pan and Dan H

28 Oct 2024 16:03 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan Young28 Oct 2024 15:01 UTC

26 points

1 comment5 min readLW link

(nathanpmyoung.substack.com)

SAEs you can See: Applying Sparse Autoencoders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC

27 points

0 comments10 min readLW link

Bridging the VLM and mech interp communities for multimodal interpretability

Sonia Joseph28 Oct 2024 14:41 UTC

19 points

5 comments15 min readLW link

How Likely Are Various Precursors of Existential Risk?

NunoSempere28 Oct 2024 13:27 UTC

55 points

4 comments15 min readLW link

(blog.sentinel-team.org)