All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Collapsing the Belief/Knowledge Distinction

Jeremias11 Sep 2024 21:24 UTC

−7 points

8 comments1 min readLW link

Programming Refusal with Conditional Activation Steering

Bruce W. Lee11 Sep 2024 20:57 UTC

41 points

0 comments11 min readLW link

(arxiv.org)

Checking public figures on whether they “answered the question” quick analysis from Harris/Trump debate, and a proposal

david reinstein11 Sep 2024 20:25 UTC

7 points

4 comments1 min readLW link

(open.substack.com)

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

Corin Katzke, Corin Katzke, Julius, andrewz and Dan H

11 Sep 2024 19:14 UTC

5 points

1 comment5 min readLW link

(newsletter.safe.ai)

Refactoring cryonics as structural brain preservation

Andy_McKenzie11 Sep 2024 18:36 UTC

102 points

14 comments3 min readLW link

[Question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal

doomyeser11 Sep 2024 18:07 UTC

9 points

9 comments3 min readLW link

How to discover the nature of sentience, and ethics

Gustavo Ramires11 Sep 2024 17:22 UTC

−2 points

4 comments5 min readLW link

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities

c.trout11 Sep 2024 15:09 UTC

24 points

2 comments3 min readLW link

Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions

James Stephen Brown11 Sep 2024 9:53 UTC

5 points

0 comments8 min readLW link

(nonzerosum.games)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Andrew_Critch11 Sep 2024 4:41 UTC

53 points

11 comments3 min readLW link

A necessary Membrane formalism feature

ThomasCederborg10 Sep 2024 21:33 UTC

20 points

6 comments11 min readLW link

Formalizing the Informal (event invite)

abramdemski10 Sep 2024 19:22 UTC

42 points

0 comments1 min readLW link

AI #80: Never Have I Ever

Zvi10 Sep 2024 17:50 UTC

45 points

20 comments39 min readLW link

(thezvi.wordpress.com)

The Best Lay Argument is not a Simple English Yud Essay

J Bostock10 Sep 2024 17:34 UTC

247 points

15 comments5 min readLW link

Economics Roundup #3

Zvi10 Sep 2024 13:50 UTC

44 points

9 comments20 min readLW link

(thezvi.wordpress.com)

Amplify is hiring! Work with us to support field-building initiatives through digital marketing

gergogaspar10 Sep 2024 8:56 UTC

0 points

1 comment4 min readLW link

What bootstraps intelligence?

invertedpassion10 Sep 2024 7:11 UTC

2 points

2 comments1 min readLW link

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)

Declan Molony10 Sep 2024 5:54 UTC

16 points

12 comments2 min readLW link

Simon DeDeo on Explore vs Exploit in Science

Elizabeth10 Sep 2024 3:40 UTC

20 points

0 comments1 min readLW link

(acesounderglass.com)

Virtue is a Vector

robotelvis10 Sep 2024 3:02 UTC

9 points

1 comment9 min readLW link

(messyprogress.substack.com)

MIT FutureTech are hiring for a Technical Associate role

peterslattery9 Sep 2024 20:16 UTC

3 points

0 comments3 min readLW link

AI forecasting bots incoming

Dan H and Mantas Mazeika

9 Sep 2024 19:14 UTC

29 points

44 comments4 min readLW link

(www.safe.ai)

My takes on SB-1047

leogao9 Sep 2024 18:38 UTC

151 points

8 comments4 min readLW link

[Question] Building an Inexpensive, Aesthetic, Private Forum

Aaron Graifman9 Sep 2024 17:10 UTC

13 points

15 comments1 min readLW link

[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)

Fernando Avalos9 Sep 2024 3:33 UTC

6 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

[Question] Has Anyone Here Consciously Changed Their Passions?

Spade9 Sep 2024 1:36 UTC

11 points

12 comments1 min readLW link

Pollsters Should Publish Question Translations

jefftk8 Sep 2024 22:10 UTC

60 points

3 comments2 min readLW link

(www.jefftk.com)

On Fables and Nuanced Charts

Niko_McCarty8 Sep 2024 17:09 UTC

35 points

2 comments8 min readLW link

(www.asimov.press)

Contra Yudkowsky on 2-4-6 Game Difficulty Explanations

Josh Hickman8 Sep 2024 16:13 UTC

6 points

1 comment2 min readLW link

(xn--2r8hmb.ws)

Attachment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT

Mihriban Temel8 Sep 2024 16:09 UTC

−8 points

0 comments9 min readLW link

Fictional parasites very different from our own

Abhishaike Mahajan8 Sep 2024 14:59 UTC

25 points

0 comments4 min readLW link

(www.owlposting.com)

My Number 1 Epistemology Book Recommendation: Inventing Temperature

adamShimi8 Sep 2024 14:30 UTC

116 points

18 comments3 min readLW link

(epistemologicalfascinations.substack.com)

[Question] I want a good multi-LLM API-powered chatbot

rotatingpaguro8 Sep 2024 9:40 UTC

10 points

3 comments1 min readLW link

That Alien Message—The Animation

Writer7 Sep 2024 14:53 UTC

144 points

9 comments8 min readLW link

(youtu.be)

Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps

Daniel C7 Sep 2024 10:04 UTC

17 points

18 comments2 min readLW link

(x.com)

Pay Risk Evaluators in Cash, Not Equity

Adam Scholl7 Sep 2024 2:37 UTC

202 points

19 comments1 min readLW link

Excerpts from “A Reader’s Manifesto”

Arjun Panickssery6 Sep 2024 22:37 UTC

72 points

1 comment13 min readLW link

(arjunpanickssery.substack.com)

Fun With CellxGene

sarahconstantin6 Sep 2024 22:00 UTC

30 points

2 comments7 min readLW link

(sarahconstantin.substack.com)

[Question] Is this voting system strategy proof?

Donald Hobson6 Sep 2024 20:44 UTC

17 points

9 comments1 min readLW link

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

6 Sep 2024 17:55 UTC

70 points

7 comments4 min readLW link

Backdoors as an analogy for deceptive alignment

Jacob_Hilton and Mark Xu

6 Sep 2024 15:30 UTC

104 points

2 comments8 min readLW link

(www.alignment.org)

A Cable Holder for 2 Cent

Johannes C. Mayer6 Sep 2024 11:01 UTC

1 point

1 comment1 min readLW link

Perhaps Try a Little Therapy, As a Treat?

segfault 6 Sep 2024 8:51 UTC

−178 points

61 comments16 min readLW link

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs

Daniel Lee and StefanHex

6 Sep 2024 2:28 UTC

28 points

0 comments12 min readLW link

Distinguish worst-case analysis from instrumental training-gaming

Olli Järviniemi and Buck

5 Sep 2024 19:13 UTC

37 points

0 comments5 min readLW link

AI x Human Flourishing: Introducing the Cosmos Institute

Brendan McCord5 Sep 2024 18:23 UTC

14 points

5 comments6 min readLW link

(cosmosinstitute.substack.com)

What is SB 1047 for?

Raemon5 Sep 2024 17:39 UTC

61 points

8 comments3 min readLW link

instruction tuning and autoregressive distribution shift

nostalgebraist5 Sep 2024 16:53 UTC

40 points

5 comments5 min readLW link

Conflating value alignment and intent alignment is causing confusion

Seth Herd5 Sep 2024 16:39 UTC

48 points

18 comments5 min readLW link

A bet for Samo Burja

Nathan Helm-Burger5 Sep 2024 16:01 UTC

13 points

2 comments2 min readLW link