All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

MIT FutureTech are hiring for a Technical Associate role

peterslattery9 Sep 2024 20:16 UTC

3 points

0 comments3 min readLW link

AI forecasting bots incoming

Dan H and Mantas Mazeika

9 Sep 2024 19:14 UTC

29 points

44 comments4 min readLW link

(www.safe.ai)

My takes on SB-1047

leogao9 Sep 2024 18:38 UTC

151 points

8 comments4 min readLW link

[Question] Building an Inexpensive, Aesthetic, Private Forum

Aaron Graifman9 Sep 2024 17:10 UTC

13 points

15 comments1 min readLW link

[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)

Fernando Avalos9 Sep 2024 3:33 UTC

6 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

[Question] Has Anyone Here Consciously Changed Their Passions?

Spade9 Sep 2024 1:36 UTC

11 points

12 comments1 min readLW link

Pollsters Should Publish Question Translations

jefftk8 Sep 2024 22:10 UTC

60 points

3 comments2 min readLW link

(www.jefftk.com)

On Fables and Nuanced Charts

Niko_McCarty8 Sep 2024 17:09 UTC

35 points

2 comments8 min readLW link

(www.asimov.press)

Contra Yudkowsky on 2-4-6 Game Difficulty Explanations

Josh Hickman8 Sep 2024 16:13 UTC

6 points

1 comment2 min readLW link

(xn--2r8hmb.ws)

Attachment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT

Mihriban Temel8 Sep 2024 16:09 UTC

−8 points

0 comments9 min readLW link

Fictional parasites very different from our own

Abhishaike Mahajan8 Sep 2024 14:59 UTC

25 points

0 comments4 min readLW link

(www.owlposting.com)

My Number 1 Epistemology Book Recommendation: Inventing Temperature

adamShimi8 Sep 2024 14:30 UTC

116 points

18 comments3 min readLW link

(epistemologicalfascinations.substack.com)

[Question] I want a good multi-LLM API-powered chatbot

rotatingpaguro8 Sep 2024 9:40 UTC

10 points

3 comments1 min readLW link

That Alien Message—The Animation

Writer7 Sep 2024 14:53 UTC

144 points

9 comments8 min readLW link

(youtu.be)

Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps

Daniel C7 Sep 2024 10:04 UTC

17 points

18 comments2 min readLW link

(x.com)

Pay Risk Evaluators in Cash, Not Equity

Adam Scholl7 Sep 2024 2:37 UTC

202 points

19 comments1 min readLW link

Excerpts from “A Reader’s Manifesto”

Arjun Panickssery6 Sep 2024 22:37 UTC

72 points

1 comment13 min readLW link

(arjunpanickssery.substack.com)

Fun With CellxGene

sarahconstantin6 Sep 2024 22:00 UTC

30 points

2 comments7 min readLW link

(sarahconstantin.substack.com)

[Question] Is this voting system strategy proof?

Donald Hobson6 Sep 2024 20:44 UTC

17 points

9 comments1 min readLW link

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

6 Sep 2024 17:55 UTC

70 points

7 comments4 min readLW link

Backdoors as an analogy for deceptive alignment

Jacob_Hilton and Mark Xu

6 Sep 2024 15:30 UTC

104 points

2 comments8 min readLW link

(www.alignment.org)

A Cable Holder for 2 Cent

Johannes C. Mayer6 Sep 2024 11:01 UTC

1 point

1 comment1 min readLW link

Perhaps Try a Little Therapy, As a Treat?

segfault 6 Sep 2024 8:51 UTC

−178 points

61 comments16 min readLW link

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs

Daniel Lee and StefanHex

6 Sep 2024 2:28 UTC

28 points

0 comments12 min readLW link

Distinguish worst-case analysis from instrumental training-gaming

Olli Järviniemi and Buck

5 Sep 2024 19:13 UTC

37 points

0 comments5 min readLW link

AI x Human Flourishing: Introducing the Cosmos Institute

Brendan McCord5 Sep 2024 18:23 UTC

14 points

5 comments6 min readLW link

(cosmosinstitute.substack.com)

What is SB 1047 for?

Raemon5 Sep 2024 17:39 UTC

61 points

8 comments3 min readLW link

instruction tuning and autoregressive distribution shift

nostalgebraist5 Sep 2024 16:53 UTC

40 points

5 comments5 min readLW link

Conflating value alignment and intent alignment is causing confusion

Seth Herd5 Sep 2024 16:39 UTC

48 points

18 comments5 min readLW link

A bet for Samo Burja

Nathan Helm-Burger5 Sep 2024 16:01 UTC

13 points

2 comments2 min readLW link

Universal basic income isn’t always AGI-proof

Kevin Kohler5 Sep 2024 15:39 UTC

5 points

3 comments7 min readLW link

(machinocene.substack.com)

Why Reflective Stability is Important

Johannes C. Mayer5 Sep 2024 15:28 UTC

19 points

2 comments1 min readLW link

Why Swiss watches and Taylor Swift are AGI-proof

Kevin Kohler5 Sep 2024 13:23 UTC

17 points

11 comments6 min readLW link

(machinocene.substack.com)

Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?

Alexander de Vries5 Sep 2024 10:23 UTC

7 points

20 comments10 min readLW link

(2ndhandecon.substack.com)

What program structures enable efficient induction?

Daniel C5 Sep 2024 10:12 UTC

21 points

5 comments3 min readLW link

How to Fake Decryption

ohmurphy5 Sep 2024 9:18 UTC

12 points

0 comments4 min readLW link

(ohmurphy.substack.com)

We Should Try to Directly Measure the Value of Scientific Papers

ohmurphy5 Sep 2024 9:08 UTC

1 point

0 comments5 min readLW link

(ohmurphy.substack.com)

on Science Beakers and DDT

bhauth5 Sep 2024 3:21 UTC

23 points

13 comments9 min readLW link

(bhauth.com)

Massive Activations and why <bos> is important in Tokenized SAE Unigrams

Louka Ewington-Pitsos5 Sep 2024 2:19 UTC

1 point

0 comments3 min readLW link

The Forging of the Great Minds: An Unfinished Tale

Aryeh Englander5 Sep 2024 0:58 UTC

−3 points

0 comments5 min readLW link

The Chatbot of Babble

Aryeh Englander5 Sep 2024 0:56 UTC

−3 points

0 comments7 min readLW link

[Question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?

Double5 Sep 2024 0:35 UTC

8 points

9 comments1 min readLW link

Executable philosophy as a failed totalizing meta-worldview

jessicata4 Sep 2024 22:50 UTC

93 points

40 comments4 min readLW link

(unstableontology.com)

Against Explosive Growth

c.trout4 Sep 2024 21:45 UTC

14 points

1 comment5 min readLW link

The Fragility of Life Hypothesis and the Evolution of Cooperation

KristianRonn4 Sep 2024 21:04 UTC

50 points

6 comments11 min readLW link

Emotion-Informed Valuation Mechanism for Improved AI Alignment in Large Language Models

Javier Marin Valenzuela4 Sep 2024 17:00 UTC

2 points

4 comments6 min readLW link

What happens if you present 500 people with an argument that AI is risky?

KatjaGrace and Nathan Young

4 Sep 2024 16:40 UTC

102 points

7 comments3 min readLW link

(blog.aiimpacts.org)

Automating LLM Auditing with Developmental Interpretability

htlou and evhub

4 Sep 2024 15:50 UTC

17 points

0 comments3 min readLW link

Michael Dickens’ Caffeine Tolerance Research

niplav4 Sep 2024 15:41 UTC

46 points

3 comments2 min readLW link

(mdickens.me)

[Question] Are UV-C Air purifiers so useful?

JohnBuridan4 Sep 2024 14:16 UTC

9 points

0 comments1 min readLW link