All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Excerpts from “A Reader’s Manifesto”

Arjun Panickssery6 Sep 2024 22:37 UTC

72 points

1 comment13 min readLW link

(arjunpanickssery.substack.com)

Fun With CellxGene

sarahconstantin6 Sep 2024 22:00 UTC

30 points

2 comments7 min readLW link

(sarahconstantin.substack.com)

[Question] Is this voting system strategy proof?

Donald Hobson6 Sep 2024 20:44 UTC

17 points

9 comments1 min readLW link

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

6 Sep 2024 17:55 UTC

70 points

7 comments4 min readLW link

Backdoors as an analogy for deceptive alignment

Jacob_Hilton and Mark Xu

6 Sep 2024 15:30 UTC

104 points

2 comments8 min readLW link

(www.alignment.org)

A Cable Holder for 2 Cent

Johannes C. Mayer6 Sep 2024 11:01 UTC

1 point

1 comment1 min readLW link

Perhaps Try a Little Therapy, As a Treat?

segfault 6 Sep 2024 8:51 UTC

−178 points

61 comments16 min readLW link

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs

Daniel Lee and StefanHex

6 Sep 2024 2:28 UTC

28 points

0 comments12 min readLW link

Distinguish worst-case analysis from instrumental training-gaming

Olli Järviniemi and Buck

5 Sep 2024 19:13 UTC

37 points

0 comments5 min readLW link

AI x Human Flourishing: Introducing the Cosmos Institute

Brendan McCord5 Sep 2024 18:23 UTC

14 points

5 comments6 min readLW link

(cosmosinstitute.substack.com)

What is SB 1047 for?

Raemon5 Sep 2024 17:39 UTC

61 points

8 comments3 min readLW link

instruction tuning and autoregressive distribution shift

nostalgebraist5 Sep 2024 16:53 UTC

40 points

5 comments5 min readLW link

Conflating value alignment and intent alignment is causing confusion

Seth Herd5 Sep 2024 16:39 UTC

48 points

18 comments5 min readLW link

A bet for Samo Burja

Nathan Helm-Burger5 Sep 2024 16:01 UTC

13 points

2 comments2 min readLW link

Universal basic income isn’t always AGI-proof

Kevin Kohler5 Sep 2024 15:39 UTC

5 points

3 comments7 min readLW link

(machinocene.substack.com)

Why Reflective Stability is Important

Johannes C. Mayer5 Sep 2024 15:28 UTC

19 points

2 comments1 min readLW link

Why Swiss watches and Taylor Swift are AGI-proof

Kevin Kohler5 Sep 2024 13:23 UTC

17 points

11 comments6 min readLW link

(machinocene.substack.com)

Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?

Alexander de Vries5 Sep 2024 10:23 UTC

7 points

20 comments10 min readLW link

(2ndhandecon.substack.com)

What program structures enable efficient induction?

Daniel C5 Sep 2024 10:12 UTC

21 points

5 comments3 min readLW link

How to Fake Decryption

ohmurphy5 Sep 2024 9:18 UTC

12 points

0 comments4 min readLW link

(ohmurphy.substack.com)

We Should Try to Directly Measure the Value of Scientific Papers

ohmurphy5 Sep 2024 9:08 UTC

1 point

0 comments5 min readLW link

(ohmurphy.substack.com)

on Science Beakers and DDT

bhauth5 Sep 2024 3:21 UTC

23 points

13 comments9 min readLW link

(bhauth.com)

Massive Activations and why <bos> is important in Tokenized SAE Unigrams

Louka Ewington-Pitsos5 Sep 2024 2:19 UTC

1 point

0 comments3 min readLW link

The Forging of the Great Minds: An Unfinished Tale

Aryeh Englander5 Sep 2024 0:58 UTC

−3 points

0 comments5 min readLW link

The Chatbot of Babble

Aryeh Englander5 Sep 2024 0:56 UTC

−3 points

0 comments7 min readLW link

[Question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?

Double5 Sep 2024 0:35 UTC

8 points

9 comments1 min readLW link

Executable philosophy as a failed totalizing meta-worldview

jessicata4 Sep 2024 22:50 UTC

93 points

40 comments4 min readLW link

(unstableontology.com)

Against Explosive Growth

c.trout4 Sep 2024 21:45 UTC

14 points

1 comment5 min readLW link

The Fragility of Life Hypothesis and the Evolution of Cooperation

KristianRonn4 Sep 2024 21:04 UTC

50 points

6 comments11 min readLW link

Emotion-Informed Valuation Mechanism for Improved AI Alignment in Large Language Models

Javier Marin Valenzuela4 Sep 2024 17:00 UTC

2 points

4 comments6 min readLW link

What happens if you present 500 people with an argument that AI is risky?

KatjaGrace and Nathan Young

4 Sep 2024 16:40 UTC

102 points

7 comments3 min readLW link

(blog.aiimpacts.org)

Automating LLM Auditing with Developmental Interpretability

htlou and evhub

4 Sep 2024 15:50 UTC

17 points

0 comments3 min readLW link

Michael Dickens’ Caffeine Tolerance Research

niplav4 Sep 2024 15:41 UTC

46 points

3 comments2 min readLW link

(mdickens.me)

[Question] Are UV-C Air purifiers so useful?

SebastianG 4 Sep 2024 14:16 UTC

9 points

0 comments1 min readLW link

AI and the Technological Richter Scale

Zvi4 Sep 2024 14:00 UTC

48 points

8 comments13 min readLW link

(thezvi.wordpress.com)

[Question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)4 Sep 2024 12:40 UTC

19 points

7 comments1 min readLW link

A Comparison Between The Pragmatosphere And Less Wrong

Zero Contradictions4 Sep 2024 9:39 UTC

−19 points

10 comments2 min readLW link

(zerocontradictions.net)

Announcing the Ultimate Jailbreaking Championship

InnerHufflepuff4 Sep 2024 0:35 UTC

15 points

1 comment1 min readLW link

AI Safety at the Frontier: Paper Highlights, August ’24

gasteigerjo3 Sep 2024 19:17 UTC

28 points

0 comments6 min readLW link

(aisafetyfrontier.substack.com)

The Checklist: What Succeeding at AI Safety Will Involve

Sam Bowman3 Sep 2024 18:18 UTC

142 points

49 comments22 min readLW link

(sleepinyourhat.github.io)

Democracy beyond majoritarianism

Arturo Macias3 Sep 2024 15:10 UTC

5 points

2 comments4 min readLW link

On the UBI Paper

Zvi3 Sep 2024 14:50 UTC

57 points

6 comments19 min readLW link

(thezvi.wordpress.com)

An Opinionated Look at Inference Rules

Gianluca Calcagni3 Sep 2024 13:32 UTC

−5 points

2 comments13 min readLW link

Announcing the PIBBSS Symposium ’24!

DusanDNesic and clem_acs

3 Sep 2024 11:19 UTC

19 points

0 comments3 min readLW link

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach

Ben Smith3 Sep 2024 5:28 UTC

16 points

2 comments1 min readLW link

How I got 4.2M YouTube views without making a single video

Closed Limelike Curves3 Sep 2024 3:52 UTC

379 points

36 comments1 min readLW link

Duped: AI and the Making of a Global Suicide Cult

izzyness2 Sep 2024 18:51 UTC

−8 points

0 comments1 min readLW link

A gentle introduction to sparse autoencoders

Nick Jiang2 Sep 2024 18:11 UTC

9 points

0 comments6 min readLW link

What makes math problems hard for reinforcement learning: a case study

Anibal, Bartek, Sergei, Shehper and Piotr2 Sep 2024 18:11 UTC

1 point

0 comments2 min readLW link

(arxiv.org)

Survey: How Do Elite Chinese Students Feel About the Risks of AI?

Nick Corvino2 Sep 2024 18:11 UTC

141 points

13 comments10 min readLW link