21 points

1 comment1 min readLW link

(arxiv.org)

Open Source Automated Interpretability for Sparse Autoencoder Features

kh4dien, SrGonao, jacob_drori and Nora Belrose

30 Jul 2024 21:11 UTC

67 points

1 comment13 min readLW link

(blog.eleuther.ai)

Caterpillars and Philosophy

Zero Contradictions30 Jul 2024 20:54 UTC

2 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

François Chollet on the limitations of LLMs in reasoning

2PuNCheeZ30 Jul 2024 20:04 UTC

1 point

1 comment2 min readLW link

(x.com)

Against AI As An Existential Risk

Noah Birnbaum30 Jul 2024 19:10 UTC

6 points

13 comments1 min readLW link

(irrationalitycommunity.substack.com)

[Question] Is objective morality self-defeating?

dialectica30 Jul 2024 18:23 UTC

−4 points

3 comments2 min readLW link

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning

Tom Angsten30 Jul 2024 16:36 UTC

6 points

0 comments9 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

30 Jul 2024 16:22 UTC

193 points

43 comments12 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

30 Jul 2024 15:41 UTC

32 points

0 comments15 min readLW link

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?

scasper30 Jul 2024 14:57 UTC

25 points

0 comments4 min readLW link

RTFB: California’s AB 3211

Zvi30 Jul 2024 13:10 UTC

62 points

2 comments11 min readLW link

(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftk30 Jul 2024 0:00 UTC

34 points

9 comments1 min readLW link

(www.jefftk.com)

What is Morality?

Zero Contradictions29 Jul 2024 19:19 UTC

−1 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Arch-anarchism and immortality

Peter lawless 29 Jul 2024 18:10 UTC

−5 points

1 comment2 min readLW link

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

29 Jul 2024 17:50 UTC

17 points

1 comment6 min readLW link

(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah Birnbaum29 Jul 2024 17:21 UTC

1 point

5 comments1 min readLW link

(substack.com)

An Interpretability Illusion from Population Statistics in Causal Analysis

Daniel Tan29 Jul 2024 14:50 UTC

9 points

3 comments1 min readLW link

[Question] How tokenization influences prompting?

Boris Kashirin29 Jul 2024 10:28 UTC

9 points

4 comments1 min readLW link

Understanding Positional Features in Layer 0 SAEs

bilalchughtai and Yeu-Tong Lau

29 Jul 2024 9:36 UTC

43 points

0 comments5 min readLW link

Prediction Markets Explained

Benjamin_Sturisky29 Jul 2024 8:02 UTC

8 points

0 comments9 min readLW link

San Francisco ACX Meetup “First Saturday”

Nate Sternberg29 Jul 2024 6:11 UTC

3 points

2 comments1 min readLW link

Relativity Theory for What the Future ‘You’ Is and Isn’t

FlorianH29 Jul 2024 2:01 UTC

7 points

48 comments4 min readLW link

Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought

cleanwhiteroom28 Jul 2024 19:55 UTC

2 points

2 comments2 min readLW link

Making Beliefs Pay Rent

Screwtape and NoSignalNoNoise

28 Jul 2024 17:59 UTC

7 points

2 comments1 min readLW link

This is already your second chance

Malmesbury28 Jul 2024 17:13 UTC

175 points

13 comments8 min readLW link

[Question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

kaler28 Jul 2024 12:23 UTC

10 points

14 comments1 min readLW link

Family and Society

Zero Contradictions28 Jul 2024 7:05 UTC

1 point

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

[Question] What is AI Safety’s line of retreat?

Remmelt28 Jul 2024 5:43 UTC

12 points

12 comments1 min readLW link

AXRP Episode 34 - AI Evaluations with Beth Barnes

DanielFilan28 Jul 2024 3:30 UTC

23 points

0 comments69 min readLW link

Rats, Back a Candidate

Blake28 Jul 2024 3:19 UTC

−40 points

19 comments1 min readLW link

AI existential risk probabilities are too unreliable to inform policy

Oleg Trott28 Jul 2024 0:59 UTC

18 points

5 comments1 min readLW link

(www.aisnakeoil.com)

Idle Speculations on Pipeline Parallelism

DaemonicSigil27 Jul 2024 22:40 UTC

1 point

0 comments4 min readLW link

(pbement.com)

Re: Anthropic’s suggested SB-1047 amendments

RobertM27 Jul 2024 22:32 UTC

87 points

13 comments9 min readLW link

(www.documentcloud.org)

The problem with psychology is that it has no theory.

Nicholas D.27 Jul 2024 19:36 UTC

2 points

7 comments4 min readLW link

(nicholasdecker.substack.com)

Bryan Johnson and a search for healthy longevity

NancyLebovitz27 Jul 2024 15:28 UTC

18 points

17 comments1 min readLW link

What are matching markets?

ohmurphy27 Jul 2024 15:05 UTC

12 points

0 comments8 min readLW link

(ohmurphy.substack.com)

Safety consultations for AI lab employees

Zach Stein-Perlman27 Jul 2024 15:00 UTC

181 points

4 comments1 min readLW link

The Case Against UBI

Zero Contradictions27 Jul 2024 6:36 UTC

−1 points

2 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Unlocking Solutions—By Understanding Coordination Problems

James Stephen Brown27 Jul 2024 4:52 UTC

54 points

4 comments5 min readLW link

(nonzerosum.games)

Utilitarianism and the replaceability of desires and attachments

MichaelStJules27 Jul 2024 1:57 UTC

5 points

2 comments1 min readLW link

Inspired by: Failures in Kindness

X4vier27 Jul 2024 1:21 UTC

61 points

2 comments3 min readLW link

My Experience Using Gamification

Wyatt S26 Jul 2024 23:06 UTC

13 points

4 comments4 min readLW link

How the AI safety technical landscape has changed in the last year, according to some practitioners

tlevin26 Jul 2024 19:06 UTC

55 points

6 comments2 min readLW link

A Visual Task that’s Hard for GPT-4o, but Doable for Primary Schoolers

Lennart Finke26 Jul 2024 17:51 UTC

25 points

4 comments2 min readLW link

Unaligned AI is coming regardless.

verbalshadow26 Jul 2024 16:41 UTC

−15 points

3 comments2 min readLW link

Index of rationalist groups in the Bay Area July 2024

Lucie Philippon, Czynski and Screwtape

26 Jul 2024 16:32 UTC

35 points

10 comments2 min readLW link

End Single Family Zoning by Overturning Euclid V Ambler

Maxwell Tabarrok26 Jul 2024 14:08 UTC

32 points

1 comment7 min readLW link

(www.maximum-progress.com)

Common Uses of “Acceptance”

Yi-Yang26 Jul 2024 11:18 UTC

9 points

5 comments24 min readLW link

Universal Basic Income and Poverty

Eliezer Yudkowsky26 Jul 2024 7:23 UTC

289 points

135 comments9 min readLW link

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication

johnswentworth and David Lorell

26 Jul 2024 0:33 UTC

93 points

1 comment13 min readLW link