All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Recommendation: reports on the search for missing hiker Bill Ewasko

eukaryote31 Jul 2024 22:15 UTC

169 points

28 comments14 min readLW link

(eukaryotewritesblog.com)

Economics101 predicted the failure of special card payments for refugees, 3 months later whole of Germany wants to adopt it

Yanling Guo31 Jul 2024 21:09 UTC

3 points

3 comments2 min readLW link

Ambiguity in Prediction Market Resolution is Still Harmful

aphyer31 Jul 2024 20:32 UTC

43 points

17 comments3 min readLW link

AI labs can boost external safety research

Zach Stein-Perlman31 Jul 2024 19:30 UTC

31 points

1 comment1 min readLW link

Women in AI Safety London Meetup

njg31 Jul 2024 18:13 UTC

1 point

0 comments1 min readLW link

Constructing Neural Network Parameters with Downstream Trainability

ch271828n31 Jul 2024 18:13 UTC

1 point

0 comments1 min readLW link

(github.com)

Want to work on US emerging tech policy? Consider the Horizon Fellowship.

Elika31 Jul 2024 18:12 UTC

4 points

0 comments1 min readLW link

[Question] What are your cruxes for imprecise probabilities / decision rules?

Anthony DiGiovanni31 Jul 2024 15:42 UTC

36 points

32 comments1 min readLW link

The new UK government’s stance on AI safety

Elliot Mckernon31 Jul 2024 15:23 UTC

17 points

0 comments4 min readLW link

Solutions to problems with Bayesianism

B Jacobs31 Jul 2024 14:18 UTC

6 points

0 comments21 min readLW link

(bobjacobs.substack.com)

Cat Sustenance Fortification

jefftk31 Jul 2024 2:30 UTC

14 points

7 comments1 min readLW link

(www.jefftk.com)

Twitter thread on open-source AI

Richard_Ngo31 Jul 2024 0:26 UTC

33 points

6 comments2 min readLW link

(x.com)

Twitter thread on AI takeover scenarios

Richard_Ngo31 Jul 2024 0:24 UTC

37 points

0 comments2 min readLW link

(x.com)

Twitter thread on AI safety evals

Richard_Ngo31 Jul 2024 0:18 UTC

62 points

3 comments2 min readLW link

(x.com)

Twitter thread on politics of AI safety

Richard_Ngo31 Jul 2024 0:00 UTC

35 points

2 comments1 min readLW link

(x.com)

An ML paper on data stealing provides a construction for “gradient hacking”

David Scott Krueger (formerly: capybaralet)30 Jul 2024 21:44 UTC

21 points

1 comment1 min readLW link

(arxiv.org)

Open Source Automated Interpretability for Sparse Autoencoder Features

kh4dien, SrGonao, jacob_drori and Nora Belrose

30 Jul 2024 21:11 UTC

67 points

1 comment13 min readLW link

(blog.eleuther.ai)

Caterpillars and Philosophy

Zero Contradictions30 Jul 2024 20:54 UTC

2 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

François Chollet on the limitations of LLMs in reasoning

2PuNCheeZ30 Jul 2024 20:04 UTC

1 point

1 comment2 min readLW link

(x.com)

Against AI As An Existential Risk

Noah Birnbaum30 Jul 2024 19:10 UTC

6 points

13 comments1 min readLW link

(irrationalitycommunity.substack.com)

[Question] Is objective morality self-defeating?

dialectica30 Jul 2024 18:23 UTC

−4 points

3 comments2 min readLW link

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning

Tom Angsten30 Jul 2024 16:36 UTC

6 points

0 comments9 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

30 Jul 2024 16:22 UTC

193 points

43 comments12 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

30 Jul 2024 15:41 UTC

32 points

0 comments15 min readLW link

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?

scasper30 Jul 2024 14:57 UTC

25 points

0 comments4 min readLW link

RTFB: California’s AB 3211

Zvi30 Jul 2024 13:10 UTC

62 points

2 comments11 min readLW link

(thezvi.wordpress.com)

If You Can Climb Up, You Can Climb Down

jefftk30 Jul 2024 0:00 UTC

34 points

9 comments1 min readLW link

(www.jefftk.com)

What is Morality?

Zero Contradictions29 Jul 2024 19:19 UTC

−1 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Arch-anarchism and immortality

Peter lawless 29 Jul 2024 18:10 UTC

−5 points

1 comment2 min readLW link

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

29 Jul 2024 17:50 UTC

17 points

1 comment6 min readLW link

(newsletter.safe.ai)

New Blog Post Against AI Doom

Noah Birnbaum29 Jul 2024 17:21 UTC

1 point

5 comments1 min readLW link

(substack.com)

An Interpretability Illusion from Population Statistics in Causal Analysis

Daniel Tan29 Jul 2024 14:50 UTC

9 points

3 comments1 min readLW link

[Question] How tokenization influences prompting?

Boris Kashirin29 Jul 2024 10:28 UTC

9 points

4 comments1 min readLW link

Understanding Positional Features in Layer 0 SAEs

bilalchughtai and Yeu-Tong Lau

29 Jul 2024 9:36 UTC

43 points

0 comments5 min readLW link

Prediction Markets Explained

Benjamin_Sturisky29 Jul 2024 8:02 UTC

8 points

0 comments9 min readLW link

San Francisco ACX Meetup “First Saturday”

Nate Sternberg29 Jul 2024 6:11 UTC

3 points

2 comments1 min readLW link

Relativity Theory for What the Future ‘You’ Is and Isn’t

FlorianH29 Jul 2024 2:01 UTC

7 points

48 comments4 min readLW link

Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought

cleanwhiteroom28 Jul 2024 19:55 UTC

2 points

2 comments2 min readLW link

Making Beliefs Pay Rent

Screwtape and NoSignalNoNoise

28 Jul 2024 17:59 UTC

7 points

2 comments1 min readLW link

This is already your second chance

Malmesbury28 Jul 2024 17:13 UTC

176 points

13 comments8 min readLW link

[Question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

kaler28 Jul 2024 12:23 UTC

10 points

14 comments1 min readLW link

Family and Society

Zero Contradictions28 Jul 2024 7:05 UTC

1 point

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

[Question] What is AI Safety’s line of retreat?

Remmelt28 Jul 2024 5:43 UTC

12 points

12 comments1 min readLW link

AXRP Episode 34 - AI Evaluations with Beth Barnes

DanielFilan28 Jul 2024 3:30 UTC

23 points

0 comments69 min readLW link

Rats, Back a Candidate

Blake28 Jul 2024 3:19 UTC

−40 points

19 comments1 min readLW link

AI existential risk probabilities are too unreliable to inform policy

Oleg Trott28 Jul 2024 0:59 UTC

18 points

5 comments1 min readLW link

(www.aisnakeoil.com)

Idle Speculations on Pipeline Parallelism

DaemonicSigil27 Jul 2024 22:40 UTC

1 point

0 comments4 min readLW link

(pbement.com)

Re: Anthropic’s suggested SB-1047 amendments

RobertM27 Jul 2024 22:32 UTC

87 points

13 comments9 min readLW link

(www.documentcloud.org)

The problem with psychology is that it has no theory.

Nicholas D.27 Jul 2024 19:36 UTC

2 points

7 comments4 min readLW link

(nicholasdecker.substack.com)

Bryan Johnson and a search for healthy longevity

NancyLebovitz27 Jul 2024 15:28 UTC

18 points

17 comments1 min readLW link