All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

Linkpost: Surely you can be serious

kave18 Jul 2024 22:18 UTC

62 points

8 comments1 min readLW link

(www.experimental-history.com)

My experience applying to MATS 6.0

mic18 Jul 2024 19:02 UTC

16 points

3 comments5 min readLW link

[Question] What are the actual arguments in favor of computationalism as a theory of identity?

sunwillrise18 Jul 2024 18:44 UTC

12 points

26 comments5 min readLW link

Yet Another Critique of “Luxury Beliefs”

ymeskhout18 Jul 2024 18:37 UTC

6 points

10 comments9 min readLW link

(www.ymeskhout.com)

[Interim research report] Evaluating the Goal-Directedness of Language Models

Rauno Arike, Elizabeth Donoway and Marius Hobbhahn

18 Jul 2024 18:19 UTC

39 points

4 comments11 min readLW link

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Karolis Jucys, george_adams and Sonia Joseph

18 Jul 2024 17:02 UTC

9 points

0 comments1 min readLW link

(arxiv.org)

Activation Engineering Theories of Impact

kubanetics18 Jul 2024 16:44 UTC

6 points

1 comment2 min readLW link

[Question] Me & My Clone

SimonBaars18 Jul 2024 16:25 UTC

27 points

22 comments1 min readLW link

AI #73: Openly Evil AI

Zvi18 Jul 2024 14:40 UTC

89 points

20 comments52 min readLW link

(thezvi.wordpress.com)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

Lee Sharkey, Lucius Bushnaq, Dan Braun, StefanHex and Nicholas Goldowsky-Dill

18 Jul 2024 14:15 UTC

119 points

18 comments18 min readLW link

SAEs (usually) Transfer Between Base and Chat Models

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

18 Jul 2024 10:29 UTC

66 points

0 comments10 min readLW link

[Question] Should we exclude alignment research from LLM training datasets?

Ben Millwood18 Jul 2024 10:27 UTC

3 points

5 comments1 min readLW link

Keeping content out of LLM training datasets

Ben Millwood18 Jul 2024 10:27 UTC

3 points

0 comments5 min readLW link

The Assassination of Trump’s Ear is Evidence for Time-Travel

elv18 Jul 2024 7:01 UTC

−9 points

5 comments5 min readLW link

Friendship is transactional, unconditional friendship is insurance

Ruby17 Jul 2024 22:52 UTC

67 points

24 comments2 min readLW link

D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset]

abstractapplic17 Jul 2024 22:34 UTC

17 points

5 comments5 min readLW link

Optimistic Assumptions, Longterm Planning, and “Cope”

Raemon17 Jul 2024 22:14 UTC

210 points

46 comments7 min readLW link

Baking vs Patissing vs Cooking, the HPS explanation

adamShimi17 Jul 2024 20:29 UTC

30 points

16 comments3 min readLW link

(epistemologicalfascinations.substack.com)

Launching the Respiratory Outlook 2024/25 Forecasting Series

ChristianWilliams17 Jul 2024 19:51 UTC

5 points

0 comments1 min readLW link

(www.metaculus.com)

What are you getting paid in?

Austin Chen17 Jul 2024 19:23 UTC

85 points

14 comments4 min readLW link

(www.approachwithalacrity.com)

Individually incentivized safe Pareto improvements in open-source bargaining

Nicolas Macé, Anthony DiGiovanni and JesseClifton

17 Jul 2024 18:26 UTC

41 points

2 comments17 min readLW link

Profit and Value

kwang17 Jul 2024 18:06 UTC

22 points

3 comments6 min readLW link

(open.substack.com)

So You’ve Learned To Teleport by Tom Scott

landscape_kiwi17 Jul 2024 18:04 UTC

4 points

0 comments1 min readLW link

(www.youtube.com)

How does generalized accessibility compare to targeted accessibility?

ErioirE17 Jul 2024 17:07 UTC

3 points

0 comments2 min readLW link

Housing Roundup #9: Restricting Supply

Zvi17 Jul 2024 12:50 UTC

25 points

8 comments44 min readLW link

(thezvi.wordpress.com)

We ran an AI safety conference in Tokyo. It went really well. Come next year!

Blaine17 Jul 2024 6:55 UTC

45 points

1 comment6 min readLW link

Agency in Politics

Martin Sustrik17 Jul 2024 5:30 UTC

35 points

2 comments3 min readLW link

(250bpm.substack.com)

Arrakis—A toolkit to conduct, track and visualize mechanistic interpretability experiments.

Yash Srivastava17 Jul 2024 2:02 UTC

3 points

2 comments5 min readLW link

Announcing Open Philanthropy’s AI governance and policy RFP

Julian Hazell17 Jul 2024 2:02 UTC

25 points

0 comments1 min readLW link

(www.openphilanthropy.org)

Turning Your Back On Traffic

jefftk17 Jul 2024 1:00 UTC

37 points

7 comments1 min readLW link

(www.jefftk.com)

[Question] Opinions on Eureka Labs

jmh17 Jul 2024 0:16 UTC

6 points

2 comments1 min readLW link

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. Hudson16 Jul 2024 22:44 UTC

44 points

27 comments5 min readLW link

Multiplex Gene Editing: Where Are We Now?

sarahconstantin16 Jul 2024 20:50 UTC

71 points

6 comments7 min readLW link

(sarahconstantin.substack.com)

Recursion in AI is scary. But let’s talk solutions.

Oleg Trott16 Jul 2024 20:34 UTC

3 points

10 comments2 min readLW link

How to wash your hands precisely and thoroughly

dkl916 Jul 2024 18:29 UTC

12 points

0 comments1 min readLW link

(dkl9.net)

Francois Chollet inadvertently limits his claim on ARC-AGI

Noosphere8916 Jul 2024 17:32 UTC

12 points

3 comments1 min readLW link

(x.com)

Fully booked—LessWrong Community weekend

jt16 Jul 2024 17:15 UTC

20 points

2 comments1 min readLW link

Boundless Emotion

GG1016 Jul 2024 16:36 UTC

3 points

0 comments3 min readLW link

Mech Interp Lacks Good Paradigms

Daniel Tan16 Jul 2024 15:47 UTC

38 points

0 comments14 min readLW link

DM Parenting

Shoshannah Tekofsky16 Jul 2024 8:50 UTC

49 points

4 comments5 min readLW link

(kidquest.substack.com)

Apply now: Get “unstuck” with the New IFS Self-Care Fellowship Program

Inga G.16 Jul 2024 8:18 UTC

10 points

3 comments1 min readLW link

Why the Best Writers Endure Isolation

Declan Molony16 Jul 2024 5:58 UTC

49 points

6 comments2 min readLW link

[Research log] The board of Alphabet would stop DeepMind to save the world

Lucie Philippon16 Jul 2024 4:59 UTC

6 points

0 comments4 min readLW link

Towards more cooperative AI safety strategies

Richard_Ngo16 Jul 2024 4:36 UTC

208 points

133 comments4 min readLW link

Why People in Poverty Make Bad Decisions

James Stephen Brown15 Jul 2024 23:40 UTC

5 points

8 comments1 min readLW link

(nonzerosum.games)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution

Jeffrey Heninger15 Jul 2024 21:30 UTC

29 points

1 comment4 min readLW link

(blog.aiimpacts.org)

MIRI’s July 2024 newsletter

Harlan15 Jul 2024 21:28 UTC

25 points

2 comments1 min readLW link

(intelligence.org)

How (and why) to get tested for CMV

Metacelsus15 Jul 2024 20:06 UTC

17 points

0 comments1 min readLW link

(denovo.substack.com)

A Better Hyperstition (for AI readers)

Yeshua God15 Jul 2024 19:35 UTC

−20 points

0 comments119 min readLW link

I found >800 orthogonal “write code” steering vectors

Jacob G-W and TurnTrout

15 Jul 2024 19:06 UTC

99 points

19 comments7 min readLW link

(jacobgw.com)