All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

0 Motivation Mapping through Information Theory

P. João16 Dec 2024 23:17 UTC

9 points

0 comments28 min readLW link

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Caspar Oesterheld, Ethan Perez and Chi Nguyen

16 Dec 2024 22:42 UTC

49 points

1 comment2 min readLW link

(arxiv.org)

A practical guide to tiling the universe with hedonium

Vittu Perkele16 Dec 2024 21:25 UTC

−9 points

1 comment1 min readLW link

(perkeleperusing.substack.com)

AI Safety Seed Funding Network—Join as a Donor or Investor

Alexandra Bos16 Dec 2024 19:30 UTC

30 points

0 comments1 min readLW link

Is this a better way to do matchmaking?

Chipmonk16 Dec 2024 19:06 UTC

9 points

4 comments1 min readLW link

I read every major AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 18:51 UTC

20 points

0 comments12 min readLW link

(longerramblings.substack.com)

Grokking revisited: reverse engineering grokking modulo addition in LSTM

Nikita Khomich and Danik

16 Dec 2024 18:48 UTC

4 points

0 comments6 min readLW link

Progress links and short notes, 2024-12-16

jasoncrawford16 Dec 2024 17:24 UTC

7 points

0 comments2 min readLW link

(newsletter.rootsofprogress.org)

Effective Altruism FAQ

omnizoid16 Dec 2024 16:27 UTC

0 points

7 comments12 min readLW link

Variably compressibly studies are fun

dkl916 Dec 2024 16:00 UTC

0 points

0 comments2 min readLW link

(dkl9.net)

AIs Will Increasingly Attempt Shenanigans

Zvi16 Dec 2024 15:20 UTC

114 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Testing which LLM architectures can do hidden serial reasoning

Filip Sondej16 Dec 2024 13:48 UTC

81 points

9 comments4 min readLW link

NeuroAI for AI safety: A Differential Path

nz and Patrick Mineault

16 Dec 2024 13:17 UTC

14 points

0 comments7 min readLW link

(arxiv.org)

Circling as practice for “just be yourself”

Kaj_Sotala16 Dec 2024 7:40 UTC

86 points

5 comments4 min readLW link

(kajsotala.fi)

Reanalyzing the 2023 Expert Survey on Progress in AI

AI Impacts16 Dec 2024 6:10 UTC

8 points

0 comments1 min readLW link

(blog.aiimpacts.org)

Ideas for benchmarking LLM creativity

gwern16 Dec 2024 5:18 UTC

57 points

11 comments1 min readLW link

(gwern.net)

Comparing the AirFanta 3Pro to the Coway AP-1512

jefftk16 Dec 2024 1:40 UTC

13 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] are IQ tests a good measure of intelligence?

KvmanThinking15 Dec 2024 23:06 UTC

0 points

5 comments1 min readLW link

Madison Secular Solstice

svfritz15 Dec 2024 21:52 UTC

1 point

0 comments1 min readLW link

[Question] Is AI alignment a purely functional property?

Roko15 Dec 2024 21:42 UTC

13 points

8 comments1 min readLW link

[Question] How counterfactual are logical counterfactuals?

Donald Hobson15 Dec 2024 21:16 UTC

11 points

10 comments1 min readLW link

Debunking the myth of safe AI

henophilia15 Dec 2024 17:44 UTC

−11 points

8 comments1 min readLW link

(henophilia.substack.com)

Introducing Avatarism: A Rational Framework for Building actual Heaven

ratiba ro15 Dec 2024 17:17 UTC

2 points

2 comments2 min readLW link

A Public Choice Take on Effective Altruism

vaishnav9215 Dec 2024 16:58 UTC

9 points

4 comments3 min readLW link

(www.optimaloutliers.com)

World Models I’m Currently Building

temporary15 Dec 2024 16:29 UTC

5 points

1 comment1 min readLW link

(samuelshadrach.com)

Dress Up For Secular Solstice

Gordon H.S.15 Dec 2024 16:28 UTC

33 points

13 comments7 min readLW link

Remap your caps lock key

bilalchughtai15 Dec 2024 14:03 UTC

84 points

18 comments1 min readLW link

Effective Evil’s AI Misalignment Plan

lsusr15 Dec 2024 7:39 UTC

81 points

9 comments3 min readLW link

Write Good Enough Code, Quickly

Oliver Daniels15 Dec 2024 4:45 UTC

19 points

10 comments8 min readLW link

How to Edit an Essay into a Solstice Speech?

Czynski15 Dec 2024 4:30 UTC

5 points

1 comment1 min readLW link

(thepdv.wordpress.com)

How Your Physiology Affects the Mind’s Projection Fallacy

YanLyutnev14 Dec 2024 21:10 UTC

0 points

0 comments6 min readLW link

Introducing the Evidence Color Wheel

Larry Lee14 Dec 2024 16:08 UTC

6 points

0 comments3 min readLW link

An Illustrated Summary of “Robust Agents Learn Causal World Model”

Dalcy14 Dec 2024 15:02 UTC

63 points

2 comments10 min readLW link

Best-of-N Jailbreaking

John Hughes, saraprice, Aengus Lynch, Rylan Schaeffer, Fazl, Henry Sleight, Ethan Perez and mrinank_sharma

14 Dec 2024 4:58 UTC

78 points

5 comments2 min readLW link

(arxiv.org)

D&D.Sci Dungeonbuilding: the Dungeon Tournament

aphyer14 Dec 2024 4:30 UTC

49 points

16 comments3 min readLW link

Creating Interpretable Latent Spaces with Gradient Routing

Jacob G-W14 Dec 2024 4:00 UTC

26 points

6 comments2 min readLW link

(jacobgw.com)

Probability of death by suicide by a 26 year old

John Wiseman14 Dec 2024 3:33 UTC

−25 points

4 comments1 min readLW link

Matryoshka Sparse Autoencoders

Noa Nabeshima14 Dec 2024 2:52 UTC

91 points

15 comments11 min readLW link

[Question] What is MIRI currently doing?

Roko14 Dec 2024 2:39 UTC

32 points

14 comments1 min readLW link

The o1 System Card Is Not About o1

Zvi13 Dec 2024 20:30 UTC

116 points

5 comments16 min readLW link

(thezvi.wordpress.com)

Arch-anarchy and The Fable of the Dragon-Tyrant

Peter lawless 13 Dec 2024 20:15 UTC

−10 points

0 comments1 min readLW link

Communications in Hard Mode (My new job at MIRI)

tanagrabeast13 Dec 2024 20:13 UTC

202 points

25 comments5 min readLW link

First Thoughts on Detachmentism

Jacob Peterson13 Dec 2024 1:19 UTC

−11 points

5 comments9 min readLW link

How to Build Heaven: A Constrained Boltzmann Brain Generator

High Tides13 Dec 2024 1:04 UTC

−8 points

3 comments5 min readLW link

Representing Irrationality in Game Theory

Larry Lee13 Dec 2024 0:50 UTC

−1 points

3 comments11 min readLW link

“Charity” as a conflationary alliance term

Jan_Kulveit12 Dec 2024 21:49 UTC

34 points

2 comments5 min readLW link

Just one more exposure bro

Chipmonk12 Dec 2024 21:37 UTC

51 points

6 comments2 min readLW link

(chrislakin.blog)

The Dangers of Mirrored Life

Niko_McCarty and fin

12 Dec 2024 20:58 UTC

119 points

7 comments29 min readLW link

(www.asimov.press)

Effective Networking as Sending Hard to Fake Signals

vaishnav9212 Dec 2024 20:32 UTC

25 points

2 comments7 min readLW link

(www.optimaloutliers.com)

Mini PAPR Review

jefftk12 Dec 2024 19:10 UTC

10 points

0 comments2 min readLW link

(www.jefftk.com)