All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29

Thoughts for and against an ASI figuring out ethics for itself

sweenesm20 Feb 2024 23:40 UTC

6 points

10 comments3 min readLW link

AI #51: Altman’s Ambition

Zvi20 Feb 2024 19:50 UTC

83 points

5 comments38 min readLW link

(thezvi.wordpress.com)

The Third Gemini

Zvi20 Feb 2024 19:50 UTC

30 points

2 comments9 min readLW link

(thezvi.wordpress.com)

Why does generalization work?

Martín Soto20 Feb 2024 17:51 UTC

43 points

16 comments4 min readLW link

ChatGPT refuses to accept a challenge where it would get shot between the eyes [game theory]

Bill Benzon20 Feb 2024 16:55 UTC

4 points

6 comments4 min readLW link

Inducing human-like biases in moral reasoning LMs

Artyom Karpov, Austin Meek, Bogdan Ionut Cirstea and SCho

20 Feb 2024 16:28 UTC

23 points

3 comments14 min readLW link

Monthly Roundup #15: February 2024

Zvi20 Feb 2024 13:10 UTC

22 points

7 comments32 min readLW link

(thezvi.wordpress.com)

Selections From “The Trouble With Being Born”

Arjun Panickssery20 Feb 2024 10:07 UTC

23 points

2 comments2 min readLW link

(arjunpanickssery.substack.com)

Difficulty classes for alignment properties

Jozdien20 Feb 2024 9:08 UTC

34 points

5 comments2 min readLW link

Lessons from Failed Attempts to Model Sleeping Beauty Problem

Ape in the coat20 Feb 2024 6:43 UTC

13 points

16 comments14 min readLW link

flowing like water; hard like stone

lsusr and SilverFlame

20 Feb 2024 3:20 UTC

27 points

4 comments4 min readLW link

Theism Isn’t So Crazy

omnizoid20 Feb 2024 3:20 UTC

−31 points

11 comments19 min readLW link

[Question] Getting started at distillations: can critique mine?

Joyee Chen20 Feb 2024 0:49 UTC

2 points

0 comments1 min readLW link

Auditing LMs with counterfactual search: a tool for control and ELK

Jacob Pfau20 Feb 2024 0:02 UTC

28 points

6 comments10 min readLW link

Rationalist Storytelling (French)

Camille Berger 19 Feb 2024 22:25 UTC

3 points

0 comments1 min readLW link

Abs-E (or, speak only in the positive)

dkl919 Feb 2024 21:14 UTC

29 points

24 comments2 min readLW link

(dkl9.net)

Retirement Accounts and Short Timelines

jefftk19 Feb 2024 18:50 UTC

83 points

35 comments2 min readLW link

(www.jefftk.com)

How Technical AI Safety Researchers Can Help Implement Punitive Damages to Mitigate Catastrophic AI Risk

Gabriel Weil19 Feb 2024 18:00 UTC

18 points

0 comments4 min readLW link

Protocol evaluations: good analogies vs control

Fabien Roger19 Feb 2024 18:00 UTC

42 points

10 comments11 min readLW link

When Should Copyright Get Shorter?

Maxwell Tabarrok19 Feb 2024 16:03 UTC

11 points

14 comments4 min readLW link

(www.maximum-progress.com)

Auto-matching hidden layers in Pytorch LLMs

chanind19 Feb 2024 12:40 UTC

2 points

0 comments3 min readLW link

I’d also take $7 trillion

bhauth19 Feb 2024 3:31 UTC

45 points

12 comments10 min readLW link

(www.bhauth.com)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19

viking_math19 Feb 2024 1:14 UTC

62 points

28 comments14 min readLW link

Solution to the two envelopes problem for moral weights

MichaelStJules19 Feb 2024 0:15 UTC

9 points

1 comment1 min readLW link

Conspiracy Investigation Done Right

ymeskhout19 Feb 2024 0:09 UTC

24 points

0 comments6 min readLW link

Scientific Method

Andrij “Androniq” Ghorbunov18 Feb 2024 21:06 UTC

24 points

4 comments30 min readLW link

[Question] Weighing reputational and moral consequences of leaving Russia or staying

spza18 Feb 2024 19:36 UTC

29 points

24 comments1 min readLW link

Things I’ve Grieved

Raemon18 Feb 2024 19:32 UTC

125 points

6 comments2 min readLW link

Senses of “knowing” a person

dkl918 Feb 2024 19:13 UTC

3 points

0 comments1 min readLW link

(dkl9.net)

The Jolly Green Giant Chronicles [ChatGPT]

Bill Benzon18 Feb 2024 17:28 UTC

4 points

0 comments8 min readLW link

Intuition for 1 + 2 + 3 + … = −1/12

Shankar Sivarajan18 Feb 2024 16:46 UTC

18 points

28 comments3 min readLW link

No Clickbait—Misalignment Database

Kabir Kumar18 Feb 2024 5:35 UTC

6 points

10 comments1 min readLW link

Idea: NV⁻ Centers for Brain Interpretability

James Camacho18 Feb 2024 5:28 UTC

6 points

1 comment3 min readLW link

Celiacs don’t need to live in fear

Jarrah18 Feb 2024 2:30 UTC

16 points

4 comments4 min readLW link

“What if we could redesign society from scratch? The promise of charter cities.” [Rational Animations video]

Jackson Wagner18 Feb 2024 0:57 UTC

40 points

7 comments1 min readLW link

(www.youtube.com)

Evaluating Solar

jefftk17 Feb 2024 21:50 UTC

26 points

5 comments2 min readLW link

(www.jefftk.com)

Opinions survey 2 (with rationalism score at the end)

tailcalled17 Feb 2024 12:03 UTC

2 points

11 comments1 min readLW link

(docs.google.com)

Achieving AI Alignment through Deliberate Uncertainty in Multiagent Systems

Florian_Dietz17 Feb 2024 8:45 UTC

4 points

0 comments13 min readLW link

Communication, consciousness, and belief strength measures

Jakub Smékal17 Feb 2024 5:45 UTC

1 point

0 comments3 min readLW link

San Fernando Valley Rationality: February 22, 2024

Thomas Broadley17 Feb 2024 1:58 UTC

3 points

0 comments1 min readLW link

Self-Awareness: Taxonomy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC

63 points

2 comments11 min readLW link

Opinions survey (with rationalism score at the end)

tailcalled17 Feb 2024 0:41 UTC

8 points

14 comments1 min readLW link

(docs.google.com)

Phallocentricity in GPT-J’s bizarre stratified ontology

mwatkins17 Feb 2024 0:16 UTC

50 points

37 comments9 min readLW link

FUTARCHY NOW BABY

sapphire17 Feb 2024 0:03 UTC

−1 points

7 comments1 min readLW link

Making the “stance” explicit

NicholasKees16 Feb 2024 23:57 UTC

23 points

3 comments2 min readLW link

2023 Survey Results

Screwtape16 Feb 2024 22:24 UTC

150 points

26 comments44 min readLW link

Physics-based early warning signal shows that AMOC is on tipping course

Annapurna16 Feb 2024 22:07 UTC

19 points

3 comments1 min readLW link

(www.science.org)

Kingfisher Winter Tour 2024

jefftk16 Feb 2024 21:40 UTC

8 points

0 comments1 min readLW link

(www.jefftk.com)

The Pointer Resolution Problem

Jozdien16 Feb 2024 21:25 UTC

41 points

20 comments3 min readLW link

Every “Every Bay Area House Party” Bay Area House Party

Richard_Ngo16 Feb 2024 18:53 UTC

179 points

6 comments4 min readLW link