All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30

Don’t want Goodhart? — Specify the damn variables

Yan Lyutnev21 Nov 2024 22:45 UTC

−3 points

2 comments5 min readLW link

Don’t want Goodhart? — Specify the variables more

YanLyutnev21 Nov 2024 22:43 UTC

3 points

2 comments5 min readLW link

Aligning AI Safety Projects with a Republican Administration

Deric Cheng21 Nov 2024 22:12 UTC

33 points

1 comment8 min readLW link

Entropic strategy in Two Truths and a Lie

dkl921 Nov 2024 22:03 UTC

4 points

2 comments1 min readLW link

(dkl9.net)

The Three Warnings of the Zentradi

Trevor Hill-Hand21 Nov 2024 20:28 UTC

13 points

1 comment5 min readLW link

[Question] Which things were you surprised to learn are not metaphors?

Eric Neyman21 Nov 2024 18:56 UTC

132 points

88 comments1 min readLW link

Epistemic status: poetry (and other poems)

Richard_Ngo21 Nov 2024 18:13 UTC

50 points

5 comments2 min readLW link

(www.narrativeark.xyz)

OpenAI’s CBRN tests seem unclear

LucaRighetti21 Nov 2024 17:28 UTC

124 points

6 comments7 min readLW link

(www.planned-obsolescence.org)

I Have A New Paper Out Arguing Against The Asymmetry And For The Existence of Happy People Being Very Good

omnizoid21 Nov 2024 17:21 UTC

9 points

3 comments9 min readLW link

Dangerous capability tests should be harder

LucaRighetti21 Nov 2024 17:20 UTC

44 points

3 comments5 min readLW link

(www.planned-obsolescence.org)

Action derivatives: You’re not doing what you think you’re doing

PatrickDFarley21 Nov 2024 16:24 UTC

26 points

0 comments3 min readLW link

AI #91: Deep Thinking

Zvi21 Nov 2024 14:30 UTC

47 points

11 comments56 min readLW link

(thezvi.wordpress.com)

Secular Solstice Round Up 2024

dspeyer21 Nov 2024 10:49 UTC

76 points

15 comments1 min readLW link

An Epistemological Nightmare

Ariel Cheng21 Nov 2024 2:08 UTC

6 points

1 comment2 min readLW link

(www.mit.edu)

A Conflicted Linkspost

Screwtape21 Nov 2024 0:37 UTC

52 points

0 comments3 min readLW link

DeepSeek beats o1-preview on math, ties on coding; will release weights

Zach Stein-Perlman20 Nov 2024 23:50 UTC

113 points

26 comments1 min readLW link

Expected Utility, Geometric Utility, and Other Equivalent Representations

StrivingForLegibility20 Nov 2024 23:28 UTC

10 points

0 comments11 min readLW link

[Question] Green thumb

Pug stanky20 Nov 2024 21:52 UTC

−12 points

1 comment2 min readLW link

Cost, Not Sacrifice

Joe Rogero20 Nov 2024 21:32 UTC

75 points

13 comments1 min readLW link

(subatomicarticles.com)

China Hawks are Manufacturing an AI Arms Race

garrison20 Nov 2024 18:17 UTC

138 points

44 comments1 min readLW link

(garrisonlovely.substack.com)

Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

omnizoid20 Nov 2024 16:48 UTC

25 points

5 comments33 min readLW link

aspirational leadership

dhruvmethi20 Nov 2024 16:07 UTC

2 points

0 comments7 min readLW link

Zvi’s Thoughts on His 2nd Round of SFF

Zvi20 Nov 2024 13:40 UTC

91 points

2 comments10 min readLW link

(thezvi.wordpress.com)

A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers

Bogdan Ionut Cirstea20 Nov 2024 11:48 UTC

16 points

0 comments1 min readLW link

(openreview.net)

[Question] What changes should happen in the HHS?

ChristianKl20 Nov 2024 11:04 UTC

0 points

19 comments1 min readLW link

[Question] What are the good rationality films?

Ben Pace20 Nov 2024 6:04 UTC

82 points

54 comments1 min readLW link

Valence Need Not Be Bounded; Utility Need Not Synthesize

Lorec20 Nov 2024 1:37 UTC

8 points

0 comments6 min readLW link

Value/Utility: A History

Lorec19 Nov 2024 23:01 UTC

9 points

0 comments10 min readLW link

Why Don’t We Just… Shoggoth+Face+Paraphraser?

Daniel Kokotajlo and abramdemski

19 Nov 2024 20:53 UTC

140 points

57 comments14 min readLW link

Every niche event should also be a meetup

DMMF19 Nov 2024 20:47 UTC

18 points

0 comments3 min readLW link

(danfrank.ca)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative

Phib19 Nov 2024 18:42 UTC

56 points

7 comments1 min readLW link

Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake

TurnTrout19 Nov 2024 18:36 UTC

40 points

5 comments1 min readLW link

(turntrout.com)

Evolution’s selection target depends on your weighting

tailcalled19 Nov 2024 18:24 UTC

23 points

22 comments1 min readLW link

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems

Corin Katzke, Julius, andrewz and Dan H

19 Nov 2024 16:36 UTC

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

Jakarta ACX December 2024 Meetup

Aud19 Nov 2024 15:01 UTC

1 point

0 comments1 min readLW link

Visualizing small Attention-only Transformers

WCargo19 Nov 2024 9:37 UTC

4 points

0 comments8 min readLW link

Americans are fat and sick—and it’s their fault…right?

Declan Molony19 Nov 2024 6:41 UTC

10 points

6 comments7 min readLW link

Announcing the CLR Foundations Course and CLR S-Risk Seminars

JamesFaville19 Nov 2024 1:18 UTC

18 points

0 comments1 min readLW link

No Electricity in Manchuria

winstonBosan19 Nov 2024 1:11 UTC

25 points

0 comments5 min readLW link

Looking back on the Future of Humanity Institute—Asterisk

jakeeaton19 Nov 2024 0:44 UTC

48 points

0 comments1 min readLW link

Don’t Dismiss on Epistemics

ggex19 Nov 2024 0:44 UTC

8 points

3 comments2 min readLW link

Training AI agents to solve hard problems could lead to Scheming

Marius Hobbhahn and AlexMeinke

19 Nov 2024 0:10 UTC

61 points

12 comments28 min readLW link

Proactive ‘If-Then’ Safety Cases

Nathan Helm-Burger18 Nov 2024 21:16 UTC

10 points

0 comments4 min readLW link

[Question] Will Orion/Gemini 2/Llama-4 outperform o1

LuigiPagani18 Nov 2024 21:15 UTC

2 points

3 comments1 min readLW link

How to use bright light to improve your life.

Nat Martin18 Nov 2024 19:32 UTC

40 points

10 comments10 min readLW link

Social events with plausible deniability

Chipmonk18 Nov 2024 18:25 UTC

25 points

24 comments1 min readLW link

(chrislakin.blog)

How likely is brain preservation to work?

Andy_McKenzie18 Nov 2024 16:58 UTC

26 points

3 comments6 min readLW link

Why imperfect adversarial robustness doesn’t doom AI control

Buck and Claude+

18 Nov 2024 16:05 UTC

62 points

25 comments2 min readLW link

Ethical Implications of the Quantum Multiverse

Jonah Wilberg18 Nov 2024 16:00 UTC

7 points

22 comments6 min readLW link

Reducing x-risk might be actively harmful

MountainPath18 Nov 2024 14:25 UTC

5 points

5 comments1 min readLW link