All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

Monosemanticity & Quantization

Rahul Chand22 Oct 2024 22:57 UTC

1 point

0 comments9 min readLW link

[Question] What is the alpha in one bit of evidence?

J Bostock22 Oct 2024 21:57 UTC

20 points

13 comments1 min readLW link

Catastrophic sabotage as a major threat model for human-level AI systems

evhub22 Oct 2024 20:57 UTC

91 points

11 comments15 min readLW link

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Elizabeth22 Oct 2024 18:20 UTC

75 points

79 comments1 min readLW link

(acesounderglass.com)

Decision-Making Under Uncertainty: Lessons From AI

Jonasb22 Oct 2024 17:54 UTC

−1 points

0 comments5 min readLW link

(www.denominations.io)

Testing Genetic Engineering Detection with Spike-Ins

jefftk22 Oct 2024 17:20 UTC

9 points

0 comments1 min readLW link

(naobservatory.org)

Predictions as Public Works Project — What Metaculus Is Building Next

ChristianWilliams22 Oct 2024 16:35 UTC

4 points

0 comments1 min readLW link

(www.metaculus.com)

Gorges of gender on a terrain of traits

dkl922 Oct 2024 16:18 UTC

−7 points

1 comment3 min readLW link

(dkl9.net)

A Defense of Peer Review

Niko_McCarty and delton137

22 Oct 2024 16:16 UTC

23 points

1 comment22 min readLW link

(www.asimov.press)

BIG-Bench Canary Contamination in GPT-4

Jozdien22 Oct 2024 15:40 UTC

123 points

13 comments4 min readLW link

[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF

Leon Lang22 Oct 2024 13:57 UTC

50 points

1 comment18 min readLW link

(arxiv.org)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE

Steven Byrnes22 Oct 2024 13:23 UTC

62 points

8 comments21 min readLW link

Resolving von Neumann-Morgenstern Inconsistent Preferences

niplav22 Oct 2024 11:45 UTC

31 points

5 comments58 min readLW link

Lenses of Control

WillPetillo22 Oct 2024 7:51 UTC

14 points

0 comments9 min readLW link

A Brief Explanation of AI Control

Aaron_Scher22 Oct 2024 7:00 UTC

7 points

1 comment6 min readLW link

Longevity, AI, and Cognitive Research Hackathon @ MIT

ekkolápto22 Oct 2024 6:19 UTC

1 point

0 comments1 min readLW link

Conversational Signposts—An Antidote to Dull Social Interactions

Declan Molony22 Oct 2024 5:37 UTC

11 points

6 comments2 min readLW link

I got dysentery so you don’t have to

eukaryote22 Oct 2024 4:55 UTC

315 points

4 comments17 min readLW link

(eukaryotewritesblog.com)

Transformers Explained (Again)

RohanS22 Oct 2024 4:06 UTC

3 points

0 comments18 min readLW link

Sleeping on Stage

jefftk22 Oct 2024 0:50 UTC

26 points

3 comments1 min readLW link

(www.jefftk.com)

The Mask Comes Off: At What Price?

Zvi21 Oct 2024 23:50 UTC

71 points

16 comments8 min readLW link

(thezvi.wordpress.com)

Distinguishing ways AI can be “concentrated”

Matthew Barnett21 Oct 2024 22:21 UTC

28 points

2 comments1 min readLW link

Jailbreaking ChatGPT and Claude using Web API Context Injection

Jaehyuk Lim21 Oct 2024 21:34 UTC

4 points

0 comments3 min readLW link

How to Teach Your Brain to Hate Procrastination

10xyz21 Oct 2024 20:12 UTC

3 points

0 comments2 min readLW link

Pausing for what?

MountainPath21 Oct 2024 20:12 UTC

0 points

1 comment1 min readLW link

What is autonomy? Why boundaries are necessary.

Chipmonk21 Oct 2024 17:56 UTC

8 points

1 comment1 min readLW link

(chrislakin.blog)

Could randomly choosing people to serve as representatives lead to better government?

John Huang21 Oct 2024 17:10 UTC

75 points

13 comments10 min readLW link

There aren’t enough smart people in biology doing something boring

Abhishaike Mahajan21 Oct 2024 15:52 UTC

27 points

13 comments10 min readLW link

Automation collapse

Geoffrey Irving, Tomek Korbak and Benjamin Hilton

21 Oct 2024 14:50 UTC

70 points

9 comments7 min readLW link

What AI companies should do: Some rough ideas

Zach Stein-Perlman21 Oct 2024 14:00 UTC

33 points

10 comments5 min readLW link

[Question] What should OpenAI do that it hasn’t already done, to stop their vacancies from being advertised on the 80k Job Board?

WitheringWeights21 Oct 2024 13:57 UTC

21 points

0 comments1 min readLW link

A Rocket–Interpretability Analogy

plex21 Oct 2024 13:55 UTC

149 points

31 comments1 min readLW link

Tokyo AI Safety 2025: Call For Papers

Blaine21 Oct 2024 8:43 UTC

24 points

0 comments3 min readLW link

(www.tais2025.cc)

OpenAI defected, but we can take honest actions

Remmelt21 Oct 2024 8:41 UTC

17 points

16 comments1 min readLW link

Slightly More Than You Wanted To Know: Pregnancy Length Effects

JustisMills21 Oct 2024 1:26 UTC

62 points

4 comments5 min readLW link

(justismills.substack.com)

Information vs Assurance

johnswentworth20 Oct 2024 23:16 UTC

185 points

17 comments2 min readLW link

Liquid vs Illiquid Careers

vaishnav9220 Oct 2024 23:03 UTC

33 points

7 comments7 min readLW link

(vaishnavsunil.substack.com)

AI Can be “Gradient Aware” Without Doing Gradient hacking.

Sodium20 Oct 2024 21:02 UTC

20 points

0 comments2 min readLW link

A brief theory of why we think things are good or bad

David Johnston20 Oct 2024 20:31 UTC

7 points

10 comments1 min readLW link

Thinking in 2D

sarahconstantin20 Oct 2024 19:30 UTC

27 points

0 comments8 min readLW link

(sarahconstantin.substack.com)

Podcast discussing Hanson’s Cultural Drift Argument

vaishnav92 and regan.arntz.gray

20 Oct 2024 17:58 UTC

3 points

0 comments1 min readLW link

(moralmayhem.substack.com)

Advice on Communicating Concisely

EvolutionByDesign20 Oct 2024 16:45 UTC

2 points

9 comments1 min readLW link

Ambiguities or the issues we face with AI in medicine

Thehumanproject.ai20 Oct 2024 16:45 UTC

2 points

0 comments5 min readLW link

The Personal Implications of AGI Realism

xizneb20 Oct 2024 16:43 UTC

7 points

7 comments5 min readLW link

Safety tax functions

owencb20 Oct 2024 14:08 UTC

30 points

0 comments6 min readLW link

(strangecities.substack.com)

Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data

rokosbasilisk20 Oct 2024 8:40 UTC

3 points

2 comments1 min readLW link

Electoral Systems

RedFishBlueFish20 Oct 2024 3:25 UTC

1 point

0 comments14 min readLW link

Overcoming Bias Anthology

Arjun Panickssery20 Oct 2024 2:01 UTC

164 points

14 comments2 min readLW link

(overcoming-bias-anthology.com)

D/acc AI Security Salon

Allison Duettmann19 Oct 2024 22:17 UTC

19 points

0 comments1 min readLW link

Who Should Have Been Killed, and Contains Neato? Who Else Could It Be, but that Villain Magneto!

Ace Delgado19 Oct 2024 20:39 UTC

−16 points

0 comments1 min readLW link