All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 20232024

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728 29 30

Gell-Mann checks

Cleo Scrolls26 Sep 2024 22:45 UTC

20 points

7 comments3 min readLW link

[Question] Doing Nothing Utility Function

k6426 Sep 2024 22:05 UTC

9 points

9 comments1 min readLW link

Stanislav Petrov Quarterly Performance Review

Ricki Heicklen26 Sep 2024 21:20 UTC

145 points

3 comments5 min readLW link

(bayesshammai.substack.com)

Self location for LLMs by LLMs: Self-Assessment Checklist.

weightt an26 Sep 2024 19:57 UTC

11 points

0 comments5 min readLW link

Four Levels of Voting Methods

hive26 Sep 2024 18:15 UTC

17 points

3 comments9 min readLW link

(hiveism.substack.com)

Characterizing stable regions in the residual stream of LLMs

Jett Janiak, jacek, Chatrik, Giorgi Giglemiani, nlpet and StefanHex

26 Sep 2024 13:44 UTC

38 points

4 comments1 min readLW link

(arxiv.org)

Chevy Bolt Review

jefftk26 Sep 2024 13:40 UTC

13 points

2 comments1 min readLW link

(www.jefftk.com)

AI #83: The Mask Comes Off

Zvi26 Sep 2024 12:00 UTC

82 points

20 comments36 min readLW link

(thezvi.wordpress.com)

The Existential Dread of Being a Powerful AI System

testingthewaters26 Sep 2024 10:56 UTC

6 points

1 comment2 min readLW link

[Question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?

ChristianKl26 Sep 2024 9:17 UTC

27 points

21 comments1 min readLW link

[Completed] The 2024 Petrov Day Scenario

Ben Pace and Raemon

26 Sep 2024 8:08 UTC

136 points

114 comments5 min readLW link

Source Control for Prototyping and Analysis

jefftk26 Sep 2024 1:50 UTC

12 points

0 comments1 min readLW link

(www.jefftk.com)

[Linkpost] Play with SAEs on Llama 3

Tom McGrath, Eric Ho and Dan Balsam

25 Sep 2024 22:35 UTC

40 points

2 comments1 min readLW link

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

Sodium25 Sep 2024 21:15 UTC

58 points

4 comments2 min readLW link

Comparing Forecasting Track Records for AI Benchmarking and Beyond

ChristianWilliams25 Sep 2024 21:01 UTC

11 points

0 comments1 min readLW link

(www.metaculus.com)

Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility

OwenChen25 Sep 2024 20:38 UTC

3 points

0 comments4 min readLW link

Evaluating Synthetic Activations composed of SAE Latents in GPT-2

Giorgi Giglemiani, nlpet, Chatrik, Jett Janiak and StefanHex

25 Sep 2024 20:37 UTC

27 points

0 comments3 min readLW link

(arxiv.org)

Climate Change And Global Warming

Zero Contradictions25 Sep 2024 19:13 UTC

−7 points

0 comments1 min readLW link

(zerocontradictions.net)

How to prevent collusion when using untrusted models to monitor each other

Buck25 Sep 2024 18:58 UTC

81 points

8 comments22 min readLW link

Alignment by default: the simulation hypothesis

gb25 Sep 2024 16:26 UTC

21 points

39 comments1 min readLW link

A Dialogue on Deceptive Alignment Risks

Rauno Arike25 Sep 2024 16:10 UTC

11 points

0 comments18 min readLW link

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi and Dylan Cope

25 Sep 2024 14:52 UTC

30 points

2 comments4 min readLW link

(arxiv.org)

AIS Hungary Operations Officer role, Deadline: 2024 October 6th

gergogaspar25 Sep 2024 13:54 UTC

1 point

0 comments1 min readLW link

[Intuitive self-models] 2. Conscious Awareness

Steven Byrnes25 Sep 2024 13:29 UTC

81 points

48 comments16 min readLW link

Book Review: On the Edge: The Business

Zvi25 Sep 2024 12:20 UTC

38 points

0 comments36 min readLW link

(thezvi.wordpress.com)

Join the $10K AutoHack 2024 Tournament

Paul Bricman25 Sep 2024 11:54 UTC

5 points

0 comments1 min readLW link

(noemaresearch.com)

[Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

chanind, TomasD, hrdkbhatnagar and Joseph Bloom

25 Sep 2024 9:31 UTC

71 points

16 comments3 min readLW link

(arxiv.org)

[Question] Non-human centric view of existence

ZY25 Sep 2024 5:47 UTC

−3 points

14 comments1 min readLW link

How to Live Well: My Philosophy of Life

Philosofer12325 Sep 2024 1:13 UTC

−8 points

0 comments1 min readLW link

(philosofer123.wordpress.com)

An open response to Wittkotter and Yampolskiy

Donald Hobson24 Sep 2024 22:27 UTC

8 points

0 comments4 min readLW link

A Path out of Insufficient Views

Unreal24 Sep 2024 20:00 UTC

55 points

46 comments9 min readLW link

How to give effectively to US Dems

Hauke Hillebrandt24 Sep 2024 14:38 UTC

2 points

0 comments1 min readLW link

(www.slowboring.com)

[Question] How do you follow AI (safety) news?

PeterH24 Sep 2024 13:58 UTC

4 points

2 comments1 min readLW link

Instruction Following without Instruction Tuning

Bogdan Ionut Cirstea24 Sep 2024 13:49 UTC

17 points

0 comments1 min readLW link

(arxiv.org)

Book Review: On the Edge: The Gamblers

Zvi24 Sep 2024 11:50 UTC

35 points

1 comment89 min readLW link

(thezvi.wordpress.com)

Editing at the Take Level

jefftk24 Sep 2024 11:30 UTC

12 points

1 comment1 min readLW link

(www.jefftk.com)

Using LLM’s for AI Foundation research and the Simple Solution assumption

Donald Hobson24 Sep 2024 11:00 UTC

5 points

0 comments2 min readLW link

When to join a respectability cascade

B Jacobs24 Sep 2024 7:54 UTC

10 points

1 comment2 min readLW link

(bobjacobs.substack.com)

Sampling Effects on Strategic Behavior in Supervised Learning Models

Phil Bland24 Sep 2024 7:44 UTC

1 point

0 comments6 min readLW link

In Praise of the Beatitudes

robotelvis24 Sep 2024 5:08 UTC

9 points

7 comments3 min readLW link

(messyprogress.substack.com)

[Question] What are the best arguments for/against AIs being “slightly ‘nice’”?

Raemon24 Sep 2024 2:00 UTC

94 points

58 comments31 min readLW link

Struggling like a Shadowmoth

Raemon24 Sep 2024 0:47 UTC

176 points

38 comments7 min readLW link

Bounty for Evidence on Some of Palisade Research’s Beliefs

benwr and Jeffrey Ladish

23 Sep 2024 20:01 UTC

46 points

4 comments2 min readLW link

Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data

jefftk23 Sep 2024 17:25 UTC

27 points

0 comments4 min readLW link

(naobservatory.org)

A primer on ML in antibody engineering

Abhishaike Mahajan23 Sep 2024 17:03 UTC

11 points

0 comments25 min readLW link

(www.owlposting.com)

[Question] On the subject of in-house large language models versus implementing frontier models

Annapurna23 Sep 2024 15:00 UTC

7 points

1 comment1 min readLW link

A basic systems architecture for AI agents that do autonomous research

Buck23 Sep 2024 13:58 UTC

187 points

15 comments8 min readLW link

Book Review: On the Edge: The Fundamentals

Zvi23 Sep 2024 13:40 UTC

64 points

3 comments31 min readLW link

(thezvi.wordpress.com)

Switching to a 4GB SD

jefftk23 Sep 2024 11:20 UTC

11 points

1 comment1 min readLW link

(www.jefftk.com)

Model evals for dangerous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC

51 points

11 comments3 min readLW link