All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c25 Oct 2023 23:46 UTC

122 points

35 comments22 min readLW link 1 review

(www.navigatingrisks.ai)

[Question] What are the long-term outcomes for Bitcoin and other cryptocurrencies?

Auspicious25 Oct 2023 21:12 UTC

−4 points

1 comment1 min readLW link

AI as a science, and three obstacles to alignment strategies

So8res25 Oct 2023 21:00 UTC

185 points

80 comments11 min readLW link

My hopes for alignment: Singular learning theory and whole brain emulation

Garrett Baker25 Oct 2023 18:31 UTC

61 points

5 comments12 min readLW link

[Question] Lying to chess players for alignment

Zane25 Oct 2023 17:47 UTC

96 points

54 comments1 min readLW link

Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund

Zach Stein-Perlman25 Oct 2023 15:20 UTC

31 points

8 comments4 min readLW link

(www.frontiermodelforum.org)

“The Economics of Time Travel”—call for reviewers (Seeds of Science)

rogersbacon25 Oct 2023 15:13 UTC

4 points

2 comments1 min readLW link

Compositional preference models for aligning LMs

Tomek Korbak25 Oct 2023 12:17 UTC

18 points

2 comments5 min readLW link

[Question] Should the US House of Representatives adopt rank choice voting for leadership positions?

jmh25 Oct 2023 11:16 UTC

16 points

6 comments1 min readLW link

Researchers believe they have found a way for artists to fight back against AI style capture

vernamcipher25 Oct 2023 10:54 UTC

3 points

1 comment1 min readLW link

(finance.yahoo.com)

Why We Disagree

zulupineapple25 Oct 2023 10:50 UTC

7 points

2 comments2 min readLW link

Beyond the Data: Why aid to poor doesn’t work

Lyrongolem25 Oct 2023 5:03 UTC

2 points

31 comments12 min readLW link

Announcing Epoch’s newly expanded Parameters, Compute and Data Trends in Machine Learning database

Robi Rahman, Jaime Sevilla Molina, Tamay, Ege Erdil, Pablo Villalobos, Ben Cottier and Matthew Barnett

25 Oct 2023 2:55 UTC

18 points

0 comments1 min readLW link

(epochai.org)

What is a Sequencing Read?

jefftk25 Oct 2023 2:10 UTC

17 points

2 comments2 min readLW link

(www.jefftk.com)

Verifiable private execution of machine learning models with Risc0?

mako yass25 Oct 2023 0:44 UTC

30 points

2 comments2 min readLW link

[Question] How to Resolve Forecasts With No Central Authority?

Nathan Young25 Oct 2023 0:28 UTC

17 points

6 comments1 min readLW link

Thoughts on responsible scaling policies and regulation

paulfchristiano24 Oct 2023 22:21 UTC

220 points

33 comments6 min readLW link

The Screenplay Method

Yeshua God24 Oct 2023 17:41 UTC

−15 points

0 comments25 min readLW link

Blunt Razor

fryolysis24 Oct 2023 17:27 UTC

3 points

0 comments2 min readLW link

Halloween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC

−10 points

1 comment1 min readLW link

Who is Harry Potter? Some predictions.

Donald Hobson24 Oct 2023 16:14 UTC

23 points

7 comments2 min readLW link

Book Review: Going Infinite

Zvi24 Oct 2023 15:00 UTC

242 points

113 comments97 min readLW link 1 review

(thezvi.wordpress.com)

[Interview w/ Quintin Pope] Evolution, values, and AI Safety

fowlertm24 Oct 2023 13:53 UTC

11 points

0 comments1 min readLW link

Lying is Cowardice, not Strategy

Connor Leahy and Gabriel Alfour

24 Oct 2023 13:24 UTC

31 points

73 comments5 min readLW link

(cognition.cafe)

[Question] Anyone Else Using Brilliant?

Sable24 Oct 2023 12:12 UTC

19 points

0 comments1 min readLW link

Announcing #AISummitTalks featuring Professor Stuart Russell and many others

otto.barten24 Oct 2023 10:11 UTC

17 points

1 comment1 min readLW link

Linkpost: A Post Mortem on the Gino Case

Linch24 Oct 2023 6:50 UTC

89 points

7 comments2 min readLW link

(www.theorgplumber.com)

South Bay SSC Meetup, San Jose, November 5th.

David Friedman24 Oct 2023 4:50 UTC

2 points

1 comment1 min readLW link

AI Pause Will Likely Backfire (Guest Post)

jsteinhardt24 Oct 2023 4:30 UTC

47 points

6 comments15 min readLW link

(bounded-regret.ghost.io)

Human wanting

TsviBT24 Oct 2023 1:05 UTC

53 points

1 comment10 min readLW link

Towards Understanding Sycophancy in Language Models

Ethan Perez, mrinank_sharma, Meg and Tomek Korbak

24 Oct 2023 0:30 UTC

66 points

0 comments2 min readLW link

(arxiv.org)

Manifold Halloween Hackathon

Austin Chen23 Oct 2023 22:47 UTC

8 points

0 comments1 min readLW link

Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper

Neel Nanda23 Oct 2023 22:38 UTC

93 points

12 comments9 min readLW link

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

EJT23 Oct 2023 21:00 UTC

79 points

22 comments1 min readLW link

(philpapers.org)

AI Alignment [Incremental Progress Units] this Week (10/22/23)

Logan Zoellner23 Oct 2023 20:32 UTC

22 points

0 comments6 min readLW link

(midwitalignment.substack.com)

z is not the cause of x

hrbigelow23 Oct 2023 17:43 UTC

6 points

2 comments9 min readLW link

Some of my predictable updates on AI

Aaron_Scher23 Oct 2023 17:24 UTC

32 points

8 comments9 min readLW link

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Fabien Roger and Buck

23 Oct 2023 16:37 UTC

107 points

3 comments8 min readLW link

Machine Unlearning Evaluations as Interpretability Benchmarks

NickyP and Nandi

23 Oct 2023 16:33 UTC

33 points

2 comments11 min readLW link

VLM-RM: Specifying Rewards with Natural Language

ChengCheng, David Lindner and Ethan Perez

23 Oct 2023 14:11 UTC

20 points

2 comments5 min readLW link

(far.ai)

Contra Dance Dialect Survey

jefftk23 Oct 2023 13:40 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Which LessWrongers are (aspiring) YouTubers?

Mati_Roy23 Oct 2023 13:21 UTC

22 points

13 comments1 min readLW link

[Question] What is an “anti-Occamian prior”?

Zane23 Oct 2023 2:26 UTC

35 points

22 comments1 min readLW link

AI Safety is Dropping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC

64 points

78 comments34 min readLW link

The Drowning Child

Tomás B.22 Oct 2023 16:39 UTC

25 points

8 comments1 min readLW link

Announcing Timaeus

Jesse Hoogland, Daniel Murfet, Alexander Gietelink Oldenziel and Stan van Wingerden

22 Oct 2023 11:59 UTC

187 points

15 comments4 min readLW link

Into AI Safety—Episode 0

jacobhaimes22 Oct 2023 3:30 UTC

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

Thoughts On (Solving) Deep Deception

Jozdien21 Oct 2023 22:40 UTC

69 points

4 comments6 min readLW link

Best effort beliefs

Adam Zerner21 Oct 2023 22:05 UTC

14 points

9 comments4 min readLW link

How toy models of ontology changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC

42 points

0 comments2 min readLW link