All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Auditing games for high-level interpretability

Paul CologneseNov 1, 2022, 10:44 AM

33 points

1 comment7 min readLW link

Remember to translate your thoughts back again

brookNov 1, 2022, 8:49 AM

25 points

11 comments3 min readLW link

(forum.effectivealtruism.org)

Conversations on Alcohol Consumption

AnnapurnaNov 1, 2022, 5:09 AM

20 points

6 comments9 min readLW link

ML Safety Scholars Summer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM

29 points

0 comments LW link

EA & LW Forums Weekly Summary (24 − 30th Oct 22′)

Zoe WilliamsNov 1, 2022, 2:58 AM

13 points

1 comment LW link

Caution when interpreting Deepmind’s In-context RL paper

Sam MarksNov 1, 2022, 2:42 AM

105 points

8 comments4 min readLW link

What sorts of systems can be deceptive?

Andrei AlexandruOct 31, 2022, 10:00 PM

16 points

0 comments7 min readLW link

“Cars and Elephants”: a handwavy argument/analogy against mechanistic interpretability

David Scott Krueger (formerly: capybaralet)Oct 31, 2022, 9:26 PM

51 points

25 comments2 min readLW link

Superintelligent AI is necessary for an amazing future, but far from sufficient

So8resOct 31, 2022, 9:16 PM

132 points

48 comments34 min readLW link

Sanity-checking in an age of hyperbole

Ciprian Elliu IvanofOct 31, 2022, 8:04 PM

2 points

4 comments2 min readLW link

Why Aren’t There More Schelling Holidays?

johnswentworthOct 31, 2022, 7:31 PM

63 points

21 comments1 min readLW link

The circular problem of epistemic irresponsibility

Roman LeventovOct 31, 2022, 5:23 PM

5 points

2 comments8 min readLW link

AI as a Civilizational Risk Part 3/6: Anti-economy and Signal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM

7 points

4 comments14 min readLW link

Average utilitarianism is non-local

Yair HalberstadtOct 31, 2022, 4:36 PM

29 points

13 comments1 min readLW link

Marvel Snap: Phase 1

ZviOct 31, 2022, 3:20 PM

23 points

1 comment14 min readLW link

(thezvi.wordpress.com)

Boundaries vs Frames

Scott GarrabrantOct 31, 2022, 3:14 PM

58 points

10 comments7 min readLW link

Embedding safety in ML development

zeshenOct 31, 2022, 12:27 PM

24 points

1 comment18 min readLW link

[Book] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM

20 points

1 comment1 min readLW link

(christophm.github.io)

My (naive) take on Risks from Learned Optimization

Artyom KarpovOct 31, 2022, 10:59 AM

7 points

0 comments5 min readLW link

Tactical Nuclear Weapons Aren’t Cost-Effective Compared to Precision Artillery

Lao MeinOct 31, 2022, 4:33 AM

28 points

7 comments3 min readLW link

Gandalf or Saruman? A Soldier in Scout’s Clothing

DirectedEvolutionOct 31, 2022, 2:40 AM

41 points

1 comment4 min readLW link

Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM

26 points

1 comment1 min readLW link

(braininspired.co)

“Normal” is the equilibrium state of past optimization processes

Alex_AltairOct 30, 2022, 7:03 PM

82 points

5 comments5 min readLW link

AI as a Civilizational Risk Part 2/6: Behavioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM

9 points

0 comments10 min readLW link

Instrumental ignoring AI, Dumb but not useless.

Donald HobsonOct 30, 2022, 4:55 PM

7 points

6 comments2 min readLW link

Weekly Roundup #3

ZviOct 30, 2022, 12:20 PM

23 points

5 comments15 min readLW link

(thezvi.wordpress.com)

Quickly refactoring the U.S. Constitution

lcOct 30, 2022, 7:17 AM

7 points

25 comments4 min readLW link

«Boundaries», Part 3a: Defining boundaries as directed Markov blankets

Andrew_CritchOct 30, 2022, 6:31 AM

90 points

20 comments15 min readLW link

Am I secretly excited for AI getting weird?

porbyOct 29, 2022, 10:16 PM

116 points

4 comments4 min readLW link

AI as a Civilizational Risk Part 1/6: Historical Priors

PashaKamyshevOct 29, 2022, 9:59 PM

2 points

2 comments7 min readLW link

Don’t expect your life partner to be better than your exes in more than one way: a mathematical model

mddOct 29, 2022, 6:47 PM

7 points

1 comment9 min readLW link

The Social Recession: By the Numbers

antonomonOct 29, 2022, 6:45 PM

165 points

29 comments8 min readLW link

(novum.substack.com)

Electric Kettle vs Stove

jefftkOct 29, 2022, 12:50 PM

18 points

7 comments1 min readLW link

(www.jefftk.com)

Quantum Immortality, foiled

BenOct 29, 2022, 11:00 AM

27 points

4 comments2 min readLW link

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

RowanWang, Alexandre Variengien, Arthur Conmy, Buck and jsteinhardt

Oct 28, 2022, 11:55 PM

101 points

9 comments9 min readLW link 2 reviews

(arxiv.org)

Resources that (I think) new alignment researchers should know about

Orpheus16Oct 28, 2022, 10:13 PM

70 points

9 comments4 min readLW link

How often does One Person succeed?

Mayank ModiOct 28, 2022, 7:32 PM

1 point

3 comments LW link

aisafety.community—A living document of AI safety communities

zeshen and plex

Oct 28, 2022, 5:50 PM

58 points

23 comments1 min readLW link

Rapid Test Throat Swabbing?

jefftkOct 28, 2022, 4:30 PM

18 points

2 comments1 min readLW link

(www.jefftk.com)

Join the interpretability research hackathon

Esben KranOct 28, 2022, 4:26 PM

15 points

0 comments LW link

Syncretism

AnnapurnaOct 28, 2022, 4:08 PM

16 points

4 comments1 min readLW link

(jorgevelez.substack.com)

Pondering computation in the real world

Adam ShaiOct 28, 2022, 3:57 PM

24 points

13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKlOct 28, 2022, 12:26 PM

−2 points

153 comments11 min readLW link

New book on s-risks

Tobias_BaumannOct 28, 2022, 9:36 AM

68 points

1 comment LW link

Cryptic symbols

Adam ScherlisOct 28, 2022, 6:44 AM

6 points

17 comments1 min readLW link

(adam.scherlis.com)

All life’s helpers’ beliefs

TehdastehdasOct 28, 2022, 5:47 AM

−12 points

1 comment5 min readLW link

Prizes for ML Safety Benchmark Ideas

joshcOct 28, 2022, 2:51 AM

36 points

5 comments1 min readLW link

Worldview iPeople—Future Fund’s AI Worldview Prize

Toni MUENDELOct 28, 2022, 1:53 AM

−22 points

4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y CelisOct 28, 2022, 1:21 AM

1 point

0 comments1 min readLW link

Nash equilibria of symmetric zero-sum games

Ege ErdilOct 27, 2022, 11:50 PM

14 points

0 comments14 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer