All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

Oct 25, 2022, 8:48 PM

14 points

2 comments4 min readLW link

A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel NandaOct 25, 2022, 8:24 PM

52 points

7 comments1 min readLW link

(www.youtube.com)

Nothing.

rogersbaconOct 25, 2022, 4:33 PM

−10 points

4 comments6 min readLW link

(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Alignment Equation

Nora_AmmannOct 25, 2022, 4:29 PM

24 points

1 comment5 min readLW link

Consider Applying to the Future Fellowship at MIT

jefftkOct 25, 2022, 3:40 PM

29 points

0 comments1 min readLW link

(www.jefftk.com)

Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel and Adam Shai

Oct 25, 2022, 3:13 PM

63 points

22 comments5 min readLW link

What does it take to defend the world against out-of-control AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM

208 points

49 comments30 min readLW link 1 review

Refine: what helped me write more?

Alexander Gietelink OldenzielOct 25, 2022, 2:44 PM

12 points

0 comments2 min readLW link

Logical Decision Theories: Our final failsafe?

Noosphere89Oct 25, 2022, 12:51 PM

−7 points

8 comments1 min readLW link

(www.lesswrong.com)

What will the scaled up GATO look like? (Updated with questions)

Amal Oct 25, 2022, 12:44 PM

34 points

22 comments1 min readLW link

Mechanism Design for AI Safety—Reading Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM

15 points

3 comments LW link

Furry Rationalists & Effective Anthropomorphism both exist

agentydragonOct 25, 2022, 3:37 AM

42 points

3 comments1 min readLW link

EA & LW Forums Weekly Summary (17 − 23 Oct 22′)

Zoe WilliamsOct 25, 2022, 2:57 AM

10 points

0 comments LW link

Dance Weekends: Tests not Masks

jefftkOct 25, 2022, 2:10 AM

12 points

0 comments2 min readLW link

(www.jefftk.com)

[Question] What is good Cyber Security Advice?

Gunnar_ZarnckeOct 24, 2022, 11:27 PM

30 points

12 comments2 min readLW link

Connections between Mind-Body Problem & Civilizations

oblivionOct 24, 2022, 9:55 PM

−3 points

1 comment1 min readLW link

[Question] Rationalism and money

David KOct 24, 2022, 9:22 PM

−5 points

2 comments1 min readLW link

[Question] Game semantics

David KOct 24, 2022, 9:22 PM

2 points

2 comments1 min readLW link

A Good Future (rough draft)

Michael SoareverixOct 24, 2022, 8:45 PM

10 points

5 comments3 min readLW link

A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel NandaOct 24, 2022, 8:45 PM

64 points

12 comments3 min readLW link

(neelnanda.io)

POWERplay: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM

29 points

0 comments1 min readLW link

(github.com)

Consider trying Vivek Hebbar’s alignment exercises

Orpheus16Oct 24, 2022, 7:46 PM

38 points

1 comment4 min readLW link

[Question] Education not meant for mass-consumption

ToloOct 24, 2022, 7:45 PM

7 points

5 comments2 min readLW link

Realizations in Regards to Masculinity

nmcOct 24, 2022, 7:42 PM

−2 points

2 comments2 min readLW link

The Futility of Religion

nmcOct 24, 2022, 7:42 PM

−1 points

5 comments3 min readLW link

The optimal timing of spending on AGI safety work; why we should probably be spending more now

Tristan CookOct 24, 2022, 5:42 PM

62 points

0 comments LW link

AGI in our lifetimes is wishful thinking

niknobleOct 24, 2022, 11:53 AM

1 point

25 comments8 min readLW link

DeepMind on Stratego, an imperfect information game

sanxiynOct 24, 2022, 5:57 AM

15 points

9 comments1 min readLW link

(arxiv.org)

[Question] TOMT: Post from 1-2 years ago talking about a paper on social networks

Simon BerensOct 24, 2022, 1:29 AM

5 points

1 comment1 min readLW link

AI researchers announce NeuroAI agenda

Cameron BergOct 24, 2022, 12:14 AM

37 points

12 comments6 min readLW link

(arxiv.org)

Empowerment is (almost) All We Need

jacob_cannellOct 23, 2022, 9:48 PM

61 points

44 comments17 min readLW link

“Originality is nothing but judicious imitation”—Voltaire

VestoziaOct 23, 2022, 7:00 PM

0 points

0 comments13 min readLW link

Mid-Peninsula ACX/LW Meetup [CANCELLED]

moshezadkaOct 23, 2022, 5:37 PM

1 point

0 comments1 min readLW link

I am a Memoryless System

Nicholas / Heather KrossOct 23, 2022, 5:34 PM

25 points

2 comments9 min readLW link

(www.thinkingmuchbetter.com)

Accountability Buddies: Why you might want one.

Samuel NellessenOct 23, 2022, 4:25 PM

10 points

3 comments LW link

How to get past Haidt’s elephant and listen

AstynaxOct 23, 2022, 4:06 PM

13 points

4 comments2 min readLW link

Writing Russian and Ukrainian words in Latin script

ViliamOct 23, 2022, 3:25 PM

19 points

22 comments6 min readLW link

[Question] Have you noticed any ways that rationalists differ? [Brainstorming session]

tailcalledOct 23, 2022, 11:32 AM

23 points

22 comments1 min readLW link

Mnestics

Jarred FilmerOct 23, 2022, 12:30 AM

122 points

6 comments4 min readLW link

Telic intuitions across the sciences

mrcbarbierOct 22, 2022, 9:31 PM

4 points

0 comments17 min readLW link

A basic lexicon of telic concepts

mrcbarbierOct 22, 2022, 9:28 PM

2 points

0 comments3 min readLW link

Do we have the right kind of math for roles, goals and meaning?

mrcbarbierOct 22, 2022, 9:28 PM

13 points

5 comments7 min readLW link

[Question] The Last Year - is there an existing novel about the last year before AI doom?

Luca PetrolatiOct 22, 2022, 8:44 PM

4 points

4 comments1 min readLW link

The highest-probability outcome can be out of distribution

tailcalledOct 22, 2022, 8:00 PM

14 points

5 comments1 min readLW link

Newsletter for Alignment Research: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM

25 points

0 comments LW link

Crypto loves impact markets: Notes from Schelling Point Bogotá

Rachel ShuOct 22, 2022, 3:58 PM

17 points

2 comments LW link

[Question] When trying to define general intelligence is ability to achieve goals the best metric?

jmhOct 22, 2022, 3:09 AM

5 points

0 comments1 min readLW link

[Question] Simple question about corrigibility and values in AI.

jmhOct 22, 2022, 2:59 AM

6 points

1 comment1 min readLW link

Moorean Statements

David UdellOct 22, 2022, 12:50 AM

11 points

11 comments1 min readLW link

Wisdom Cannot Be Unzipped

SableOct 22, 2022, 12:28 AM

74 points

17 comments7 min readLW link 1 review

(affablyevil.substack.com)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer