All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018?

Chris_LeongDec 14, 2022, 11:28 PM

8 points

0 comments1 min readLW link

«Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_CritchDec 14, 2022, 10:34 PM

72 points

7 comments13 min readLW link

Aligning alignment with performance

Marv KDec 14, 2022, 10:19 PM

2 points

0 comments2 min readLW link

Contrary to List of Lethality’s point 22, alignment’s door number 2

False NameDec 14, 2022, 10:01 PM

−2 points

5 comments22 min readLW link

Kolmogorov Complexity and Simulation Hypothesis

False NameDec 14, 2022, 10:01 PM

−3 points

0 comments7 min readLW link

[Question] Stanley Meyer’s water fuel cell

mikbpDec 14, 2022, 9:19 PM

2 points

6 comments1 min readLW link

[Question] Is the AI timeline too short to have children?

YorethDec 14, 2022, 6:32 PM

38 points

20 comments1 min readLW link

Predicting GPU performance

Marius Hobbhahn and Tamay

Dec 14, 2022, 4:27 PM

60 points

26 comments1 min readLW link

(epochai.org)

[Incomplete] What is Computation Anyway?

DragonGodDec 14, 2022, 4:17 PM

16 points

1 comment13 min readLW link

(arxiv.org)

Chair Hanging Peg

jefftkDec 14, 2022, 3:30 PM

11 points

0 comments1 min readLW link

(www.jefftk.com)

My AGI safety research—2022 review, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM

51 points

10 comments7 min readLW link

Extracting and Evaluating Causal Direction in LLMs’ Activations

Fabien Roger and simeon_c

Dec 14, 2022, 2:33 PM

29 points

5 comments11 min readLW link

Key Mostly Outward-Facing Facts From the Story of VaccinateCA

ZviDec 14, 2022, 1:30 PM

61 points

2 comments23 min readLW link

(thezvi.wordpress.com)

Discovering Latent Knowledge in Language Models Without Supervision

XodarapDec 14, 2022, 12:32 PM

45 points

1 comment1 min readLW link

(arxiv.org)

[Question] COVID China Personal Advice (No mRNA vax, possible hospital overload, bug-chasing edition)

Lao MeinDec 14, 2022, 10:31 AM

20 points

11 comments1 min readLW link

Beyond a better world

DavidmanheimDec 14, 2022, 10:18 AM

14 points

7 comments4 min readLW link

(progressforum.org)

Proof as mere strong evidence

adamShimiDec 14, 2022, 8:56 AM

28 points

16 comments2 min readLW link

(epistemologicalvigilance.substack.com)

Trying to disambiguate different questions about whether RLHF is “good”

BuckDec 14, 2022, 4:03 AM

106 points

47 comments7 min readLW link 1 review

[Question] How can one literally buy time (from x-risk) with money?

Alex_AltairDec 13, 2022, 7:24 PM

24 points

3 comments1 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

Applications open for AGI Safety Fundamentals: Alignment Course

Richard_NgoDec 13, 2022, 6:31 PM

49 points

0 comments2 min readLW link

What Does It Mean to Align AI With Human Values?

AlgonDec 13, 2022, 4:56 PM

8 points

3 comments1 min readLW link

(www.quantamagazine.org)

It Takes Two Paracetamol?

Eli_Dec 13, 2022, 4:29 PM

33 points

10 comments2 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

Dec 13, 2022, 3:41 PM

150 points

23 comments22 min readLW link 2 reviews

[Question] Is the ChatGPT-simulated Linux virtual machine real?

KenoubiDec 13, 2022, 3:41 PM

18 points

7 comments1 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasperDec 13, 2022, 2:47 PM

37 points

17 comments3 min readLW link

What is the correlation between upvoting and benefit to readers of LW?

banevDec 13, 2022, 2:26 PM

7 points

15 comments1 min readLW link

Limits of Superintelligence

Aleksei PetrenkoDec 13, 2022, 12:19 PM

1 point

5 comments1 min readLW link

Bay 2022 Solstice

RaemonDec 13, 2022, 8:58 AM

17 points

0 comments1 min readLW link

Last day to nominate things for the Review. Also, 2019 books still exist.

RaemonDec 13, 2022, 8:53 AM

15 points

0 comments1 min readLW link

AI alignment is distinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM

255 points

21 comments2 min readLW link

(ai-alignment.com)

Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie SteinerDec 13, 2022, 7:04 AM

37 points

3 comments2 min readLW link

[Question] Are lawsuits against AGI companies extending AGI timelines?

SlowingAGIDec 13, 2022, 6:00 AM

1 point

1 comment1 min readLW link

EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′)

Zoe WilliamsDec 13, 2022, 2:53 AM

7 points

0 comments LW link

Alignment with argument-networks and assessment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM

10 points

5 comments45 min readLW link

Revisiting algorithmic progress

Tamay and Ege Erdil

Dec 13, 2022, 1:39 AM

95 points

15 comments2 min readLW link 1 review

(arxiv.org)

An exploration of GPT-2′s embedding weights

Adam ScherlisDec 13, 2022, 12:46 AM

44 points

4 comments10 min readLW link

12 career-related questions that may (or may not) be helpful for people interested in alignment research

Orpheus16Dec 12, 2022, 10:36 PM

20 points

0 comments2 min readLW link

Concept extrapolation for hypothesis generation

Stuart_Armstrong, Patrick Leask and rgorman

Dec 12, 2022, 10:09 PM

20 points

2 comments3 min readLW link

Let’s go meta: Grammatical knowledge and self-referential sentences [ChatGPT]

Bill BenzonDec 12, 2022, 9:50 PM

5 points

0 comments9 min readLW link

D&D.Sci December 2022 Evaluation and Ruleset

abstractapplicDec 12, 2022, 9:21 PM

17 points

8 comments2 min readLW link

Log-odds are better than Probabilities

Robert_AIZIDec 12, 2022, 8:10 PM

22 points

4 comments4 min readLW link

(aizi.substack.com)

Bengaluru LW/ACX Social Meetup—December 2022

faizDec 12, 2022, 7:30 PM

4 points

0 comments1 min readLW link

Psychological Disorders and Problems

adamShimi and Gabriel Alfour

Dec 12, 2022, 6:15 PM

39 points

6 comments1 min readLW link

Confusing the goal and the path

adamShimiDec 12, 2022, 4:42 PM

44 points

7 comments1 min readLW link

(epistemologicalvigilance.substack.com)

Meaningful things are those the universe possesses a semantics for

Abhimanyu Pallavi SudhirDec 12, 2022, 4:03 PM

16 points

14 comments14 min readLW link

Tradeoffs in complexity, abstraction, and generality

remember and Gabriel Alfour

Dec 12, 2022, 3:55 PM

32 points

0 comments2 min readLW link

Green Line Extension Opening Dates

jefftkDec 12, 2022, 2:40 PM

12 points

0 comments1 min readLW link

(www.jefftk.com)

Join the AI Testing Hackathon this Friday

Esben KranDec 12, 2022, 2:24 PM

10 points

0 comments LW link

Side-channels: input versus output

davidadDec 12, 2022, 12:32 PM

44 points

16 comments2 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer