All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 293031

Takeaways from calibration training

Olli Järviniemi29 Jan 2023 19:09 UTC

38 points

1 comment3 min readLW link

Structure, creativity, and novelty

TsviBT29 Jan 2023 14:30 UTC

18 points

4 comments7 min readLW link

What is the ground reality of countries taking steps to recalibrate AI development towards Alignment first?

Nebuch29 Jan 2023 13:26 UTC

8 points

6 comments3 min readLW link

Compendium of problems with RLHF

Charbel-Raphaël29 Jan 2023 11:40 UTC

120 points

16 comments10 min readLW link

My biggest takeaway from Redwood Research REMIX

Alok Singh29 Jan 2023 11:00 UTC

0 points

0 comments1 min readLW link

(alok.github.io)

EA novel published on Amazon

Timothy Underwood29 Jan 2023 8:33 UTC

17 points

0 comments1 min readLW link

Reverse RSS Stats

jefftk29 Jan 2023 3:40 UTC

12 points

2 comments1 min readLW link

(www.jefftk.com)

Why and How to Graduate Early [U.S.]

Tego29 Jan 2023 1:28 UTC

29 points

5 comments8 min readLW link

Stop-gradients lead to fixed point predictions

Johannes Treutlein, Caspar Oesterheld, Rubi J. Hudson and Emery Cooper

28 Jan 2023 22:47 UTC

37 points

2 comments24 min readLW link

Eli Dourado AMA on the Progress Forum

jasoncrawford28 Jan 2023 22:18 UTC

19 points

0 comments1 min readLW link

(rootsofprogress.org)

LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts)

Ruby and RobertM

28 Jan 2023 22:14 UTC

60 points

4 comments3 min readLW link

No Fire in the Equations

Carlos Ramirez28 Jan 2023 21:16 UTC

−16 points

4 comments3 min readLW link

Optimality is the tiger, and annoying the user is its teeth

Christopher King28 Jan 2023 20:20 UTC

25 points

6 comments2 min readLW link

On not getting contaminated by the wrong obesity ideas

Natália28 Jan 2023 20:18 UTC

305 points

69 comments30 min readLW link

Advice I found helpful in 2022

Akash28 Jan 2023 19:48 UTC

36 points

5 comments2 min readLW link

The Knockdown Argument Paradox

Bryan Frances28 Jan 2023 19:23 UTC

−12 points

6 comments8 min readLW link

Less Wrong/ACX Budapest Feb 4th Meetup

Richard Horvath and Timothy Underwood

28 Jan 2023 14:49 UTC

2 points

0 comments1 min readLW link

Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)

Shoshannah Tekofsky28 Jan 2023 5:26 UTC

53 points

7 comments7 min readLW link

A Simple Alignment Typology

Shoshannah Tekofsky28 Jan 2023 5:26 UTC

34 points

2 comments2 min readLW link

Spooky action at a distance in the loss landscape

Jesse Hoogland and Filip Sondej

28 Jan 2023 0:22 UTC

61 points

4 comments7 min readLW link

(www.jessehoogland.com)

WaPo: “Big Tech was moving cautiously on AI. Then came ChatGPT.”

Julian Bradshaw27 Jan 2023 22:54 UTC

26 points

5 comments1 min readLW link

(www.washingtonpost.com)

Literature review of TAI timelines

Jsevillamol, keith_wynroe and David Atkinson

27 Jan 2023 20:07 UTC

35 points

7 comments2 min readLW link

(epochai.org)

Scaling Laws Literature Review

Pablo Villalobos27 Jan 2023 19:57 UTC

36 points

1 comment4 min readLW link

(epochai.org)

The role of Bayesian ML in AI safety—an overview

Marius Hobbhahn27 Jan 2023 19:40 UTC

31 points

6 comments10 min readLW link

Assigning Praise and Blame: Decoupling Epistemology and Decision Theory

adamShimi and Gabriel Alfour

27 Jan 2023 18:16 UTC

59 points

5 comments3 min readLW link

[Question] How could humans dominate over a super intelligent AI?

Marco Discendenti27 Jan 2023 18:15 UTC

−5 points

8 comments1 min readLW link

ChatGPT understands language

philosophybear27 Jan 2023 7:14 UTC

27 points

4 comments6 min readLW link

(philosophybear.substack.com)

Jar of Chocolate

jefftk27 Jan 2023 3:40 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

Basics of Rationalist Discourse

Duncan Sabien (Deactivated)27 Jan 2023 2:40 UTC

267 points

192 comments36 min readLW link 4 reviews

The recent banality of rationality (and effective altruism)

CraigMichael27 Jan 2023 1:19 UTC

−6 points

7 comments11 min readLW link

11 heuristics for choosing (alignment) research projects

Akash and danesherbs

27 Jan 2023 0:36 UTC

50 points

5 comments1 min readLW link

A different observation of Vavilov Day

Elizabeth26 Jan 2023 21:50 UTC

30 points

1 comment1 min readLW link

(acesounderglass.com)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

mwatkins and Robert Miles

26 Jan 2023 21:01 UTC

39 points

81 comments2 min readLW link

Just another thought experiment

Bohdan Kudlai 26 Jan 2023 19:29 UTC

−11 points

0 comments1 min readLW link

Exquisite Oracle: A Dadaist-Inspired Literary Game for Many Friends (or 1 AI)

Yitz26 Jan 2023 18:26 UTC

6 points

1 comment1 min readLW link

AI Risk Management Framework | NIST

DragonGod26 Jan 2023 15:27 UTC

36 points

4 comments2 min readLW link

(www.nist.gov)

“How to Escape from the Simulation”—Seeds of Science call for reviewers

rogersbacon26 Jan 2023 15:11 UTC

12 points

0 comments1 min readLW link

Loom: Why and How to use it

brook26 Jan 2023 14:34 UTC

2 points

5 comments1 min readLW link

Covid 1/26/23: Case Count Crash

Zvi26 Jan 2023 12:50 UTC

32 points

5 comments9 min readLW link

(thezvi.wordpress.com)

[Question] How are you currently modeling COVID contagiousness?

CounterBlunder26 Jan 2023 4:46 UTC

2 points

2 comments1 min readLW link

[Question] What’s the simplest concrete unsolved problem in AI alignment?

agg26 Jan 2023 4:15 UTC

28 points

4 comments1 min readLW link

2022 Less Wrong Census/Survey: Request for Comments

Screwtape25 Jan 2023 20:57 UTC

5 points

29 comments1 min readLW link

Next steps after AGISF at UMich

JakubK25 Jan 2023 20:57 UTC

10 points

0 comments5 min readLW link

(docs.google.com)

AGI will have learnt utility functions

beren25 Jan 2023 19:42 UTC

36 points

3 comments13 min readLW link

[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

gekaklam, Walter Laurito , Kaarel and Kay Kozaronek

25 Jan 2023 19:03 UTC

48 points

6 comments12 min readLW link

Spreading messages to help with the most important century

HoldenKarnofsky25 Jan 2023 18:20 UTC

75 points

4 comments18 min readLW link

(www.cold-takes.com)

My Model Of EA Burnout

LoganStrohl25 Jan 2023 17:52 UTC

255 points

50 comments5 min readLW link 1 review

Thoughts on the impact of RLHF research

paulfchristiano25 Jan 2023 17:23 UTC

250 points

102 comments9 min readLW link

[Question] Could AI be used to engineer a sociopolitical situation where humans can solve the problems surrounding AGI?

hollowing25 Jan 2023 17:17 UTC

1 point

6 comments1 min readLW link

Progress links and tweets, 2023-01-25

jasoncrawford25 Jan 2023 16:12 UTC

8 points

0 comments1 min readLW link

(rootsofprogress.org)