All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI Safety in a Vulnerable World: Requesting Feedback on Preliminary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM

4 points

2 comments3 min readLW link

ChatGPT and the Human Race

Ben ReillyDec 6, 2022, 9:38 PM

6 points

1 comment3 min readLW link

[Question] How do finite factored sets compare with phase space?

Alex_AltairDec 6, 2022, 8:05 PM

15 points

1 comment1 min readLW link

Mesa-Optimizers via Grokking

orthonormalDec 6, 2022, 8:05 PM

36 points

4 comments6 min readLW link

Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong and rgorman

Dec 6, 2022, 7:54 PM

170 points

85 comments9 min readLW link

The Parable of the Crimp

PhosphorousDec 6, 2022, 6:41 PM

11 points

3 comments3 min readLW link

The Categorical Imperative Obscures

Gordon Seidoh WorleyDec 6, 2022, 5:48 PM

17 points

17 comments2 min readLW link

MIRI’s “Death with Dignity” in 60 seconds.

Cleo NardoDec 6, 2022, 5:18 PM

58 points

4 comments1 min readLW link

Things roll downhill

awenonianDec 6, 2022, 3:27 PM

19 points

0 comments1 min readLW link

EA & LW Forums Weekly Summary (28th Nov − 4th Dec 22′)

Zoe WilliamsDec 6, 2022, 9:38 AM

10 points

1 comment LW link

Take 5: Another problem for natural abstractions is laziness.

Charlie SteinerDec 6, 2022, 7:00 AM

31 points

4 comments3 min readLW link

Verification Is Not Easier Than Generation In General

johnswentworthDec 6, 2022, 5:20 AM

73 points

27 comments1 min readLW link

Shh, don’t tell the AI it’s likely to be evil

naterushDec 6, 2022, 3:35 AM

19 points

9 comments1 min readLW link

[Question] What are the major underlying divisions in AI safety?

Chris_LeongDec 6, 2022, 3:28 AM

5 points

2 comments1 min readLW link

[Link] Why I’m optimistic about OpenAI’s alignment approach

janleikeDec 5, 2022, 10:51 PM

98 points

15 comments1 min readLW link

(aligned.substack.com)

The No Free Lunch theorem for dummies

Steven ByrnesDec 5, 2022, 9:46 PM

37 points

16 comments3 min readLW link

ChatGPT and Ideological Turing Test

ViliamDec 5, 2022, 9:45 PM

42 points

1 comment1 min readLW link

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill BenzonDec 5, 2022, 9:10 PM

5 points

0 comments4 min readLW link

Updating my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM

145 points

50 comments2 min readLW link

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

Dec 5, 2022, 8:28 PM

40 points

19 comments10 min readLW link

College Admissions as a Brutal One-Shot Game

devanshDec 5, 2022, 8:05 PM

8 points

26 comments2 min readLW link

Analysis of AI Safety surveys for field-building insights

Ash JafariDec 5, 2022, 7:21 PM

11 points

2 comments5 min readLW link

Testing Ways to Bypass ChatGPT’s Safety Features

Robert_AIZIDec 5, 2022, 6:50 PM

7 points

4 comments5 min readLW link

(aizi.substack.com)

Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities

jacquesthibsDec 5, 2022, 4:09 PM

28 points

6 comments8 min readLW link

Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM

19 points

5 comments7 min readLW link

[Question] How should I judge the impact of giving $5k to a family of three kids and two mentally ill parents?

BlakeDec 5, 2022, 1:42 PM

10 points

10 comments1 min readLW link

Is the “Valley of Confused Abstractions” real?

jacquesthibsDec 5, 2022, 1:36 PM

20 points

11 comments2 min readLW link

Take 4: One problem with natural abstractions is there’s too many of them.

Charlie SteinerDec 5, 2022, 10:39 AM

37 points

4 comments1 min readLW link

[Question] What are some good Lesswrong-related accounts or hashtags on Mastodon that I should follow?

SpectrumDTDec 5, 2022, 9:42 AM

2 points

0 comments1 min readLW link

[Question] Who are some prominent reasonable people who are confident that AI won’t kill everyone?

Optimization ProcessDec 5, 2022, 9:12 AM

72 points

54 comments1 min readLW link

Monthly Shorts 11/22

CelerDec 5, 2022, 7:30 AM

8 points

0 comments3 min readLW link

(keller.substack.com)

A ChatGPT story about ChatGPT doom

SurfingOrcaDec 5, 2022, 5:40 AM

6 points

2 comments4 min readLW link

A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability

YitzDec 5, 2022, 5:33 AM

26 points

0 comments4 min readLW link

Nook Nature

Duncan Sabien (Inactive)Dec 5, 2022, 4:10 AM

54 points

18 comments10 min readLW link

Probably good projects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM

78 points

40 comments2 min readLW link

Historical Notes on Charitable Funds

jefftkDec 4, 2022, 11:30 PM

28 points

0 comments3 min readLW link

(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleeseDec 4, 2022, 11:00 PM

8 points

8 comments7 min readLW link

South Bay ACX/LW Pre-Holiday Get-Together

IS4 Dec 2022 22:57 UTC

10 points

0 comments1 min readLW link

ChatGPT is settling the Chinese Room argument

averros4 Dec 2022 20:25 UTC

−7 points

7 comments1 min readLW link

Race to the Top: Benchmarks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC

29 points

6 comments1 min readLW link

Open & Welcome Thread—December 2022

niplav4 Dec 2022 15:06 UTC

8 points

22 comments1 min readLW link

AI can exploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC

−15 points

4 comments LW link

ChatGPT seems overconfident to me

qbolec4 Dec 2022 8:03 UTC

19 points

3 comments16 min readLW link

Could an AI be Religious?

mk544 Dec 2022 5:00 UTC

−12 points

14 comments1 min readLW link

Can GPT-3 Write Contra Dances?

jefftk4 Dec 2022 3:00 UTC

6 points

4 comments10 min readLW link

(www.jefftk.com)

Take 3: No indescribable heavenworlds.

Charlie Steiner4 Dec 2022 2:48 UTC

23 points

12 comments2 min readLW link

Summary of a new study on out-group hate (and how to fix it)

DirectedEvolution4 Dec 2022 1:53 UTC

60 points

30 comments3 min readLW link

(www.pnas.org)

[Question] Will the first AGI agent have been designed as an agent (in addition to an AGI)?

nahoj3 Dec 2022 20:32 UTC

1 point

8 comments1 min readLW link

Logical induction for software engineers

Alex Flint3 Dec 2022 19:55 UTC

163 points

8 comments27 min readLW link 1 review

Utilitarianism is the only option

aelwood3 Dec 2022 17:14 UTC

−13 points

7 comments LW link