All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28

Remote AI Alignment Overhang?

tryactions19 Feb 2023 22:30 UTC

37 points

5 comments4 min readLW link

A Neural Network undergoing Gradient-based Training as a Complex System

carboniferous_umbraculum 19 Feb 2023 22:08 UTC

22 points

1 comment19 min readLW link

Another Way to Be Okay

Gretta Duleba19 Feb 2023 20:49 UTC

105 points

15 comments6 min readLW link

A Way To Be Okay

Duncan Sabien (Deactivated)19 Feb 2023 20:27 UTC

108 points

38 comments10 min readLW link 1 review

Exploring Lily’s world with ChatGPT [things an AI won’t do]

Bill Benzon19 Feb 2023 16:39 UTC

5 points

0 comments20 min readLW link

EIS VIII: An Engineer’s Understanding of Deceptive Alignment

scasper19 Feb 2023 15:25 UTC

30 points

5 comments4 min readLW link

Does novel understanding imply novel agency / values?

TsviBT19 Feb 2023 14:41 UTC

18 points

0 comments7 min readLW link

There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs

Taran19 Feb 2023 12:25 UTC

124 points

34 comments4 min readLW link

Navigating public AI x-risk hype while pursuing technical solutions

Dan Braun19 Feb 2023 12:22 UTC

18 points

0 comments2 min readLW link

Somewhat against “just update all the way”

tailcalled19 Feb 2023 10:49 UTC

29 points

10 comments2 min readLW link

Human beats SOTA Go AI by learning an adversarial policy

Vanessa Kosoy19 Feb 2023 9:38 UTC

58 points

32 comments1 min readLW link

(goattack.far.ai)

Degamification

Nate Showell19 Feb 2023 5:35 UTC

23 points

2 comments2 min readLW link

Stop posting prompt injections on Twitter and calling it “misalignment”

lc19 Feb 2023 2:21 UTC

144 points

9 comments1 min readLW link

AGI in sight: our look at the game board

Andrea_Miotti and Gabriel Alfour

18 Feb 2023 22:17 UTC

226 points

135 comments6 min readLW link

(andreamiotti.substack.com)

We should be signal-boosting anti Bing chat content

mbrooks18 Feb 2023 18:52 UTC

−4 points

13 comments2 min readLW link

Can talk, can think, can suffer.

Ilio18 Feb 2023 18:43 UTC

1 point

8 comments3 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

Near-Term Risks of an Obedient Artificial Intelligence

ymeskhout18 Feb 2023 18:30 UTC

20 points

1 comment6 min readLW link

EIS VII: A Challenge for Mechanists

scasper18 Feb 2023 18:27 UTC

36 points

4 comments3 min readLW link

Reading Speed Exists!

Johannes C. Mayer18 Feb 2023 15:30 UTC

12 points

9 comments1 min readLW link

The Practitioner’s Path 2.0: the Meditative Archetype

Evenflair18 Feb 2023 15:23 UTC

14 points

1 comment2 min readLW link

(guildoftherose.org)

Should we cry “wolf”?

Tapatakt18 Feb 2023 11:24 UTC

24 points

5 comments1 min readLW link

[Question] Name of the fallacy of assuming an extreme value (e.g. 0) with the illusion of ‘avoiding to have to make an assumption’?

FlorianH18 Feb 2023 8:11 UTC

4 points

1 comment1 min readLW link

I Think We’re Approaching The Bitter Lesson’s Asymptote

SomeoneYouOnceKnew18 Feb 2023 5:33 UTC

−3 points

9 comments5 min readLW link

Bus-Only Bus Lane Enforcement

jefftk18 Feb 2023 2:50 UTC

19 points

15 comments1 min readLW link

(www.jefftk.com)

Run Head on Towards the Falling Tears

Johannes C. Mayer18 Feb 2023 1:33 UTC

6 points

0 comments2 min readLW link

Two problems with ‘Simulators’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC

81 points

13 comments5 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC

109 points

27 comments11 min readLW link

On Board Vision, Hollow Words, and the End of the World

Marcello17 Feb 2023 23:18 UTC

52 points

27 comments5 min readLW link

PICT: A Zero-Shot Prompt Template to Automate Evaluation

Quentin FEUILLADE--MONTIXI17 Feb 2023 23:16 UTC

17 points

1 comment11 min readLW link

Hunch seeds: Info bio

the gears to ascension17 Feb 2023 21:25 UTC

12 points

0 comments9 min readLW link

Why Do We Believe

Screwtape17 Feb 2023 20:58 UTC

9 points

3 comments3 min readLW link

I Am Scared of Posting Negative Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC

63 points

28 comments1 min readLW link

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasper17 Feb 2023 20:48 UTC

49 points

9 comments12 min readLW link

Tinker Bell Theory and LLMs

Fergus Fettes17 Feb 2023 20:23 UTC

1 point

11 comments1 min readLW link

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

Vaniver17 Feb 2023 20:11 UTC

125 points

12 comments2 min readLW link

Microsoft and OpenAI, stop telling chatbots to roleplay as AI

hold_my_fish17 Feb 2023 19:55 UTC

49 points

10 comments1 min readLW link

A warm-up for the AI governance project

jacek17 Feb 2023 18:06 UTC

10 points

2 comments3 min readLW link

Link Post > Blog Post

party girl17 Feb 2023 17:59 UTC

4 points

6 comments1 min readLW link

(onthespectrumontheguestlist.substack.com)

One-layer transformers aren’t equivalent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC

127 points

11 comments7 min readLW link

[Question] Should we be kind and polite to emerging AIs?

David Gross17 Feb 2023 16:58 UTC

9 points

13 comments1 min readLW link

Follow-up Posting on Cyborg Psychologist

Hopkins Stanley17 Feb 2023 16:56 UTC

0 points

2 comments1 min readLW link

(www.lesswrong.com)

A “slow takeoff” might still look fast

MichaelDickens17 Feb 2023 16:51 UTC

5 points

3 comments1 min readLW link

AI Safety Info Distillation Fellowship

Robert Miles and mwatkins

17 Feb 2023 16:16 UTC

47 points

3 comments3 min readLW link

Nozick’s Dilemma: A Critique of Game Theory

Edward P. Könings17 Feb 2023 16:11 UTC

10 points

1 comment13 min readLW link

[Question] Are LLMs sufficient for AI takeoff?

rpglover6417 Feb 2023 15:46 UTC

8 points

2 comments1 min readLW link

Sydney’s Secret: A Short Story by Bing Chat

fela17 Feb 2023 13:31 UTC

36 points

1 comment5 min readLW link

Automating Consistency

Hoagy17 Feb 2023 13:24 UTC

10 points

0 comments1 min readLW link

Human decision processes are not well factored

remember and Gabriel Alfour

17 Feb 2023 13:11 UTC

33 points

3 comments2 min readLW link

2023 ACX Predictions: Buy/Sell/Hold

Zvi17 Feb 2023 13:10 UTC

25 points

3 comments20 min readLW link

(thezvi.wordpress.com)