All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Announcing Athena—Women in AI Alignment Research

Claire Short7 Nov 2023 21:46 UTC

80 points

2 comments3 min readLW link

Vote on Interesting Disagreements

Ben Pace7 Nov 2023 21:35 UTC

159 points

129 comments1 min readLW link

What is democracy for?

Johnstone7 Nov 2023 18:17 UTC

−5 points

10 comments7 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

7 Nov 2023 17:59 UTC

36 points

2 comments2 min readLW link

(arxiv.org)

Implementing Decision Theory

justinpombrio7 Nov 2023 17:55 UTC

22 points

12 comments3 min readLW link

Mirror, Mirror on the Wall: How Do Forecasters Fare by Their Own Call?

nikos7 Nov 2023 17:39 UTC

14 points

5 comments14 min readLW link

Symbiotic self-alignment of AIs.

Spiritus Dei7 Nov 2023 17:18 UTC

1 point

0 comments3 min readLW link

AMA: Earning to Give

jefftk7 Nov 2023 16:20 UTC

53 points

8 comments1 min readLW link

(www.jefftk.com)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Quentin FEUILLADE--MONTIXI and Pierre Peigné

7 Nov 2023 16:12 UTC

52 points

20 comments6 min readLW link

Preface to the Sequence on LLM Psychology

Quentin FEUILLADE--MONTIXI7 Nov 2023 16:12 UTC

33 points

0 comments2 min readLW link

What I’ve been reading, November 2023

jasoncrawford7 Nov 2023 13:37 UTC

23 points

1 comment5 min readLW link

(rootsofprogress.org)

AI Alignment [Progress] this Week (11/05/2023)

Logan Zoellner7 Nov 2023 13:26 UTC

24 points

0 comments4 min readLW link

(midwitalignment.substack.com)

On the UK Summit

Zvi7 Nov 2023 13:10 UTC

74 points

6 comments30 min readLW link

(thezvi.wordpress.com)

Box inversion revisited

Jan_Kulveit7 Nov 2023 11:09 UTC

40 points

3 comments8 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougall7 Nov 2023 9:43 UTC

56 points

0 comments1 min readLW link

The Perils of Professionalism

Screwtape7 Nov 2023 0:07 UTC

43 points

1 comment10 min readLW link

How to (hopefully ethically) make money off of AGI

habryka, Zvi, Cosmos and NoahK

6 Nov 2023 23:35 UTC

142 points

88 comments32 min readLW link 1 review

cost estimation for 2 grid energy storage systems

bhauth6 Nov 2023 23:32 UTC

16 points

12 comments7 min readLW link

(www.bhauth.com)

A bet on critical periods in neural networks

kave and Garrett Baker

6 Nov 2023 23:21 UTC

24 points

1 comment6 min readLW link

Job listing: Communications Generalist / Project Manager

Gretta Duleba6 Nov 2023 20:21 UTC

49 points

7 comments1 min readLW link

Askesis: a model of the cerebellum

MadHatter6 Nov 2023 20:19 UTC

7 points

2 comments1 min readLW link

(github.com)

LQPR: An Algorithm for Reinforcement Learning with Provable Safety Guarantees

MadHatter6 Nov 2023 20:17 UTC

6 points

0 comments1 min readLW link

(github.com)

ACX Meetup Leipzig

Roman Leipe6 Nov 2023 18:33 UTC

1 point

0 comments1 min readLW link

[Question] Does bulemia work?

lc6 Nov 2023 17:58 UTC

6 points

18 comments1 min readLW link

Why building ventures in AI Safety is particularly challenging

Heramb6 Nov 2023 16:27 UTC

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

What is true is already so. Owning up to it doesn’t make it worse.

RamblinDash6 Nov 2023 15:49 UTC

20 points

2 comments1 min readLW link

An illustrative model of backfire risks from pausing AI research

Maxime Riché6 Nov 2023 14:30 UTC

33 points

3 comments11 min readLW link

Proposal for improving state of alignment research

Iknownothing6 Nov 2023 13:55 UTC

2 points

0 comments1 min readLW link

 Are language models good at making predictions?

dynomight6 Nov 2023 13:10 UTC

76 points

14 comments4 min readLW link

(dynomight.net)

Tips, tricks, lessons and thoughts on hosting hackathons

gergogaspar6 Nov 2023 11:03 UTC

3 points

0 comments11 min readLW link

Announcing TAIS 2024

Blaine6 Nov 2023 8:38 UTC

23 points

0 comments1 min readLW link

(tais2024.cc)

Taboo Wall

Screwtape6 Nov 2023 3:51 UTC

19 points

0 comments3 min readLW link

When and why should you use the Kelly criterion?

Garrett Baker, philh and River

5 Nov 2023 23:26 UTC

27 points

25 comments16 min readLW link

On Overhangs and Technological Change

Roko5 Nov 2023 22:58 UTC

50 points

19 comments2 min readLW link

xAI announces Grok, beats GPT-3.5

nikola5 Nov 2023 22:11 UTC

10 points

6 comments1 min readLW link

(x.ai)

Disentangling four motivations for acting in accordance with UDT

Julian Stastny5 Nov 2023 21:26 UTC

33 points

3 comments7 min readLW link

AI as Super-Demagogue

RationalDino5 Nov 2023 21:21 UTC

0 points

11 comments9 min readLW link

EA orgs’ legal structure inhibits risk taking and information sharing on the margin

Elizabeth5 Nov 2023 19:13 UTC

136 points

17 comments4 min readLW link

Eric Schmidt on recursive self-improvement

nikola5 Nov 2023 19:05 UTC

24 points

3 comments1 min readLW link

(www.youtube.com)

Pivotal Acts might Not be what You Think they are

Johannes C. Mayer5 Nov 2023 17:23 UTC

41 points

13 comments3 min readLW link

The Assumed Intent Bias

silentbob5 Nov 2023 16:28 UTC

51 points

13 comments6 min readLW link

Go flash blinking lights at printed text right now

lemonhope5 Nov 2023 7:29 UTC

15 points

9 comments1 min readLW link

Life of GPT

Odd anon5 Nov 2023 4:55 UTC

6 points

2 comments5 min readLW link

Lightning Talks

Screwtape5 Nov 2023 3:27 UTC

6 points

3 comments4 min readLW link

Utility is not the selection target

tailcalled4 Nov 2023 22:48 UTC

24 points

1 comment1 min readLW link

Stuxnet, not Skynet: Humanity’s disempowerment by AI

Roko4 Nov 2023 22:23 UTC

107 points

24 comments6 min readLW link

The 6D effect: When companies take risks, one email can be very powerful.

scasper4 Nov 2023 20:08 UTC

275 points

42 comments3 min readLW link

Genetic fitness is a measure of selection strength, not the selection target

Kaj_Sotala4 Nov 2023 19:02 UTC

56 points

43 comments18 min readLW link

The Soul Key

Richard_Ngo4 Nov 2023 17:51 UTC

97 points

9 comments8 min readLW link

(www.narrativeark.xyz)

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC

27 points

0 comments1 min readLW link

(arxiv.org)