All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Activation adding experiments with FLAN-T5

Nina Panickssery13 Jul 2023 23:32 UTC

21 points

5 comments7 min readLW link

[Question] What criterion would you use to select companies likely to cause AI doom?

momom213 Jul 2023 20:31 UTC

8 points

4 comments1 min readLW link

Newcomb II: Newer and Comb-ier

Nathaniel Monson13 Jul 2023 18:49 UTC

0 points

11 comments3 min readLW link

Jailbreaking GPT-4′s code interpreter

nikola13 Jul 2023 18:43 UTC

160 points

22 comments7 min readLW link

An attempt to steelman OpenAI’s alignment plan

Nathan Helm-Burger13 Jul 2023 18:25 UTC

22 points

0 comments4 min readLW link

Instrumental Convergence to Complexity Preservation

Macro Flaneur13 Jul 2023 17:40 UTC

2 points

2 comments3 min readLW link

Unabridged History of Global Parenting

CrimsonChin13 Jul 2023 16:49 UTC

0 points

2 comments7 min readLW link

The Goddess of Everything Else—The Animation

Writer13 Jul 2023 16:26 UTC

142 points

4 comments1 min readLW link

(youtu.be)

Winners of AI Alignment Awards Research Contest

Akash and OliviaJ

13 Jul 2023 16:14 UTC

115 points

4 comments12 min readLW link

(alignmentawards.com)

Accidentally Load Bearing

jefftk13 Jul 2023 16:10 UTC

280 points

17 comments1 min readLW link 1 review

(www.jefftk.com)

AI #20: Code Interpreter and Claude 2.0 for Everyone

Zvi13 Jul 2023 14:00 UTC

60 points

9 comments56 min readLW link

(thezvi.wordpress.com)

[Question] How can I get help becoming a better rationalist?

TeaTieAndHat13 Jul 2023 13:41 UTC

31 points

19 comments1 min readLW link

i love eating trash

Ace Delgado13 Jul 2023 11:23 UTC

−15 points

0 comments1 min readLW link

Elon Musk announces xAI

Jan_Kulveit13 Jul 2023 9:01 UTC

75 points

35 comments1 min readLW link

(www.ft.com)

The intelligence-sentience orthogonality thesis

Ben Smith13 Jul 2023 6:55 UTC

18 points

9 comments9 min readLW link

Alignment Megaprojects: You’re Not Even Trying to Have Ideas

Nicholas / Heather Kross12 Jul 2023 23:39 UTC

55 points

30 comments2 min readLW link

Eric Michaud on the Quantization Model of Neural Scaling, Interpretability and Grokking

Michaël Trazzi12 Jul 2023 22:45 UTC

10 points

0 comments2 min readLW link

(theinsideview.ai)

[Question] Are there any good, easy-to-understand examples of cases where statistical causal network discovery worked well in practice?

tailcalled12 Jul 2023 22:08 UTC

42 points

6 comments1 min readLW link

The Opt-In Revolution — My vision of a positive future with ASI (An experiment with LLM storytelling)

Tachikoma12 Jul 2023 21:08 UTC

2 points

0 comments2 min readLW link

[Question] What does the launch of x.ai mean for AI Safety?

Chris_Leong12 Jul 2023 19:42 UTC

35 points

3 comments1 min readLW link

Towards Developmental Interpretability

Jesse Hoogland, Alexander Gietelink Oldenziel, Daniel Murfet and Stan van Wingerden

12 Jul 2023 19:33 UTC

180 points

10 comments9 min readLW link 1 review

Flowchart: How might rogue AIs defeat all humans?

Aryeh Englander12 Jul 2023 19:23 UTC

12 points

0 comments1 min readLW link

A review of Principia Qualia

jessicata12 Jul 2023 18:38 UTC

56 points

8 comments10 min readLW link

(unstablerontology.substack.com)

How I Learned To Stop Worrying And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC

9 points

15 comments5 min readLW link

Goal-Direction for Simulated Agents

Raymond D12 Jul 2023 17:06 UTC

33 points

2 comments6 min readLW link

AISN#14: OpenAI’s ‘Superalignment’ team, Musk’s xAI launches, and developments in military AI use

Dan H12 Jul 2023 16:58 UTC

16 points

0 comments1 min readLW link

Report on modeling evidential cooperation in large worlds

Johannes Treutlein12 Jul 2023 16:37 UTC

45 points

3 comments1 min readLW link

(arxiv.org)

Compression of morbidity

DirectedEvolution12 Jul 2023 15:26 UTC

12 points

0 comments3 min readLW link

An Overview of the AI Safety Funding Situation

Stephen McAleese12 Jul 2023 14:54 UTC

67 points

9 comments1 min readLW link

[Question] What is some unnecessarily obscure jargon that people here tend to use?

jchan12 Jul 2023 13:52 UTC

17 points

5 comments1 min readLW link

Housing and Transit Roundup #5

Zvi12 Jul 2023 13:10 UTC

25 points

1 comment20 min readLW link

(thezvi.wordpress.com)

A transcript of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC

105 points

13 comments4 min readLW link

Lightweight minimal speech recognition?

jefftk12 Jul 2023 12:00 UTC

9 points

6 comments1 min readLW link

(www.jefftk.com)

Aging and the geroscience hypothesis

DirectedEvolution12 Jul 2023 7:16 UTC

54 points

14 comments5 min readLW link

Popularizing vibes vs. models

DirectedEvolution12 Jul 2023 5:44 UTC

19 points

0 comments2 min readLW link

Announcing the AI Fables Writing Contest!

DaystarEld12 Jul 2023 3:04 UTC

36 points

3 comments1 min readLW link

Why it’s necessary to shoot yourself in the foot

Jacob G-W11 Jul 2023 21:17 UTC

39 points

7 comments2 min readLW link

(g-w1.github.io)

How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond.

Christopher King11 Jul 2023 19:27 UTC

17 points

11 comments2 min readLW link

[Question] Do we automatically accept propositions?

Aaron Graifman11 Jul 2023 17:45 UTC

10 points

5 comments1 min readLW link

fMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

Escaque 6611 Jul 2023 17:17 UTC

−1 points

3 comments2 min readLW link

Introducing Fatebook: the fastest way to make and track predictions

Adam B and Sage Future

11 Jul 2023 15:28 UTC

128 points

36 comments1 min readLW link

(fatebook.io)

My Weirdest Experience

Bridgett Kay11 Jul 2023 14:44 UTC

37 points

19 comments1 min readLW link

(dxmrevealed.wordpress.com)

Announcing The Roots of Progress Blog-Building Intensive

jasoncrawford11 Jul 2023 14:04 UTC

10 points

0 comments1 min readLW link

(rootsofprogress.org)

OpenAI Launches Superalignment Taskforce

Zvi11 Jul 2023 13:00 UTC

149 points

40 comments49 min readLW link

(thezvi.wordpress.com)

Critiquing Risks From Learned Optimization, and Avoiding Cached Theories

ProofBySonnet11 Jul 2023 11:38 UTC

1 point

0 comments6 min readLW link

[UPDATE: deadline extended to July 24!] New wind in rationality’s sails: Applications for Epistea Residency 2023 are now open

Jana Meixnerová and Irena Kotíková

11 Jul 2023 11:02 UTC

80 points

7 comments3 min readLW link

Two Hot Takes about Quine

Charlie Steiner11 Jul 2023 6:42 UTC

15 points

0 comments2 min readLW link

Disincentivizing deception in mesa optimizers with Model Tampering

martinkunev11 Jul 2023 0:44 UTC

3 points

0 comments2 min readLW link

Drawn Out: a story

Richard_Ngo11 Jul 2023 0:08 UTC

77 points

2 comments8 min readLW link

Definitions are about efficiency and consistency with common language.

Nacruno9610 Jul 2023 23:46 UTC

1 point

0 comments4 min readLW link