All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

Focus on existential risk is a distraction from the real issues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC

−19 points

11 comments2 min readLW link

Will releasing the weights of large language models grant widespread access to pandemic agents?

jefftk30 Oct 2023 18:22 UTC

46 points

25 comments1 min readLW link

(arxiv.org)

[Linkpost] Two major announcements in AI governance today

Angélina30 Oct 2023 17:28 UTC

1 point

1 comment1 min readLW link

(www.whitehouse.gov)

Grokking Beyond Neural Networks

Jack Miller30 Oct 2023 17:28 UTC

10 points

0 comments2 min readLW link

(arxiv.org)

Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew Wearden30 Oct 2023 17:27 UTC

5 points

2 comments6 min readLW link

(matthewwearden.co.uk)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Zeming Wei30 Oct 2023 17:22 UTC

3 points

1 comment1 min readLW link

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare

trevor30 Oct 2023 16:30 UTC

32 points

0 comments10 min readLW link

[Linkpost] Biden-Harris Executive Order on AI

beren30 Oct 2023 15:20 UTC

3 points

0 comments1 min readLW link

AI Alignment [progress] this Week (10/29/2023)

Logan Zoellner30 Oct 2023 15:02 UTC

15 points

4 comments6 min readLW link

(midwitalignment.substack.com)

Improving the Welfare of AIs: A Nearcasted Proposal

ryan_greenblatt30 Oct 2023 14:51 UTC

104 points

5 comments20 min readLW link

President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

Tristan Williams30 Oct 2023 11:15 UTC

171 points

39 comments1 min readLW link

(www.whitehouse.gov)

GPT-2 XL’s capacity for coherence and ontology clustering

MiguelDev30 Oct 2023 9:24 UTC

6 points

2 comments41 min readLW link

Charbel-Raphaël and Lucius discuss interpretability

Mateusz Bagiński, Charbel-Raphaël and Lucius Bushnaq

30 Oct 2023 5:50 UTC

110 points

7 comments21 min readLW link

Multi-Winner 3-2-1 Voting

Yoav Ravid30 Oct 2023 3:31 UTC

14 points

6 comments3 min readLW link

math terminology as convolution

bhauth30 Oct 2023 1:05 UTC

34 points

1 comment4 min readLW link

(www.bhauth.com)

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

29 Oct 2023 23:17 UTC

75 points

11 comments23 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

sudo29 Oct 2023 23:09 UTC

156 points

22 comments10 min readLW link

(nitter.net)

Mathematically-Defined Optimization Captures A Lot of Useful Information

J Bostock29 Oct 2023 17:17 UTC

19 points

0 comments2 min readLW link

Clarifying the free energy principle (with quotes)

Ryo 29 Oct 2023 16:03 UTC

8 points

0 comments9 min readLW link

A new intro to Quantum Physics, with the math fixed

titotal29 Oct 2023 15:11 UTC

113 points

23 comments17 min readLW link

(titotal.substack.com)

My idea of sacredness, divinity, and religion

Kaj_Sotala29 Oct 2023 12:50 UTC

40 points

10 comments4 min readLW link

(kajsotala.fi)

The AI Boom Mainly Benefits Big Firms, but long-term, markets will concentrate

Hauke Hillebrandt29 Oct 2023 8:38 UTC

−1 points

0 comments1 min readLW link

What’s up with “Responsible Scaling Policies”?

habryka and ryan_greenblatt

29 Oct 2023 4:17 UTC

99 points

8 comments20 min readLW link

Experiments as a Third Alternative

Adam Zerner29 Oct 2023 0:39 UTC

48 points

21 comments5 min readLW link

Comparing representation vectors between llama 2 base and chat

Nina Panickssery28 Oct 2023 22:54 UTC

36 points

5 comments2 min readLW link

Vaniver’s thoughts on Anthropic’s RSP

Vaniver28 Oct 2023 21:06 UTC

46 points

4 comments3 min readLW link

Book Review: Orality and Literacy: The Technologizing of the Word

Fergus Fettes28 Oct 2023 20:12 UTC

13 points

0 comments16 min readLW link

Regrant up to $600,000 to AI safety projects with GiveWiki

Dawn Drescher28 Oct 2023 19:56 UTC

33 points

1 comment1 min readLW link

Shane Legg interview on alignment

Seth Herd28 Oct 2023 19:28 UTC

66 points

20 comments2 min readLW link

(www.youtube.com)

AI Existential Safety Fellowships

mmfli28 Oct 2023 18:07 UTC

5 points

0 comments1 min readLW link

AI Safety Hub Serbia Official Opening

DusanDNesic and Tanja T

28 Oct 2023 17:03 UTC

55 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

 Managing AI Risks in an Era of Rapid Progress

Algon28 Oct 2023 15:48 UTC

30 points

3 comments11 min readLW link

(managing-ai-risks.com)

[Question] ELI5 Why isn’t alignment easier as models get stronger?

Logan Zoellner28 Oct 2023 14:34 UTC

3 points

9 comments1 min readLW link

Truthseeking, EA, Simulacra levels, and other stuff

Elizabeth and Vaniver

27 Oct 2023 23:56 UTC

44 points

12 comments9 min readLW link

[Question] Do you believe “E=mc^2” is a correct and/or useful equation, and, whether yes or no, precisely what are your reasons for holding this belief (with such a degree of confidence)?

l8c27 Oct 2023 22:46 UTC

10 points

14 comments1 min readLW link

Value systematization: how values become coherent (and misaligned)

Richard_Ngo27 Oct 2023 19:06 UTC

102 points

48 comments13 min readLW link

Techno-humanism is techno-optimism for the 21st century

Richard_Ngo27 Oct 2023 18:37 UTC

88 points

5 comments14 min readLW link

(www.mindthefuture.info)

Sanctuary for Humans

Nikola Jurkovic27 Oct 2023 18:08 UTC

21 points

9 comments1 min readLW link

Wireheading and misalignment by composition on NetHack

pierlucadoro27 Oct 2023 17:43 UTC

34 points

4 comments4 min readLW link

We’re Not Ready: thoughts on “pausing” and responsible scaling policies

HoldenKarnofsky27 Oct 2023 15:19 UTC

200 points

33 comments8 min readLW link

Aspiration-based Q-Learning

Clément Dumas and Jobst Heitzig

27 Oct 2023 14:42 UTC

38 points

5 comments11 min readLW link

Linkpost: Rishi Sunak’s Speech on AI (26th October)

bideup27 Oct 2023 11:57 UTC

85 points

8 comments7 min readLW link

(www.gov.uk)

ASPR & WARP: Rationality Camps for Teens in Taiwan and Oxford

Anna Gajdova27 Oct 2023 8:40 UTC

18 points

0 comments1 min readLW link

[Question] To what extent is the UK Government’s recent AI Safety push entirely due to Rishi Sunak?

Stephen Fowler27 Oct 2023 3:29 UTC

23 points

4 comments1 min readLW link

Bayesian Punishment

Rob Lucas27 Oct 2023 3:24 UTC

1 point

1 comment6 min readLW link

Online Dialogues Party — Sunday 5th November

Ben Pace27 Oct 2023 2:41 UTC

28 points

1 comment1 min readLW link

OpenAI’s new Preparedness team is hiring

leopold26 Oct 2023 20:42 UTC

60 points

2 comments1 min readLW link

Fake Deeply

Zack_M_Davis26 Oct 2023 19:55 UTC

33 points

7 comments1 min readLW link

(unremediatedgender.space)

Symbol/Referent Confusions in Language Model Alignment Experiments

johnswentworth26 Oct 2023 19:49 UTC

94 points

44 comments6 min readLW link

Unsupervised Methods for Concept Discovery in AlphaZero

aogara26 Oct 2023 19:05 UTC

9 points

0 comments1 min readLW link

(arxiv.org)