1 Nov 2023 21:31 UTC

26 points

16 comments29 min readLW link

My thoughts on the social response to AI risk

Matthew Barnett1 Nov 2023 21:17 UTC

157 points

37 comments10 min readLW link

Reactions to the Executive Order

Zvi1 Nov 2023 20:40 UTC

77 points

4 comments29 min readLW link

(thezvi.wordpress.com)

Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC

85 points

1 comment4 min readLW link

(www.anthropic.com)

Book Review: Determined by Sapolsky

Kailuo Wang1 Nov 2023 17:37 UTC

1 point

0 comments7 min readLW link

AI Alignment: A Comprehensive Survey

Stephen McAleer1 Nov 2023 17:35 UTC

15 points

1 comment1 min readLW link

(arxiv.org)

A list of all the deadlines in Biden’s Executive Order on AI

Valentin Baltadzhiev1 Nov 2023 17:14 UTC

26 points

2 comments11 min readLW link

2023 LessWrong Community Census, Request for Comments

Screwtape1 Nov 2023 16:32 UTC

43 points

37 comments2 min readLW link

[Question] Snapshot of narratives and frames against regulating AI

Jan_Kulveit1 Nov 2023 16:30 UTC

36 points

19 comments3 min readLW link

Commensal Institutions

Sable1 Nov 2023 16:01 UTC

8 points

12 comments4 min readLW link

(affablyevil.substack.com)

ChatGPT’s Ontological Landscape

Bill Benzon1 Nov 2023 15:12 UTC

7 points

0 comments4 min readLW link

On the Executive Order

Zvi1 Nov 2023 14:20 UTC

100 points

4 comments30 min readLW link

(thezvi.wordpress.com)

Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]

Akash1 Nov 2023 13:28 UTC

44 points

4 comments1 min readLW link

(www.ft.com)

[Question] Forecasting Questions: What do you want to predict on AI?

Nathan Young1 Nov 2023 13:17 UTC

7 points

2 comments1 min readLW link

Mission Impossible: Dead Reckoning Part 1 AI Takeaways

Zvi1 Nov 2023 12:52 UTC

47 points

13 comments6 min readLW link

Robustness of Contrast-Consistent Search to Adversarial Prompting

Nandi, i, Jamie Wright, Seamus_F and hugofry

1 Nov 2023 12:46 UTC

18 points

1 comment7 min readLW link

The Bletchley Declaration on AI Safety

Hauke Hillebrandt1 Nov 2023 11:44 UTC

17 points

0 comments1 min readLW link

(www.gov.uk)

Bay Winter Solstice 2023: Song & speech auditions

tcheasdfjkl1 Nov 2023 4:17 UTC

17 points

2 comments1 min readLW link

On Having No Clue

Chris_Leong1 Nov 2023 1:36 UTC

20 points

11 comments1 min readLW link

Balancing Security Mindset with Collaborative Research: A Proposal

MadHatter1 Nov 2023 0:46 UTC

9 points

3 comments4 min readLW link

Computational Approaches to Pathogen Detection

jefftk1 Nov 2023 0:30 UTC

32 points

5 comments5 min readLW link

(www.jefftk.com)

Thoughts on the AI Safety Summit company policy requests and responses

So8res31 Oct 2023 23:54 UTC

169 points

14 comments10 min readLW link

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks

aogara and Dan H

31 Oct 2023 19:34 UTC

35 points

1 comment6 min readLW link

(newsletter.safe.ai)

If AIs become self-aware, what religion will they have?

mnvr31 Oct 2023 17:29 UTC

−17 points

3 comments4 min readLW link

Self-Blinded L-Theanine RCT

niplav31 Oct 2023 15:24 UTC

53 points

12 comments3 min readLW link

AI Safety 101 - Chapter 5.2 - Unrestricted Adversarial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC

17 points

0 comments19 min readLW link

Preventing Language Models from hiding their reasoning

Fabien Roger and ryan_greenblatt

31 Oct 2023 14:34 UTC

113 points

14 comments12 min readLW link

AI Safety 101 - Chapter 5.1 - Debate

Charbel-Raphaël31 Oct 2023 14:29 UTC

15 points

0 comments13 min readLW link

M&A in AI

Hauke Hillebrandt31 Oct 2023 12:20 UTC

2 points

0 comments1 min readLW link

Urging an International AI Treaty: An Open Letter

Olli Järviniemi31 Oct 2023 11:26 UTC

48 points

2 comments1 min readLW link

(aitreaty.org)

[Closed] Agent Foundations track in MATS

Vanessa Kosoy31 Oct 2023 8:12 UTC

54 points

1 comment1 min readLW link

(www.matsprogram.org)

Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI

jsteinhardt31 Oct 2023 5:10 UTC

40 points

0 comments12 min readLW link

(bounded-regret.ghost.io)

Focus on existential risk is a distraction from the real issues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC

−19 points

11 comments2 min readLW link

Will releasing the weights of large language models grant widespread access to pandemic agents?

jefftk30 Oct 2023 18:22 UTC

46 points

25 comments1 min readLW link

(arxiv.org)

[Linkpost] Two major announcements in AI governance today

Angélina30 Oct 2023 17:28 UTC

1 point

1 comment1 min readLW link

(www.whitehouse.gov)

Grokking Beyond Neural Networks

Jack Miller30 Oct 2023 17:28 UTC

10 points

0 comments2 min readLW link

(arxiv.org)

Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew Wearden30 Oct 2023 17:27 UTC

5 points

2 comments6 min readLW link

(matthewwearden.co.uk)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Zeming Wei30 Oct 2023 17:22 UTC

3 points

1 comment1 min readLW link

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare

trevor30 Oct 2023 16:30 UTC

32 points

0 comments10 min readLW link

[Linkpost] Biden-Harris Executive Order on AI

beren30 Oct 2023 15:20 UTC

3 points

0 comments1 min readLW link

AI Alignment [progress] this Week (10/29/2023)

Logan Zoellner30 Oct 2023 15:02 UTC

15 points

4 comments6 min readLW link

(midwitalignment.substack.com)

Improving the Welfare of AIs: A Nearcasted Proposal

ryan_greenblatt30 Oct 2023 14:51 UTC

104 points

5 comments20 min readLW link

President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

Tristan Williams30 Oct 2023 11:15 UTC

171 points

39 comments1 min readLW link

(www.whitehouse.gov)

GPT-2 XL’s capacity for coherence and ontology clustering

MiguelDev30 Oct 2023 9:24 UTC

6 points

2 comments41 min readLW link

Charbel-Raphaël and Lucius discuss interpretability

Mateusz Bagiński, Charbel-Raphaël and Lucius Bushnaq

30 Oct 2023 5:50 UTC

110 points

7 comments21 min readLW link

Multi-Winner 3-2-1 Voting

Yoav Ravid30 Oct 2023 3:31 UTC

14 points

6 comments3 min readLW link

math terminology as convolution

bhauth30 Oct 2023 1:05 UTC

34 points

1 comment4 min readLW link

(www.bhauth.com)

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

29 Oct 2023 23:17 UTC

75 points

11 comments23 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

sudo29 Oct 2023 23:09 UTC

156 points

22 comments10 min readLW link

(nitter.net)

Mathematically-Defined Optimization Captures A Lot of Useful Information

J Bostock29 Oct 2023 17:17 UTC

19 points

0 comments2 min readLW link