All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Thoughts on the AI Safety Summit company policy requests and responses

So8res31 Oct 2023 23:54 UTC

169 points

14 comments10 min readLW link

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks

aogara and Dan H

31 Oct 2023 19:34 UTC

35 points

1 comment6 min readLW link

(newsletter.safe.ai)

If AIs become self-aware, what religion will they have?

mnvr31 Oct 2023 17:29 UTC

−17 points

3 comments4 min readLW link

Self-Blinded L-Theanine RCT

niplav31 Oct 2023 15:24 UTC

53 points

12 comments3 min readLW link

AI Safety 101 - Chapter 5.2 - Unrestricted Adversarial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC

17 points

0 comments19 min readLW link

Preventing Language Models from hiding their reasoning

Fabien Roger and ryan_greenblatt

31 Oct 2023 14:34 UTC

113 points

14 comments12 min readLW link

AI Safety 101 - Chapter 5.1 - Debate

Charbel-Raphaël31 Oct 2023 14:29 UTC

15 points

0 comments13 min readLW link

M&A in AI

Hauke Hillebrandt31 Oct 2023 12:20 UTC

2 points

0 comments1 min readLW link

Urging an International AI Treaty: An Open Letter

Olli Järviniemi31 Oct 2023 11:26 UTC

48 points

2 comments1 min readLW link

(aitreaty.org)

[Closed] Agent Foundations track in MATS

Vanessa Kosoy31 Oct 2023 8:12 UTC

54 points

1 comment1 min readLW link

(www.matsprogram.org)

Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI

jsteinhardt31 Oct 2023 5:10 UTC

40 points

0 comments12 min readLW link

(bounded-regret.ghost.io)

Focus on existential risk is a distraction from the real issues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC

−19 points

11 comments2 min readLW link

Will releasing the weights of large language models grant widespread access to pandemic agents?

jefftk30 Oct 2023 18:22 UTC

46 points

25 comments1 min readLW link

(arxiv.org)

[Linkpost] Two major announcements in AI governance today

Angélina30 Oct 2023 17:28 UTC

1 point

1 comment1 min readLW link

(www.whitehouse.gov)

Grokking Beyond Neural Networks

Jack Miller30 Oct 2023 17:28 UTC

10 points

0 comments2 min readLW link

(arxiv.org)

Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew Wearden30 Oct 2023 17:27 UTC

5 points

2 comments6 min readLW link

(matthewwearden.co.uk)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Zeming Wei30 Oct 2023 17:22 UTC

3 points

1 comment1 min readLW link

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare

trevor30 Oct 2023 16:30 UTC

32 points

0 comments10 min readLW link

[Linkpost] Biden-Harris Executive Order on AI

beren30 Oct 2023 15:20 UTC

3 points

0 comments1 min readLW link

AI Alignment [progress] this Week (10/29/2023)

Logan Zoellner30 Oct 2023 15:02 UTC

15 points

4 comments6 min readLW link

(midwitalignment.substack.com)

Improving the Welfare of AIs: A Nearcasted Proposal

ryan_greenblatt30 Oct 2023 14:51 UTC

98 points

5 comments20 min readLW link

President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

Tristan Williams30 Oct 2023 11:15 UTC

171 points

39 comments1 min readLW link

(www.whitehouse.gov)

GPT-2 XL’s capacity for coherence and ontology clustering

MiguelDev30 Oct 2023 9:24 UTC

6 points

2 comments41 min readLW link

Charbel-Raphaël and Lucius discuss interpretability

Mateusz Bagiński, Charbel-Raphaël and Lucius Bushnaq

30 Oct 2023 5:50 UTC

107 points

7 comments21 min readLW link

Multi-Winner 3-2-1 Voting

Yoav Ravid30 Oct 2023 3:31 UTC

14 points

6 comments3 min readLW link

math terminology as convolution

bhauth30 Oct 2023 1:05 UTC

34 points

1 comment4 min readLW link

(www.bhauth.com)

Grokking, memorization, and generalization — a discussion

Kaarel and Dmitry Vaintrob

29 Oct 2023 23:17 UTC

75 points

11 comments23 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

sudo29 Oct 2023 23:09 UTC

154 points

22 comments10 min readLW link

(nitter.net)

Mathematically-Defined Optimization Captures A Lot of Useful Information

J Bostock29 Oct 2023 17:17 UTC

19 points

0 comments2 min readLW link

Clarifying the free energy principle (with quotes)

Ryo 29 Oct 2023 16:03 UTC

8 points

0 comments9 min readLW link

A new intro to Quantum Physics, with the math fixed

titotal29 Oct 2023 15:11 UTC

113 points

23 comments17 min readLW link

(titotal.substack.com)

My idea of sacredness, divinity, and religion

Kaj_Sotala29 Oct 2023 12:50 UTC

40 points

10 comments4 min readLW link

(kajsotala.fi)

The AI Boom Mainly Benefits Big Firms, but long-term, markets will concentrate

Hauke Hillebrandt29 Oct 2023 8:38 UTC

−1 points

0 comments1 min readLW link

What’s up with “Responsible Scaling Policies”?

habryka and ryan_greenblatt

29 Oct 2023 4:17 UTC

99 points

8 comments20 min readLW link

Experiments as a Third Alternative

Adam Zerner29 Oct 2023 0:39 UTC

48 points

21 comments5 min readLW link

Comparing representation vectors between llama 2 base and chat

Nina Panickssery28 Oct 2023 22:54 UTC

36 points

5 comments2 min readLW link

Vaniver’s thoughts on Anthropic’s RSP

Vaniver28 Oct 2023 21:06 UTC

46 points

4 comments3 min readLW link

Book Review: Orality and Literacy: The Technologizing of the Word

Fergus Fettes28 Oct 2023 20:12 UTC

13 points

0 comments16 min readLW link

Regrant up to $600,000 to AI safety projects with GiveWiki

Dawn Drescher28 Oct 2023 19:56 UTC

33 points

1 comment1 min readLW link

Shane Legg interview on alignment

Seth Herd28 Oct 2023 19:28 UTC

66 points

20 comments2 min readLW link

(www.youtube.com)

AI Existential Safety Fellowships

mmfli28 Oct 2023 18:07 UTC

5 points

0 comments1 min readLW link

AI Safety Hub Serbia Official Opening

DusanDNesic and Tanja T

28 Oct 2023 17:03 UTC

55 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

 Managing AI Risks in an Era of Rapid Progress

Algon28 Oct 2023 15:48 UTC

30 points

3 comments11 min readLW link

(managing-ai-risks.com)

[Question] ELI5 Why isn’t alignment easier as models get stronger?

Logan Zoellner28 Oct 2023 14:34 UTC

3 points

9 comments1 min readLW link

Truthseeking, EA, Simulacra levels, and other stuff

Elizabeth and Vaniver

27 Oct 2023 23:56 UTC

44 points

12 comments9 min readLW link

[Question] Do you believe “E=mc^2” is a correct and/or useful equation, and, whether yes or no, precisely what are your reasons for holding this belief (with such a degree of confidence)?

l8c27 Oct 2023 22:46 UTC

10 points

14 comments1 min readLW link

Value systematization: how values become coherent (and misaligned)

Richard_Ngo27 Oct 2023 19:06 UTC

102 points

48 comments13 min readLW link

Techno-humanism is techno-optimism for the 21st century

Richard_Ngo27 Oct 2023 18:37 UTC

88 points

5 comments14 min readLW link

(www.mindthefuture.info)

Sanctuary for Humans

nikola27 Oct 2023 18:08 UTC

21 points

9 comments1 min readLW link

Wireheading and misalignment by composition on NetHack

pierlucadoro27 Oct 2023 17:43 UTC

34 points

4 comments4 min readLW link