RSS

Newsletters

Tag

QAPR 4: In­duc­tive biases

Quintin PopeOct 10, 2022, 10:08 PM
67 points
2 comments18 min readLW link

Fore­cast­ing Newslet­ter. June 2020.

NunoSempereJul 1, 2020, 9:46 AM
27 points
0 comments8 min readLW link

[MLSN #8] Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

Feb 20, 2023, 3:54 PM
20 points
0 comments4 min readLW link
(newsletter.mlsafety.org)

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin PopeSep 10, 2022, 6:39 AM
120 points
6 comments9 min readLW link

[AN #115]: AI safety re­search prob­lems in the AI-GA framework

Rohin ShahSep 2, 2020, 5:10 PM
19 points
16 comments6 min readLW link
(mailchi.mp)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

Rohin ShahJun 3, 2020, 5:20 PM
38 points
6 comments10 min readLW link
(mailchi.mp)

Hi­a­tus: EA and LW post summaries

Zoe WilliamsMay 17, 2023, 5:17 PM
14 points
0 comments1 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawfordMay 16, 2023, 8:54 PM
14 points
0 comments1 min readLW link
(rootsofprogress.org)

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rum Weekly Sum­mary (20th − 26th March 2023)

Zoe WilliamsMar 27, 2023, 8:46 PM
4 points
0 comments1 min readLW link

AI #58: Star­gate AGI

ZviApr 4, 2024, 1:10 PM
49 points
9 comments60 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

Dan HMay 23, 2023, 9:47 PM
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #53: One More Leap

ZviFeb 29, 2024, 4:10 PM
45 points
0 comments38 min readLW link
(thezvi.wordpress.com)

[AN #112]: Eng­ineer­ing a Safer World

Rohin ShahAug 13, 2020, 5:20 PM
26 points
2 comments12 min readLW link
(mailchi.mp)

AI #6: Agents of Change

ZviApr 6, 2023, 2:00 PM
79 points
13 comments47 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #1 [CAIS Linkpost]

Apr 10, 2023, 8:18 PM
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

[MLSN #9] Ver­ify­ing large train­ing runs, se­cu­rity risks from LLM ac­cess to APIs, why nat­u­ral se­lec­tion may fa­vor AIs over humans

Apr 11, 2023, 4:03 PM
11 points
0 comments6 min readLW link
(newsletter.mlsafety.org)

AI #7: Free Agency

ZviApr 13, 2023, 4:20 PM
33 points
12 comments47 min readLW link
(thezvi.wordpress.com)

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

simeon_cApr 14, 2023, 2:35 PM
11 points
3 comments1 min readLW link
(navigatingairisks.substack.com)

AI Im­pacts Quar­terly Newslet­ter, Jan-Mar 2023

HarlanApr 17, 2023, 10:10 PM
5 points
0 comments3 min readLW link
(blog.aiimpacts.org)

AI #13: Po­ten­tial Al­gorith­mic Improvements

ZviMay 25, 2023, 3:40 PM
45 points
4 comments67 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

Dan HMay 30, 2023, 11:52 AM
20 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #14: A Very Good Sentence

ZviJun 1, 2023, 9:30 PM
118 points
30 comments65 min readLW link
(thezvi.wordpress.com)

AI #59: Model Updates

ZviApr 11, 2024, 2:20 PM
30 points
2 comments63 min readLW link
(thezvi.wordpress.com)

AISN #9: State­ment on Ex­tinc­tion Risks, Com­pet­i­tive Pres­sures, and When Will AI Reach Hu­man-Level?

Dan HJun 6, 2023, 4:10 PM
12 points
0 comments7 min readLW link
(newsletter.safe.ai)

AI #15: The Prin­ci­ple of Charity

ZviJun 8, 2023, 12:10 PM
73 points
16 comments44 min readLW link
(thezvi.wordpress.com)

AI #16: AI in the UK

ZviJun 15, 2023, 1:20 PM
46 points
20 comments54 min readLW link
(thezvi.wordpress.com)

Sum­maries of top fo­rum posts (17th − 23rd April 2023)

Zoe WilliamsApr 24, 2023, 4:13 AM
18 points
0 comments1 min readLW link

AI #17: The Litany

ZviJun 22, 2023, 2:30 PM
95 points
34 comments56 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #3: AI policy pro­pos­als and a new challenger approaches

ozhangApr 25, 2023, 4:15 PM
33 points
0 comments1 min readLW link

AI #18: The Great De­bate Debate

ZviJun 29, 2023, 4:20 PM
47 points
9 comments52 min readLW link
(thezvi.wordpress.com)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan ZoellnerOct 23, 2023, 8:32 PM
22 points
0 comments6 min readLW link
(midwitalignment.substack.com)

AISN #12: Policy Pro­pos­als from NTIA’s Re­quest for Com­ment and Re­con­sid­er­ing In­stru­men­tal Convergence

Dan HJun 27, 2023, 5:20 PM
6 points
0 comments1 min readLW link

AI #19: Hofs­tadter, Sutskever, Leike

ZviJul 6, 2023, 12:50 PM
60 points
16 comments40 min readLW link
(thezvi.wordpress.com)

Monthly Roundup #8: July 2023

ZviJul 3, 2023, 1:20 PM
40 points
4 comments46 min readLW link
(thezvi.wordpress.com)

AISN #13: An in­ter­dis­ci­plinary per­spec­tive on AI proxy failures, new com­peti­tors to ChatGPT, and prompt­ing lan­guage mod­els to misbehave

Dan HJul 5, 2023, 3:33 PM
13 points
0 comments1 min readLW link

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

Dan HOct 31, 2023, 7:34 PM
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

AI #9: The Merge and the Million Tokens

ZviApr 27, 2023, 2:20 PM
36 points
8 comments53 min readLW link
(thezvi.wordpress.com)

Med­i­cal Roundup #3

ZviJul 9, 2024, 1:10 PM
39 points
4 comments19 min readLW link
(thezvi.wordpress.com)

AISN #26: Na­tional In­sti­tu­tions for AI Safety, Re­sults From the UK Sum­mit, and New Re­leases From OpenAI and xAI

Nov 15, 2023, 4:07 PM
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #72: Deny­ing the Future

ZviJul 11, 2024, 3:00 PM
45 points
8 comments41 min readLW link
(thezvi.wordpress.com)

AI #103: Show Me the Money

ZviFeb 13, 2025, 3:20 PM
30 points
9 comments58 min readLW link
(thezvi.wordpress.com)

AISN #33: Re­assess­ing AI and Biorisk Plus, Con­soli­da­tion in the Cor­po­rate AI Land­scape, and Na­tional In­vest­ments in AI

Apr 12, 2024, 4:10 PM
13 points
0 comments9 min readLW link
(newsletter.safe.ai)

Llama Llama-3-405B?

ZviJul 24, 2024, 7:40 PM
51 points
9 comments30 min readLW link
(thezvi.wordpress.com)

AI #41: Bring in the Other Gemini

ZviDec 7, 2023, 3:10 PM
46 points
16 comments52 min readLW link
(thezvi.wordpress.com)

AISN #27: Defen­sive Ac­cel­er­a­tionism, A Ret­ro­spec­tive On The OpenAI Board Saga, And A New AI Bill From Se­na­tors Thune And Klobuchar

Dec 7, 2023, 3:59 PM
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #74: GPT-4o Mini Me and Llama 3

ZviJul 25, 2024, 1:50 PM
30 points
6 comments36 min readLW link
(thezvi.wordpress.com)

AISN#14: OpenAI’s ‘Su­per­al­ign­ment’ team, Musk’s xAI launches, and de­vel­op­ments in mil­i­tary AI use

Dan HJul 12, 2023, 4:58 PM
16 points
0 comments1 min readLW link

Align­ment Newslet­ter #36

Rohin ShahDec 12, 2018, 1:10 AM
21 points
0 comments11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #47

Rohin ShahMar 4, 2019, 4:30 AM
18 points
0 comments8 min readLW link
(mailchi.mp)

AI #20: Code In­ter­preter and Claude 2.0 for Everyone

ZviJul 13, 2023, 2:00 PM
60 points
9 comments56 min readLW link
(thezvi.wordpress.com)

AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copy­right In­fringe­ment, and Con­gres­sional Ques­tions about Re­search Stan­dards in AI Safety

Jan 4, 2024, 4:09 PM
8 points
0 comments6 min readLW link
(newsletter.safe.ai)

AISN#15: China and the US take ac­tion to reg­u­late AI, re­sults from a tour­na­ment fore­cast­ing AI risk, up­dates on xAI’s plan, and Meta re­leases its open-source and com­mer­cially available Llama 2

Jul 19, 2023, 1:01 PM
16 points
0 comments6 min readLW link
(newsletter.safe.ai)

AISN #28: Cen­ter for AI Safety 2023 Year in Review

Dan HDec 23, 2023, 9:31 PM
30 points
1 comment5 min readLW link
(newsletter.safe.ai)

AI #95: o1 Joins the API

ZviDec 19, 2024, 3:10 PM
58 points
1 comment41 min readLW link
(thezvi.wordpress.com)

[AN #118]: Risks, solu­tions, and pri­ori­ti­za­tion in a world with many AI systems

Rohin ShahSep 23, 2020, 6:20 PM
15 points
6 comments10 min readLW link
(mailchi.mp)

Progress links and tweets, 2023-07-20: “A god­dess en­throned on a car”

jasoncrawfordJul 20, 2023, 6:28 PM
12 points
4 comments2 min readLW link
(rootsofprogress.org)

AI #22: Into the Weeds

ZviJul 27, 2023, 5:40 PM
49 points
8 comments84 min readLW link
(thezvi.wordpress.com)

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

Jul 25, 2023, 4:58 PM
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

AISN #17: Au­to­mat­i­cally Cir­cum­vent­ing LLM Guardrails, the Fron­tier Model Fo­rum, and Se­nate Hear­ing on AI Oversight

Dan HAug 1, 2023, 3:40 PM
8 points
0 comments8 min readLW link
(newsletter.safe.ai)

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

Aug 1, 2023, 3:39 PM
3 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #23: Fun­da­men­tal Prob­lems with RLHF

ZviAug 3, 2023, 12:50 PM
59 points
9 comments41 min readLW link
(thezvi.wordpress.com)

AI #24: Week of the Podcast

ZviAug 10, 2023, 3:00 PM
49 points
5 comments44 min readLW link
(thezvi.wordpress.com)

AISN #18: Challenges of Re­in­force­ment Learn­ing from Hu­man Feed­back, Microsoft’s Se­cu­rity Breach, and Con­cep­tual Re­search on AI Safety

Dan HAug 8, 2023, 3:52 PM
13 points
0 comments1 min readLW link
(newsletter.safe.ai)

AI #75: Math is Easier

ZviAug 1, 2024, 1:40 PM
46 points
25 comments72 min readLW link
(thezvi.wordpress.com)

Progress links di­gest, 2023-08-09: US adds new nu­clear, Katalin Kar­ikó in­ter­view, and more

jasoncrawfordAug 9, 2023, 7:22 PM
18 points
0 comments3 min readLW link
(rootsofprogress.org)

AI #76: Six Shorts Sto­ries About OpenAI

ZviAug 8, 2024, 1:50 PM
53 points
10 comments48 min readLW link
(thezvi.wordpress.com)

Startup Roundup #2

ZviAug 6, 2024, 1:30 PM
45 points
0 comments32 min readLW link
(thezvi.wordpress.com)

AI #83: The Mask Comes Off

ZviSep 26, 2024, 12:00 PM
82 points
20 comments36 min readLW link
(thezvi.wordpress.com)

Monthly Roundup #25: De­cem­ber 2024

ZviDec 23, 2024, 2:20 PM
18 points
3 comments26 min readLW link
(thezvi.wordpress.com)

AISN #34: New Mili­tary AI Sys­tems Plus, AI Labs Fail to Uphold Vol­un­tary Com­mit­ments to UK AI Safety In­sti­tute, and New AI Policy Pro­pos­als in the US Senate

May 2, 2024, 4:12 PM
6 points
0 comments8 min readLW link
(newsletter.safe.ai)

AISN #32: Mea­sur­ing and Re­duc­ing Hazardous Knowl­edge in LLMs Plus, Fore­cast­ing the Fu­ture with LLMs, and Reg­u­la­tory Markets

Mar 7, 2024, 4:39 PM
8 points
0 comments8 min readLW link
(newsletter.safe.ai)

AI #86: Just Think of the Potential

ZviOct 17, 2024, 3:10 PM
58 points
8 comments57 min readLW link
(thezvi.wordpress.com)

Hous­ing Roundup #10

ZviOct 29, 2024, 1:50 PM
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

AI #26: Fine Tun­ing Time

ZviAug 24, 2023, 3:30 PM
49 points
6 comments33 min readLW link
(thezvi.wordpress.com)

AI #87: Stay­ing in Character

ZviOct 29, 2024, 7:10 AM
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

ZviOct 30, 2024, 11:00 AM
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Oc­to­ber 2024 Progress in Guaran­teed Safe AI

QuinnOct 28, 2024, 11:34 PM
7 points
0 comments1 min readLW link
(gsai.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjoOct 31, 2024, 12:09 AM
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

AI #88: Thanks for the Memos

ZviOct 31, 2024, 3:00 PM
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

AI #105: Hey There Alexa

ZviFeb 27, 2025, 2:30 PM
31 points
3 comments40 min readLW link
(thezvi.wordpress.com)

AI #89: Trump Card

ZviNov 7, 2024, 4:30 PM
42 points
12 comments42 min readLW link
(thezvi.wordpress.com)

AISN #35: Lob­by­ing on AI Reg­u­la­tion Plus, New Models from OpenAI and Google, and Le­gal Regimes for Train­ing on Copy­righted Data

May 16, 2024, 2:29 PM
2 points
3 comments6 min readLW link
(newsletter.safe.ai)

On Dwarksh’s Pod­cast with Leopold Aschenbrenner

ZviJun 10, 2024, 12:40 PM
101 points
7 comments59 min readLW link
(thezvi.wordpress.com)

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

Dan HOct 4, 2023, 5:37 PM
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

Sen­tinel min­utes #10/​2025: Trump tar­iffs, US/​China ten­sions, Claude code re­ward hack­ing.

NunoSempereMar 10, 2025, 7:00 PM
25 points
0 comments10 min readLW link
(blog.sentinel-team.org)

Child­hood and Ed­u­ca­tion #8: Deal­ing with the Internet

ZviJan 6, 2025, 2:00 PM
37 points
7 comments13 min readLW link
(thezvi.wordpress.com)

[AN #129]: Ex­plain­ing dou­ble de­scent by mea­sur­ing bias and variance

Rohin ShahDec 16, 2020, 6:10 PM
14 points
1 comment7 min readLW link
(mailchi.mp)

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

Oct 18, 2023, 5:06 PM
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM
149 points
7 comments11 min readLW link
(thezvi.wordpress.com)

AI #98: World Ends With Six Word Story

ZviJan 9, 2025, 4:30 PM
36 points
2 comments38 min readLW link
(thezvi.wordpress.com)

[AN #145]: Our three year an­niver­sary!

Rohin ShahApr 9, 2021, 5:48 PM
19 points
0 comments8 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: April 2021

NunoSempereMay 1, 2021, 4:07 PM
9 points
0 comments10 min readLW link

[AN #166]: Is it crazy to claim we’re in the most im­por­tant cen­tury?

Rohin ShahOct 8, 2021, 5:30 PM
52 points
5 comments8 min readLW link
(mailchi.mp)

[AN #167]: Con­crete ML safety prob­lems and their rele­vance to x-risk

Rohin ShahOct 20, 2021, 5:10 PM
21 points
4 comments9 min readLW link
(mailchi.mp)

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin ShahDec 8, 2021, 6:10 PM
21 points
1 comment7 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: Jan­uary 2022

NunoSempereFeb 3, 2022, 7:22 PM
17 points
0 comments6 min readLW link

Fore­cast­ing Newslet­ter: Fe­bru­ary 2022

NunoSempereMar 5, 2022, 7:30 PM
36 points
0 comments9 min readLW link

On Dwarkesh Pa­tel’s 4th Pod­cast With Tyler Cowen

ZviJan 10, 2025, 1:50 PM
44 points
7 comments27 min readLW link
(thezvi.wordpress.com)

[AN #173] Re­cent lan­guage model re­sults from DeepMind

Rohin ShahJul 21, 2022, 2:30 AM
37 points
9 comments8 min readLW link
(mailchi.mp)

EA & LW Fo­rums Weekly Sum­mary (21 Aug − 27 Aug 22′)

Zoe WilliamsAug 30, 2022, 1:42 AM
57 points
4 comments12 min readLW link

AI Safety at the Fron­tier: Paper High­lights, De­cem­ber ’24

gasteigerjoJan 11, 2025, 10:54 PM
7 points
2 comments7 min readLW link
(aisafetyfrontier.substack.com)

EA & LW Fo­rums Weekly Sum­mary (5 − 11 Sep 22′)

Zoe WilliamsSep 12, 2022, 11:24 PM
24 points
0 comments13 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 2

Quintin PopeSep 19, 2022, 1:41 PM
67 points
2 comments10 min readLW link

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

Quintin PopeSep 28, 2022, 4:02 PM
58 points
2 comments10 min readLW link

AI #99: Farewell to Biden

ZviJan 16, 2025, 2:20 PM
54 points
5 comments58 min readLW link
(thezvi.wordpress.com)

[MLSN #6]: Trans­parency sur­vey, prov­able ro­bust­ness, ML mod­els that pre­dict the future

Dan HOct 12, 2022, 8:56 PM
27 points
0 comments6 min readLW link

EA & LW Fo­rums Weekly Sum­mary (10 − 16 Oct 22′)

Zoe WilliamsOct 17, 2022, 10:51 PM
12 points
4 comments1 min readLW link

Meta Pivots on Con­tent Moderation

ZviJan 17, 2025, 2:20 PM
47 points
3 comments10 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe WilliamsOct 25, 2022, 2:57 AM
10 points
0 comments1 min readLW link

On Deep­Seek’s r1

ZviJan 22, 2025, 7:50 PM
55 points
2 comments35 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rums Weekly Sum­mary (24 − 30th Oct 22′)

Zoe WilliamsNov 1, 2022, 2:58 AM
13 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (31st Oct − 6th Nov 22′)

Zoe WilliamsNov 8, 2022, 3:58 AM
12 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (7th Nov − 13th Nov 22′)

Zoe WilliamsNov 16, 2022, 3:04 AM
19 points
0 comments1 min readLW link

AI #100: Meet the New Boss

ZviJan 23, 2025, 3:40 PM
50 points
4 comments69 min readLW link
(thezvi.wordpress.com)

[Question] What AI newslet­ters or sub­stacks about AI do you recom­mend?

wunanNov 25, 2022, 7:29 PM
6 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (14th Nov − 27th Nov 22′)

Zoe WilliamsNov 29, 2022, 11:00 PM
21 points
1 comment1 min readLW link

AISN #30: In­vest­ments in Com­pute and Mili­tary AI Plus, Ja­pan and Sin­ga­pore’s Na­tional AI Safety Institutes

Jan 24, 2024, 7:38 PM
27 points
1 comment6 min readLW link
(newsletter.safe.ai)

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

Dec 9, 2022, 10:38 AM
19 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rums Weekly Sum­mary (5th Dec − 11th Dec 22′)

Zoe WilliamsDec 13, 2022, 2:53 AM
7 points
0 comments1 min readLW link

Star­gate AI-1

ZviJan 24, 2025, 3:20 PM
85 points
1 comment18 min readLW link
(thezvi.wordpress.com)

Deep­Seek: Le­mon, It’s Wednesday

ZviJan 29, 2025, 3:00 PM
33 points
0 comments33 min readLW link
(thezvi.wordpress.com)

Operator

ZviJan 28, 2025, 8:00 PM
35 points
1 comment11 min readLW link
(thezvi.wordpress.com)

Deep­Seek Panic at the App Store

ZviJan 28, 2025, 7:30 PM
51 points
14 comments33 min readLW link
(thezvi.wordpress.com)

AI #101: The Shal­low End

ZviJan 30, 2025, 2:50 PM
39 points
1 comment59 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rum Sum­maries (9th Jan to 15th Jan 23′)

Zoe WilliamsJan 18, 2023, 7:29 AM
17 points
0 comments1 min readLW link

o3-mini Early Days

ZviFeb 3, 2025, 2:20 PM
45 points
0 comments15 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rum Weekly Sum­mary (16th − 22nd Jan ’23)

Zoe WilliamsJan 23, 2023, 3:46 AM
13 points
0 comments1 min readLW link

We’re in Deep Research

ZviFeb 4, 2025, 5:20 PM
45 points
2 comments20 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rum Weekly Sum­mary (23rd − 29th Jan ’23)

Zoe WilliamsJan 31, 2023, 12:36 AM
12 points
0 comments1 min readLW link

Sum­maries of top fo­rum posts (24th − 30th April 2023)

Zoe WilliamsMay 2, 2023, 2:30 AM
12 points
1 comment1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (30th Jan − 5th Feb 2023)

Zoe WilliamsFeb 7, 2023, 2:13 AM
3 points
3 comments1 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

May 2, 2023, 6:41 PM
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

AISN #31: A New AI Policy Bill in Cal­ifor­nia Plus, Prece­dents for AI Gover­nance and The EU AI Office

Dan HFeb 21, 2024, 9:58 PM
17 points
0 comments6 min readLW link
(newsletter.safe.ai)

EA & LW Fo­rum Weekly Sum­mary (27th Feb − 5th Mar 2023)

Zoe WilliamsMar 6, 2023, 3:18 AM
12 points
0 comments1 min readLW link

AI #102: Made in America

ZviFeb 6, 2025, 2:20 PM
26 points
17 comments67 min readLW link
(thezvi.wordpress.com)

EA & LW Fo­rum Weekly Sum­mary (6th − 12th March 2023)

Zoe WilliamsMar 14, 2023, 3:01 AM
7 points
0 comments1 min readLW link

AI #10: Code In­ter­preter and Ge­off Hinton

ZviMay 4, 2023, 2:00 PM
80 points
7 comments78 min readLW link
(thezvi.wordpress.com)

AI Safety − 7 months of dis­cus­sion in 17 minutes

Zoe WilliamsMar 15, 2023, 11:41 PM
25 points
0 comments1 min readLW link

Fore­cast­ing newslet­ter #2/​2025: Fore­cast­ing meetup network

NunoSempereFeb 9, 2025, 6:07 PM
13 points
0 comments4 min readLW link
(forecasting.substack.com)

EA & LW Fo­rum Weekly Sum­mary (13th − 19th March 2023)

Zoe WilliamsMar 20, 2023, 4:18 AM
13 points
0 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Jan­uary ’25

gasteigerjoFeb 11, 2025, 4:14 PM
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

Sum­maries of top fo­rum posts (1st to 7th May 2023)

Zoe WilliamsMay 9, 2023, 9:30 AM
21 points
0 comments1 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

Dan HMay 9, 2023, 3:26 PM
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

AI #11: In Search of a Moat

ZviMay 11, 2023, 3:40 PM
67 points
28 comments81 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

Dan HMay 16, 2023, 3:14 PM
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

March gw­ern.net link roundup

gwernApr 20, 2018, 7:09 PM
10 points
1 comment1 min readLW link
(www.gwern.net)

An­nounc­ing Ra­tional Newsletter

Alexey LapitskyApr 1, 2018, 2:37 PM
10 points
9 comments1 min readLW link

Re­cent up­dates to gw­ern.net (2013-2014)

gwernJul 8, 2014, 1:44 AM
38 points
32 comments4 min readLW link

An­nounc­ing LessWrong Digest

Evan_GaensbauerFeb 23, 2015, 10:41 AM
35 points
18 comments1 min readLW link

July 2020 gw­ern.net newsletter

gwernAug 20, 2020, 4:39 PM
29 points
0 comments1 min readLW link
(www.gwern.net)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin ShahAug 19, 2020, 5:10 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

Progress Stud­ies Fel­low­ship look­ing for members

jay ramJul 6, 2023, 5:41 PM
3 points
0 comments1 min readLW link

July 2019 gw­ern.net newsletter

gwernAug 1, 2019, 4:19 PM
23 points
0 comments1 min readLW link
(www.gwern.net)

June 2020 gw­ern.net newsletter

gwernJul 2, 2020, 2:19 PM
16 points
0 comments1 min readLW link
(www.gwern.net)

May Gw­ern.net newslet­ter (w/​GPT-3 com­men­tary)

gwernJun 2, 2020, 3:40 PM
32 points
7 comments1 min readLW link
(www.gwern.net)

April 2020 gw­ern.net newsletter

gwernMay 1, 2020, 8:47 PM
11 points
0 comments1 min readLW link
(www.gwern.net)

March 2020 gw­ern.net newsletter

gwernApr 3, 2020, 2:16 AM
13 points
1 comment1 min readLW link
(www.gwern.net)

Fe­bru­ary 2020 gw­ern.net newsletter

gwernMar 4, 2020, 7:05 PM
15 points
0 comments1 min readLW link
(www.gwern.net)

Jan­uary 2020 gw­ern.net newsletter

gwernJan 31, 2020, 6:04 PM
19 points
0 comments1 min readLW link
(www.gwern.net)

July gw­ern.net newsletter

gwernAug 2, 2018, 1:42 PM
24 points
0 comments1 min readLW link
(www.gwern.net)

[AN #114]: The­ory-in­spired safety solu­tions for pow­er­ful Bayesian RL agents

Rohin ShahAug 26, 2020, 5:20 PM
21 points
3 comments8 min readLW link
(mailchi.mp)

Bi-Weekly Ra­tional Feed

sapphireJun 24, 2017, 12:07 AM
35 points
3 comments12 min readLW link

Septem­ber 2019 gw­ern.net newsletter

gwernOct 4, 2019, 4:44 PM
21 points
0 comments1 min readLW link
(www.gwern.net)

Rus­sian x-risks newslet­ter Sum­mer 2020

avturchinSep 1, 2020, 2:06 PM
22 points
6 comments1 min readLW link

Fore­cast­ing Newslet­ter: Au­gust 2020.

NunoSempereSep 1, 2020, 11:38 AM
16 points
1 comment6 min readLW link

Au­gust 2020 gw­ern.net newsletter

gwernSep 1, 2020, 9:04 PM
25 points
4 comments1 min readLW link
(www.gwern.net)

AI Im­pacts Quar­terly Newslet­ter, Apr-Jun 2023

Jul 18, 2023, 5:14 PM
6 points
0 comments3 min readLW link
(blog.aiimpacts.org)

Rus­sian x-risks newslet­ter #2, fall 2019

avturchinDec 3, 2019, 4:54 PM
22 points
0 comments3 min readLW link

[AN #116]: How to make ex­pla­na­tions of neu­rons compositional

Rohin ShahSep 9, 2020, 5:20 PM
21 points
2 comments9 min readLW link
(mailchi.mp)

Man­i­fund: What we’re fund­ing (weeks 2-4)

Austin ChenAug 4, 2023, 4:00 PM
44 points
2 comments1 min readLW link
(manifund.substack.com)

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

Dan HAug 15, 2023, 4:10 PM
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

Dec 2019 gw­ern.net newsletter

gwernJan 4, 2020, 8:48 PM
17 points
2 comments1 min readLW link
(www.gwern.net)

Re­cent up­dates to gw­ern.net (2014-2015)

gwernNov 2, 2015, 12:06 AM
34 points
3 comments3 min readLW link

[AN #123]: In­fer­ring what is valuable in or­der to al­ign recom­mender systems

Rohin ShahOct 28, 2020, 5:00 PM
20 points
1 comment8 min readLW link
(mailchi.mp)

Septem­ber 2020 gw­ern.net newsletter

gwernOct 26, 2020, 1:38 PM
17 points
1 comment1 min readLW link
(www.gwern.net)

May gw­ern.net newsletter

gwernJun 1, 2019, 5:25 PM
17 points
0 comments1 min readLW link
(www.gwern.net)

March 2019 gw­ern.net newsletter

gwernApr 2, 2019, 2:17 PM
19 points
9 comments1 min readLW link
(www.gwern.net)

De­cem­ber gw­ern.net newsletter

gwernJan 2, 2019, 3:13 PM
20 points
0 comments1 min readLW link
(www.gwern.net)

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Fore­cast­ing Newslet­ter: Oc­to­ber 2020.

NunoSempereNov 1, 2020, 1:09 PM
11 points
0 comments4 min readLW link

AI #27: Por­tents of Gemini

ZviAug 31, 2023, 12:40 PM
54 points
37 comments47 min readLW link
(thezvi.wordpress.com)

Jan­uary 2019 gw­ern.net newsletter

gwernFeb 4, 2019, 3:53 PM
15 points
0 comments1 min readLW link
(www.gwern.net)

Bi-weekly Ra­tional Feed

sapphireAug 8, 2017, 1:56 PM
29 points
4 comments13 min readLW link

AISN #21: Google Deep­Mind’s GPT-4 Com­peti­tor, Mili­tary In­vest­ments in Au­tonomous Drones, The UK AI Safety Sum­mit, and Case Stud­ies in AI Policy

Dan HSep 5, 2023, 3:03 PM
15 points
0 comments5 min readLW link
(newsletter.safe.ai)

[AN #125]: Neu­ral net­work scal­ing laws across mul­ti­ple modalities

Rohin ShahNov 11, 2020, 6:20 PM
25 points
7 comments9 min readLW link
(mailchi.mp)

MLSN: #10 Ad­ver­sar­ial At­tacks Against Lan­guage and Vi­sion Models, Im­prov­ing LLM Hon­esty, and Trac­ing the In­fluence of LLM Train­ing Data

Sep 13, 2023, 6:03 PM
15 points
1 comment5 min readLW link
(newsletter.mlsafety.org)

AISN #22: The Land­scape of US AI Leg­is­la­tion - Hear­ings, Frame­works, Bills, and Laws

Dan HSep 19, 2023, 2:44 PM
20 points
0 comments5 min readLW link
(newsletter.safe.ai)

[AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

Rohin ShahDec 2, 2020, 6:20 PM
53 points
0 comments13 min readLW link
(mailchi.mp)

Novem­ber 2020 gw­ern.net newsletter

gwernDec 3, 2020, 10:47 PM
14 points
5 comments1 min readLW link
(www.gwern.net)

[AN #133]: Build­ing ma­chines that can co­op­er­ate (with hu­mans, in­sti­tu­tions, or other ma­chines)

Rohin ShahJan 13, 2021, 6:10 PM
14 points
0 comments9 min readLW link
(mailchi.mp)

[AN #136]: How well will GPT-N perform on down­stream tasks?

Rohin ShahFeb 3, 2021, 6:10 PM
21 points
2 comments9 min readLW link
(mailchi.mp)

[AN #172] Sorry for the long hi­a­tus!

Rohin ShahJul 5, 2022, 6:20 AM
54 points
0 comments3 min readLW link
(mailchi.mp)

EA & LW Fo­rums Weekly Sum­mary (28 Aug − 3 Sep 22’)

Zoe WilliamsSep 6, 2022, 11:06 AM
51 points
2 comments14 min readLW link

EA & LW Fo­rums Weekly Sum­mary (26 Sep − 9 Oct 22′)

Zoe WilliamsOct 10, 2022, 11:58 PM
13 points
2 comments1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
25 points
0 comments1 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

Dec 2, 2022, 3:50 PM
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Will Machines Ever Rule the World? MLAISU W50

Esben KranDec 16, 2022, 11:03 AM
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rums Weekly Sum­mary (12th Dec − 18th Dec 22′)

Zoe WilliamsDec 20, 2022, 9:49 AM
10 points
0 comments1 min readLW link

AI im­prov­ing AI [MLAISU W01!]

Esben KranJan 6, 2023, 11:13 AM
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

[MLSN #7]: an ex­am­ple of an emer­gent in­ter­nal optimizer

Jan 9, 2023, 7:39 PM
28 points
0 comments6 min readLW link

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben KranJan 13, 2023, 3:47 PM
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

Gen­er­al­iz­abil­ity & Hope for AI [MLAISU W03]

Esben KranJan 20, 2023, 10:06 AM
5 points
2 comments2 min readLW link
(newsletter.apartresearch.com)

Novem­ber 2018 gw­ern.net newsletter

gwernDec 1, 2018, 1:57 PM
35 points
0 comments1 min readLW link
(www.gwern.net)

AI Safety Newslet­ter #40: Cal­ifor­nia AI Leg­is­la­tion Plus, NVIDIA De­lays Chip Pro­duc­tion, and Do AI Safety Bench­marks Ac­tu­ally Mea­sure Safety?

Aug 21, 2024, 6:09 PM
11 points
0 comments6 min readLW link
(newsletter.safe.ai)

MIRI’s April 2024 Newsletter

HarlanApr 12, 2024, 11:38 PM
95 points
0 comments3 min readLW link
(intelligence.org)

AISN #36: Vol­un­tary Com­mit­ments are In­suffi­cient Plus, a Se­nate AI Policy Roadmap, and Chap­ter 1: An Overview of Catas­trophic Risks

Jun 5, 2024, 5:45 PM
9 points
0 comments5 min readLW link
(newsletter.safe.ai)

Weekly newslet­ter for AI safety events and train­ing programs

Bryce RobertsonMay 3, 2024, 12:33 AM
29 points
0 comments1 min readLW link

MIRI’s June 2024 Newsletter

HarlanJun 14, 2024, 11:02 PM
74 points
20 comments2 min readLW link
(intelligence.org)

AI Safety Newslet­ter #37: US Launches An­titrust In­ves­ti­ga­tions Plus, re­cent crit­i­cisms of OpenAI and An­thropic, and a sum­mary of Si­tu­a­tional Awareness

Jun 18, 2024, 6:07 PM
8 points
0 comments5 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #41: The Next Gen­er­a­tion of Com­pute Scale Plus, Rank­ing Models by Sus­cep­ti­bil­ity to Jailbreak­ing, and Ma­chine Ethics

Sep 11, 2024, 7:14 PM
5 points
1 comment5 min readLW link
(newsletter.safe.ai)

AISN #38: Supreme Court De­ci­sion Could Limit Fed­eral Abil­ity to Reg­u­late AI Plus, “Cir­cuit Break­ers” for AI sys­tems, and up­dates on China’s AI industry

Jul 9, 2024, 7:28 PM
5 points
0 comments5 min readLW link
(newsletter.safe.ai)

MIRI’s July 2024 newsletter

HarlanJul 15, 2024, 9:28 PM
25 points
2 comments1 min readLW link
(intelligence.org)

AISN #45: Cen­ter for AI Safety 2024 Year in Review

Dec 19, 2024, 6:15 PM
13 points
0 comments4 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #39: Im­pli­ca­tions of a Trump Ad­minis­tra­tion for AI Policy Plus, Safety Engineering

Jul 29, 2024, 5:50 PM
17 points
1 comment6 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #42: New­som Ve­toes SB 1047 Plus, OpenAI’s o1, and AI Gover­nance Summary

Oct 1, 2024, 8:35 PM
8 points
0 comments6 min readLW link
(newsletter.safe.ai)

Launch­ing Ad­ja­cent News

Lucas KohorstOct 16, 2024, 5:58 PM
24 points
0 comments4 min readLW link

AISN #44: The Trump Cir­cle on AI Safety Plus, Chi­nese re­searchers used Llama to cre­ate a mil­i­tary tool for the PLA, a Google AI sys­tem dis­cov­ered a zero-day cy­ber­se­cu­rity vuln­er­a­bil­ity, and Com­plex Sys­tems

Nov 19, 2024, 4:36 PM
9 points
0 comments5 min readLW link
(newsletter.safe.ai)

AISN #49: Su­per­in­tel­li­gence Strategy

Mar 6, 2025, 5:46 PM
6 points
1 comment5 min readLW link
(newsletter.safe.ai)

AISN #46: The Transition

Jan 23, 2025, 6:09 PM
8 points
0 comments5 min readLW link
(newsletter.safe.ai)

AISN #47: Rea­son­ing Models

Feb 6, 2025, 6:52 PM
3 points
0 comments4 min readLW link
(newsletter.safe.ai)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

Apr 18, 2023, 6:44 PM
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

What I’ve been read­ing, Novem­ber 2023

jasoncrawfordNov 7, 2023, 1:37 PM
23 points
1 comment5 min readLW link
(rootsofprogress.org)

OpenAI: Facts from a Weekend

ZviNov 20, 2023, 3:30 PM
271 points
165 comments9 min readLW link
(thezvi.wordpress.com)

Fore­cast­ing Newslet­ter: April 2020

NunoSempereApr 30, 2020, 4:41 PM
22 points
3 comments6 min readLW link

Fore­cast­ing Newslet­ter: May 2020.

NunoSempereMay 31, 2020, 12:35 PM
9 points
1 comment20 min readLW link

Null-box­ing New­comb’s Problem

YitzJul 13, 2020, 4:32 PM
33 points
9 comments4 min readLW link

May gw­ern.net newsletter

gwernJun 1, 2018, 2:47 PM
24 points
3 comments1 min readLW link
(www.gwern.net)

Ra­tion­al­ity Feed: Last Month’s Best Posts

sapphireFeb 12, 2018, 1:18 PM
23 points
1 comment3 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin ShahJul 2, 2018, 4:10 PM
70 points
12 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #16: 07/​23/​18

Rohin ShahJul 23, 2018, 4:20 PM
42 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #15: 07/​16/​18

Rohin ShahJul 16, 2018, 4:10 PM
42 points
0 comments15 min readLW link
(mailchi.mp)

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

Rohin ShahJun 24, 2019, 4:10 PM
55 points
10 comments8 min readLW link
(mailchi.mp)

Ra­tion­al­ity Feed: Last Month’s Best Posts

sapphireMar 21, 2018, 2:12 PM
20 points
2 comments2 min readLW link

[AN #59] How ar­gu­ments for AI risk have changed over time

Rohin ShahJul 8, 2019, 5:20 PM
43 points
4 comments7 min readLW link
(mailchi.mp)

The Align­ment Newslet­ter #1: 04/​09/​18

Rohin ShahApr 9, 2018, 4:00 PM
12 points
3 comments4 min readLW link

The Align­ment Newslet­ter #2: 04/​16/​18

Rohin ShahApr 16, 2018, 4:00 PM
8 points
0 comments5 min readLW link

The Align­ment Newslet­ter #3: 04/​23/​18

Rohin ShahApr 23, 2018, 4:00 PM
9 points
0 comments6 min readLW link

The Align­ment Newslet­ter #4: 04/​30/​18

Rohin ShahApr 30, 2018, 4:00 PM
8 points
0 comments3 min readLW link

The Align­ment Newslet­ter #5: 05/​07/​18

Rohin ShahMay 7, 2018, 4:00 PM
8 points
0 comments7 min readLW link

The Align­ment Newslet­ter #6: 05/​14/​18

Rohin ShahMay 14, 2018, 4:00 PM
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #7: 05/​21/​18

Rohin ShahMay 21, 2018, 4:00 PM
8 points
0 comments5 min readLW link

The Align­ment Newslet­ter #8: 05/​28/​18

Rohin ShahMay 28, 2018, 4:00 PM
8 points
0 comments6 min readLW link

The Align­ment Newslet­ter #9: 06/​04/​18

Rohin ShahJun 4, 2018, 4:00 PM
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #10: 06/​11/​18

Rohin ShahJun 11, 2018, 4:00 PM
16 points
0 comments9 min readLW link

The Align­ment Newslet­ter #11: 06/​18/​18

Rohin ShahJun 18, 2018, 4:00 PM
8 points
0 comments10 min readLW link

The Align­ment Newslet­ter #12: 06/​25/​18

Rohin ShahJun 25, 2018, 4:00 PM
15 points
0 comments3 min readLW link

Align­ment Newslet­ter #14

Rohin ShahJul 9, 2018, 4:20 PM
14 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #17

Rohin ShahJul 30, 2018, 4:10 PM
32 points
0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #18

Rohin ShahAug 6, 2018, 4:00 PM
17 points
0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #19

Rohin ShahAug 14, 2018, 2:10 AM
18 points
0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #20

Rohin ShahAug 20, 2018, 4:00 PM
12 points
2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #21

Rohin ShahAug 27, 2018, 4:20 PM
25 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #22

Rohin ShahSep 3, 2018, 4:10 PM
18 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #23

Rohin ShahSep 10, 2018, 5:10 PM
16 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #24

Rohin ShahSep 17, 2018, 4:20 PM
10 points
6 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #25

Rohin ShahSep 24, 2018, 4:10 PM
18 points
3 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #26

Rohin ShahOct 2, 2018, 4:10 PM
13 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #27

Rohin ShahOct 9, 2018, 1:10 AM
16 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #28

Rohin ShahOct 15, 2018, 9:20 PM
11 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #29

Rohin ShahOct 22, 2018, 4:20 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #30

Rohin ShahOct 29, 2018, 4:10 PM
29 points
2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #31

Rohin ShahNov 5, 2018, 11:50 PM
17 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #32

Rohin ShahNov 12, 2018, 5:20 PM
18 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #33

Rohin ShahNov 19, 2018, 5:20 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #34

Rohin ShahNov 26, 2018, 11:10 PM
24 points
0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #35

Rohin ShahDec 4, 2018, 1:10 AM
15 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #37

Rohin ShahDec 17, 2018, 7:10 PM
25 points
4 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #38

Rohin ShahDec 25, 2018, 4:10 PM
9 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #39

Rohin ShahJan 1, 2019, 8:10 AM
32 points
2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #40

Rohin ShahJan 8, 2019, 8:10 PM
21 points
2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #41

Rohin ShahJan 17, 2019, 8:10 AM
22 points
6 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #42

Rohin ShahJan 22, 2019, 2:00 AM
20 points
1 comment10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #43

Rohin ShahJan 29, 2019, 9:10 PM
14 points
2 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #44

Rohin ShahFeb 6, 2019, 8:30 AM
18 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #45

Rohin ShahFeb 14, 2019, 2:10 AM
25 points
2 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #46

Rohin ShahFeb 22, 2019, 12:10 AM
12 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #48

Rohin ShahMar 11, 2019, 9:10 PM
29 points
14 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #49

Rohin ShahMar 20, 2019, 4:20 AM
23 points
1 comment11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #50

Rohin ShahMar 28, 2019, 6:10 PM
15 points
2 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #51

Rohin ShahApr 3, 2019, 4:10 AM
25 points
2 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #52

Rohin ShahApr 6, 2019, 1:20 AM
19 points
1 comment8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter One Year Retrospective

Rohin ShahApr 10, 2019, 6:58 AM
94 points
31 comments21 min readLW link

Align­ment Newslet­ter #53

Rohin ShahApr 18, 2019, 5:20 PM
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #54] Box­ing a finite-hori­zon AI sys­tem to keep it unambitious

Rohin ShahApr 28, 2019, 5:20 AM
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #55] Reg­u­la­tory mar­kets and in­ter­na­tional stan­dards as a means of en­sur­ing benefi­cial AI

Rohin ShahMay 5, 2019, 2:20 AM
17 points
2 comments8 min readLW link
(mailchi.mp)

[AN #56] Should ML re­searchers stop run­ning ex­per­i­ments be­fore mak­ing hy­pothe­ses?

Rohin ShahMay 21, 2019, 2:20 AM
21 points
8 comments9 min readLW link
(mailchi.mp)

[AN #57] Why we should fo­cus on ro­bust­ness in AI safety, and the analo­gous prob­lems in programming

Rohin ShahJun 5, 2019, 11:20 PM
26 points
15 comments7 min readLW link
(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that as­sist hu­man play­ers in cre­ative mode

Rohin ShahJul 22, 2019, 5:00 PM
23 points
6 comments9 min readLW link
(mailchi.mp)

[AN #61] AI policy and gov­er­nance, from two peo­ple in the field

Rohin ShahAug 5, 2019, 5:00 PM
12 points
2 comments9 min readLW link
(mailchi.mp)

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin ShahAug 22, 2019, 5:10 PM
28 points
10 comments9 min readLW link
(mailchi.mp)

[AN #63] How ar­chi­tec­ture search, meta learn­ing, and en­vi­ron­ment de­sign could lead to gen­eral intelligence

Rohin ShahSep 10, 2019, 7:10 PM
21 points
12 comments8 min readLW link
(mailchi.mp)

[AN #64]: Us­ing Deep RL and Re­ward Uncer­tainty to In­cen­tivize Prefer­ence Learning

Rohin ShahSep 16, 2019, 5:10 PM
11 points
8 comments7 min readLW link
(mailchi.mp)

[AN #65]: Learn­ing use­ful skills by watch­ing hu­mans “play”

Rohin ShahSep 23, 2019, 5:30 PM
11 points
0 comments9 min readLW link
(mailchi.mp)

[AN #66]: De­com­pos­ing ro­bust­ness into ca­pa­bil­ity ro­bust­ness and al­ign­ment robustness

Rohin ShahSep 30, 2019, 6:00 PM
12 points
1 comment7 min readLW link
(mailchi.mp)

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

Rohin ShahOct 7, 2019, 5:10 PM
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin ShahOct 14, 2019, 5:00 PM
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin ShahOct 19, 2019, 12:30 AM
60 points
12 comments15 min readLW link
(mailchi.mp)

[AN #70]: Agents that help hu­mans who are still learn­ing about their own preferences

Rohin ShahOct 23, 2019, 5:10 PM
16 points
0 comments9 min readLW link
(mailchi.mp)

[AN #71]: Avoid­ing re­ward tam­per­ing through cur­rent-RF optimization

Rohin ShahOct 30, 2019, 5:10 PM
12 points
0 comments7 min readLW link
(mailchi.mp)

[AN #72]: Align­ment, ro­bust­ness, method­ol­ogy, and sys­tem build­ing as re­search pri­ori­ties for AI safety

Rohin ShahNov 6, 2019, 6:10 PM
26 points
4 comments10 min readLW link
(mailchi.mp)

[AN #73]: De­tect­ing catas­trophic failures by learn­ing how agents tend to break

Rohin ShahNov 13, 2019, 6:10 PM
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #74]: Separat­ing benefi­cial AI into com­pe­tence, al­ign­ment, and cop­ing with impacts

Rohin ShahNov 20, 2019, 6:20 PM
19 points
0 comments7 min readLW link
(mailchi.mp)

[AN #75]: Solv­ing Atari and Go with learned game mod­els, and thoughts from a MIRI employee

Rohin ShahNov 27, 2019, 6:10 PM
38 points
1 comment10 min readLW link
(mailchi.mp)

[AN #76]: How dataset size af­fects ro­bust­ness, and bench­mark­ing safe ex­plo­ra­tion by mea­sur­ing con­straint violations

Rohin ShahDec 4, 2019, 6:10 PM
14 points
6 comments9 min readLW link
(mailchi.mp)

[AN #77]: Dou­ble de­scent: a unifi­ca­tion of statis­ti­cal the­ory and mod­ern ML practice

Rohin ShahDec 18, 2019, 6:30 PM
21 points
4 comments14 min readLW link
(mailchi.mp)

[AN #78] For­mal­iz­ing power and in­stru­men­tal con­ver­gence, and the end-of-year AI safety char­ity comparison

Rohin ShahDec 26, 2019, 1:10 AM
26 points
10 comments9 min readLW link
(mailchi.mp)

[AN #79]: Re­cur­sive re­ward mod­el­ing as an al­ign­ment tech­nique in­te­grated with deep RL

Rohin ShahJan 1, 2020, 6:00 PM
13 points
0 comments12 min readLW link
(mailchi.mp)

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin ShahJan 2, 2020, 6:20 PM
36 points
95 comments10 min readLW link
(mailchi.mp)

[AN #81]: Univer­sal­ity as a po­ten­tial solu­tion to con­cep­tual difficul­ties in in­tent alignment

Rohin ShahJan 8, 2020, 6:00 PM
32 points
4 comments11 min readLW link
(mailchi.mp)

[AN #82]: How OpenAI Five dis­tributed their train­ing computation

Rohin ShahJan 15, 2020, 6:20 PM
19 points
0 comments8 min readLW link
(mailchi.mp)

[AN #83]: Sam­ple-effi­cient deep learn­ing with ReMixMatch

Rohin ShahJan 22, 2020, 6:10 PM
15 points
4 comments11 min readLW link
(mailchi.mp)

[AN #84] Re­view­ing AI al­ign­ment work in 2018-19

Rohin ShahJan 29, 2020, 6:30 PM
23 points
0 comments6 min readLW link
(mailchi.mp)

[AN #85]: The nor­ma­tive ques­tions we should be ask­ing for AI al­ign­ment, and a sur­pris­ingly good chatbot

Rohin ShahFeb 5, 2020, 6:20 PM
14 points
2 comments7 min readLW link
(mailchi.mp)

[AN #86]: Im­prov­ing de­bate and fac­tored cog­ni­tion through hu­man experiments

Rohin ShahFeb 12, 2020, 6:10 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

[AN #87]: What might hap­pen as deep learn­ing scales even fur­ther?

Rohin ShahFeb 19, 2020, 6:20 PM
28 points
0 comments4 min readLW link
(mailchi.mp)

[AN #88]: How the prin­ci­pal-agent liter­a­ture re­lates to AI risk

Rohin ShahFeb 27, 2020, 9:10 AM
18 points
0 comments9 min readLW link
(mailchi.mp)

[AN #89]: A unify­ing for­mal­ism for prefer­ence learn­ing algorithms

Rohin ShahMar 4, 2020, 6:20 PM
16 points
0 comments9 min readLW link
(mailchi.mp)

[AN #90]: How search land­scapes can con­tain self-re­in­forc­ing feed­back loops

Rohin ShahMar 11, 2020, 5:30 PM
11 points
6 comments8 min readLW link
(mailchi.mp)

[AN #91]: Con­cepts, im­ple­men­ta­tions, prob­lems, and a bench­mark for im­pact measurement

Rohin ShahMar 18, 2020, 5:10 PM
15 points
10 comments13 min readLW link
(mailchi.mp)

[AN #92]: Learn­ing good rep­re­sen­ta­tions with con­trastive pre­dic­tive coding

Rohin ShahMar 25, 2020, 5:20 PM
18 points
1 comment10 min readLW link
(mailchi.mp)

[AN #93]: The Precipice we’re stand­ing at, and how we can back away from it

Rohin ShahApr 1, 2020, 5:10 PM
24 points
0 comments7 min readLW link
(mailchi.mp)

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

Rohin ShahApr 8, 2020, 5:10 PM
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #95]: A frame­work for think­ing about how to make AI go well

Rohin ShahApr 15, 2020, 5:10 PM
20 points
2 comments10 min readLW link
(mailchi.mp)

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

Rohin ShahApr 22, 2020, 5:20 PM
17 points
4 comments10 min readLW link
(mailchi.mp)

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

Rohin ShahApr 29, 2020, 5:30 PM
15 points
0 comments10 min readLW link
(mailchi.mp)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

Rohin ShahMay 6, 2020, 5:10 PM
22 points
3 comments9 min readLW link
(mailchi.mp)

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

Rohin ShahMay 13, 2020, 5:20 PM
29 points
0 comments10 min readLW link
(mailchi.mp)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

Rohin ShahMay 20, 2020, 5:30 PM
33 points
2 comments12 min readLW link
(mailchi.mp)

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

Rohin ShahMay 27, 2020, 5:20 PM
15 points
0 comments10 min readLW link
(mailchi.mp)

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

Rohin ShahJun 10, 2020, 5:20 PM
29 points
0 comments10 min readLW link
(mailchi.mp)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

Rohin ShahJun 18, 2020, 5:10 PM
19 points
5 comments8 min readLW link
(mailchi.mp)

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

Rohin ShahJun 24, 2020, 5:30 PM
24 points
3 comments11 min readLW link
(mailchi.mp)

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

Rohin ShahJul 1, 2020, 5:20 PM
14 points
2 comments11 min readLW link
(mailchi.mp)

[AN #107]: The con­ver­gent in­stru­men­tal sub­goals of goal-di­rected agents

Rohin ShahJul 16, 2020, 6:47 AM
13 points
1 comment8 min readLW link
(mailchi.mp)

[AN #108]: Why we should scru­ti­nize ar­gu­ments for AI risk

Rohin ShahJul 16, 2020, 6:47 AM
19 points
6 comments12 min readLW link
(mailchi.mp)

[AN #109]: Teach­ing neu­ral nets to gen­er­al­ize the way hu­mans would

Rohin ShahJul 22, 2020, 5:10 PM
17 points
3 comments9 min readLW link
(mailchi.mp)

[AN #110]: Learn­ing fea­tures from hu­man feed­back to en­able re­ward learning

Rohin ShahJul 29, 2020, 5:20 PM
13 points
2 comments10 min readLW link
(mailchi.mp)

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_mMay 5, 2023, 5:43 PM
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

Rus­sian x-risks newslet­ter, sum­mer 2019

avturchinSep 7, 2019, 9:50 AM
39 points
5 comments4 min readLW link

Ra­tional Feed: Last Month’s Best Posts

sapphireMay 2, 2018, 6:19 PM
16 points
0 comments2 min readLW link

Fore­cast­ing Newslet­ter: July 2020.

NunoSempereAug 1, 2020, 5:08 PM
21 points
4 comments22 min readLW link

June gw­ern.net newsletter

gwernJul 4, 2018, 10:59 PM
34 points
0 comments1 min readLW link
(www.gwern.net)

[AN #111]: The Cir­cuits hy­pothe­ses for deep learning

Rohin ShahAug 5, 2020, 5:40 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

Call for con­trib­u­tors to the Align­ment Newsletter

Rohin ShahAug 21, 2019, 6:21 PM
39 points
0 comments4 min readLW link

June 2019 gw­ern.net newsletter

gwernJul 1, 2019, 2:35 PM
29 points
0 comments1 min readLW link
(www.gwern.net)

Re­cent up­dates to gw­ern.net (2015-2016)

gwernAug 26, 2016, 7:22 PM
42 points
6 comments1 min readLW link

Re­cent up­dates to gw­ern.net (2011)

gwernNov 26, 2011, 1:58 AM
45 points
18 comments1 min readLW link

Oc­to­ber gw­ern.net links

gwernNov 1, 2018, 1:11 AM
29 points
8 comments1 min readLW link
(www.gwern.net)
No comments.