Scott Aaron­son on “Re­form AI Align­ment”

ShmiNov 20, 2022, 10:20 PM
39 points
17 comments1 min readLW link
(scottaaronson.blog)

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Nov 25, 2022, 2:36 PM
39 points
9 comments6 min readLW link
(vkrakovna.wordpress.com)

A caveat to the Orthog­o­nal­ity Thesis

Wuschel SchulzNov 9, 2022, 3:06 PM
38 points
10 comments2 min readLW link

How do I start a pro­gram­ming ca­reer in the West?

Lao MeinNov 25, 2022, 6:37 AM
38 points
7 comments2 min readLW link

[Question] Is there any dis­cus­sion on avoid­ing be­ing Dutch-booked or oth­er­wise taken ad­van­tage of one’s bounded ra­tio­nal­ity by re­fus­ing to en­gage?

ShmiNov 7, 2022, 2:36 AM
38 points
29 comments1 min readLW link

Choos­ing the right dish

Adam ZernerNov 19, 2022, 1:38 AM
38 points
7 comments8 min readLW link

In­ter­nal com­mu­ni­ca­tion framework

Nov 15, 2022, 12:41 PM
38 points
13 comments12 min readLW link

If Pro­fes­sional In­vestors Missed This...

jefftkNov 16, 2022, 3:00 PM
37 points
18 comments3 min readLW link
(www.jefftk.com)

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

eliflandNov 28, 2022, 4:17 PM
37 points
2 comments28 min readLW link

Si­mu­la­tors, con­straints, and goal ag­nos­ti­cism: por­bynotes vol. 1

porbyNov 23, 2022, 4:22 AM
37 points
2 comments35 min readLW link

Feel­ing Old: Leav­ing your 20s in the 2020s

squidiousNov 22, 2022, 10:50 PM
37 points
3 comments1 min readLW link
(opalsandbonobos.blogspot.com)

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Orpheus16Nov 25, 2022, 8:47 PM
37 points
2 comments9 min readLW link

Hous­ing and Tran­sit Thoughts #1

ZviNov 2, 2022, 12:10 PM
35 points
5 comments16 min readLW link
(thezvi.wordpress.com)

User-Con­trol­led Al­gorith­mic Feeds

jefftkNov 12, 2022, 3:20 PM
35 points
7 comments2 min readLW link
(www.jefftk.com)

Some re­search ideas in forecasting

JsevillamolNov 15, 2022, 7:47 PM
35 points
2 commentsLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

Nov 21, 2022, 8:34 PM
34 points
3 comments4 min readLW link
(www.snellessen.com)

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
34 points
20 comments34 min readLW link

Sols­tice 2022 Roundup

dspeyerNov 12, 2022, 9:26 PM
34 points
12 comments1 min readLW link

Ways to buy time

Nov 12, 2022, 7:31 PM
34 points
23 comments12 min readLW link

Peo­ple care about each other even though they have im­perfect mo­ti­va­tional poin­t­ers?

TurnTroutNov 8, 2022, 6:15 PM
33 points
25 comments7 min readLW link

Au­dit­ing games for high-level interpretability

Paul CologneseNov 1, 2022, 10:44 AM
33 points
1 comment7 min readLW link

Think­ing About Mastodon

jefftkNov 7, 2022, 7:40 PM
33 points
17 comments1 min readLW link
(www.jefftk.com)

Covid 11/​17/​22: Slow Recovery

ZviNov 17, 2022, 2:50 PM
33 points
3 comments4 min readLW link
(thezvi.wordpress.com)

Weekly Roundup #5

ZviNov 11, 2022, 4:20 PM
33 points
0 comments6 min readLW link
(thezvi.wordpress.com)

Why bet Kelly?

AlexMennenNov 15, 2022, 6:12 PM
32 points
14 comments5 min readLW link

When should we be sur­prised that an in­ven­tion took “so long”?

jasoncrawfordNov 16, 2022, 8:04 PM
32 points
11 comments4 min readLW link
(rootsofprogress.org)

Charg­ing for the Dharma

jchanNov 11, 2022, 2:02 PM
32 points
18 comments5 min readLW link

Re­view: LOVE in a simbox

PeterMcCluskeyNov 27, 2022, 5:41 PM
32 points
4 comments9 min readLW link
(bayesianinvestor.com)

Make the Drought Eva­po­rate!

AnthonyRepettoNov 19, 2022, 11:41 PM
32 points
25 comments3 min readLW link

Un­pack­ing “Shard The­ory” as Hunch, Ques­tion, The­ory, and Insight

Jacy Reese AnthisNov 16, 2022, 1:54 PM
31 points
9 comments2 min readLW link

Ad­ver­sar­ial Poli­cies Beat Pro­fes­sional-Level Go AIs

sanxiynNov 3, 2022, 1:27 PM
31 points
35 comments1 min readLW link
(goattack.alignmentfund.org)

Covid 11/​10/​22: Into the Background

ZviNov 10, 2022, 1:40 PM
31 points
5 comments4 min readLW link
(thezvi.wordpress.com)

The Mir­ror Cham­ber: A short story ex­plor­ing the an­thropic mea­sure func­tion and why it can matter

mako yassNov 3, 2022, 6:47 AM
30 points
13 comments10 min readLW link

A Walk­through of In­ter­pretabil­ity in the Wild (w/​ au­thors Kevin Wang, Arthur Conmy & Alexan­dre Variengien)

Neel NandaNov 7, 2022, 10:39 PM
30 points
15 comments3 min readLW link
(youtu.be)

Gliders in Lan­guage Models

Alexandre VariengienNov 25, 2022, 12:38 AM
30 points
11 comments10 min readLW link

What videos should Ra­tional An­i­ma­tions make?

WriterNov 26, 2022, 8:28 PM
30 points
24 commentsLW link

ML Safety Schol­ars Sum­mer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM
29 points
0 commentsLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail SaminNov 6, 2022, 8:07 AM
29 points
3 comments8 min readLW link

Response

Jarred FilmerNov 6, 2022, 1:03 AM
29 points
2 comments12 min readLW link

The econ­omy as an anal­ogy for ad­vanced AI systems

Nov 15, 2022, 11:16 AM
28 points
0 comments5 min readLW link

Toy Models and Tegum Products

Adam JermynNov 4, 2022, 6:51 PM
28 points
7 comments5 min readLW link

Good Fu­tures Ini­ti­a­tive: Win­ter Pro­ject Internship

ArisNov 27, 2022, 11:41 PM
28 points
4 comments4 min readLW link

Mechanis­tic In­ter­pretabil­ity as Re­v­erse Eng­ineer­ing (fol­low-up to “cars and elephants”)

David Scott Krueger (formerly: capybaralet)Nov 3, 2022, 11:19 PM
28 points
3 comments1 min readLW link

A short cri­tique of Vanessa Kosoy’s PreDCA

Martín SotoNov 13, 2022, 4:00 PM
28 points
8 comments4 min readLW link

Semi-con­duc­tor/​AI Stock Dis­cus­sion.

sapphireNov 25, 2022, 11:35 PM
28 points
25 comments1 min readLW link

Es­ti­mat­ing the prob­a­bil­ity that FTX Fu­ture Fund grant money gets clawed back

spencergNov 14, 2022, 3:33 AM
28 points
6 commentsLW link

Open Let­ter Against Reck­less Nu­clear Es­ca­la­tion and Use

Max TegmarkNov 3, 2022, 5:34 AM
27 points
25 comments1 min readLW link

In­verse scal­ing can be­come U-shaped

Edouard HarrisNov 8, 2022, 7:04 PM
27 points
15 comments1 min readLW link
(arxiv.org)

The Ground Truth Prob­lem (Or, Why Eval­u­at­ing In­ter­pretabil­ity Meth­ods Is Hard)

Jessica RumbelowNov 17, 2022, 11:06 AM
27 points
2 comments2 min readLW link

LLMs may cap­ture key com­po­nents of hu­man agency

catubcNov 17, 2022, 8:14 PM
27 points
0 comments4 min readLW link