Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
101 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Re­sources that (I think) new al­ign­ment re­searchers should know about

Akash28 Oct 2022 22:13 UTC
69 points
9 comments4 min readLW link

How of­ten does One Per­son suc­ceed?

Mayank Modi28 Oct 2022 19:32 UTC
1 point
3 comments1 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

28 Oct 2022 17:50 UTC
58 points
23 comments1 min readLW link

Rapid Test Throat Swab­bing?

jefftk28 Oct 2022 16:30 UTC
18 points
2 comments1 min readLW link
(www.jefftk.com)

Join the in­ter­pretabil­ity re­search hackathon

Esben Kran28 Oct 2022 16:26 UTC
15 points
0 comments1 min readLW link

Syncretism

Annapurna28 Oct 2022 16:08 UTC
16 points
4 comments1 min readLW link
(jorgevelez.substack.com)

Pon­der­ing com­pu­ta­tion in the real world

Adam Shai28 Oct 2022 15:57 UTC
24 points
13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKl28 Oct 2022 12:26 UTC
−2 points
153 comments11 min readLW link

New book on s-risks

Tobias_Baumann28 Oct 2022 9:36 UTC
68 points
1 comment1 min readLW link

Cryp­tic symbols

Adam Scherlis28 Oct 2022 6:44 UTC
6 points
17 comments1 min readLW link
(adam.scherlis.com)

All life’s helpers’ beliefs

Tehdastehdas28 Oct 2022 5:47 UTC
−12 points
1 comment5 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshc28 Oct 2022 2:51 UTC
36 points
4 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDEL28 Oct 2022 1:53 UTC
−22 points
4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y Celis28 Oct 2022 1:21 UTC
1 point
0 comments1 min readLW link

Nash equil­ibria of sym­met­ric zero-sum games

Ege Erdil27 Oct 2022 23:50 UTC
14 points
0 comments14 min readLW link

[Question] Good psy­chol­ogy books/​books that con­tain good psy­cholog­i­cal mod­els?

shuffled-cantaloupe27 Oct 2022 23:04 UTC
1 point
1 comment1 min readLW link

Pod­cast: The Left and Effec­tive Altru­ism with Habiba Islam

garrison27 Oct 2022 17:41 UTC
2 points
2 comments1 min readLW link

Les­sons from ‘Famine, Affluence, and Mo­ral­ity’ and its re­flec­tion on to­day.

Mayank Modi27 Oct 2022 17:20 UTC
4 points
0 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere8927 Oct 2022 14:41 UTC
12 points
20 comments1 min readLW link

His­tori­cism in the math-ad­ja­cent sciences

mrcbarbier27 Oct 2022 14:38 UTC
3 points
0 comments5 min readLW link

How Risky Is Trick-or-Treat­ing?

jefftk27 Oct 2022 14:10 UTC
58 points
18 comments2 min readLW link
(www.jefftk.com)

Covid 10/​27/​22: Another Ori­gin Story

Zvi27 Oct 2022 13:40 UTC
32 points
1 comment13 min readLW link
(thezvi.wordpress.com)

[Question] Why are prob­a­bil­ities rep­re­sented as real num­bers in­stead of ra­tio­nal num­bers?

Yaakov T27 Oct 2022 11:23 UTC
5 points
9 comments1 min readLW link

Five Areas I Wish EAs Gave More Focus

Prometheus27 Oct 2022 6:13 UTC
13 points
18 comments1 min readLW link

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

27 Oct 2022 1:32 UTC
135 points
14 comments12 min readLW link

[Question] Quan­tum Suicide and Au­mann’s Agree­ment Theorem

Isaac King27 Oct 2022 1:32 UTC
14 points
20 comments1 min readLW link

Res­lab Re­quest for In­for­ma­tion: EA hard­ware projects

Joel Becker26 Oct 2022 21:13 UTC
10 points
0 comments1 min readLW link

A list of Petrov buttons

philh26 Oct 2022 20:50 UTC
19 points
8 comments5 min readLW link
(reasonableapproximation.net)

The Game of Antonyms

Faustify26 Oct 2022 19:26 UTC
4 points
6 comments8 min readLW link

Paper: In-con­text Re­in­force­ment Learn­ing with Al­gorithm Distil­la­tion [Deep­mind]

LawrenceC26 Oct 2022 18:45 UTC
29 points
5 comments1 min readLW link
(arxiv.org)

[Question] How to be­come more ar­tic­u­late?

just_browsing26 Oct 2022 14:43 UTC
17 points
14 comments1 min readLW link

Open Bands: Lead­ing Rhythm

jefftk26 Oct 2022 14:30 UTC
10 points
0 comments4 min readLW link
(www.jefftk.com)

Sig­nals of war in Au­gust 2021

yieldthought26 Oct 2022 8:11 UTC
70 points
16 comments2 min readLW link

Trig­ger-based rapid checklists

VipulNaik26 Oct 2022 4:05 UTC
44 points
0 comments9 min readLW link

Why some peo­ple be­lieve in AGI, but I don’t.

cveres26 Oct 2022 3:09 UTC
−15 points
6 comments1 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

25 Oct 2022 20:48 UTC
14 points
2 comments4 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel Nanda25 Oct 2022 20:24 UTC
52 points
7 comments1 min readLW link
(www.youtube.com)

Noth­ing.

rogersbacon25 Oct 2022 16:33 UTC
−10 points
4 comments6 min readLW link
(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_Ammann25 Oct 2022 16:29 UTC
21 points
1 comment5 min readLW link

Con­sider Ap­ply­ing to the Fu­ture Fel­low­ship at MIT

jefftk25 Oct 2022 15:40 UTC
29 points
0 comments1 min readLW link
(www.jefftk.com)

Beyond Kol­mogorov and Shannon

25 Oct 2022 15:13 UTC
63 points
20 comments5 min readLW link

What does it take to defend the world against out-of-con­trol AGIs?

Steven Byrnes25 Oct 2022 14:47 UTC
199 points
47 comments30 min readLW link1 review

Refine: what helped me write more?

Alexander Gietelink Oldenziel25 Oct 2022 14:44 UTC
12 points
0 comments2 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere8925 Oct 2022 12:51 UTC
−7 points
8 comments1 min readLW link
(www.lesswrong.com)

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal 25 Oct 2022 12:44 UTC
34 points
22 comments1 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC
15 points
3 comments1 min readLW link

Furry Ra­tion­al­ists & Effec­tive An­thro­po­mor­phism both exist

agentydragon25 Oct 2022 3:37 UTC
42 points
3 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC
10 points
0 comments1 min readLW link