Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

Oct 28, 2022, 11:55 PM
101 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Re­sources that (I think) new al­ign­ment re­searchers should know about

AkashOct 28, 2022, 10:13 PM
69 points
9 comments4 min readLW link

How of­ten does One Per­son suc­ceed?

Mayank ModiOct 28, 2022, 7:32 PM
1 point
3 comments1 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
58 points
23 comments1 min readLW link

Rapid Test Throat Swab­bing?

jefftkOct 28, 2022, 4:30 PM
18 points
2 comments1 min readLW link
(www.jefftk.com)

Join the in­ter­pretabil­ity re­search hackathon

Esben KranOct 28, 2022, 4:26 PM
15 points
0 comments1 min readLW link

Syncretism

AnnapurnaOct 28, 2022, 4:08 PM
16 points
4 comments1 min readLW link
(jorgevelez.substack.com)

Pon­der­ing com­pu­ta­tion in the real world

Adam ShaiOct 28, 2022, 3:57 PM
24 points
13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKlOct 28, 2022, 12:26 PM
−2 points
153 comments11 min readLW link

New book on s-risks

Tobias_BaumannOct 28, 2022, 9:36 AM
68 points
1 comment1 min readLW link

Cryp­tic symbols

Adam ScherlisOct 28, 2022, 6:44 AM
6 points
17 comments1 min readLW link
(adam.scherlis.com)

All life’s helpers’ beliefs

TehdastehdasOct 28, 2022, 5:47 AM
−12 points
1 comment5 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshcOct 28, 2022, 2:51 AM
36 points
4 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDELOct 28, 2022, 1:53 AM
−22 points
4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y CelisOct 28, 2022, 1:21 AM
1 point
0 comments1 min readLW link

Nash equil­ibria of sym­met­ric zero-sum games

Ege ErdilOct 27, 2022, 11:50 PM
14 points
0 comments14 min readLW link

[Question] Good psy­chol­ogy books/​books that con­tain good psy­cholog­i­cal mod­els?

shuffled-cantaloupeOct 27, 2022, 11:04 PM
1 point
1 comment1 min readLW link

Pod­cast: The Left and Effec­tive Altru­ism with Habiba Islam

garrisonOct 27, 2022, 5:41 PM
2 points
2 comments1 min readLW link

Les­sons from ‘Famine, Affluence, and Mo­ral­ity’ and its re­flec­tion on to­day.

Mayank ModiOct 27, 2022, 5:20 PM
4 points
0 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere89Oct 27, 2022, 2:41 PM
12 points
20 comments1 min readLW link

His­tori­cism in the math-ad­ja­cent sciences

mrcbarbierOct 27, 2022, 2:38 PM
3 points
0 comments5 min readLW link

How Risky Is Trick-or-Treat­ing?

jefftkOct 27, 2022, 2:10 PM
58 points
18 comments2 min readLW link
(www.jefftk.com)

Covid 10/​27/​22: Another Ori­gin Story

ZviOct 27, 2022, 1:40 PM
32 points
1 comment13 min readLW link
(thezvi.wordpress.com)

[Question] Why are prob­a­bil­ities rep­re­sented as real num­bers in­stead of ra­tio­nal num­bers?

Yaakov TOct 27, 2022, 11:23 AM
5 points
9 comments1 min readLW link

Five Areas I Wish EAs Gave More Focus

PrometheusOct 27, 2022, 6:13 AM
13 points
18 comments1 min readLW link

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

Oct 27, 2022, 1:32 AM
135 points
14 comments12 min readLW link

[Question] Quan­tum Suicide and Au­mann’s Agree­ment Theorem

Isaac KingOct 27, 2022, 1:32 AM
14 points
20 comments1 min readLW link

Res­lab Re­quest for In­for­ma­tion: EA hard­ware projects

Joel BeckerOct 26, 2022, 9:13 PM
10 points
0 comments1 min readLW link

A list of Petrov buttons

philhOct 26, 2022, 8:50 PM
19 points
8 comments5 min readLW link
(reasonableapproximation.net)

The Game of Antonyms

FaustifyOct 26, 2022, 7:26 PM
4 points
6 comments8 min readLW link

Paper: In-con­text Re­in­force­ment Learn­ing with Al­gorithm Distil­la­tion [Deep­mind]

LawrenceCOct 26, 2022, 6:45 PM
29 points
5 comments1 min readLW link
(arxiv.org)

[Question] How to be­come more ar­tic­u­late?

just_browsingOct 26, 2022, 2:43 PM
18 points
14 comments1 min readLW link

Open Bands: Lead­ing Rhythm

jefftkOct 26, 2022, 2:30 PM
10 points
0 comments4 min readLW link
(www.jefftk.com)

Sig­nals of war in Au­gust 2021

yieldthoughtOct 26, 2022, 8:11 AM
70 points
16 comments2 min readLW link

Trig­ger-based rapid checklists

VipulNaikOct 26, 2022, 4:05 AM
44 points
0 comments9 min readLW link

Why some peo­ple be­lieve in AGI, but I don’t.

cveresOct 26, 2022, 3:09 AM
−15 points
6 comments1 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM
1 point
10 comments3 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

Oct 25, 2022, 8:48 PM
14 points
2 comments4 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel NandaOct 25, 2022, 8:24 PM
52 points
7 comments1 min readLW link
(www.youtube.com)

Noth­ing.

rogersbaconOct 25, 2022, 4:33 PM
−10 points
4 comments6 min readLW link
(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_AmmannOct 25, 2022, 4:29 PM
21 points
1 comment5 min readLW link

Con­sider Ap­ply­ing to the Fu­ture Fel­low­ship at MIT

jefftkOct 25, 2022, 3:40 PM
29 points
0 comments1 min readLW link
(www.jefftk.com)

Beyond Kol­mogorov and Shannon

Oct 25, 2022, 3:13 PM
63 points
22 comments5 min readLW link

What does it take to defend the world against out-of-con­trol AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM
207 points
48 comments30 min readLW link1 review

Refine: what helped me write more?

Alexander Gietelink OldenzielOct 25, 2022, 2:44 PM
12 points
0 comments2 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere89Oct 25, 2022, 12:51 PM
−7 points
8 comments1 min readLW link
(www.lesswrong.com)

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal Oct 25, 2022, 12:44 PM
34 points
22 comments1 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM
15 points
3 comments1 min readLW link

Furry Ra­tion­al­ists & Effec­tive An­thro­po­mor­phism both exist

agentydragonOct 25, 2022, 3:37 AM
42 points
3 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe WilliamsOct 25, 2022, 2:57 AM
10 points
0 comments1 min readLW link