Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

Oct 25, 2022, 8:48 PM
14 points
2 comments4 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel NandaOct 25, 2022, 8:24 PM
52 points
7 comments1 min readLW link
(www.youtube.com)

Noth­ing.

rogersbaconOct 25, 2022, 4:33 PM
−10 points
4 comments6 min readLW link
(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_AmmannOct 25, 2022, 4:29 PM
21 points
1 comment5 min readLW link

Con­sider Ap­ply­ing to the Fu­ture Fel­low­ship at MIT

jefftkOct 25, 2022, 3:40 PM
29 points
0 comments1 min readLW link
(www.jefftk.com)

Beyond Kol­mogorov and Shannon

Oct 25, 2022, 3:13 PM
63 points
22 comments5 min readLW link

What does it take to defend the world against out-of-con­trol AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM
207 points
48 comments30 min readLW link1 review

Refine: what helped me write more?

Alexander Gietelink OldenzielOct 25, 2022, 2:44 PM
12 points
0 comments2 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere89Oct 25, 2022, 12:51 PM
−7 points
8 comments1 min readLW link
(www.lesswrong.com)

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal Oct 25, 2022, 12:44 PM
34 points
22 comments1 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM
15 points
3 comments1 min readLW link

Furry Ra­tion­al­ists & Effec­tive An­thro­po­mor­phism both exist

agentydragonOct 25, 2022, 3:37 AM
42 points
3 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe WilliamsOct 25, 2022, 2:57 AM
10 points
0 comments1 min readLW link

Dance Week­ends: Tests not Masks

jefftkOct 25, 2022, 2:10 AM
12 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] What is good Cy­ber Se­cu­rity Ad­vice?

Gunnar_ZarnckeOct 24, 2022, 11:27 PM
30 points
12 comments2 min readLW link

Con­nec­tions be­tween Mind-Body Prob­lem & Civilizations

oblivionOct 24, 2022, 9:55 PM
−3 points
1 comment1 min readLW link

[Question] Ra­tion­al­ism and money

David KOct 24, 2022, 9:22 PM
−5 points
2 comments1 min readLW link

[Question] Game semantics

David KOct 24, 2022, 9:22 PM
2 points
2 comments1 min readLW link

A Good Fu­ture (rough draft)

Michael SoareverixOct 24, 2022, 8:45 PM
10 points
5 comments3 min readLW link

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel NandaOct 24, 2022, 8:45 PM
64 points
12 comments3 min readLW link
(neelnanda.io)

POWER­play: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM
29 points
0 comments1 min readLW link
(github.com)

Con­sider try­ing Vivek Heb­bar’s al­ign­ment exercises

AkashOct 24, 2022, 7:46 PM
38 points
1 comment4 min readLW link

[Question] Ed­u­ca­tion not meant for mass-consumption

ToloOct 24, 2022, 7:45 PM
7 points
5 comments2 min readLW link

Real­iza­tions in Re­gards to Masculinity

nmcOct 24, 2022, 7:42 PM
−2 points
2 comments2 min readLW link

The Fu­til­ity of Religion

nmcOct 24, 2022, 7:42 PM
−1 points
5 comments3 min readLW link

The op­ti­mal timing of spend­ing on AGI safety work; why we should prob­a­bly be spend­ing more now

Tristan CookOct 24, 2022, 5:42 PM
62 points
0 comments1 min readLW link

AGI in our life­times is wish­ful thinking

niknobleOct 24, 2022, 11:53 AM
0 points
25 comments8 min readLW link

Deep­Mind on Strat­ego, an im­perfect in­for­ma­tion game

sanxiynOct 24, 2022, 5:57 AM
15 points
9 comments1 min readLW link
(arxiv.org)

[Question] TOMT: Post from 1-2 years ago talk­ing about a pa­per on so­cial networks

Simon BerensOct 24, 2022, 1:29 AM
5 points
1 comment1 min readLW link

AI re­searchers an­nounce Neu­roAI agenda

Cameron BergOct 24, 2022, 12:14 AM
37 points
12 comments6 min readLW link
(arxiv.org)

Em­pow­er­ment is (al­most) All We Need

jacob_cannellOct 23, 2022, 9:48 PM
61 points
44 comments17 min readLW link

“Origi­nal­ity is noth­ing but ju­di­cious imi­ta­tion”—Voltaire

VestoziaOct 23, 2022, 7:00 PM
0 points
0 comments13 min readLW link

Mid-Pen­in­sula ACX/​LW Meetup [CANCELLED]

moshezadkaOct 23, 2022, 5:37 PM
1 point
0 comments1 min readLW link

I am a Me­moryless System

Nicholas / Heather KrossOct 23, 2022, 5:34 PM
25 points
2 comments9 min readLW link
(www.thinkingmuchbetter.com)

Ac­countabil­ity Bud­dies: Why you might want one.

Samuel NellessenOct 23, 2022, 4:25 PM
10 points
3 comments1 min readLW link

How to get past Haidt’s elephant and listen

AstynaxOct 23, 2022, 4:06 PM
13 points
4 comments2 min readLW link

Writ­ing Rus­sian and Ukrainian words in Latin script

ViliamOct 23, 2022, 3:25 PM
19 points
22 comments6 min readLW link

[Question] Have you no­ticed any ways that ra­tio­nal­ists differ? [Brain­storm­ing ses­sion]

tailcalledOct 23, 2022, 11:32 AM
23 points
22 comments1 min readLW link

Mnestics

Jarred FilmerOct 23, 2022, 12:30 AM
120 points
6 comments4 min readLW link

Telic in­tu­itions across the sciences

mrcbarbierOct 22, 2022, 9:31 PM
4 points
0 comments17 min readLW link

A ba­sic lex­i­con of telic concepts

mrcbarbierOct 22, 2022, 9:28 PM
2 points
0 comments3 min readLW link

Do we have the right kind of math for roles, goals and mean­ing?

mrcbarbierOct 22, 2022, 9:28 PM
13 points
5 comments7 min readLW link

[Question] The Last Year - is there an ex­ist­ing novel about the last year be­fore AI doom?

Luca PetrolatiOct 22, 2022, 8:44 PM
4 points
4 comments1 min readLW link

The high­est-prob­a­bil­ity out­come can be out of distribution

tailcalledOct 22, 2022, 8:00 PM
14 points
5 comments1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
25 points
0 comments1 min readLW link

Crypto loves im­pact mar­kets: Notes from Schel­ling Point Bogotá

Rachel ShuOct 22, 2022, 3:58 PM
17 points
2 comments1 min readLW link

[Question] When try­ing to define gen­eral in­tel­li­gence is abil­ity to achieve goals the best met­ric?

jmhOct 22, 2022, 3:09 AM
5 points
0 comments1 min readLW link

[Question] Sim­ple ques­tion about cor­rigi­bil­ity and val­ues in AI.

jmhOct 22, 2022, 2:59 AM
6 points
1 comment1 min readLW link

Moorean Statements

David UdellOct 22, 2022, 12:50 AM
11 points
11 comments1 min readLW link

Wis­dom Can­not Be Unzipped

SableOct 22, 2022, 12:28 AM
74 points
17 comments7 min readLW link1 review
(affablyevil.substack.com)