What sorts of sys­tems can be de­cep­tive?

Andrei AlexandruOct 31, 2022, 10:00 PM
16 points
0 comments7 min readLW link

“Cars and Elephants”: a hand­wavy ar­gu­ment/​anal­ogy against mechanis­tic interpretability

David Scott Krueger (formerly: capybaralet)Oct 31, 2022, 9:26 PM
48 points
25 comments2 min readLW link

Su­per­in­tel­li­gent AI is nec­es­sary for an amaz­ing fu­ture, but far from sufficient

So8resOct 31, 2022, 9:16 PM
132 points
48 comments34 min readLW link

San­ity-check­ing in an age of hyperbole

Ciprian Elliu IvanofOct 31, 2022, 8:04 PM
2 points
4 comments2 min readLW link

Why Aren’t There More Schel­ling Holi­days?

johnswentworthOct 31, 2022, 7:31 PM
63 points
21 comments1 min readLW link

The cir­cu­lar prob­lem of epistemic irresponsibility

Roman LeventovOct 31, 2022, 5:23 PM
5 points
2 comments8 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM
7 points
4 comments14 min readLW link

Aver­age util­i­tar­i­anism is non-local

Yair HalberstadtOct 31, 2022, 4:36 PM
29 points
13 comments1 min readLW link

Marvel Snap: Phase 1

ZviOct 31, 2022, 3:20 PM
23 points
1 comment14 min readLW link
(thezvi.wordpress.com)

Boundaries vs Frames

Scott GarrabrantOct 31, 2022, 3:14 PM
58 points
10 comments7 min readLW link

Embed­ding safety in ML development

zeshenOct 31, 2022, 12:27 PM
24 points
1 comment18 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM
20 points
1 comment1 min readLW link
(christophm.github.io)

My (naive) take on Risks from Learned Optimization

Artyom KarpovOct 31, 2022, 10:59 AM
7 points
0 comments5 min readLW link

Tac­ti­cal Nu­clear Weapons Aren’t Cost-Effec­tive Com­pared to Pre­ci­sion Artillery

Lao MeinOct 31, 2022, 4:33 AM
28 points
7 comments3 min readLW link

Gan­dalf or Saru­man? A Soldier in Scout’s Clothing

DirectedEvolutionOct 31, 2022, 2:40 AM
41 points
1 comment4 min readLW link

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM
26 points
1 comment1 min readLW link
(braininspired.co)

“Nor­mal” is the equil­ibrium state of past op­ti­miza­tion processes

Alex_AltairOct 30, 2022, 7:03 PM
81 points
5 comments5 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM
9 points
0 comments10 min readLW link

In­stru­men­tal ig­nor­ing AI, Dumb but not use­less.

Donald HobsonOct 30, 2022, 4:55 PM
7 points
6 comments2 min readLW link

Weekly Roundup #3

ZviOct 30, 2022, 12:20 PM
23 points
5 comments15 min readLW link
(thezvi.wordpress.com)

Quickly re­fac­tor­ing the U.S. Constitution

lcOct 30, 2022, 7:17 AM
7 points
25 comments4 min readLW link

«Boundaries», Part 3a: Defin­ing bound­aries as di­rected Markov blankets

Andrew_CritchOct 30, 2022, 6:31 AM
90 points
20 comments15 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porbyOct 29, 2022, 10:16 PM
116 points
4 comments4 min readLW link

AI as a Civ­i­liza­tional Risk Part 1/​6: His­tor­i­cal Priors

PashaKamyshevOct 29, 2022, 9:59 PM
2 points
2 comments7 min readLW link

Don’t ex­pect your life part­ner to be bet­ter than your exes in more than one way: a math­e­mat­i­cal model

mddOct 29, 2022, 6:47 PM
7 points
1 comment9 min readLW link

The So­cial Re­ces­sion: By the Numbers

antonomonOct 29, 2022, 6:45 PM
165 points
29 comments8 min readLW link
(novum.substack.com)

Elec­tric Ket­tle vs Stove

jefftkOct 29, 2022, 12:50 PM
18 points
7 comments1 min readLW link
(www.jefftk.com)

Quan­tum Im­mor­tal­ity, foiled

BenOct 29, 2022, 11:00 AM
27 points
4 comments2 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

Oct 28, 2022, 11:55 PM
101 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Re­sources that (I think) new al­ign­ment re­searchers should know about

AkashOct 28, 2022, 10:13 PM
69 points
9 comments4 min readLW link

How of­ten does One Per­son suc­ceed?

Mayank ModiOct 28, 2022, 7:32 PM
1 point
3 comments1 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
58 points
23 comments1 min readLW link

Rapid Test Throat Swab­bing?

jefftkOct 28, 2022, 4:30 PM
18 points
2 comments1 min readLW link
(www.jefftk.com)

Join the in­ter­pretabil­ity re­search hackathon

Esben KranOct 28, 2022, 4:26 PM
15 points
0 comments1 min readLW link

Syncretism

AnnapurnaOct 28, 2022, 4:08 PM
16 points
4 comments1 min readLW link
(jorgevelez.substack.com)

Pon­der­ing com­pu­ta­tion in the real world

Adam ShaiOct 28, 2022, 3:57 PM
24 points
13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKlOct 28, 2022, 12:26 PM
−2 points
153 comments11 min readLW link

New book on s-risks

Tobias_BaumannOct 28, 2022, 9:36 AM
68 points
1 comment1 min readLW link

Cryp­tic symbols

Adam ScherlisOct 28, 2022, 6:44 AM
6 points
17 comments1 min readLW link
(adam.scherlis.com)

All life’s helpers’ beliefs

TehdastehdasOct 28, 2022, 5:47 AM
−12 points
1 comment5 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshcOct 28, 2022, 2:51 AM
36 points
5 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDELOct 28, 2022, 1:53 AM
−22 points
4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y CelisOct 28, 2022, 1:21 AM
1 point
0 comments1 min readLW link

Nash equil­ibria of sym­met­ric zero-sum games

Ege ErdilOct 27, 2022, 11:50 PM
14 points
0 comments14 min readLW link

[Question] Good psy­chol­ogy books/​books that con­tain good psy­cholog­i­cal mod­els?

shuffled-cantaloupeOct 27, 2022, 11:04 PM
1 point
1 comment1 min readLW link

Pod­cast: The Left and Effec­tive Altru­ism with Habiba Islam

garrisonOct 27, 2022, 5:41 PM
2 points
2 comments1 min readLW link

Les­sons from ‘Famine, Affluence, and Mo­ral­ity’ and its re­flec­tion on to­day.

Mayank ModiOct 27, 2022, 5:20 PM
4 points
0 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere89Oct 27, 2022, 2:41 PM
12 points
20 comments1 min readLW link

His­tori­cism in the math-ad­ja­cent sciences

mrcbarbierOct 27, 2022, 2:38 PM
3 points
0 comments5 min readLW link

How Risky Is Trick-or-Treat­ing?

jefftkOct 27, 2022, 2:10 PM
58 points
18 comments2 min readLW link
(www.jefftk.com)