Is check­ing that a state of the world is not dystopian eas­ier than con­struct­ing a non-dystopian state?

No77eDec 27, 2022, 8:57 PM
5 points
3 comments1 min readLW link

Crypto-cur­rency as pro-al­ign­ment mechanism

False NameDec 27, 2022, 5:45 PM
−10 points
2 comments2 min readLW link

My Reser­va­tions about Dis­cov­er­ing La­tent Knowl­edge (Burns, Ye, et al)

Robert_AIZIDec 27, 2022, 5:27 PM
50 points
0 comments4 min readLW link
(aizi.substack.com)

Things that can kill you quickly: What ev­ery­one should know about first aid

jasoncrawfordDec 27, 2022, 4:23 PM
166 points
21 comments2 min readLW link1 review
(jasoncrawford.org)

[Question] Why The Fo­cus on Ex­pected Utility Max­imisers?

DragonGodDec 27, 2022, 3:49 PM
118 points
84 comments3 min readLW link

Pre­sump­tive Listen­ing: stick­ing to fa­mil­iar con­cepts and miss­ing the outer rea­son­ing paths

RemmeltDec 27, 2022, 3:40 PM
−16 points
8 comments2 min readLW link
(mflb.com)

Mere ex­po­sure effect: Bias in Eval­u­at­ing AGI X-Risks

Dec 27, 2022, 2:05 PM
0 points
2 comments1 min readLW link

Hous­ing and Trans­porta­tion Roundup #2

ZviDec 27, 2022, 1:10 PM
25 points
0 comments12 min readLW link
(thezvi.wordpress.com)

[Question] Are tul­pas moral pa­tients?

ChristianKlDec 27, 2022, 11:30 AM
16 points
28 comments1 min readLW link

Reflec­tions on my 5-month al­ign­ment up­skil­ling grant

Jay BaileyDec 27, 2022, 10:51 AM
82 points
4 comments8 min readLW link

In­sti­tu­tions Can­not Res­train Dark-Triad AI Exploitation

Dec 27, 2022, 10:34 AM
5 points
0 comments5 min readLW link
(mflb.com)

In­tro­duc­tion: Bias in Eval­u­at­ing AGI X-Risks

Dec 27, 2022, 10:27 AM
1 point
0 comments3 min readLW link

MDPs and the Bel­l­man Equa­tion, In­tu­itively Explained

Jack O'BrienDec 27, 2022, 5:50 AM
11 points
3 comments14 min readLW link

How ‘Hu­man-Hu­man’ dy­nam­ics give way to ‘Hu­man-AI’ and then ‘AI-AI’ dynamics

Dec 27, 2022, 3:16 AM
−2 points
5 comments2 min readLW link
(mflb.com)

Nine Points of Col­lec­tive Insanity

Dec 27, 2022, 3:14 AM
−2 points
3 comments1 min readLW link
(mflb.com)

Frac­tional Resignation

jefftkDec 27, 2022, 2:30 AM
19 points
6 comments1 min readLW link
(www.jefftk.com)

[Question] What poli­cies have most thor­oughly crip­pled (oth­er­wise-promis­ing) in­dus­tries or tech­nolo­gies?

benwrDec 27, 2022, 2:25 AM
40 points
4 comments1 min readLW link

Re­cent ad­vances in Nat­u­ral Lan­guage Pro­cess­ing—Some Woolly spec­u­la­tions (2019 es­say on se­man­tics and lan­guage mod­els)

philosophybearDec 27, 2022, 2:11 AM
1 point
0 comments7 min readLW link

Against Agents as an Ap­proach to Aligned Trans­for­ma­tive AI

DragonGodDec 27, 2022, 12:47 AM
12 points
9 comments2 min readLW link

Can we effi­ciently dis­t­in­guish differ­ent mechanisms?

paulfchristianoDec 27, 2022, 12:20 AM
88 points
30 comments16 min readLW link
(ai-alignment.com)

Air-gap­ping eval­u­a­tion and support

Ryan KiddDec 26, 2022, 10:52 PM
53 points
1 comment2 min readLW link

Slightly against al­ign­ing with neo-luddites

Matthew BarnettDec 26, 2022, 10:46 PM
104 points
31 comments4 min readLW link

Avoid­ing per­pet­ual risk from TAI

scasperDec 26, 2022, 10:34 PM
15 points
6 comments5 min readLW link

An­nounc­ing: The In­de­pen­dent AI Safety Registry

Shoshannah TekofskyDec 26, 2022, 9:22 PM
53 points
9 comments1 min readLW link

Are men harder to help?

bracesDec 26, 2022, 9:11 PM
35 points
1 comment2 min readLW link

[Question] How much should I up­date on the fact that my den­tist is named Den­nis?

MichaelDickensDec 26, 2022, 7:11 PM
2 points
3 comments1 min readLW link

Theod­icy and the simu­la­tion hy­poth­e­sis, or: The prob­lem of simu­la­tor evil

philosophybearDec 26, 2022, 6:55 PM
12 points
12 comments19 min readLW link
(philosophybear.substack.com)

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

CanDec 26, 2022, 6:51 PM
16 points
2 comments10 min readLW link
(forum.effectivealtruism.org)

Co­her­ent ex­trap­o­lated dreaming

Alex FlintDec 26, 2022, 5:29 PM
38 points
10 comments17 min readLW link

An overview of some promis­ing work by ju­nior al­ign­ment researchers

AkashDec 26, 2022, 5:23 PM
34 points
0 comments4 min readLW link

Sols­tice song: Here Lies the Dragon

jchanDec 26, 2022, 4:08 PM
8 points
1 comment2 min readLW link

The Use­ful­ness Paradigm

AprillionDec 26, 2022, 1:23 PM
4 points
4 comments1 min readLW link

Look­ing Back on Posts From 2022

ZviDec 26, 2022, 1:20 PM
50 points
8 comments17 min readLW link
(thezvi.wordpress.com)

Analo­gies be­tween Soft­ware Re­v­erse Eng­ineer­ing and Mechanis­tic Interpretability

Dec 26, 2022, 12:26 PM
34 points
6 comments11 min readLW link
(www.neelnanda.io)

Mlyyrczo

lsusrDec 26, 2022, 7:58 AM
41 points
14 comments3 min readLW link

Causal ab­strac­tions vs infradistributions

Pablo VillalobosDec 26, 2022, 12:21 AM
24 points
0 comments6 min readLW link

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel NandaDec 25, 2022, 10:21 PM
57 points
7 comments12 min readLW link
(www.neelnanda.io)

It’s time to worry about on­line pri­vacy again

MalmesburyDec 25, 2022, 9:05 PM
67 points
23 comments6 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

Dec 25, 2022, 8:58 PM
15 points
2 comments6 min readLW link
(www.snellessen.com)

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieveDec 25, 2022, 8:14 PM
3 points
6 comments1 min readLW link

YCom­bi­na­tor fraud rates

XodarapDec 25, 2022, 7:21 PM
56 points
3 comments1 min readLW link

How evolu­tion­ary lineages of LLMs can plan their own fu­ture and act on these plans

Roman LeventovDec 25, 2022, 6:11 PM
39 points
16 comments8 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM
32 points
38 comments9 min readLW link

ChatGPT is our Wright Brothers moment

Ron JDec 25, 2022, 4:26 PM
10 points
9 comments1 min readLW link

The Med­i­ta­tion on Winter

RaemonDec 25, 2022, 4:12 PM
59 points
3 comments3 min readLW link

I’ve up­dated to­wards AI box­ing be­ing sur­pris­ingly easy

Noosphere89Dec 25, 2022, 3:40 PM
8 points
20 comments2 min readLW link

Take 14: Cor­rigi­bil­ity isn’t that great.

Charlie SteinerDec 25, 2022, 1:04 PM
15 points
3 comments3 min readLW link

Sim­plified Level Up

jefftkDec 25, 2022, 1:00 PM
12 points
16 comments2 min readLW link
(www.jefftk.com)

Hyper­finite graphs ~ manifolds

Alok SinghDec 25, 2022, 12:24 PM
11 points
5 comments2 min readLW link

In­con­sis­tent math is great

Alok SinghDec 25, 2022, 3:20 AM
1 point
2 comments1 min readLW link