In­ter­pret­ing OpenAI’s Whisper

EllenaR24 Sep 2023 17:53 UTC
114 points
13 comments7 min readLW link

Con­tra­dic­tion Ap­peal Bias

onur24 Sep 2023 17:03 UTC
3 points
2 comments1 min readLW link

RAIN: Your Lan­guage Models Can Align Them­selves with­out Fine­tun­ing—Microsoft Re­search 2023 - Re­duces the ad­ver­sar­ial prompt at­tack suc­cess rate from 94% to 19%!

Singularian250124 Sep 2023 16:48 UTC
5 points
0 comments1 min readLW link

Honor Sys­tem for Vac­ci­na­tion?

jefftk24 Sep 2023 11:50 UTC
17 points
22 comments1 min readLW link
(www.jefftk.com)

Far-Fu­ture Com­mit­ments as a Policy Con­sen­sus Strategy

FCCC24 Sep 2023 6:34 UTC
7 points
40 comments1 min readLW link

Five ne­glected work ar­eas that could re­duce AI risk

24 Sep 2023 2:03 UTC
17 points
5 comments9 min readLW link

[Question] Are the other Ra­tion­al­ity: A-Z se­quences com­ing out as books?

caffeinated_dissonance24 Sep 2023 0:38 UTC
7 points
3 comments1 min readLW link

The Dick Kick’em Paradox

Augs SMSHacks23 Sep 2023 22:22 UTC
−5 points
21 comments1 min readLW link

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
120 points
74 comments4 min readLW link
(arxiv.org)

Sparse Cod­ing, for Mechanis­tic In­ter­pretabil­ity and Ac­ti­va­tion Engineering

David Udell23 Sep 2023 19:16 UTC
42 points
7 comments34 min readLW link

[Question] Places to meet in­ter­est­ing mid­dle-aged men?

anon_girl23 Sep 2023 19:06 UTC
18 points
7 comments1 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
30 points
8 comments5 min readLW link

A quick re­mark on so-called “hal­lu­ci­na­tions” in LLMs and hu­mans

Bill Benzon23 Sep 2023 12:17 UTC
4 points
4 comments1 min readLW link

Hand-writ­ing MathML

jefftk23 Sep 2023 11:20 UTC
16 points
40 comments1 min readLW link
(www.jefftk.com)

Musk, Star­link, and Crimea

Nicholas / Heather Kross23 Sep 2023 2:35 UTC
−13 points
0 comments5 min readLW link

[Linkpost/​Video] All The Times We Nearly Blew Up The World

Jacob G-W23 Sep 2023 1:18 UTC
6 points
1 comment1 min readLW link
(www.youtube.com)

Luck based medicine: in­os­i­tol for anx­iety and brain fog

Elizabeth22 Sep 2023 20:10 UTC
40 points
5 comments3 min readLW link
(acesounderglass.com)

If in­fluence func­tions are not ap­prox­i­mat­ing leave-one-out, how are they sup­posed to help?

Fabien Roger22 Sep 2023 14:23 UTC
66 points
5 comments3 min readLW link

Model­ing p(doom) with TrojanGDP

K. Liam Smith22 Sep 2023 14:19 UTC
−2 points
2 comments13 min readLW link

Let’s talk about Im­pos­tor syn­drome in AI safety

Igor Ivanov22 Sep 2023 13:51 UTC
29 points
4 comments3 min readLW link

Fund Tran­sit With Development

jefftk22 Sep 2023 11:10 UTC
47 points
22 comments3 min readLW link
(www.jefftk.com)

Atoms to Agents Proto-Lectures

johnswentworth22 Sep 2023 6:22 UTC
93 points
14 comments2 min readLW link
(www.youtube.com)

Would You Work Harder In The Least Con­ve­nient Pos­si­ble World?

Firinn22 Sep 2023 5:17 UTC
87 points
94 comments9 min readLW link

Con­tra Kevin Dorst’s Ra­tional Polarization

azsantosk22 Sep 2023 4:28 UTC
8 points
2 comments9 min readLW link

ACX Bos­ton—Petrov Day 2023

duck_master22 Sep 2023 1:13 UTC
2 points
0 comments1 min readLW link

What so­cial sci­ence re­search do you want to see re­an­a­lyzed?

Michael Wiebe22 Sep 2023 0:03 UTC
14 points
9 comments1 min readLW link

Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC
47 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

Neel Nanda on the Mechanis­tic In­ter­pretabil­ity Re­searcher Mindset

Michaël Trazzi21 Sep 2023 19:47 UTC
37 points
1 comment3 min readLW link
(theinsideview.ai)

Re­quire AGI to be Explainable

PeterMcCluskey21 Sep 2023 16:11 UTC
5 points
0 comments6 min readLW link
(bayesianinvestor.com)

Up­date to “Dom­i­nant As­surance Con­tract Plat­form”

moyamo21 Sep 2023 16:09 UTC
32 points
1 comment1 min readLW link

Sparse Au­toen­coders: Fu­ture Work

21 Sep 2023 15:30 UTC
35 points
5 comments6 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
158 points
8 comments5 min readLW link

There should be more AI safety orgs

Marius Hobbhahn21 Sep 2023 14:53 UTC
181 points
25 comments17 min readLW link

Ward 5: Jack Perenick and Naima Sait

jefftk21 Sep 2023 13:00 UTC
24 points
0 comments1 min readLW link
(www.jefftk.com)

AI #30: Dalle-3 and GPT-3.5-In­struct-Turbo

Zvi21 Sep 2023 12:00 UTC
75 points
8 comments47 min readLW link
(thezvi.wordpress.com)

[Question] How are ra­tio­nal­ists or orgs blocked, that you can see?

Nathan Young21 Sep 2023 2:37 UTC
7 points
2 comments1 min readLW link

Vi­sion Week­end US Edition

Allison Duettmann20 Sep 2023 21:28 UTC
4 points
0 comments1 min readLW link

Fore­sight Vi­sion Week­end Europe Edition

Allison Duettmann20 Sep 2023 21:25 UTC
3 points
0 comments1 min readLW link

Notes on ChatGPT’s “mem­ory” for strings and for events

Bill Benzon20 Sep 2023 18:12 UTC
3 points
0 comments10 min readLW link

Belief and the Truth

Sam I am20 Sep 2023 17:38 UTC
2 points
14 comments5 min readLW link
(open.substack.com)

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

20 Sep 2023 15:23 UTC
58 points
9 comments1 min readLW link
(arxiv.org)

In­ter­pretabil­ity Ex­ter­nal­ities Case Study—Hun­gry Hun­gry Hippos

Magdalena Wache20 Sep 2023 14:42 UTC
64 points
22 comments2 min readLW link

An Ele­men­tary In­tro­duc­tion to In­fra-Bayesianism

CharlesRW20 Sep 2023 14:29 UTC
16 points
0 comments1 min readLW link

Weekly In­ci­dence In­clud­ing Delay

jefftk20 Sep 2023 14:00 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] The stereo­type of male clas­si­cal mu­sic lovers be­ing gay

BB620 Sep 2023 13:23 UTC
11 points
6 comments1 min readLW link

Hous­ing Roundup #6

Zvi20 Sep 2023 13:10 UTC
27 points
8 comments14 min readLW link
(thezvi.wordpress.com)

Care­less talk on US-China AI com­pe­ti­tion? (and crit­i­cism of CAIS cov­er­age)

Oliver Sourbut20 Sep 2023 12:46 UTC
3 points
0 comments10 min readLW link
(www.oliversourbut.net)

A New Bayesian De­ci­sion Theory

Pareto Optimal20 Sep 2023 9:36 UTC
−6 points
0 comments1 min readLW link
(paretooptimal.substack.com)

Protest against Meta’s ir­re­versible pro­lifer­a­tion (Sept 29, San Fran­cisco)

Holly_Elmore19 Sep 2023 23:40 UTC
54 points
33 comments1 min readLW link