Ja­cob on the Precipice

Richard_Ngo26 Sep 2023 21:16 UTC
42 points
8 comments11 min readLW link
(narrativeark.substack.com)

Text Posts from the Kids Group: 2022

jefftk26 Sep 2023 20:40 UTC
33 points
2 comments7 min readLW link
(www.jefftk.com)

GPT-4 for per­sonal pro­duc­tivity: on­line dis­trac­tion blocker

Sergii26 Sep 2023 17:41 UTC
64 points
12 comments2 min readLW link
(grgv.xyz)

ARENA 2.0 - Im­pact Report

CallumMcDougall26 Sep 2023 17:13 UTC
35 points
5 comments13 min readLW link

Mechanis­tic In­ter­pretabil­ity Read­ing group

26 Sep 2023 16:26 UTC
15 points
0 comments1 min readLW link

An­nounc­ing the CNN In­ter­pretabil­ity Competition

scasper26 Sep 2023 16:21 UTC
22 points
0 comments4 min readLW link

Mak­ing AIs less likely to be spiteful

26 Sep 2023 14:12 UTC
116 points
4 comments10 min readLW link

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

mic26 Sep 2023 12:05 UTC
18 points
11 comments2 min readLW link
(www.businessinsider.com)

En­forc­ing Far-Fu­ture Con­tracts for Governments

FCCC26 Sep 2023 4:26 UTC
−7 points
49 comments3 min readLW link

Car­i­oca Petrov Day

Giskard26 Sep 2023 0:30 UTC
1 point
0 comments1 min readLW link

[Question] A few Align­ment ques­tions: util­ity op­ti­miz­ers, SLT, sharp left turn and identifiability

Igor Timofeev26 Sep 2023 0:27 UTC
6 points
1 comment2 min readLW link

Im­pact sto­ries for model in­ter­nals: an ex­er­cise for in­ter­pretabil­ity researchers

jenny25 Sep 2023 23:15 UTC
29 points
3 comments7 min readLW link

Au­to­nomic Sanity

Sable25 Sep 2023 22:37 UTC
20 points
9 comments4 min readLW link
(affablyevil.substack.com)

[Question] What is wrong with this “util­ity switch but­ton prob­lem” ap­proach?

Donald Hobson25 Sep 2023 21:36 UTC
14 points
3 comments1 min readLW link

You should just smile at strangers a lot

chaosmage25 Sep 2023 20:12 UTC
13 points
10 comments1 min readLW link

The King and the Golem

Richard_Ngo25 Sep 2023 19:51 UTC
187 points
18 comments5 min readLW link1 review
(narrativeark.substack.com)

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

25 Sep 2023 18:55 UTC
3 points
2 comments3 min readLW link
(www.sentienceinstitute.org)

Wel­come to Ap­ply: The 2024 Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety by FLI!

Zhijing Jin25 Sep 2023 18:42 UTC
5 points
2 comments2 min readLW link

Eval­u­at­ing hid­den di­rec­tions on the util­ity dataset: clas­sifi­ca­tion, steer­ing and removal

25 Sep 2023 17:19 UTC
25 points
3 comments7 min readLW link

Linkpost: A model of bi­ases as aris­ing from meta-beliefs

JuanGarcia25 Sep 2023 17:14 UTC
5 points
0 comments1 min readLW link

[Question] What causes a de­ci­sion the­ory to be used?

Dagon25 Sep 2023 16:33 UTC
8 points
2 comments1 min readLW link

Un­der­stand­ing strate­gic de­cep­tion and de­cep­tive alignment

25 Sep 2023 16:27 UTC
64 points
16 comments7 min readLW link
(www.apolloresearch.ai)

The Mer­its of Con­trar­i­anism & Why I hate Chat­bots. [My Ex­pe­rience with the Ide­olog­i­cal Tur­ing Test @ a Less Wrong meetup]

Amina V.25 Sep 2023 16:13 UTC
4 points
1 comment1 min readLW link
(bimbollectual.com)

In­side Views, Im­pos­tor Syn­drome, and the Great LARP

johnswentworth25 Sep 2023 16:08 UTC
331 points
53 comments5 min readLW link

“X dis­tracts from Y” as a thinly-dis­guised fight over group sta­tus /​ politics

Steven Byrnes25 Sep 2023 15:18 UTC
108 points
14 comments8 min readLW link

Ama­zon to in­vest up to $4 billion in Anthropic

Davis_Kingsley25 Sep 2023 14:55 UTC
44 points
8 comments1 min readLW link
(twitter.com)

Should Effec­tive Altru­ists be Valuists in­stead of util­i­tar­i­ans?

25 Sep 2023 14:03 UTC
1 point
3 comments6 min readLW link

Feedly Breaks MathML

jefftk25 Sep 2023 13:40 UTC
15 points
3 comments1 min readLW link
(www.jefftk.com)

[Question] How have you be­come more hard-work­ing?

Chi Nguyen25 Sep 2023 12:37 UTC
80 points
42 comments1 min readLW link

Au­tomat­ing In­tel­li­gence: A Cur­sory Glance at How Au­toML Brings Pre­ci­sion to AI Development

RoscoHunter25 Sep 2023 9:39 UTC
3 points
0 comments3 min readLW link

In­ter­pret­ing OpenAI’s Whisper

EllenaR24 Sep 2023 17:53 UTC
114 points
13 comments7 min readLW link

Con­tra­dic­tion Ap­peal Bias

onur24 Sep 2023 17:03 UTC
3 points
2 comments1 min readLW link

RAIN: Your Lan­guage Models Can Align Them­selves with­out Fine­tun­ing—Microsoft Re­search 2023 - Re­duces the ad­ver­sar­ial prompt at­tack suc­cess rate from 94% to 19%!

Singularian250124 Sep 2023 16:48 UTC
5 points
0 comments1 min readLW link

Honor Sys­tem for Vac­ci­na­tion?

jefftk24 Sep 2023 11:50 UTC
17 points
22 comments1 min readLW link
(www.jefftk.com)

Far-Fu­ture Com­mit­ments as a Policy Con­sen­sus Strategy

FCCC24 Sep 2023 6:34 UTC
7 points
40 comments1 min readLW link

Five ne­glected work ar­eas that could re­duce AI risk

24 Sep 2023 2:03 UTC
17 points
5 comments9 min readLW link

[Question] Are the other Ra­tion­al­ity: A-Z se­quences com­ing out as books?

caffeinated_dissonance24 Sep 2023 0:38 UTC
7 points
3 comments1 min readLW link

The Dick Kick’em Paradox

Augs SMSHacks23 Sep 2023 22:22 UTC
−5 points
21 comments1 min readLW link

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

Paper: LLMs trained on “A is B” fail to learn “B is A”

23 Sep 2023 19:55 UTC
120 points
74 comments4 min readLW link
(arxiv.org)

Sparse Cod­ing, for Mechanis­tic In­ter­pretabil­ity and Ac­ti­va­tion Engineering

David Udell23 Sep 2023 19:16 UTC
42 points
7 comments34 min readLW link

[Question] Places to meet in­ter­est­ing mid­dle-aged men?

anon_girl23 Sep 2023 19:06 UTC
18 points
7 comments1 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
30 points
8 comments5 min readLW link

A quick re­mark on so-called “hal­lu­ci­na­tions” in LLMs and hu­mans

Bill Benzon23 Sep 2023 12:17 UTC
4 points
4 comments1 min readLW link

Hand-writ­ing MathML

jefftk23 Sep 2023 11:20 UTC
16 points
40 comments1 min readLW link
(www.jefftk.com)

Musk, Star­link, and Crimea

Nicholas / Heather Kross23 Sep 2023 2:35 UTC
−13 points
0 comments5 min readLW link

[Linkpost/​Video] All The Times We Nearly Blew Up The World

Jacob G-W23 Sep 2023 1:18 UTC
6 points
1 comment1 min readLW link
(www.youtube.com)

Luck based medicine: in­os­i­tol for anx­iety and brain fog

Elizabeth22 Sep 2023 20:10 UTC
40 points
5 comments3 min readLW link
(acesounderglass.com)

If in­fluence func­tions are not ap­prox­i­mat­ing leave-one-out, how are they sup­posed to help?

Fabien Roger22 Sep 2023 14:23 UTC
66 points
5 comments3 min readLW link

Model­ing p(doom) with TrojanGDP

K. Liam Smith22 Sep 2023 14:19 UTC
−2 points
2 comments13 min readLW link