Paper Walk­through: Au­to­mated Cir­cuit Dis­cov­ery with Arthur Conmy

Neel Nanda29 Aug 2023 22:07 UTC
36 points
1 comment1 min readLW link
(www.youtube.com)

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

29 Aug 2023 19:44 UTC
26 points
2 comments6 min readLW link

The Eco­nomics of the As­teroid Deflec­tion Prob­lem (Dom­i­nant As­surance Con­tracts)

moyamo29 Aug 2023 18:28 UTC
78 points
71 comments15 min readLW link

The Epistemic Author­ity of Deep Learn­ing Pioneers

Dylan Bowman29 Aug 2023 18:14 UTC
8 points
2 comments3 min readLW link

Demo­cratic Fine-Tuning

Joe Edelman29 Aug 2023 18:13 UTC
22 points
2 comments1 min readLW link
(open.substack.com)

Should ra­tio­nal­ists (be seen to) win?

Will_Pearson29 Aug 2023 18:13 UTC
6 points
7 comments1 min readLW link

Frank­furt meetup

sultan29 Aug 2023 18:10 UTC
2 points
0 comments1 min readLW link

Is­tan­bul meetup

sultan29 Aug 2023 18:10 UTC
2 points
0 comments1 min readLW link

Bro­ken Bench­mark: MMLU

awg29 Aug 2023 18:09 UTC
24 points
5 comments1 min readLW link
(www.youtube.com)

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

29 Aug 2023 15:07 UTC
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Loft Bed Fan Guard

jefftk29 Aug 2023 13:30 UTC
16 points
3 comments1 min readLW link
(www.jefftk.com)

Dat­ing Roundup #1: This is Why You’re Single

Zvi29 Aug 2023 12:50 UTC
86 points
28 comments38 min readLW link
(thezvi.wordpress.com)

Neu­ral Rec­og­niz­ers: Some [old] notes based on a TV tube metaphor [per­cep­tual con­tact with the world]

Bill Benzon29 Aug 2023 11:33 UTC
4 points
0 comments5 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor Leahy29 Aug 2023 10:56 UTC
63 points
13 comments1 min readLW link
(www.youtube.com)

New­comb Variant

lsusr29 Aug 2023 7:02 UTC
25 points
22 comments1 min readLW link

[Question] In­cen­tives af­fect­ing al­ign­ment-re­searcher encouragement

Nicholas / Heather Kross29 Aug 2023 5:11 UTC
28 points
3 comments1 min readLW link

Any­one want to de­bate pub­li­cly about FDT?

omnizoid29 Aug 2023 3:45 UTC
13 points
31 comments1 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

29 Aug 2023 1:29 UTC
53 points
3 comments10 min readLW link

An In­ter­pretabil­ity Illu­sion for Ac­ti­va­tion Patch­ing of Ar­bi­trary Subspaces

29 Aug 2023 1:04 UTC
77 points
4 comments1 min readLW link

OpenAI API base mod­els are not syco­phan­tic, at any size

nostalgebraist29 Aug 2023 0:58 UTC
182 points
20 comments2 min readLW link
(colab.research.google.com)

Paradigms and The­ory Choice in AI: Adap­tivity, Econ­omy and Control

particlemania28 Aug 2023 22:19 UTC
4 points
0 comments16 min readLW link

[Question] Hu­man­i­ties In A Post-Con­scious AI World?

Netcentrica28 Aug 2023 21:59 UTC
1 point
1 comment2 min readLW link

In­tro­duc­ing the Cen­ter for AI Policy (& we’re hiring!)

Thomas Larsen28 Aug 2023 21:17 UTC
123 points
50 comments2 min readLW link
(www.aipolicy.us)

[Question] 45% to 55% vs. 90% to 100%

yhoiseth28 Aug 2023 19:15 UTC
5 points
8 comments4 min readLW link

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevor28 Aug 2023 18:54 UTC
37 points
7 comments3 min readLW link

The Ev­i­dence for Ques­tion De­com­po­si­tion is Weak

niplav28 Aug 2023 15:46 UTC
22 points
6 comments5 min readLW link

ACX Meetup Any­where, Bratis­lava, Slovakia

David Varga28 Aug 2023 15:40 UTC
1 point
0 comments1 min readLW link

The An­thropic Prin­ci­ple Tells Us That AGI Will Not Be Conscious

nem28 Aug 2023 15:25 UTC
−4 points
8 comments1 min readLW link

No More Freezer Pucks

jefftk28 Aug 2023 15:20 UTC
10 points
7 comments1 min readLW link
(www.jefftk.com)

The mind as a polyvis­cous fluid

Bill Benzon28 Aug 2023 14:38 UTC
8 points
0 comments3 min readLW link

[Question] Who can most re­duce X-Risk?

sudhanshu_kasewa28 Aug 2023 14:38 UTC
1 point
12 comments1 min readLW link

Drinks at a bar

yakimoff28 Aug 2023 3:13 UTC
2 points
0 comments1 min readLW link

Dear Self; we need to talk about ambition

Elizabeth27 Aug 2023 23:10 UTC
260 points
27 comments8 min readLW link2 reviews
(acesounderglass.com)

AI pause/​gov­er­nance ad­vo­cacy might be net-nega­tive, es­pe­cially with­out a fo­cus on ex­plain­ing x-risk

Mikhail Samin27 Aug 2023 23:05 UTC
82 points
9 comments6 min readLW link

Will is­sues are quite nearly skill issues

dkl927 Aug 2023 16:42 UTC
1 point
1 comment3 min readLW link
(dkl9.net)

Xanadu, GPT, and Beyond: An ad­ven­ture of the mind

Bill Benzon27 Aug 2023 16:19 UTC
2 points
0 comments5 min readLW link

High level overview on how to go about es­ti­mat­ing “p(doom)” or the like

Aryeh Englander27 Aug 2023 16:01 UTC
16 points
0 comments5 min readLW link

Try­ing a Wet Suit

jefftk27 Aug 2023 15:00 UTC
33 points
5 comments1 min readLW link
(www.jefftk.com)

Ap­ply to a small iter­a­tion of MLAB in Oxford

27 Aug 2023 14:54 UTC
2 points
0 comments1 min readLW link

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

27 Aug 2023 14:21 UTC
12 points
0 comments1 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC
24 points
15 comments6 min readLW link

Eliezer Yud­kowsky Is Fre­quently, Con­fi­dently, Egre­giously Wrong

omnizoid27 Aug 2023 1:06 UTC
−25 points
97 comments36 min readLW link

Mesa-Op­ti­miza­tion: Ex­plain it like I’m 10 Edition

brook26 Aug 2023 23:04 UTC
20 points
1 comment6 min readLW link

Au­mann-agree­ment is common

tailcalled26 Aug 2023 20:22 UTC
64 points
33 comments7 min readLW link1 review

Abuja, Nige­ria – ACX Mee­tups Every­where Fall 2023

Olaoluwa Akinloluwa26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link

Cal­gary, Alberta, Canada – ACX Mee­tups Every­where Fall 2023

David Piepgrass26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link

Stock­holm, Swe­den – ACX Mee­tups Every­where Fall 2023

Jonatan Westholm26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link

Col­lege Sta­tion, Texas, USA – ACX Mee­tups Every­where Fall 2023

frost26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link

Playa Del Car­men, Mex­ico – ACX Mee­tups Every­where Fall 2023

andcut26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link

Philadelphia, Pennsly­va­nia, USA – ACX Mee­tups Every­where Fall 2023

Philadelphia Rationality26 Aug 2023 17:58 UTC
1 point
0 comments1 min readLW link