La­bel­ling, Vari­ables, and In-Con­text Learn­ing in Llama2

Joshua Penman3 Aug 2024 19:36 UTC
6 points
0 comments1 min readLW link
(colab.research.google.com)

[Question] Dan Hendrycks and EA

jeffreycaruso3 Aug 2024 13:33 UTC
−4 points
4 comments1 min readLW link

[Question] Why do Min­i­mal Bayes Nets of­ten cor­re­spond to Causal Models of Real­ity?

Dalcy3 Aug 2024 12:39 UTC
27 points
1 comment1 min readLW link

Why did ChatGPT say that? Prompt en­g­ineer­ing and more, with PIZZA.

Jessica Rumbelow3 Aug 2024 12:07 UTC
40 points
2 comments4 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

3 Aug 2024 10:16 UTC
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

SRE’s re­view of Democracy

Martin Sustrik3 Aug 2024 7:20 UTC
48 points
2 comments3 min readLW link
(250bpm.substack.com)

The Case Against Libertarianism

Zero Contradictions3 Aug 2024 5:05 UTC
−3 points
1 comment1 min readLW link
(zerocontradictions.net)

We Don’t Just Let Peo­ple Die—So What Next?

James Stephen Brown3 Aug 2024 1:04 UTC
11 points
8 comments10 min readLW link

The EA case for Trump

Judd Rosenblatt3 Aug 2024 1:00 UTC
9 points
1 comment1 min readLW link
(www.secondbest.ca)

I didn’t think I’d take the time to build this cal­ibra­tion train­ing game, but with web­sim it took roughly 30 sec­onds, so here it is!

mako yass2 Aug 2024 22:35 UTC
24 points
2 comments5 min readLW link

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
38 points
1 comment9 min readLW link

The Bit­ter Les­son for AI Safety Research

2 Aug 2024 18:39 UTC
57 points
5 comments3 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC
5 points
2 comments7 min readLW link

[Question] Re­quest for AI risk quotes, es­pe­cially around speed, large im­pacts and black boxes

Nathan Young2 Aug 2024 17:49 UTC
6 points
0 comments1 min readLW link

A Sim­ple Toy Co­her­ence Theorem

2 Aug 2024 17:47 UTC
74 points
19 comments7 min readLW link

All the Fol­low­ing are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC
16 points
3 comments8 min readLW link

The ‘strong’ fea­ture hy­poth­e­sis could be wrong

lewis smith2 Aug 2024 14:33 UTC
218 points
17 comments17 min readLW link

The Resi­d­ual Ex­pan­sion: A Frame­work for think­ing about Trans­former Circuits

Daniel Tan2 Aug 2024 11:04 UTC
16 points
13 comments3 min readLW link

An in­for­ma­tion-the­o­retic study of ly­ing in LLMs

2 Aug 2024 10:06 UTC
16 points
0 comments4 min readLW link

How I Wrought a Lesser Scribing Ar­ti­fact (You Can, Too!)

Lorxus2 Aug 2024 3:35 UTC
12 points
0 comments5 min readLW link

The Rise and Stag­na­tion of Modernity

Zero Contradictions2 Aug 2024 3:31 UTC
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Les­sons from the FDA for AI

Remmelt2 Aug 2024 0:52 UTC
1 point
4 comments1 min readLW link
(ainowinstitute.org)

AI Rights for Hu­man Safety

Simon Goldstein1 Aug 2024 23:01 UTC
45 points
6 comments1 min readLW link
(papers.ssrn.com)

Case Study: In­ter­pret­ing, Ma­nipu­lat­ing, and Con­trol­ling CLIP With Sparse Autoencoders

Gytis Daujotas1 Aug 2024 21:08 UTC
44 points
6 comments7 min readLW link

Op­ti­miz­ing Re­peated Correlations

SatvikBeri1 Aug 2024 17:33 UTC
26 points
1 comment1 min readLW link

The need for multi-agent experiments

Martín Soto1 Aug 2024 17:14 UTC
43 points
3 comments9 min readLW link

Dragon Agnosticism

jefftk1 Aug 2024 17:00 UTC
99 points
73 comments2 min readLW link
(www.jefftk.com)

Mor­ris­town ACX Meetup

mbrooks1 Aug 2024 16:29 UTC
2 points
1 comment1 min readLW link

Some com­ments on intelligence

Viliam1 Aug 2024 15:17 UTC
30 points
5 comments3 min readLW link

[Question] [Thought Ex­per­i­ment] Given a but­ton to ter­mi­nate all hu­man­ity, would you press it?

lorepieri1 Aug 2024 15:10 UTC
−2 points
9 comments1 min readLW link

Are un­paid UN in­tern­ships a good idea?

Cipolla1 Aug 2024 15:06 UTC
1 point
7 comments4 min readLW link

AI #75: Math is Easier

Zvi1 Aug 2024 13:40 UTC
46 points
25 comments72 min readLW link
(thezvi.wordpress.com)

Tem­po­rary Cog­ni­tive Hyper­pa­ram­e­ter Alteration

Jonathan Moregård1 Aug 2024 10:27 UTC
9 points
0 comments3 min readLW link
(honestliving.substack.com)

Tech­nol­ogy and Progress

Zero Contradictions1 Aug 2024 4:49 UTC
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Do Pre­dic­tion Mar­kets Work?

Benjamin_Sturisky1 Aug 2024 2:31 UTC
1 point
0 comments4 min readLW link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacos1 Aug 2024 1:15 UTC
12 points
0 comments8 min readLW link

[Question] Can UBI over­come in­fla­tion and rent seek­ing?

Gordon Seidoh Worley1 Aug 2024 0:13 UTC
5 points
34 comments1 min readLW link

Recom­men­da­tion: re­ports on the search for miss­ing hiker Bill Ewasko

eukaryote31 Jul 2024 22:15 UTC
168 points
28 comments14 min readLW link
(eukaryotewritesblog.com)

Eco­nomics101 pre­dicted the failure of spe­cial card pay­ments for re­fugees, 3 months later whole of Ger­many wants to adopt it

Yanling Guo31 Jul 2024 21:09 UTC
2 points
1 comment2 min readLW link

Am­bi­guity in Pre­dic­tion Mar­ket Re­s­olu­tion is Still Harmful

aphyer31 Jul 2024 20:32 UTC
43 points
17 comments3 min readLW link

AI labs can boost ex­ter­nal safety research

Zach Stein-Perlman31 Jul 2024 19:30 UTC
31 points
1 comment1 min readLW link

Women in AI Safety Lon­don Meetup

njg31 Jul 2024 18:13 UTC
1 point
0 comments1 min readLW link

Con­struct­ing Neu­ral Net­work Pa­ram­e­ters with Down­stream Trainability

ch271828n31 Jul 2024 18:13 UTC
1 point
0 comments1 min readLW link
(github.com)

Want to work on US emerg­ing tech policy? Con­sider the Hori­zon Fel­low­ship.

Elika31 Jul 2024 18:12 UTC
4 points
0 comments1 min readLW link

[Question] What are your cruxes for im­pre­cise prob­a­bil­ities /​ de­ci­sion rules?

Anthony DiGiovanni31 Jul 2024 15:42 UTC
36 points
29 comments1 min readLW link

The new UK gov­ern­ment’s stance on AI safety

Elliot Mckernon31 Jul 2024 15:23 UTC
17 points
0 comments4 min readLW link

Solu­tions to prob­lems with Bayesianism

B Jacobs31 Jul 2024 14:18 UTC
6 points
0 comments21 min readLW link
(bobjacobs.substack.com)

Cat Sus­te­nance Fortification

jefftk31 Jul 2024 2:30 UTC
14 points
7 comments1 min readLW link
(www.jefftk.com)

Twit­ter thread on open-source AI

Richard_Ngo31 Jul 2024 0:26 UTC
33 points
6 comments2 min readLW link
(x.com)

Twit­ter thread on AI takeover scenarios

Richard_Ngo31 Jul 2024 0:24 UTC
37 points
0 comments2 min readLW link
(x.com)