Near-mode think­ing on AI

Olli Järviniemi4 Aug 2024 20:47 UTC
127 points
8 comments5 min readLW link

Water­marks: Sign­ing, Brand­ing, and Boobytrapping

Shankar Sivarajan4 Aug 2024 20:41 UTC
1 point
0 comments1 min readLW link

Model­ling So­cial Ex­change: A Sys­tem­a­tised Method to Judge Friend­ship Quality

Wynn Walker4 Aug 2024 18:49 UTC
6 points
0 comments5 min readLW link

We’re not as 3-Di­men­sional as We Think

silentbob4 Aug 2024 14:39 UTC
36 points
16 comments5 min readLW link

You don’t know how bad most things are nor pre­cisely how they’re bad.

Solenoid_Entity4 Aug 2024 14:12 UTC
313 points
48 comments5 min readLW link

Can We Pre­dict Per­sua­sive­ness Bet­ter Than An­thropic?

Lennart Finke4 Aug 2024 14:05 UTC
22 points
5 comments4 min readLW link

[Question] What should we do about COVID in 2024?

ChristianKl4 Aug 2024 10:57 UTC
20 points
2 comments1 min readLW link

To­k­enized SAEs: In­fus­ing per-to­ken bi­ases.

4 Aug 2024 9:17 UTC
19 points
20 comments15 min readLW link

Thoughts On Democracy

Zero Contradictions4 Aug 2024 6:02 UTC
2 points
0 comments1 min readLW link
(zerocontradictions.net)

AI Align­ment through Com­par­a­tive Advantage

artemiocobb4 Aug 2024 0:32 UTC
−2 points
4 comments3 min readLW link

La­bel­ling, Vari­ables, and In-Con­text Learn­ing in Llama2

Joshua Penman3 Aug 2024 19:36 UTC
6 points
0 comments1 min readLW link
(colab.research.google.com)

[Question] Dan Hendrycks and EA

jeffreycaruso3 Aug 2024 13:33 UTC
−4 points
4 comments1 min readLW link

[Question] Why do Min­i­mal Bayes Nets of­ten cor­re­spond to Causal Models of Real­ity?

Dalcy3 Aug 2024 12:39 UTC
27 points
1 comment1 min readLW link

Why did ChatGPT say that? Prompt en­g­ineer­ing and more, with PIZZA.

Jessica Rumbelow3 Aug 2024 12:07 UTC
40 points
2 comments4 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

3 Aug 2024 10:16 UTC
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

SRE’s re­view of Democracy

Martin Sustrik3 Aug 2024 7:20 UTC
48 points
2 comments3 min readLW link
(250bpm.substack.com)

The Case Against Libertarianism

Zero Contradictions3 Aug 2024 5:05 UTC
−3 points
1 comment1 min readLW link
(zerocontradictions.net)

We Don’t Just Let Peo­ple Die—So What Next?

James Stephen Brown3 Aug 2024 1:04 UTC
11 points
8 comments10 min readLW link

The EA case for Trump

Judd Rosenblatt3 Aug 2024 1:00 UTC
9 points
1 comment1 min readLW link
(www.secondbest.ca)

I didn’t think I’d take the time to build this cal­ibra­tion train­ing game, but with web­sim it took roughly 30 sec­onds, so here it is!

mako yass2 Aug 2024 22:35 UTC
24 points
2 comments5 min readLW link

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
38 points
1 comment9 min readLW link

The Bit­ter Les­son for AI Safety Research

2 Aug 2024 18:39 UTC
57 points
5 comments3 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC
5 points
2 comments7 min readLW link

[Question] Re­quest for AI risk quotes, es­pe­cially around speed, large im­pacts and black boxes

Nathan Young2 Aug 2024 17:49 UTC
6 points
0 comments1 min readLW link

A Sim­ple Toy Co­her­ence Theorem

2 Aug 2024 17:47 UTC
74 points
19 comments7 min readLW link

All the Fol­low­ing are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC
16 points
3 comments8 min readLW link

The ‘strong’ fea­ture hy­poth­e­sis could be wrong

lewis smith2 Aug 2024 14:33 UTC
218 points
17 comments17 min readLW link

The Resi­d­ual Ex­pan­sion: A Frame­work for think­ing about Trans­former Circuits

Daniel Tan2 Aug 2024 11:04 UTC
16 points
13 comments3 min readLW link

An in­for­ma­tion-the­o­retic study of ly­ing in LLMs

2 Aug 2024 10:06 UTC
16 points
0 comments4 min readLW link

How I Wrought a Lesser Scribing Ar­ti­fact (You Can, Too!)

Lorxus2 Aug 2024 3:35 UTC
12 points
0 comments5 min readLW link

The Rise and Stag­na­tion of Modernity

Zero Contradictions2 Aug 2024 3:31 UTC
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Les­sons from the FDA for AI

Remmelt2 Aug 2024 0:52 UTC
1 point
4 comments1 min readLW link
(ainowinstitute.org)

AI Rights for Hu­man Safety

Simon Goldstein1 Aug 2024 23:01 UTC
45 points
6 comments1 min readLW link
(papers.ssrn.com)

Case Study: In­ter­pret­ing, Ma­nipu­lat­ing, and Con­trol­ling CLIP With Sparse Autoencoders

Gytis Daujotas1 Aug 2024 21:08 UTC
44 points
6 comments7 min readLW link

Op­ti­miz­ing Re­peated Correlations

SatvikBeri1 Aug 2024 17:33 UTC
26 points
1 comment1 min readLW link

The need for multi-agent experiments

Martín Soto1 Aug 2024 17:14 UTC
43 points
3 comments9 min readLW link

Dragon Agnosticism

jefftk1 Aug 2024 17:00 UTC
99 points
73 comments2 min readLW link
(www.jefftk.com)

Mor­ris­town ACX Meetup

mbrooks1 Aug 2024 16:29 UTC
2 points
1 comment1 min readLW link

Some com­ments on intelligence

Viliam1 Aug 2024 15:17 UTC
30 points
5 comments3 min readLW link

[Question] [Thought Ex­per­i­ment] Given a but­ton to ter­mi­nate all hu­man­ity, would you press it?

lorepieri1 Aug 2024 15:10 UTC
−2 points
9 comments1 min readLW link

Are un­paid UN in­tern­ships a good idea?

Cipolla1 Aug 2024 15:06 UTC
1 point
7 comments4 min readLW link

AI #75: Math is Easier

Zvi1 Aug 2024 13:40 UTC
46 points
25 comments72 min readLW link
(thezvi.wordpress.com)

Tem­po­rary Cog­ni­tive Hyper­pa­ram­e­ter Alteration

Jonathan Moregård1 Aug 2024 10:27 UTC
9 points
0 comments3 min readLW link
(honestliving.substack.com)

Tech­nol­ogy and Progress

Zero Contradictions1 Aug 2024 4:49 UTC
1 point
0 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Do Pre­dic­tion Mar­kets Work?

Benjamin_Sturisky1 Aug 2024 2:31 UTC
1 point
0 comments4 min readLW link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacos1 Aug 2024 1:15 UTC
12 points
0 comments8 min readLW link

[Question] Can UBI over­come in­fla­tion and rent seek­ing?

Gordon Seidoh Worley1 Aug 2024 0:13 UTC
5 points
34 comments1 min readLW link

Recom­men­da­tion: re­ports on the search for miss­ing hiker Bill Ewasko

eukaryote31 Jul 2024 22:15 UTC
168 points
28 comments14 min readLW link
(eukaryotewritesblog.com)

Eco­nomics101 pre­dicted the failure of spe­cial card pay­ments for re­fugees, 3 months later whole of Ger­many wants to adopt it

Yanling Guo31 Jul 2024 21:09 UTC
2 points
1 comment2 min readLW link

Am­bi­guity in Pre­dic­tion Mar­ket Re­s­olu­tion is Still Harmful

aphyer31 Jul 2024 20:32 UTC
43 points
17 comments3 min readLW link