D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Is­sues Eval­u­a­tion & Ruleset

aphyer17 Jun 2024 21:29 UTC
51 points
11 comments6 min readLW link

Ques­tion­able Nar­ra­tives of “Si­tu­a­tional Aware­ness”

fergusq17 Jun 2024 21:01 UTC
0 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

ZuVillage Ge­or­gia – Mis­sion Statement

Burns17 Jun 2024 19:53 UTC
3 points
3 comments9 min readLW link

Get­ting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC
262 points
50 comments13 min readLW link

Sy­co­phancy to sub­ter­fuge: In­ves­ti­gat­ing re­ward tam­per­ing in large lan­guage models

17 Jun 2024 18:41 UTC
161 points
22 comments8 min readLW link
(arxiv.org)

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alex17 Jun 2024 18:09 UTC
4 points
0 comments17 min readLW link

Towards a Less Bul­lshit Model of Semantics

17 Jun 2024 15:51 UTC
94 points
44 comments21 min readLW link

Analysing Ad­ver­sar­ial At­tacks with Lin­ear Probing

17 Jun 2024 14:16 UTC
9 points
0 comments8 min readLW link

What’s the fu­ture of AI hard­ware?

Itay Dreyfus17 Jun 2024 13:05 UTC
2 points
0 comments8 min readLW link
(productidentity.co)

OpenAI #8: The Right to Warn

Zvi17 Jun 2024 12:00 UTC
97 points
8 comments34 min readLW link
(thezvi.wordpress.com)

Logit Prisms: De­com­pos­ing Trans­former Out­puts for Mechanis­tic Interpretability

ntt12317 Jun 2024 11:46 UTC
5 points
4 comments6 min readLW link
(neuralblog.github.io)

Weak AGIs Kill Us First

yrimon17 Jun 2024 11:13 UTC
15 points
4 comments9 min readLW link

[Linkpost] Guardian ar­ti­cle cov­er­ing Light­cone In­fras­truc­ture, Man­i­fest and CFAR ties to FTX

ROM17 Jun 2024 10:05 UTC
8 points
9 comments1 min readLW link
(www.theguardian.com)

Fat Tails Dis­cour­age Compromise

niplav17 Jun 2024 9:39 UTC
53 points
5 comments1 min readLW link

Our In­tu­itions About The Crim­i­nal Jus­tice Sys­tem Are Screwed Up

omnizoid17 Jun 2024 6:22 UTC
14 points
14 comments4 min readLW link

A Case for Co­op­er­a­tion: Depen­dence in the Pri­soner’s Dilemma

grantstenger17 Jun 2024 1:10 UTC
9 points
2 comments23 min readLW link

De­gen­era­cies are sticky for SGD

16 Jun 2024 21:19 UTC
56 points
1 comment16 min readLW link

YM’s Shortform

YM16 Jun 2024 20:57 UTC
3 points
1 comment1 min readLW link

“Is-Ought” is Fraught

MiSteR Kittty16 Jun 2024 17:27 UTC
−5 points
2 comments1 min readLW link

The type of AI hu­man­ity has cho­sen to cre­ate so far is un­safe, for soft so­cial rea­sons and not tech­ni­cal ones.

l8c16 Jun 2024 13:31 UTC
−6 points
2 comments1 min readLW link

Self-Con­trol of LLM Be­hav­iors by Com­press­ing Suffix Gra­di­ent into Pre­fix Controller

Henry Cai16 Jun 2024 13:01 UTC
7 points
0 comments7 min readLW link
(arxiv.org)

CIV: a story

Richard_Ngo15 Jun 2024 22:36 UTC
98 points
6 comments9 min readLW link
(www.narrativeark.xyz)

Yann LeCun: We only de­sign ma­chines that min­i­mize costs [there­fore they are safe]

tailcalled15 Jun 2024 17:25 UTC
19 points
8 comments1 min readLW link
(twitter.com)

(Ap­pet­i­tive, Con­sum­ma­tory) ≈ (RL, re­flex)

Steven Byrnes15 Jun 2024 15:57 UTC
38 points
1 comment3 min readLW link

Two LessWrong speed friend­ing experiments

15 Jun 2024 10:52 UTC
52 points
3 comments4 min readLW link

Claude’s dark spiritual AI futurism

jessicata15 Jun 2024 0:57 UTC
22 points
7 comments43 min readLW link
(unstableontology.com)

[Question] When is “un­falsifi­able im­plies false” in­cor­rect?

VojtaKovarik15 Jun 2024 0:28 UTC
3 points
11 comments1 min readLW link

MIRI’s June 2024 Newsletter

Harlan14 Jun 2024 23:02 UTC
74 points
20 comments2 min readLW link
(intelligence.org)

Lan­guage for Goal Mis­gen­er­al­iza­tion: Some For­mal­isms from my MSc Thesis

Giulio14 Jun 2024 19:35 UTC
6 points
0 comments8 min readLW link
(www.giuliostarace.com)

Shard The­ory—is it true for hu­mans?

Rishika14 Jun 2024 19:21 UTC
71 points
7 comments15 min readLW link

When fine-tun­ing fails to elicit GPT-3.5′s chess abilities

Theodore Chapman14 Jun 2024 18:50 UTC
42 points
3 comments9 min readLW link

Re­sults from the AI x Democ­racy Re­search Sprint

14 Jun 2024 16:40 UTC
13 points
0 comments6 min readLW link

Ra­tional An­i­ma­tions’ in­tro to mechanis­tic interpretability

Writer14 Jun 2024 16:10 UTC
45 points
1 comment11 min readLW link
(youtu.be)

Why keep a di­ary, and why wish for large lan­guage models

DanielFilan14 Jun 2024 16:10 UTC
9 points
1 comment2 min readLW link
(danielfilan.com)

The Leopold Model: Anal­y­sis and Reactions

Zvi14 Jun 2024 15:10 UTC
108 points
19 comments57 min readLW link
(thezvi.wordpress.com)

[Question] Thoughts on Fran­cois Chol­let’s be­lief that LLMs are far away from AGI?

O O14 Jun 2024 6:32 UTC
26 points
17 comments1 min readLW link

Re­search Re­port: Alter­na­tive spar­sity meth­ods for sparse au­toen­coders with Othel­loGPT.

Andrew Quaisley14 Jun 2024 0:57 UTC
17 points
5 comments12 min readLW link

Slowed ASI—a pos­si­ble tech­ni­cal strat­egy for alignment

Lester Leong14 Jun 2024 0:57 UTC
5 points
2 comments3 min readLW link

Con­cep­tual Ty­pog­ra­phy “spells it out”

milanrosko14 Jun 2024 0:39 UTC
15 points
0 comments1 min readLW link

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_Critch14 Jun 2024 0:16 UTC
347 points
38 comments4 min readLW link

OpenAI ap­points Re­tired U.S. Army Gen­eral Paul M. Naka­sone to Board of Directors

Joel Burget13 Jun 2024 21:28 UTC
35 points
10 comments1 min readLW link
(openai.com)

AI #68: Re­mark­ably Rea­son­able Reactions

Zvi13 Jun 2024 16:30 UTC
46 points
11 comments50 min readLW link
(thezvi.wordpress.com)

Four Fu­tures For Cog­ni­tive Labor

Maxwell Tabarrok13 Jun 2024 12:56 UTC
14 points
10 comments4 min readLW link
(www.maximum-progress.com)

Un­der­rated Proverbs

Arjun Panickssery13 Jun 2024 12:30 UTC
10 points
9 comments1 min readLW link
(arjunpanickssery.substack.com)

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

13 Jun 2024 10:04 UTC
84 points
10 comments2 min readLW link
(arxiv.org)

Prob­a­bly Not a Ghost Story

George Ingebretsen12 Jun 2024 22:55 UTC
27 points
4 comments3 min readLW link

AiPhone

Zvi12 Jun 2024 22:20 UTC
63 points
4 comments14 min readLW link
(thezvi.wordpress.com)

microwave drilling is impractical

bhauth12 Jun 2024 22:16 UTC
58 points
15 comments4 min readLW link
(www.bhauth.com)

Phonose­man­tic Duplication

bitcoinssg12 Jun 2024 20:19 UTC
5 points
0 comments1 min readLW link

My AI Model Delta Com­pared To Christiano

johnswentworth12 Jun 2024 18:19 UTC
191 points
73 comments4 min readLW link