1. The CAST Strategy

Max Harms7 Jun 2024 22:29 UTC
46 points
19 comments38 min readLW link

0. CAST: Cor­rigi­bil­ity as Sin­gu­lar Target

Max Harms7 Jun 2024 22:29 UTC
144 points
12 comments8 min readLW link

What is space? What is time?

Tahp7 Jun 2024 22:15 UTC
8 points
3 comments7 min readLW link

[Question] Ques­tion about Lewis’ coun­ter­fac­tual the­ory of causation

jbkjr7 Jun 2024 20:15 UTC
12 points
7 comments1 min readLW link

Re­la­tion­ships among words, met­al­in­gual defi­ni­tion, and interpretability

Bill Benzon7 Jun 2024 19:18 UTC
2 points
0 comments5 min readLW link

Let’s Talk About Emergence

jacobhaimes7 Jun 2024 19:18 UTC
4 points
0 comments7 min readLW link
(www.odysseaninstitute.org)

D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Issues

aphyer7 Jun 2024 19:02 UTC
42 points
16 comments3 min readLW link

Nat­u­ral La­tents Are Not Ro­bust To Tiny Mixtures

7 Jun 2024 18:53 UTC
61 points
8 comments5 min readLW link

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 2

Joe Rogero7 Jun 2024 17:20 UTC
10 points
2 comments4 min readLW link

Frida van Lisa, a short story about ad­ver­sar­ial AI at­tacks on humans

arisAlexis7 Jun 2024 13:22 UTC
2 points
0 comments18 min readLW link

Quotes from Leopold Aschen­bren­ner’s Si­tu­a­tional Aware­ness Paper

Zvi7 Jun 2024 11:40 UTC
91 points
10 comments37 min readLW link
(thezvi.wordpress.com)

LessWrong/​ACX meetup Tran­sil­vanya tour—Cluj Napoca

Marius Adrian Nicoară7 Jun 2024 5:45 UTC
1 point
1 comment1 min readLW link

Is Claude a mys­tic?

jessicata7 Jun 2024 4:27 UTC
60 points
23 comments13 min readLW link
(unstablerontology.substack.com)

Offer­ing Completion

jefftk7 Jun 2024 1:40 UTC
29 points
6 comments1 min readLW link
(www.jefftk.com)

A Case for Su­per­hu­man Gover­nance, us­ing AI

ozziegooen7 Jun 2024 0:10 UTC
30 points
0 comments1 min readLW link

Me­moriz­ing weak ex­am­ples can elicit strong be­hav­ior out of pass­word-locked models

6 Jun 2024 23:54 UTC
58 points
5 comments7 min readLW link

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob Bensinger6 Jun 2024 22:57 UTC
194 points
27 comments3 min readLW link

Scal­ing and eval­u­at­ing sparse autoencoders

leogao6 Jun 2024 22:50 UTC
106 points
6 comments1 min readLW link

Hum­ming is not a free $100 bill

Elizabeth6 Jun 2024 20:10 UTC
183 points
6 comments3 min readLW link
(acesounderglass.com)

There Are No Pri­mor­dial Defi­ni­tions of Man/​Woman

ymeskhout6 Jun 2024 19:30 UTC
11 points
0 comments4 min readLW link
(ymeskhout.substack.com)

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 1

Joe Rogero6 Jun 2024 18:59 UTC
20 points
0 comments5 min readLW link

[Link Post] “Foun­da­tional Challenges in As­sur­ing Align­ment and Safety of Large Lan­guage Models”

David Scott Krueger (formerly: capybaralet)6 Jun 2024 18:55 UTC
70 points
2 comments6 min readLW link
(llm-safety-challenges.github.io)

AI #67: Brief Strange Trip

Zvi6 Jun 2024 18:50 UTC
49 points
6 comments40 min readLW link
(thezvi.wordpress.com)

The Hu­man Biolog­i­cal Ad­van­tage Over AI

Wstewart6 Jun 2024 18:18 UTC
−13 points
2 comments1 min readLW link

An eval­u­a­tion of He­len Toner’s in­ter­view on the TED AI Show

PeterH6 Jun 2024 17:39 UTC
24 points
2 comments30 min readLW link

The Im­pos­si­bil­ity of a Ra­tional In­tel­li­gence Optimizer

Nicolas Villarreal6 Jun 2024 16:14 UTC
−9 points
5 comments14 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024 15:17 UTC
4 points
0 comments12 min readLW link

SB 1047 Is Weakened

Zvi6 Jun 2024 13:40 UTC
67 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Weep­ing Agents

pleiotroth6 Jun 2024 12:18 UTC
24 points
2 comments3 min readLW link

Pod­cast: Cen­ter for AI Policy, on AI risk and listen­ing to AI researchers

KatjaGrace6 Jun 2024 3:30 UTC
9 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Calcu­lat­ing Nat­u­ral La­tents via Resampling

6 Jun 2024 0:37 UTC
55 points
4 comments10 min readLW link

SAEs Dis­cover Mean­ingful Fea­tures in the IOI Task

5 Jun 2024 23:48 UTC
15 points
2 comments10 min readLW link

Let’s De­sign A School, Part 2.4 School as Ed­u­ca­tion—The Cur­ricu­lum (Phase 3, Spe­cific)

Sable5 Jun 2024 21:40 UTC
19 points
2 comments12 min readLW link
(affablyevil.substack.com)

METR is hiring ML Re­search Eng­ineers and Scientists

Xodarap5 Jun 2024 21:27 UTC
5 points
0 comments1 min readLW link
(metr.org)

Book re­view: The Quincunx

cousin_it5 Jun 2024 21:13 UTC
41 points
12 comments2 min readLW link

[Question] How should I think about my ca­reer?

Chico5 Jun 2024 18:11 UTC
3 points
2 comments1 min readLW link

AISN #36: Vol­un­tary Com­mit­ments are In­suffi­cient Plus, a Se­nate AI Policy Roadmap, and Chap­ter 1: An Overview of Catas­trophic Risks

5 Jun 2024 17:45 UTC
9 points
0 comments5 min readLW link
(newsletter.safe.ai)

GPT2, Five Years On

Joel Burget5 Jun 2024 17:44 UTC
34 points
0 comments3 min readLW link
(importai.substack.com)

[Question] Who wants to be in­vited to the LW Me­ta­mod­ern di­alogue?

hunterglenn5 Jun 2024 16:39 UTC
−3 points
1 comment1 min readLW link

Non­re­ac­tivity: a sim­ple model of meditation

cesiumquail5 Jun 2024 16:26 UTC
21 points
4 comments6 min readLW link

graph­patch: a Python Library for Ac­ti­va­tion Patching

Occam's Laser5 Jun 2024 15:08 UTC
13 points
2 comments1 min readLW link

Startup Stock Op­tions: the Short­est Com­plete Guide for Employees

Boris T5 Jun 2024 15:03 UTC
17 points
2 comments1 min readLW link
(borisagain.substack.com)

Ag­grega­tive Prin­ci­ples of So­cial Justice

Cleo Nardo5 Jun 2024 13:44 UTC
29 points
10 comments37 min readLW link

What and how much makes a differ­ence?

Marius Adrian Nicoară5 Jun 2024 10:30 UTC
7 points
0 comments2 min readLW link

An­nounc­ing ILIAD — The­o­ret­i­cal AI Align­ment Conference

5 Jun 2024 9:37 UTC
162 points
18 comments2 min readLW link

Se­cond-Order Ra­tion­al­ity, Sys­tem Ra­tion­al­ity, and a fea­ture sug­ges­tion for LessWrong

Mati_Roy5 Jun 2024 7:20 UTC
13 points
2 comments8 min readLW link

Former OpenAI Su­per­al­ign­ment Re­searcher: Su­per­in­tel­li­gence by 2030

Julian Bradshaw5 Jun 2024 3:35 UTC
69 points
30 comments1 min readLW link
(situational-awareness.ai)

On “first crit­i­cal tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC
54 points
8 comments14 min readLW link

Take­off speeds pre­sen­ta­tion at Anthropic

Tom Davidson4 Jun 2024 22:46 UTC
92 points
0 comments25 min readLW link

A Reflec­tion on Richard Ham­ming’s “You and Your Re­search”: Striv­ing for Greatness

aysajan4 Jun 2024 20:07 UTC
8 points
5 comments21 min readLW link
(www.aysajaneziz.com)