Soviet com­edy film recommendations

Nina Panickssery9 Jun 2024 23:40 UTC
42 points
11 comments2 min readLW link
(open.substack.com)

The Data Wall is Important

JustisMills9 Jun 2024 22:54 UTC
40 points
20 comments2 min readLW link
(justismills.substack.com)

Two Fam­ily Dance Flyers

jefftk9 Jun 2024 20:50 UTC
13 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] What hap­pens to ex­ist­ing life sen­tences un­der LEV?

O O9 Jun 2024 17:49 UTC
5 points
7 comments1 min readLW link

3b. For­mal (Faux) Corrigibility

Max Harms9 Jun 2024 17:18 UTC
21 points
13 comments17 min readLW link

3a. Towards For­mal Corrigibility

Max Harms9 Jun 2024 16:53 UTC
22 points
2 comments19 min readLW link

In­tro­duc­ing SARA: a new ac­ti­va­tion steer­ing technique

Alejandro Tlaie9 Jun 2024 15:33 UTC
17 points
7 comments6 min readLW link

“What the hell is a rep­re­sen­ta­tion, any­way?” | Clar­ify­ing AI in­ter­pretabil­ity with tools from philos­o­phy of cog­ni­tive sci­ence | Part 1: Ve­hi­cles vs. contents

IwanWilliams9 Jun 2024 14:19 UTC
9 points
1 comment4 min readLW link

Ex­plor­ing Llama-3-8B MLP Neurons

ntt1239 Jun 2024 14:19 UTC
10 points
0 comments4 min readLW link
(neuralblog.github.io)

De­mys­tify­ing “Align­ment” through a Comic

milanrosko9 Jun 2024 8:24 UTC
106 points
19 comments1 min readLW link

Dumb­ing down

Martin Sustrik9 Jun 2024 6:50 UTC
70 points
0 comments4 min readLW link

What if a tech com­pany forced you to move to NYC?

KatjaGrace9 Jun 2024 6:30 UTC
56 points
22 comments1 min readLW link
(worldspiritsockpuppet.com)

[Question] What should I do? (long term plan about start­ing an AI lab)

not_a_cat9 Jun 2024 0:45 UTC
2 points
1 comment2 min readLW link

Search­ing for the Root of the Tree of Evil

Ivan Vendrov8 Jun 2024 17:05 UTC
36 points
14 comments5 min readLW link
(nothinghuman.substack.com)

2. Cor­rigi­bil­ity Intuition

Max Harms8 Jun 2024 15:52 UTC
65 points
10 comments33 min readLW link

Two easy things that maybe Just Work to im­prove AI discourse

jacobjacob8 Jun 2024 15:51 UTC
190 points
35 comments2 min readLW link

I made an AI safety fel­low­ship. What I wish I knew.

Ruben Castaing8 Jun 2024 15:23 UTC
12 points
0 comments2 min readLW link

Align­ment Gaps

kcyras8 Jun 2024 15:23 UTC
11 points
4 comments8 min readLW link

The Slack Dou­ble Crux, or how to ne­go­ti­ate with yourself

Thac08 Jun 2024 15:22 UTC
6 points
2 comments4 min readLW link

The Per­ils of Pop­u­lar­ity: A Crit­i­cal Ex­am­i­na­tion of LessWrong’s Ra­tional Discourse

BubbaJoeLouis8 Jun 2024 15:22 UTC
−24 points
3 comments2 min readLW link

Sta­tus quo bias is usu­ally justified

Amadeus Pagel8 Jun 2024 14:54 UTC
10 points
3 comments1 min readLW link
(amadeuspagel.substack.com)

Closed-Source Evaluations

Jono8 Jun 2024 14:18 UTC
15 points
4 comments1 min readLW link

Ac­cess to pow­er­ful AI might make com­puter se­cu­rity rad­i­cally easier

Buck8 Jun 2024 6:00 UTC
97 points
14 comments6 min readLW link

[Question] Why don’t we just get rid of all the bioethi­cists?

Sable8 Jun 2024 3:48 UTC
13 points
0 comments1 min readLW link

Sev, Sev­teen, Sevty, Sevth

jefftk8 Jun 2024 2:30 UTC
17 points
9 comments1 min readLW link
(www.jefftk.com)

1. The CAST Strategy

Max Harms7 Jun 2024 22:29 UTC
46 points
19 comments38 min readLW link

0. CAST: Cor­rigi­bil­ity as Sin­gu­lar Target

Max Harms7 Jun 2024 22:29 UTC
144 points
12 comments8 min readLW link

What is space? What is time?

Tahp7 Jun 2024 22:15 UTC
8 points
3 comments7 min readLW link

[Question] Ques­tion about Lewis’ coun­ter­fac­tual the­ory of causation

jbkjr7 Jun 2024 20:15 UTC
12 points
7 comments1 min readLW link

Re­la­tion­ships among words, met­al­in­gual defi­ni­tion, and interpretability

Bill Benzon7 Jun 2024 19:18 UTC
2 points
0 comments5 min readLW link

Let’s Talk About Emergence

jacobhaimes7 Jun 2024 19:18 UTC
4 points
0 comments7 min readLW link
(www.odysseaninstitute.org)

D&D.Sci Alchemy: Arch­mage Anachronos and the Sup­ply Chain Issues

aphyer7 Jun 2024 19:02 UTC
42 points
16 comments3 min readLW link

Nat­u­ral La­tents Are Not Ro­bust To Tiny Mixtures

7 Jun 2024 18:53 UTC
61 points
8 comments5 min readLW link

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 2

Joe Rogero7 Jun 2024 17:20 UTC
11 points
2 comments4 min readLW link

Frida van Lisa, a short story about ad­ver­sar­ial AI at­tacks on humans

arisAlexis7 Jun 2024 13:22 UTC
2 points
0 comments18 min readLW link

Quotes from Leopold Aschen­bren­ner’s Si­tu­a­tional Aware­ness Paper

Zvi7 Jun 2024 11:40 UTC
91 points
10 comments37 min readLW link
(thezvi.wordpress.com)

LessWrong/​ACX meetup Tran­sil­vanya tour—Cluj Napoca

Marius Adrian Nicoară7 Jun 2024 5:45 UTC
1 point
1 comment1 min readLW link

Is Claude a mys­tic?

jessicata7 Jun 2024 4:27 UTC
60 points
23 comments13 min readLW link
(unstablerontology.substack.com)

Offer­ing Completion

jefftk7 Jun 2024 1:40 UTC
29 points
6 comments1 min readLW link
(www.jefftk.com)

A Case for Su­per­hu­man Gover­nance, us­ing AI

ozziegooen7 Jun 2024 0:10 UTC
30 points
0 comments1 min readLW link

Me­moriz­ing weak ex­am­ples can elicit strong be­hav­ior out of pass­word-locked models

6 Jun 2024 23:54 UTC
58 points
5 comments7 min readLW link

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob Bensinger6 Jun 2024 22:57 UTC
194 points
27 comments3 min readLW link

Scal­ing and eval­u­at­ing sparse autoencoders

leogao6 Jun 2024 22:50 UTC
106 points
6 comments1 min readLW link

Hum­ming is not a free $100 bill

Elizabeth6 Jun 2024 20:10 UTC
183 points
6 comments3 min readLW link
(acesounderglass.com)

There Are No Pri­mor­dial Defi­ni­tions of Man/​Woman

ymeskhout6 Jun 2024 19:30 UTC
11 points
0 comments4 min readLW link
(ymeskhout.substack.com)

Si­tu­a­tional Aware­ness Sum­ma­rized—Part 1

Joe Rogero6 Jun 2024 18:59 UTC
20 points
0 comments5 min readLW link

[Link Post] “Foun­da­tional Challenges in As­sur­ing Align­ment and Safety of Large Lan­guage Models”

David Scott Krueger (formerly: capybaralet)6 Jun 2024 18:55 UTC
70 points
2 comments6 min readLW link
(llm-safety-challenges.github.io)

AI #67: Brief Strange Trip

Zvi6 Jun 2024 18:50 UTC
49 points
6 comments40 min readLW link
(thezvi.wordpress.com)

The Hu­man Biolog­i­cal Ad­van­tage Over AI

Wstewart6 Jun 2024 18:18 UTC
−13 points
2 comments1 min readLW link

An eval­u­a­tion of He­len Toner’s in­ter­view on the TED AI Show

PeterH6 Jun 2024 17:39 UTC
24 points
2 comments30 min readLW link