[New Fea­ture] Your Sub­scribed Feed

11 Jun 2024 22:45 UTC
69 points
9 comments4 min readLW link

Open Thread Sum­mer 2024

habryka11 Jun 2024 20:57 UTC
22 points
99 comments1 min readLW link

Can effi­ciency-ad­justable re­port­ing thresh­olds close a loop­hole in Bi­den’s ex­ec­u­tive or­der on AI?

Jemal Young11 Jun 2024 20:56 UTC
4 points
1 comment2 min readLW link

“Full Au­toma­tion” is a Slip­pery Metric

ozziegooen11 Jun 2024 19:56 UTC
30 points
1 comment1 min readLW link

AI take­off and nu­clear war

owencb11 Jun 2024 19:36 UTC
78 points
6 comments11 min readLW link
(strangecities.substack.com)

[Question] What do peo­ple think about the poly­mar­ket Eth Etf re­s­olu­tion?

edge_retainer11 Jun 2024 18:34 UTC
1 point
0 comments1 min readLW link

Let’s De­sign A School, Part 3.1: Bring­ing it all to­gether with the Sieve Model

Sable11 Jun 2024 17:03 UTC
13 points
2 comments7 min readLW link
(affablyevil.substack.com)

How to elimi­nate cut?

jessicata11 Jun 2024 15:54 UTC
21 points
0 comments14 min readLW link
(unstableontology.com)

my favourite Scott Sum­ner blog posts

DMMF11 Jun 2024 14:40 UTC
26 points
0 comments3 min readLW link
(danfrank.ca)

[Question] Is any­one de­vel­op­ing op­ti­mi­sa­tion-ro­bust in­ter­pretabil­ity meth­ods?

Jono11 Jun 2024 13:14 UTC
6 points
0 comments1 min readLW link

Keep the Grass Guessing

JackOfAllTrades11 Jun 2024 7:29 UTC
4 points
0 comments2 min readLW link

“Me­tas­trate­gic Brain­storm­ing”, a core build­ing-block skill

Raemon11 Jun 2024 4:27 UTC
59 points
5 comments6 min readLW link

AI De­bate Sta­bil­ity: Ad­dress­ing Self-Defeat­ing Responses

Annie Sorkin11 Jun 2024 3:03 UTC
9 points
0 comments3 min readLW link

Cor­rigi­bil­ity could make things worse

ThomasCederborg11 Jun 2024 0:55 UTC
9 points
6 comments6 min readLW link

Emo­tional is­sues of­ten have an im­me­di­ate payoff

Chipmonk10 Jun 2024 23:39 UTC
26 points
2 comments4 min readLW link
(chrislakin.blog)

DPO/​PPO-RLHF on LLMs in­cen­tivizes syco­phancy, ex­ag­ger­a­tion and de­cep­tive hal­lu­ci­na­tion, but not mis­al­igned powerseeking

tailcalled10 Jun 2024 21:20 UTC
29 points
13 comments2 min readLW link

Plop! Goes the Concept

Jonathan Moregård10 Jun 2024 19:23 UTC
6 points
0 comments8 min readLW link
(honestliving.substack.com)

What can we learn from or­cas?

Jonasb10 Jun 2024 18:01 UTC
1 point
0 comments8 min readLW link
(www.denominations.io)

How to build a data cen­ter, by Con­struc­tion Physics

TheManxLoiner10 Jun 2024 17:38 UTC
2 points
0 comments1 min readLW link
(www.construction-physics.com)

Ob­ser­va­tions for do­ing de­bate with mod­els be­hind APIs

PoD12310 Jun 2024 16:22 UTC
3 points
0 comments3 min readLW link

My AI Model Delta Com­pared To Yudkowsky

johnswentworth10 Jun 2024 16:12 UTC
277 points
102 comments4 min readLW link

[Question] Good ways to mon­e­tar­ily profit from the in­creas­ing de­mand for power?

Matt Goldenberg10 Jun 2024 15:29 UTC
12 points
5 comments1 min readLW link

The Evolu­tion to­wards the Blank Slate

Arturo Macias10 Jun 2024 15:20 UTC
−7 points
0 comments3 min readLW link

10 Public “I was wrong” Ad­mis­sions by Scien­tists and Intellectuals

Hashem ElAssad10 Jun 2024 14:19 UTC
0 points
3 comments1 min readLW link

[Valence se­ries] 4. Valence & Lik­ing /​ Admiring

Steven Byrnes10 Jun 2024 14:19 UTC
46 points
12 comments14 min readLW link

5. Open Cor­rigi­bil­ity Questions

Max Harms10 Jun 2024 14:09 UTC
29 points
0 comments7 min readLW link

4. Ex­ist­ing Writ­ing on Corrigibility

Max Harms10 Jun 2024 14:08 UTC
47 points
15 comments106 min readLW link

On Dwarksh’s Pod­cast with Leopold Aschenbrenner

Zvi10 Jun 2024 12:40 UTC
101 points
7 comments59 min readLW link
(thezvi.wordpress.com)

Sum­mary of Si­tu­a­tional Aware­ness—The Decade Ahead

Oscar10 Jun 2024 8:44 UTC
6 points
2 comments1 min readLW link
(forum.effectivealtruism.org)

Why I don’t be­lieve in the placebo effect

transhumanist_atom_understander10 Jun 2024 2:37 UTC
131 points
22 comments9 min readLW link

Soviet com­edy film recommendations

Nina Panickssery9 Jun 2024 23:40 UTC
42 points
11 comments2 min readLW link
(open.substack.com)

The Data Wall is Important

JustisMills9 Jun 2024 22:54 UTC
40 points
20 comments2 min readLW link
(justismills.substack.com)

Two Fam­ily Dance Flyers

jefftk9 Jun 2024 20:50 UTC
13 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] What hap­pens to ex­ist­ing life sen­tences un­der LEV?

O O9 Jun 2024 17:49 UTC
5 points
7 comments1 min readLW link

3b. For­mal (Faux) Corrigibility

Max Harms9 Jun 2024 17:18 UTC
21 points
13 comments17 min readLW link

3a. Towards For­mal Corrigibility

Max Harms9 Jun 2024 16:53 UTC
22 points
2 comments19 min readLW link

In­tro­duc­ing SARA: a new ac­ti­va­tion steer­ing technique

Alejandro Tlaie9 Jun 2024 15:33 UTC
17 points
7 comments6 min readLW link

“What the hell is a rep­re­sen­ta­tion, any­way?” | Clar­ify­ing AI in­ter­pretabil­ity with tools from philos­o­phy of cog­ni­tive sci­ence | Part 1: Ve­hi­cles vs. contents

IwanWilliams9 Jun 2024 14:19 UTC
9 points
1 comment4 min readLW link

Ex­plor­ing Llama-3-8B MLP Neurons

ntt1239 Jun 2024 14:19 UTC
10 points
0 comments4 min readLW link
(neuralblog.github.io)

De­mys­tify­ing “Align­ment” through a Comic

milanrosko9 Jun 2024 8:24 UTC
106 points
19 comments1 min readLW link

Dumb­ing down

Martin Sustrik9 Jun 2024 6:50 UTC
70 points
0 comments4 min readLW link

What if a tech com­pany forced you to move to NYC?

KatjaGrace9 Jun 2024 6:30 UTC
56 points
22 comments1 min readLW link
(worldspiritsockpuppet.com)

[Question] What should I do? (long term plan about start­ing an AI lab)

not_a_cat9 Jun 2024 0:45 UTC
2 points
1 comment2 min readLW link

Search­ing for the Root of the Tree of Evil

Ivan Vendrov8 Jun 2024 17:05 UTC
36 points
14 comments5 min readLW link
(nothinghuman.substack.com)

2. Cor­rigi­bil­ity Intuition

Max Harms8 Jun 2024 15:52 UTC
65 points
10 comments33 min readLW link

Two easy things that maybe Just Work to im­prove AI discourse

jacobjacob8 Jun 2024 15:51 UTC
190 points
35 comments2 min readLW link

I made an AI safety fel­low­ship. What I wish I knew.

Ruben Castaing8 Jun 2024 15:23 UTC
12 points
0 comments2 min readLW link

Align­ment Gaps

kcyras8 Jun 2024 15:23 UTC
10 points
3 comments8 min readLW link

The Slack Dou­ble Crux, or how to ne­go­ti­ate with yourself

Thac08 Jun 2024 15:22 UTC
6 points
2 comments4 min readLW link

The Per­ils of Pop­u­lar­ity: A Crit­i­cal Ex­am­i­na­tion of LessWrong’s Ra­tional Discourse

BubbaJoeLouis8 Jun 2024 15:22 UTC
−24 points
3 comments2 min readLW link