Talk: AI safety field­build­ing at MATS

Ryan Kidd23 Jun 2024 23:06 UTC
26 points
2 comments10 min readLW link

AI Labs Wouldn’t be Con­victed of Trea­son or Sedition

Matthew Khoriaty23 Jun 2024 21:34 UTC
9 points
2 comments3 min readLW link

Con­trol Vec­tors as Dis­po­si­tional Traits

Gianluca Calcagni23 Jun 2024 21:34 UTC
9 points
0 comments11 min readLW link

“On the Im­pos­si­bil­ity of Su­per­in­tel­li­gent Ru­bik’s Cube Solvers”, Claude 2024 [hu­mor]

gwern23 Jun 2024 21:18 UTC
22 points
6 comments1 min readLW link
(gwern.net)

[Question] How are you prepar­ing for the pos­si­bil­ity of an AI bust?

Nate Showell23 Jun 2024 19:13 UTC
26 points
16 comments1 min readLW link

A sim­ple text sta­tus can change something

nextcaller23 Jun 2024 18:48 UTC
5 points
0 comments2 min readLW link

35 In­ter­ac­tive Learn­ing Mo­d­ules Rele­vant to EAs /​ Effec­tive Altru­ism (that are all free)

spencerg23 Jun 2024 17:57 UTC
5 points
0 comments1 min readLW link

Pod­casts: AGI Show, Con­sis­tently Can­did, Lon­don Futurists

KatjaGrace23 Jun 2024 13:50 UTC
16 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Text Posts from the Kids Group: 2019

jefftk23 Jun 2024 13:20 UTC
23 points
0 comments18 min readLW link
(www.jefftk.com)

Pop­u­la­tion ethics and the value of variety

cousin_it23 Jun 2024 10:42 UTC
24 points
11 comments2 min readLW link

[Question] Karma votes: blind to or ac­count­ing for score?

cata22 Jun 2024 21:40 UTC
19 points
4 comments1 min readLW link

[Question] Should effec­tive al­tru­ism be more “cool”?

jaredmantell22 Jun 2024 20:42 UTC
3 points
3 comments1 min readLW link

Meta Align­ment: Com­mu­ni­ca­tion Wack-a-Mole

Bridgett Kay22 Jun 2024 20:12 UTC
16 points
2 comments5 min readLW link
(dxmrevealed.wordpress.com)

AI as a com­put­ing plat­form: what to expect

Jonasb22 Jun 2024 19:55 UTC
−3 points
0 comments7 min readLW link
(www.denominations.io)

Ex­pected num­ber of tries

adios22 Jun 2024 19:22 UTC
6 points
0 comments2 min readLW link

Ap­ply­ing Force to the Wrong End of a Causal Chain

silentbob22 Jun 2024 18:06 UTC
40 points
0 comments9 min readLW link

Bed Time Quests & Din­ner Games for 3-5 year olds

22 Jun 2024 7:53 UTC
51 points
0 comments1 min readLW link
(kidquest.substack.com)

Ap­prais­ing ag­grega­tivism and utilitarianism

Cleo Nardo21 Jun 2024 23:10 UTC
27 points
10 comments19 min readLW link

Best-of-n with mis­al­igned re­ward mod­els for Math reasoning

Fabien Roger21 Jun 2024 22:53 UTC
24 points
0 comments3 min readLW link

No re­ally, the Sticker Short­cut fal­lacy is in­deed a fallacy

ymeskhout21 Jun 2024 22:27 UTC
11 points
2 comments5 min readLW link
(www.ymeskhout.com)

Sara­jevo 1914: Black Swan Questions

JohnBuridan21 Jun 2024 21:27 UTC
8 points
0 comments2 min readLW link

Yud­kowsky is too op­ti­mistic about how AI will treat hu­mans.

ProfessorFalken21 Jun 2024 19:01 UTC
0 points
1 comment1 min readLW link

Juneberry Puffs

jefftk21 Jun 2024 18:50 UTC
15 points
0 comments1 min readLW link
(www.jefftk.com)

Let’s De­sign a School, Part 3.2 Costs

Sable21 Jun 2024 17:58 UTC
8 points
0 comments5 min readLW link
(affablyevil.substack.com)

2022 AI Align­ment Course: 5→37% work­ing on AI safety

Dewi21 Jun 2024 17:45 UTC
7 points
3 comments3 min readLW link

Some Thoughts on AI Align­ment: Us­ing AI to Con­trol AI

eigenvalue21 Jun 2024 17:44 UTC
1 point
1 comment1 min readLW link
(github.com)

What dis­t­in­guishes “early”, “mid” and “end” games?

Raemon21 Jun 2024 17:41 UTC
47 points
22 comments1 min readLW link

Nu­clear War, Map and Ter­ri­tory, Values | Guild of the Rose Newslet­ter, May 2024

moridinamael21 Jun 2024 17:39 UTC
18 points
0 comments4 min readLW link
(guildoftherose.org)

AI gov­er­nance needs a the­ory of victory

21 Jun 2024 16:15 UTC
34 points
6 comments1 min readLW link
(www.convergenceanalysis.org)

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

21 Jun 2024 15:54 UTC
160 points
13 comments8 min readLW link
(arxiv.org)

On OpenAI’s Model Spec

Zvi21 Jun 2024 13:00 UTC
46 points
3 comments30 min readLW link
(thezvi.wordpress.com)

At­ten­tion Out­put SAEs Im­prove Cir­cuit Analysis

21 Jun 2024 12:56 UTC
33 points
1 comment19 min readLW link

“New­ton’s laws” of finance

pchvykov21 Jun 2024 9:41 UTC
9 points
3 comments10 min readLW link

Cap­i­tal­is­ing On Trust—A Simulation

James Stephen Brown21 Jun 2024 4:43 UTC
2 points
0 comments1 min readLW link
(nonzerosum.games)

″… than av­er­age” is (al­most) meaningless

jwfiredragon21 Jun 2024 4:42 UTC
16 points
6 comments3 min readLW link

The Ker­nel of Mean­ing in Prop­erty Rights

Abhimanyu Pallavi Sudhir21 Jun 2024 1:12 UTC
7 points
6 comments2 min readLW link

En­riched tab is now the de­fault LW Front­page ex­pe­rience for logged-in users

21 Jun 2024 0:09 UTC
46 points
27 comments3 min readLW link

De­bate, Or­a­cles, and Obfus­cated Arguments

20 Jun 2024 23:14 UTC
40 points
2 comments21 min readLW link

Eva­po­ra­tion of improvements

Viliam20 Jun 2024 18:34 UTC
28 points
27 comments2 min readLW link

In­ter­pret­ing and Steer­ing Fea­tures in Images

Gytis Daujotas20 Jun 2024 18:33 UTC
65 points
6 comments5 min readLW link

Claude 3.5 Sonnet

Zach Stein-Perlman20 Jun 2024 18:00 UTC
75 points
41 comments1 min readLW link
(www.anthropic.com)

[Question] What is go­ing to hap­pen in a case of an AGI era where hu­mans are out of the game?

Cipolla20 Jun 2024 17:44 UTC
−2 points
1 comment1 min readLW link

Jailbreak steer­ing generalization

20 Jun 2024 17:25 UTC
41 points
4 comments2 min readLW link
(arxiv.org)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofsky20 Jun 2024 13:33 UTC
42 points
0 comments1 min readLW link

AI #69: Nice

Zvi20 Jun 2024 12:40 UTC
65 points
9 comments51 min readLW link
(thezvi.wordpress.com)

Niche product design

Itay Dreyfus20 Jun 2024 6:34 UTC
2 points
1 comment3 min readLW link
(productidentity.co)

Data on AI

20 Jun 2024 6:31 UTC
1 point
0 comments1 min readLW link
(epochai.org)

Ac­tu­ally, Power Plants May Be an AI Train­ing Bot­tle­neck.

Lao Mein20 Jun 2024 4:41 UTC
83 points
13 comments2 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi Yamakawa20 Jun 2024 4:05 UTC
5 points
0 comments12 min readLW link

Week One of Study­ing Trans­form­ers Architecture

JustisMills20 Jun 2024 3:47 UTC
3 points
0 comments15 min readLW link
(justismills.substack.com)