SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

5 Feb 2023 22:02 UTC
676 points
205 comments12 min readLW link

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8res2 Feb 2023 0:27 UTC
421 points
60 comments4 min readLW link

Bing Chat is blatantly, ag­gres­sively misaligned

evhub15 Feb 2023 5:29 UTC
400 points
180 comments2 min readLW link

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew Barnett8 Feb 2023 1:33 UTC
361 points
56 comments2 min readLW link

Please don’t throw your mind away

TsviBT15 Feb 2023 21:41 UTC
341 points
44 comments18 min readLW link

Cyborgism

10 Feb 2023 14:47 UTC
336 points
46 comments35 min readLW link

Child­hoods of ex­cep­tional people

Henrik Karlsson6 Feb 2023 17:27 UTC
330 points
62 comments15 min readLW link
(escapingflatland.substack.com)

Fuck­ing God­damn Ba­sics of Ra­tion­al­ist Discourse

LoganStrohl4 Feb 2023 1:47 UTC
319 points
100 comments1 min readLW link

I hired 5 peo­ple to sit be­hind me and make me pro­duc­tive for a month

Simon Berens5 Feb 2023 1:19 UTC
246 points
83 comments10 min readLW link
(www.simonberens.com)

You Don’t Ex­ist, Duncan

Duncan Sabien (Deactivated)2 Feb 2023 8:37 UTC
244 points
107 comments9 min readLW link

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
225 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Ele­ments of Ra­tion­al­ist Discourse

Rob Bensinger12 Feb 2023 7:58 UTC
223 points
48 comments3 min readLW link

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

25 Feb 2023 19:35 UTC
194 points
46 comments4 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8res21 Feb 2023 0:48 UTC
191 points
40 comments3 min readLW link

Ei­genKarma: trust at scale

Henrik Karlsson8 Feb 2023 18:52 UTC
186 points
52 comments5 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
172 points
10 comments2 min readLW link
(arxiv.org)

Why Are Bac­te­ria So Sim­ple?

aysja6 Feb 2023 3:00 UTC
171 points
33 comments10 min readLW link

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
171 points
44 comments61 min readLW link
(thezvi.wordpress.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 15 Feb 2023 1:56 UTC
166 points
31 comments4 min readLW link

[Link] A com­mu­nity alert about Ziz

DanielFilan24 Feb 2023 0:06 UTC
163 points
126 comments2 min readLW link
(medium.com)

Big Mac Sub­sidy?

jefftk23 Feb 2023 4:00 UTC
156 points
25 comments2 min readLW link
(www.jefftk.com)

There are no co­her­ence theorems

20 Feb 2023 21:25 UTC
145 points
124 comments19 min readLW link

Stop post­ing prompt in­jec­tions on Twit­ter and call­ing it “mis­al­ign­ment”

lc19 Feb 2023 2:21 UTC
144 points
9 comments1 min readLW link

We Found An Neu­ron in GPT-2

11 Feb 2023 18:27 UTC
143 points
23 comments7 min readLW link
(clementneo.com)

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

9 Feb 2023 1:30 UTC
139 points
16 comments9 min readLW link
(generative.ink)

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

23 Feb 2023 12:34 UTC
138 points
89 comments75 min readLW link

“Ra­tion­al­ist Dis­course” Is Like “Physi­cist Mo­tors”

Zack_M_Davis26 Feb 2023 5:58 UTC
136 points
152 comments9 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
134 points
19 comments11 min readLW link

Mo­dal Fix­point Co­op­er­a­tion with­out Löb’s Theorem

Andrew_Critch5 Feb 2023 0:58 UTC
133 points
32 comments3 min readLW link

Hash­ing out long-stand­ing dis­agree­ments seems low-value to me

So8res16 Feb 2023 6:20 UTC
133 points
34 comments4 min readLW link

Eval­u­a­tions (of new AI Safety re­searchers) can be noisy

LawrenceC5 Feb 2023 4:15 UTC
132 points
10 comments16 min readLW link

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC
127 points
11 comments7 min readLW link

Recom­men­da­tion: Bug Boun­ties and Re­spon­si­ble Dis­clo­sure for Ad­vanced ML Systems

Vaniver17 Feb 2023 20:11 UTC
125 points
12 comments2 min readLW link

There are (prob­a­bly) no su­per­hu­man Go AIs: strong hu­man play­ers beat the strongest AIs

Taran19 Feb 2023 12:25 UTC
124 points
34 comments4 min readLW link

In Defense of Chat­bot Romance

Kaj_Sotala11 Feb 2023 14:30 UTC
123 points
52 comments11 min readLW link
(kajsotala.fi)

A pro­posed method for fore­cast­ing trans­for­ma­tive AI

Matthew Barnett10 Feb 2023 19:34 UTC
121 points
21 comments10 min readLW link

GPT-175bee

8 Feb 2023 18:58 UTC
121 points
14 comments1 min readLW link

On In­ves­ti­gat­ing Con­spir­acy Theories

Zvi20 Feb 2023 12:50 UTC
116 points
38 comments5 min readLW link
(thezvi.wordpress.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
115 points
63 comments3 min readLW link

The Open Agency Model

Eric Drexler22 Feb 2023 10:35 UTC
114 points
18 comments4 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC
114 points
9 comments1 min readLW link
(aiimpacts.org)

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

6 Feb 2023 19:09 UTC
111 points
45 comments13 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC
109 points
27 comments11 min readLW link

A Way To Be Okay

Duncan Sabien (Deactivated)19 Feb 2023 20:27 UTC
108 points
37 comments10 min readLW link

Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

22 Feb 2023 16:09 UTC
108 points
9 comments6 min readLW link

Con­flict The­ory of Bounded Distrust

Zack_M_Davis12 Feb 2023 5:30 UTC
107 points
29 comments3 min readLW link

I don’t think MIRI “gave up”

Raemon3 Feb 2023 0:26 UTC
106 points
64 comments4 min readLW link

Another Way to Be Okay

Gretta Duleba19 Feb 2023 20:49 UTC
105 points
15 comments6 min readLW link

Sam Alt­man: “Plan­ning for AGI and be­yond”

LawrenceC24 Feb 2023 20:28 UTC
104 points
54 comments6 min readLW link
(openai.com)

H5N1

Zvi13 Feb 2023 12:50 UTC
101 points
1 comment9 min readLW link
(thezvi.wordpress.com)