SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

Feb 5, 2023, 10:02 PM
682 points
206 comments12 min readLW link1 review

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8resFeb 2, 2023, 12:27 AM
459 points
63 comments4 min readLW link3 reviews

Bing Chat is blatantly, ag­gres­sively misaligned

evhubFeb 15, 2023, 5:29 AM
403 points
181 comments2 min readLW link1 review

Please don’t throw your mind away

TsviBTFeb 15, 2023, 9:41 PM
374 points
49 comments18 min readLW link1 review

Not­ing an er­ror in Inad­e­quate Equilibria

Matthew BarnettFeb 8, 2023, 1:33 AM
366 points
60 comments2 min readLW link2 reviews

Fuck­ing God­damn Ba­sics of Ra­tion­al­ist Discourse

LoganStrohlFeb 4, 2023, 1:47 AM
346 points
103 comments1 min readLW link3 reviews

Child­hoods of ex­cep­tional people

Henrik KarlssonFeb 6, 2023, 5:27 PM
344 points
63 comments15 min readLW link1 review
(escapingflatland.substack.com)

Cyborgism

Feb 10, 2023, 2:47 PM
341 points
46 comments35 min readLW link2 reviews

You Don’t Ex­ist, Duncan

Duncan Sabien (Deactivated)Feb 2, 2023, 8:37 AM
252 points
107 comments9 min readLW link

I hired 5 peo­ple to sit be­hind me and make me pro­duc­tive for a month

Simon BerensFeb 5, 2023, 1:19 AM
249 points
83 comments10 min readLW link
(www.simonberens.com)

AGI in sight: our look at the game board

Feb 18, 2023, 10:17 PM
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Ele­ments of Ra­tion­al­ist Discourse

Rob BensingerFeb 12, 2023, 7:58 AM
224 points
49 comments3 min readLW link1 review

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

Feb 25, 2023, 7:35 PM
195 points
46 comments4 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8resFeb 21, 2023, 12:48 AM
193 points
40 comments3 min readLW link

Ei­genKarma: trust at scale

Henrik KarlssonFeb 8, 2023, 6:52 PM
186 points
52 comments5 min readLW link

[Link] A com­mu­nity alert about Ziz

DanielFilanFeb 24, 2023, 12:06 AM
180 points
166 comments2 min readLW link4 reviews
(medium.com)

Why Are Bac­te­ria So Sim­ple?

aysjaFeb 6, 2023, 3:00 AM
172 points
33 comments10 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTroutFeb 18, 2023, 6:41 PM
172 points
10 comments2 min readLW link
(arxiv.org)

AI #1: Syd­ney and Bing

ZviFeb 21, 2023, 2:00 PM
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) Feb 15, 2023, 1:56 AM
166 points
31 comments4 min readLW link

Big Mac Sub­sidy?

jefftkFeb 23, 2023, 4:00 AM
158 points
25 comments2 min readLW link
(www.jefftk.com)

There are no co­her­ence theorems

Feb 20, 2023, 9:25 PM
149 points
130 comments19 min readLW link1 review

Stop post­ing prompt in­jec­tions on Twit­ter and call­ing it “mis­al­ign­ment”

lcFeb 19, 2023, 2:21 AM
144 points
9 comments1 min readLW link

We Found An Neu­ron in GPT-2

Feb 11, 2023, 6:27 PM
143 points
23 comments7 min readLW link
(clementneo.com)

Hash­ing out long-stand­ing dis­agree­ments seems low-value to me

So8resFeb 16, 2023, 6:20 AM
141 points
34 comments4 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

Feb 9, 2023, 1:30 AM
140 points
16 comments9 min readLW link
(generative.ink)

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

Feb 23, 2023, 12:34 PM
138 points
89 comments75 min readLW link

“Ra­tion­al­ist Dis­course” Is Like “Physi­cist Mo­tors”

Zack_M_DavisFeb 26, 2023, 5:58 AM
136 points
153 comments9 min readLW link1 review

Pre­train­ing Lan­guage Models with Hu­man Preferences

Feb 21, 2023, 5:57 PM
135 points
20 comments11 min readLW link2 reviews

Mo­dal Fix­point Co­op­er­a­tion with­out Löb’s Theorem

Andrew_CritchFeb 5, 2023, 12:58 AM
134 points
34 comments3 min readLW link1 review

Eval­u­a­tions (of new AI Safety re­searchers) can be noisy

LawrenceCFeb 5, 2023, 4:15 AM
132 points
11 comments16 min readLW link1 review

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

BuckFeb 17, 2023, 5:26 PM
127 points
11 comments7 min readLW link

There are (prob­a­bly) no su­per­hu­man Go AIs: strong hu­man play­ers beat the strongest AIs

TaranFeb 19, 2023, 12:25 PM
125 points
34 comments4 min readLW link

Recom­men­da­tion: Bug Boun­ties and Re­spon­si­ble Dis­clo­sure for Ad­vanced ML Systems

VaniverFeb 17, 2023, 8:11 PM
125 points
12 comments2 min readLW link

In Defense of Chat­bot Romance

Kaj_SotalaFeb 11, 2023, 2:30 PM
124 points
53 comments11 min readLW link
(kajsotala.fi)

GPT-175bee

Feb 8, 2023, 6:58 PM
122 points
14 comments1 min readLW link

A pro­posed method for fore­cast­ing trans­for­ma­tive AI

Matthew BarnettFeb 10, 2023, 7:34 PM
121 points
21 comments10 min readLW link

On In­ves­ti­gat­ing Con­spir­acy Theories

ZviFeb 20, 2023, 12:50 PM
116 points
38 comments5 min readLW link
(thezvi.wordpress.com)

Bing chat is the AI fire alarm

RatiosFeb 17, 2023, 6:51 AM
115 points
63 comments3 min readLW link

The Open Agency Model

Eric DrexlerFeb 22, 2023, 10:35 AM
114 points
18 comments4 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-PerlmanFeb 17, 2023, 4:10 AM
114 points
9 comments1 min readLW link
(aiimpacts.org)

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

Feb 6, 2023, 7:09 PM
113 points
45 comments13 min readLW link

Con­flict The­ory of Bounded Distrust

Zack_M_DavisFeb 12, 2023, 5:30 AM
112 points
33 comments3 min readLW link1 review

GPT-4 Predictions

Stephen McAleeseFeb 17, 2023, 11:20 PM
110 points
27 comments11 min readLW link

A Way To Be Okay

Duncan Sabien (Deactivated)Feb 19, 2023, 8:27 PM
109 points
38 comments10 min readLW link1 review

Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

Feb 22, 2023, 4:09 PM
108 points
9 comments6 min readLW link

Another Way to Be Okay

Gretta DulebaFeb 19, 2023, 8:49 PM
107 points
15 comments6 min readLW link

I don’t think MIRI “gave up”

RaemonFeb 3, 2023, 12:26 AM
106 points
64 comments4 min readLW link

Sam Alt­man: “Plan­ning for AGI and be­yond”

LawrenceCFeb 24, 2023, 8:28 PM
104 points
54 comments6 min readLW link
(openai.com)

H5N1

ZviFeb 13, 2023, 12:50 PM
102 points
1 comment9 min readLW link
(thezvi.wordpress.com)