Launch­ing Light­speed Grants (Ap­ply by July 6th)

habrykaJun 7, 2023, 2:53 AM
211 points
41 comments5 min readLW link

Ac­tu­ally, Othello-GPT Has A Lin­ear Emer­gent World Representation

Neel NandaMar 29, 2023, 10:13 PM
211 points
26 comments19 min readLW link
(neelnanda.io)

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristianoJul 31, 2023, 4:04 PM
210 points
44 comments11 min readLW link1 review

Labs should be ex­plicit about why they are build­ing AGI

peterbarnettOct 17, 2023, 9:09 PM
210 points
18 comments1 min readLW link1 review

The Lighthaven Cam­pus is open for bookings

habrykaSep 30, 2023, 1:08 AM
209 points
18 comments5 min readLW link
(www.lighthaven.space)

Gi­ant (In)scrutable Ma­tri­ces: (Maybe) the Best of All Pos­si­ble Worlds

1a3ornApr 4, 2023, 5:39 PM
208 points
38 comments5 min readLW link1 review

Evolu­tion pro­vides no ev­i­dence for the sharp left turn

Quintin PopeApr 11, 2023, 6:43 PM
206 points
65 comments15 min readLW link1 review

My cur­rent LK99 questions

Eliezer YudkowskyAug 1, 2023, 10:48 PM
206 points
38 comments5 min readLW link

Feed­back­loop-first Rationality

RaemonAug 7, 2023, 5:58 PM
205 points
69 comments8 min readLW link2 reviews

Light­cone In­fras­truc­ture/​LessWrong is look­ing for funding

habrykaJun 14, 2023, 4:45 AM
205 points
39 comments1 min readLW link

If in­ter­pretabil­ity re­search goes well, it may get dangerous

So8resApr 3, 2023, 9:48 PM
202 points
11 comments2 min readLW link

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofskyOct 27, 2023, 3:19 PM
200 points
33 comments8 min readLW link

My ten­ta­tive best guess on how EAs and Ra­tion­al­ists some­times turn crazy

habrykaJun 21, 2023, 4:11 AM
199 points
110 comments8 min readLW link

GPT-4 Plugs In

ZviMar 27, 2023, 12:10 PM
198 points
47 comments6 min readLW link
(thezvi.wordpress.com)

Abil­ity to solve long-hori­zon tasks cor­re­lates with want­ing things in the be­hav­iorist sense

So8resNov 24, 2023, 5:37 PM
197 points
84 comments5 min readLW link1 review

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM
197 points
63 comments14 min readLW link1 review

My “2.9 trauma limit”

RaemonJul 1, 2023, 7:32 PM
196 points
31 comments7 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yud­kowsky)

sudoOct 29, 2023, 11:09 PM
196 points
24 comments10 min readLW link1 review
(nitter.net)

Think­ing By The Clock

ScrewtapeNov 8, 2023, 7:40 AM
196 points
29 comments8 min readLW link1 review

Acausal normalcy

Andrew_CritchMar 3, 2023, 11:34 PM
195 points
36 comments8 min readLW link1 review

Killing Socrates

Duncan Sabien (Deactivated)Apr 11, 2023, 10:28 AM
195 points
146 comments8 min readLW link1 review

In­fer­ence-Time In­ter­ven­tion: Elic­it­ing Truth­ful An­swers from a Lan­guage Model

likennethJun 11, 2023, 5:38 AM
195 points
4 comments1 min readLW link
(arxiv.org)

Cog­ni­tive Emu­la­tion: A Naive AI Safety Proposal

Feb 25, 2023, 7:35 PM
195 points
46 comments4 min readLW link

Is be­ing sexy for your homies?

ValentineDec 13, 2023, 8:37 PM
193 points
100 comments14 min readLW link2 reviews

Pro­pa­ganda or Science: A Look at Open Source AI and Bioter­ror­ism Risk

1a3ornNov 2, 2023, 6:20 PM
193 points
79 comments23 min readLW link

AI as a sci­ence, and three ob­sta­cles to al­ign­ment strategies

So8resOct 25, 2023, 9:00 PM
193 points
80 comments11 min readLW link

AI al­ign­ment re­searchers don’t (seem to) stack

So8resFeb 21, 2023, 12:48 AM
193 points
40 comments3 min readLW link

The ‘ pe­ter­todd’ phenomenon

mwatkinsApr 15, 2023, 12:59 AM
192 points
50 comments38 min readLW link1 review

Towards Devel­op­men­tal Interpretability

Jul 12, 2023, 7:33 PM
192 points
10 comments9 min readLW link1 review

Sam Alt­man fired from OpenAI

LawrenceCNov 17, 2023, 8:42 PM
192 points
75 comments1 min readLW link
(openai.com)

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane RuthenisDec 16, 2023, 8:08 PM
191 points
34 comments5 min readLW link

Grant ap­pli­ca­tions and grand narratives

ElizabethJul 2, 2023, 12:16 AM
191 points
22 comments6 min readLW link

Twiblings, four-par­ent ba­bies and other re­pro­duc­tive technology

GeneSmithMay 20, 2023, 5:11 PM
191 points
33 comments6 min readLW link

Cry­on­ics and Regret

MvBJul 24, 2023, 9:16 AM
190 points
35 comments2 min readLW link1 review

Eval­u­at­ing the his­tor­i­cal value mis­speci­fi­ca­tion argument

Matthew BarnettOct 5, 2023, 6:34 PM
190 points
162 comments7 min readLW link3 reviews

Tran­script and Brief Re­sponse to Twit­ter Con­ver­sa­tion be­tween Yann LeCunn and Eliezer Yudkowsky

ZviApr 26, 2023, 1:10 PM
190 points
51 comments10 min readLW link
(thezvi.wordpress.com)

The King and the Golem

Richard_NgoSep 25, 2023, 7:51 PM
190 points
19 comments5 min readLW link1 review
(narrativeark.substack.com)

The ba­sic rea­sons I ex­pect AGI ruin

Rob BensingerApr 18, 2023, 3:37 AM
189 points
73 comments14 min readLW link

The other side of the tidal wave

KatjaGraceNov 3, 2023, 5:40 AM
189 points
86 comments1 min readLW link
(worldspiritsockpuppet.com)

A Golden Age of Build­ing? Ex­cerpts and les­sons from Em­pire State, Pen­tagon, Skunk Works and SpaceX

Bird ConceptSep 1, 2023, 4:03 AM
188 points
26 comments24 min readLW link1 review

What a com­pute-cen­tric frame­work says about AI take­off speeds

Tom DavidsonJan 23, 2023, 4:02 AM
188 points
30 comments16 min readLW link1 review

Effec­tive Asper­sions: How the Non­lin­ear In­ves­ti­ga­tion Went Wrong

TracingWoodgrainsDec 19, 2023, 12:00 PM
188 points
172 commentsLW link2 reviews

An­nounc­ing Timaeus

Oct 22, 2023, 11:59 AM
188 points
15 comments4 min readLW link

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

Sep 28, 2023, 6:53 PM
187 points
39 comments3 min readLW link1 review

Ei­genKarma: trust at scale

Henrik KarlssonFeb 8, 2023, 6:52 PM
186 points
52 comments5 min readLW link

Another med­i­cal miracle

DentinJun 25, 2023, 8:43 PM
186 points
48 comments3 min readLW link

What will GPT-2030 look like?

jsteinhardtJun 7, 2023, 11:40 PM
185 points
43 comments23 min readLW link
(bounded-regret.ghost.io)

Large Lan­guage Models will be Great for Censorship

Ethan EdwardsAug 21, 2023, 7:03 PM
185 points
14 comments8 min readLW link
(ethanedwards.substack.com)

Why Not Just… Build Weak AI Tools For AI Align­ment Re­search?

johnswentworthMar 5, 2023, 12:12 AM
184 points
18 comments6 min readLW link

OpenAI API base mod­els are not syco­phan­tic, at any size

nostalgebraistAug 29, 2023, 12:58 AM
183 points
20 comments2 min readLW link
(colab.research.google.com)