Sig­nifi­cantly En­hanc­ing Adult In­tel­li­gence With Gene Edit­ing May Be Possible

12 Dec 2023 18:14 UTC
451 points
168 comments33 min readLW link

Speak­ing to Con­gres­sional staffers about AI risk

4 Dec 2023 23:08 UTC
297 points
23 comments16 min readLW link

Con­stel­la­tions are Younger than Continents

Jeffrey Heninger19 Dec 2023 6:12 UTC
260 points
22 comments2 min readLW link

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
215 points
14 comments10 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
195 points
56 comments13 min readLW link

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC
180 points
34 comments5 min readLW link

re: Yud­kowsky on biolog­i­cal materials

bhauth11 Dec 2023 13:28 UTC
179 points
30 comments5 min readLW link

Effec­tive Asper­sions: How the Non­lin­ear In­ves­ti­ga­tion Went Wrong

TracingWoodgrains19 Dec 2023 12:00 UTC
175 points
170 comments1 min readLW link

Crit­i­cal re­view of Chris­ti­ano’s dis­agree­ments with Yudkowsky

Vanessa Kosoy27 Dec 2023 16:02 UTC
172 points
40 comments15 min readLW link

2023 Unoffi­cial LessWrong Cen­sus/​Survey

Screwtape2 Dec 2023 4:41 UTC
169 points
81 comments1 min readLW link

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

18 Dec 2023 20:35 UTC
166 points
21 comments12 min readLW link

Is be­ing sexy for your homies?

Valentine13 Dec 2023 20:37 UTC
163 points
92 comments14 min readLW link

How use­ful is mechanis­tic in­ter­pretabil­ity?

1 Dec 2023 2:54 UTC
163 points
54 comments25 min readLW link

The likely first longevity drug is based on sketchy sci­ence. This is bad for sci­ence and bad for longevity.

BobBurgers12 Dec 2023 2:42 UTC
161 points
34 comments5 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC
158 points
42 comments1 min readLW link

Succession

Richard_Ngo20 Dec 2023 19:25 UTC
158 points
48 comments11 min readLW link
(www.narrativeark.xyz)

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
147 points
21 comments10 min readLW link

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC
146 points
40 comments31 min readLW link

AI Views Snapshots

Rob Bensinger13 Dec 2023 0:45 UTC
142 points
61 comments1 min readLW link

The Dark Arts

19 Dec 2023 4:41 UTC
135 points
49 comments9 min readLW link

Bayesian Injustice

Kevin Dorst14 Dec 2023 15:44 UTC
124 points
10 comments6 min readLW link
(kevindorst.substack.com)

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC
122 points
29 comments13 min readLW link

Nat­u­ral La­tents: The Math

27 Dec 2023 19:03 UTC
120 points
37 comments12 min readLW link

The LessWrong 2022 Review

habryka5 Dec 2023 4:00 UTC
115 points
43 comments4 min readLW link

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC
114 points
155 comments8 min readLW link

Map­ping the se­man­tic void: Strange go­ings-on in GPT em­bed­ding spaces

mwatkins14 Dec 2023 13:10 UTC
114 points
31 comments14 min readLW link

What I Would Do If I Were Work­ing On AI Governance

johnswentworth8 Dec 2023 6:43 UTC
109 points
32 comments10 min readLW link

Fact Find­ing: At­tempt­ing to Re­v­erse-Eng­ineer Fac­tual Re­call on the Neu­ron Level (Post 1)

23 Dec 2023 2:44 UTC
108 points
8 comments22 min readLW link

“AI Align­ment” is a Danger­ously Over­loaded Term

Roko15 Dec 2023 14:34 UTC
108 points
100 comments3 min readLW link

[Question] How do you feel about LessWrong these days? [Open feed­back thread]

jacobjacob5 Dec 2023 20:54 UTC
107 points
281 comments1 min readLW link

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

Non­lin­ear’s Ev­i­dence: De­bunk­ing False and Mislead­ing Claims

KatWoods12 Dec 2023 13:16 UTC
104 points
171 comments1 min readLW link

The Witness

Richard_Ngo3 Dec 2023 22:27 UTC
104 points
5 comments14 min readLW link
(www.narrativeark.xyz)

[Valence se­ries] 1. Introduction

Steven Byrnes4 Dec 2023 15:40 UTC
98 points
14 comments16 min readLW link

Mean­ing & Agency

abramdemski19 Dec 2023 22:27 UTC
91 points
17 comments14 min readLW link

Pre­dic­tion Mar­kets aren’t Magic

SimonM21 Dec 2023 12:54 UTC
90 points
29 comments3 min readLW link

Based Beff Je­zos and the Accelerationists

Zvi6 Dec 2023 16:00 UTC
89 points
29 comments12 min readLW link
(thezvi.wordpress.com)

A Crisper Ex­pla­na­tion of Si­mu­lacrum Levels

Thane Ruthenis23 Dec 2023 22:13 UTC
86 points
13 comments13 min readLW link

[Valence se­ries] 2. Valence & Normativity

Steven Byrnes7 Dec 2023 16:43 UTC
86 points
5 comments28 min readLW link

A Univer­sal Emer­gent De­com­po­si­tion of Retrieval Tasks in Lan­guage Models

19 Dec 2023 11:52 UTC
84 points
3 comments10 min readLW link
(arxiv.org)

Some for-profit AI al­ign­ment org ideas

Eric Ho14 Dec 2023 14:23 UTC
84 points
19 comments9 min readLW link

Niet­zsche’s Mo­ral­ity in Plain English

Arjun Panickssery4 Dec 2023 0:57 UTC
84 points
13 comments4 min readLW link
(arjunpanickssery.substack.com)

Re­fusal mechanisms: ini­tial ex­per­i­ments with Llama-2-7b-chat

8 Dec 2023 17:08 UTC
81 points
7 comments7 min readLW link

Study­ing The Alien Mind

5 Dec 2023 17:27 UTC
80 points
10 comments15 min readLW link

EU poli­cy­mak­ers reach an agree­ment on the AI Act

tlevin15 Dec 2023 6:02 UTC
78 points
7 comments7 min readLW link

Send us ex­am­ple gnarly bugs

10 Dec 2023 5:23 UTC
77 points
10 comments2 min readLW link

OpenAI: Leaks Con­firm the Story

Zvi12 Dec 2023 14:00 UTC
77 points
9 comments16 min readLW link
(thezvi.wordpress.com)

MATS Sum­mer 2023 Retrospective

1 Dec 2023 23:29 UTC
77 points
34 comments26 min readLW link

[Valence se­ries] 3. Valence & Beliefs

Steven Byrnes11 Dec 2023 20:21 UTC
75 points
11 comments21 min readLW link

The Offense-Defense Balance Rarely Changes

Maxwell Tabarrok9 Dec 2023 15:21 UTC
75 points
23 comments3 min readLW link
(maximumprogress.substack.com)