Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwernJul 3, 2023, 12:48 AM
426 points
54 comments7 min readLW link
(www.youtube.com)

Align­ment Grant­mak­ing is Fund­ing-Limited Right Now

johnswentworthJul 19, 2023, 4:49 PM
312 points
68 comments1 min readLW link

Ac­ci­den­tally Load Bearing

jefftkJul 13, 2023, 4:10 PM
287 points
18 comments1 min readLW link1 review
(www.jefftk.com)

Yes, It’s Sub­jec­tive, But Why All The Crabs?

johnswentworthJul 28, 2023, 7:35 PM
250 points
15 comments6 min readLW link

Cul­ti­vat­ing a state of mind where new ideas are born

Henrik KarlssonJul 27, 2023, 9:16 AM
238 points
21 comments14 min readLW link2 reviews
(www.henrikkarlsson.xyz)

Self-driv­ing car bets

paulfchristianoJul 29, 2023, 6:10 PM
236 points
44 comments5 min readLW link
(sideways-view.com)

Ways I Ex­pect AI Reg­u­la­tion To In­crease Ex­tinc­tion Risk

1a3ornJul 4, 2023, 5:32 PM
225 points
32 comments7 min readLW link

Con­scious­ness as a con­fla­tion­ary al­li­ance term for in­trin­si­cally val­ued in­ter­nal experiences

Andrew_CritchJul 10, 2023, 8:09 AM
214 points
54 comments11 min readLW link2 reviews

My “2.9 trauma limit”

RaemonJul 1, 2023, 7:32 PM
196 points
31 comments7 min readLW link

Towards Devel­op­men­tal Interpretability

Jul 12, 2023, 7:33 PM
192 points
10 comments9 min readLW link1 review

Grant ap­pli­ca­tions and grand narratives

ElizabethJul 2, 2023, 12:16 AM
191 points
22 comments6 min readLW link

Cry­on­ics and Regret

MvBJul 24, 2023, 9:16 AM
190 points
35 comments2 min readLW link1 review

[Linkpost] In­tro­duc­ing Superalignment

berenJul 5, 2023, 6:23 PM
175 points
69 comments1 min readLW link
(openai.com)

Ra­tion­al­ity !== Winning

RaemonJul 24, 2023, 2:53 AM
169 points
51 comments4 min readLW link

When can we trust model eval­u­a­tions?

evhubJul 28, 2023, 7:42 PM
166 points
10 comments10 min readLW link1 review

Why it’s so hard to talk about Consciousness

Rafael HarthJul 2, 2023, 3:56 PM
166 points
213 comments9 min readLW link3 reviews

Jailbreak­ing GPT-4′s code interpreter

Nikola JurkovicJul 13, 2023, 6:43 PM
160 points
22 comments7 min readLW link

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink OldenzielJul 24, 2023, 11:30 AM
149 points
12 comments7 min readLW link

OpenAI Launches Su­per­al­ign­ment Taskforce

ZviJul 11, 2023, 1:00 PM
149 points
40 comments49 min readLW link
(thezvi.wordpress.com)

The God­dess of Every­thing Else—The Animation

WriterJul 13, 2023, 4:26 PM
142 points
4 comments1 min readLW link
(youtu.be)

The Seeker’s Game – Vignettes from the Bay

YuliaJul 9, 2023, 7:32 PM
141 points
19 comments16 min readLW link

Go­ing Crazy and Get­ting Bet­ter Again

EvenstarJul 2, 2023, 6:55 PM
139 points
13 comments7 min readLW link1 review

Ten Levels of AI Align­ment Difficulty

Sammy MartinJul 3, 2023, 8:20 PM
138 points
24 comments12 min readLW link1 review

Neuronpedia

Johnny LinJul 26, 2023, 4:29 PM
135 points
51 comments2 min readLW link
(neuronpedia.org)

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM
135 points
16 comments8 min readLW link

Views on when AGI comes and on strat­egy to re­duce ex­is­ten­tial risk

TsviBTJul 8, 2023, 9:00 AM
133 points
61 comments14 min readLW link1 review

In­tro­duc­ing Fate­book: the fastest way to make and track predictions

Jul 11, 2023, 3:28 PM
132 points
41 comments1 min readLW link2 reviews
(fatebook.io)

Even Su­per­hu­man Go AIs Have Sur­pris­ing Failure Modes

Jul 20, 2023, 5:31 PM
130 points
22 comments10 min readLW link
(far.ai)

Re­duc­ing syco­phancy and im­prov­ing hon­esty via ac­ti­va­tion steering

Nina PanicksseryJul 28, 2023, 2:46 AM
122 points
18 comments9 min readLW link1 review

Why was the AI Align­ment com­mu­nity so un­pre­pared for this mo­ment?

Ras1513Jul 15, 2023, 12:26 AM
121 points
65 comments2 min readLW link

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric DrexlerJul 10, 2023, 1:42 PM
118 points
9 comments12 min readLW link

In­tro­duc­ing bayescalc.io

Adele LopezJul 7, 2023, 4:11 PM
115 points
29 comments1 min readLW link
(bayescalc.io)

Win­ners of AI Align­ment Awards Re­search Contest

Jul 13, 2023, 4:14 PM
115 points
4 comments12 min readLW link
(alignmentawards.com)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin PopeJul 23, 2023, 8:14 PM
114 points
15 comments9 min readLW link

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

Jul 18, 2023, 4:36 PM
111 points
15 comments6 min readLW link1 review

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail SaminJul 12, 2023, 12:12 PM
105 points
13 comments4 min readLW link

Con­sider Join­ing the UK Foun­da­tion Model Taskforce

ZviJul 10, 2023, 1:50 PM
105 points
12 comments1 min readLW link
(thezvi.wordpress.com)

Pri­ori­ties for the UK Foun­da­tion Models Taskforce

Andrea_MiottiJul 21, 2023, 3:23 PM
105 points
4 comments5 min readLW link
(www.conjecture.dev)

An­thropic Observations

ZviJul 25, 2023, 12:50 PM
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Meta-level ad­ver­sar­ial eval­u­a­tion of over­sight tech­niques might al­low ro­bust mea­sure­ment of their adequacy

Jul 26, 2023, 5:02 PM
99 points
19 comments1 min readLW link1 review

Fixed Point: a love story

Richard_NgoJul 8, 2023, 1:56 PM
98 points
2 comments7 min readLW link

When Some­one Tells You They’re Ly­ing, Believe Them

ymeskhoutJul 14, 2023, 12:31 AM
95 points
3 comments3 min readLW link

“Jus­tice, Cher­ryl.”

Zack_M_DavisJul 23, 2023, 4:16 PM
91 points
21 comments9 min readLW link1 review

BCIs and the ecosys­tem of mod­u­lar minds

berenJul 21, 2023, 3:58 PM
88 points
14 comments11 min readLW link

Apollo Neuro Results

ElizabethJul 30, 2023, 6:40 PM
85 points
17 comments3 min readLW link
(acesounderglass.com)

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

lukemarksJul 8, 2023, 11:42 AM
84 points
28 comments2 min readLW link

Un­der­wa­ter Tor­ture Cham­bers: The Hor­ror Of Fish Farming

omnizoidJul 26, 2023, 12:27 AM
83 points
50 comments10 min readLW link1 review

A $10k retroac­tive grant for VaccinateCA

Austin ChenJul 27, 2023, 6:14 PM
82 points
0 commentsLW link
(manifund.org)

Sapi­ent Algorithms

ValentineJul 17, 2023, 4:30 PM
82 points
15 comments5 min readLW link

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidadJul 22, 2023, 6:09 PM
80 points
2 comments2 min readLW link