Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
425 points
54 comments7 min readLW link
(www.youtube.com)

Align­ment Grant­mak­ing is Fund­ing-Limited Right Now

johnswentworth19 Jul 2023 16:49 UTC
312 points
68 comments1 min readLW link

Ac­ci­den­tally Load Bearing

jefftk13 Jul 2023 16:10 UTC
280 points
17 comments1 min readLW link1 review
(www.jefftk.com)

Yes, It’s Sub­jec­tive, But Why All The Crabs?

johnswentworth28 Jul 2023 19:35 UTC
248 points
15 comments6 min readLW link

Self-driv­ing car bets

paulfchristiano29 Jul 2023 18:10 UTC
234 points
43 comments5 min readLW link
(sideways-view.com)

Ways I Ex­pect AI Reg­u­la­tion To In­crease Ex­tinc­tion Risk

1a3orn4 Jul 2023 17:32 UTC
227 points
32 comments7 min readLW link

Cul­ti­vat­ing a state of mind where new ideas are born

Henrik Karlsson27 Jul 2023 9:16 UTC
225 points
20 comments14 min readLW link1 review
(www.henrikkarlsson.xyz)

Con­scious­ness as a con­fla­tion­ary al­li­ance term for in­trin­si­cally val­ued in­ter­nal experiences

Andrew_Critch10 Jul 2023 8:09 UTC
201 points
51 comments11 min readLW link

My “2.9 trauma limit”

Raemon1 Jul 2023 19:32 UTC
193 points
31 comments7 min readLW link

Grant ap­pli­ca­tions and grand narratives

Elizabeth2 Jul 2023 0:16 UTC
191 points
22 comments6 min readLW link

Cry­on­ics and Regret

MvB24 Jul 2023 9:16 UTC
187 points
35 comments2 min readLW link1 review

Towards Devel­op­men­tal Interpretability

12 Jul 2023 19:33 UTC
180 points
10 comments9 min readLW link1 review

[Linkpost] In­tro­duc­ing Superalignment

beren5 Jul 2023 18:23 UTC
175 points
69 comments1 min readLW link
(openai.com)

Ra­tion­al­ity !== Winning

Raemon24 Jul 2023 2:53 UTC
163 points
51 comments4 min readLW link

When can we trust model eval­u­a­tions?

evhub28 Jul 2023 19:42 UTC
160 points
10 comments10 min readLW link1 review

Jailbreak­ing GPT-4′s code interpreter

nikola13 Jul 2023 18:43 UTC
160 points
22 comments7 min readLW link

OpenAI Launches Su­per­al­ign­ment Taskforce

Zvi11 Jul 2023 13:00 UTC
149 points
40 comments49 min readLW link
(thezvi.wordpress.com)

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC
145 points
12 comments7 min readLW link

The God­dess of Every­thing Else—The Animation

Writer13 Jul 2023 16:26 UTC
142 points
4 comments1 min readLW link
(youtu.be)

Go­ing Crazy and Get­ting Bet­ter Again

Evenstar2 Jul 2023 18:55 UTC
139 points
13 comments7 min readLW link1 review

The Seeker’s Game – Vignettes from the Bay

Yulia9 Jul 2023 19:32 UTC
137 points
19 comments16 min readLW link

Neuronpedia

Johnny Lin26 Jul 2023 16:29 UTC
135 points
51 comments2 min readLW link
(neuronpedia.org)

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC
134 points
16 comments8 min readLW link

Why it’s so hard to talk about Consciousness

Rafael Harth2 Jul 2023 15:56 UTC
131 points
159 comments9 min readLW link1 review

Even Su­per­hu­man Go AIs Have Sur­pris­ing Failure Modes

20 Jul 2023 17:31 UTC
129 points
22 comments10 min readLW link
(far.ai)

In­tro­duc­ing Fate­book: the fastest way to make and track predictions

11 Jul 2023 15:28 UTC
128 points
36 comments1 min readLW link
(fatebook.io)

Re­duc­ing syco­phancy and im­prov­ing hon­esty via ac­ti­va­tion steering

Nina Panickssery28 Jul 2023 2:46 UTC
122 points
17 comments9 min readLW link

Why was the AI Align­ment com­mu­nity so un­pre­pared for this mo­ment?

Ras151315 Jul 2023 0:26 UTC
121 points
65 comments2 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
121 points
14 comments12 min readLW link

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
117 points
9 comments12 min readLW link

Win­ners of AI Align­ment Awards Re­search Contest

13 Jul 2023 16:14 UTC
115 points
4 comments12 min readLW link
(alignmentawards.com)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin Pope23 Jul 2023 20:14 UTC
114 points
15 comments9 min readLW link

In­tro­duc­ing bayescalc.io

Adele Lopez7 Jul 2023 16:11 UTC
114 points
29 comments1 min readLW link
(bayescalc.io)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
111 points
14 comments6 min readLW link

Con­sider Join­ing the UK Foun­da­tion Model Taskforce

Zvi10 Jul 2023 13:50 UTC
105 points
12 comments1 min readLW link
(thezvi.wordpress.com)

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
105 points
13 comments4 min readLW link

Pri­ori­ties for the UK Foun­da­tion Models Taskforce

Andrea_Miotti21 Jul 2023 15:23 UTC
105 points
4 comments5 min readLW link
(www.conjecture.dev)

An­thropic Observations

Zvi25 Jul 2023 12:50 UTC
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Views on when AGI comes and on strat­egy to re­duce ex­is­ten­tial risk

TsviBT8 Jul 2023 9:00 UTC
103 points
33 comments14 min readLW link

Meta-level ad­ver­sar­ial eval­u­a­tion of over­sight tech­niques might al­low ro­bust mea­sure­ment of their adequacy

26 Jul 2023 17:02 UTC
96 points
19 comments1 min readLW link1 review

Fixed Point: a love story

Richard_Ngo8 Jul 2023 13:56 UTC
95 points
2 comments7 min readLW link

When Some­one Tells You They’re Ly­ing, Believe Them

ymeskhout14 Jul 2023 0:31 UTC
95 points
3 comments3 min readLW link

BCIs and the ecosys­tem of mod­u­lar minds

beren21 Jul 2023 15:58 UTC
88 points
14 comments11 min readLW link

“Jus­tice, Cher­ryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC
85 points
21 comments9 min readLW link1 review

Apollo Neuro Results

Elizabeth30 Jul 2023 18:40 UTC
85 points
17 comments3 min readLW link
(acesounderglass.com)

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

lukemarks8 Jul 2023 11:42 UTC
84 points
28 comments2 min readLW link

A $10k retroac­tive grant for VaccinateCA

Austin Chen27 Jul 2023 18:14 UTC
82 points
0 comments1 min readLW link
(manifund.org)

Un­der­wa­ter Tor­ture Cham­bers: The Hor­ror Of Fish Farming

omnizoid26 Jul 2023 0:27 UTC
81 points
50 comments10 min readLW link1 review

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidad22 Jul 2023 18:09 UTC
80 points
2 comments2 min readLW link

[UPDATE: dead­line ex­tended to July 24!] New wind in ra­tio­nal­ity’s sails: Ap­pli­ca­tions for Epistea Res­i­dency 2023 are now open

11 Jul 2023 11:02 UTC
80 points
7 comments3 min readLW link