Re­search Re­port: In­cor­rect­ness Cas­cades (Cor­rected)

Robert_AIZI9 May 2023 21:54 UTC
9 points
0 comments9 min readLW link
(aizi.substack.com)

Stop­ping dan­ger­ous AI: Ideal US behavior

Zach Stein-Perlman9 May 2023 21:00 UTC
17 points
0 comments3 min readLW link

Stop­ping dan­ger­ous AI: Ideal lab behavior

Zach Stein-Perlman9 May 2023 21:00 UTC
8 points
0 comments2 min readLW link

Progress links and tweets, 2023-05-09

jasoncrawford9 May 2023 20:22 UTC
14 points
0 comments2 min readLW link
(rootsofprogress.org)

[Question] Have you heard about MIT’s “liquid neu­ral net­works”? What do you think about them?

Ppau9 May 2023 20:16 UTC
35 points
14 comments1 min readLW link

Re­spect for Boundaries as non-ar­bir­trary co­or­di­na­tion norms

Jonas Hallgren9 May 2023 19:42 UTC
9 points
3 comments7 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

9 May 2023 19:41 UTC
119 points
1 comment10 min readLW link

Fore­cast­ing as a tool for teach­ing the gen­eral pub­lic to make bet­ter judge­ments?

Dominik Hajduk | České priority9 May 2023 17:35 UTC
3 points
0 comments3 min readLW link

Lan­guage mod­els can ex­plain neu­rons in lan­guage models

nz9 May 2023 17:29 UTC
23 points
0 comments1 min readLW link
(openai.com)

Asi­mov on build­ing robots with­out the First Law

rossry9 May 2023 16:44 UTC
4 points
1 comment2 min readLW link

Mak­ing Up Baby Signs

jefftk9 May 2023 16:40 UTC
44 points
6 comments2 min readLW link
(www.jefftk.com)

Ex­cit­ing New In­ter­pretabil­ity Paper!

research_prime_space9 May 2023 16:39 UTC
12 points
1 comment1 min readLW link

Re­sult Of The Bounty/​Con­test To Ex­plain In­fra-Bayes In The Lan­guage Of Game Theory

johnswentworth9 May 2023 16:35 UTC
79 points
0 comments1 min readLW link

The Bleak Har­mony of Diets and Sur­vival: A Glimpse into Na­ture’s Un­for­giv­ing Balance

bardstale9 May 2023 16:08 UTC
−16 points
0 comments1 min readLW link

En­tropic Abyss

bardstale9 May 2023 15:59 UTC
−12 points
0 comments2 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

9 May 2023 15:26 UTC
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

A Search for More ChatGPT /​ GPT-3.5 /​ GPT-4 “Un­speak­able” Glitch Tokens

Martin Fell9 May 2023 14:36 UTC
26 points
9 comments6 min readLW link

How to In­ter­pret Pre­dic­tion Mar­ket Prices as Probabilities

SimonM9 May 2023 14:12 UTC
14 points
1 comment4 min readLW link

Stampy’s AI Safety Info—New Distil­la­tions #2 [April 2023]

markov9 May 2023 13:31 UTC
25 points
1 comment1 min readLW link
(aisafety.info)

Quote quiz answer

jasoncrawford9 May 2023 13:27 UTC
19 points
0 comments4 min readLW link
(rootsofprogress.org)

[Question] Does re­versible com­pu­ta­tion let you com­pute the com­plex­ity class PSPACE as effi­ciently as nor­mal com­put­ers com­pute the com­plex­ity class P?

Noosphere899 May 2023 13:18 UTC
6 points
14 comments1 min readLW link

EconTalk pod­cast: “Eliezer Yud­kowsky on the Dangers of AI”

TekhneMakre9 May 2023 11:14 UTC
15 points
1 comment1 min readLW link
(www.econtalk.org)

Most peo­ple should prob­a­bly feel safe most of the time

Kaj_Sotala9 May 2023 9:35 UTC
95 points
28 comments10 min readLW link

Sum­maries of top fo­rum posts (1st to 7th May 2023)

Zoe Williams9 May 2023 9:30 UTC
21 points
0 comments1 min readLW link

Fo­cus­ing on longevity re­search as a way to avoid the AI apocalypse

Random Trader9 May 2023 4:47 UTC
14 points
2 comments2 min readLW link

When is Good­hart catas­trophic?

9 May 2023 3:59 UTC
179 points
28 comments8 min readLW link

Chilean AIS Hackathon Retrospective

agucova9 May 2023 1:34 UTC
9 points
0 comments1 min readLW link

An­nounc­ing “Key Phenom­ena in AI Risk” (fa­cil­i­tated read­ing group)

9 May 2023 0:31 UTC
65 points
4 comments2 min readLW link

Yoshua Ben­gio ar­gues for tool-AI and to ban “ex­ec­u­tive-AI”

habryka9 May 2023 0:13 UTC
53 points
15 comments7 min readLW link
(yoshuabengio.org)

South Bay ACX/​LW Meetup

IS8 May 2023 23:55 UTC
2 points
0 comments1 min readLW link

H-JEPA might be tech­ni­cally al­ignable in a mod­ified form

Roman Leventov8 May 2023 23:04 UTC
12 points
2 comments7 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven04618 May 2023 22:30 UTC
33 points
44 comments2 min readLW link

Pre­dictable up­dat­ing about AI risk

Joe Carlsmith8 May 2023 21:53 UTC
289 points
25 comments36 min readLW link1 review

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
2 comments7 min readLW link
(yoshuabengio.org)

Are healthy choices effec­tive for im­prov­ing live ex­pec­tancy any­more?

Christopher King8 May 2023 21:25 UTC
6 points
4 comments1 min readLW link

LeCun’s “A Path Towards Au­tonomous Ma­chine In­tel­li­gence” has an un­solved tech­ni­cal al­ign­ment problem

Steven Byrnes8 May 2023 19:35 UTC
137 points
37 comments15 min readLW link

Product En­dorse­ment: Apollo Neuro

Elizabeth8 May 2023 19:00 UTC
46 points
28 comments5 min readLW link
(acesounderglass.com)

Acausal trade nat­u­rally re­sults in the Nash bar­gain­ing solution

Christopher King8 May 2023 18:13 UTC
3 points
0 comments4 min readLW link

In­fer­ence Speed is Not Unbounded

OneManyNone8 May 2023 16:24 UTC
35 points
32 comments16 min readLW link

[Cross­post] Un­veiling the Amer­i­can Public Opinion on AI Mo­ra­to­rium and Govern­ment In­ter­ven­tion: The Im­pact of Me­dia Exposure

otto.barten8 May 2023 14:09 UTC
7 points
0 comments6 min readLW link
(forum.effectivealtruism.org)

Thriv­ing in the Weird Times: Prepar­ing for the 100X Economy

8 May 2023 13:44 UTC
23 points
16 comments2 min readLW link

Hous­ing and Tran­sit Roundup #4

Zvi8 May 2023 13:30 UTC
25 points
0 comments11 min readLW link
(thezvi.wordpress.com)

Dance Profit Sharing

jefftk8 May 2023 13:10 UTC
11 points
3 comments2 min readLW link
(www.jefftk.com)

How “AGI” could end up be­ing many differ­ent spe­cial­ized AI’s stitched together

titotal8 May 2023 12:32 UTC
9 points
2 comments1 min readLW link

What does it take to ban a thing?

qbolec8 May 2023 11:00 UTC
66 points
18 comments5 min readLW link

Solomonoff’s solip­sism

Mergimio H. Doefevmil8 May 2023 6:55 UTC
−13 points
9 comments1 min readLW link

A tech­ni­cal note on bil­in­ear lay­ers for interpretability

Lee Sharkey8 May 2023 6:06 UTC
58 points
0 comments1 min readLW link
(arxiv.org)

[Question] Is EDT cor­rect? Does “EDT” == “log­i­cal EDT” == “log­i­cal CDT”?

Vivek Hebbar8 May 2023 2:07 UTC
13 points
2 comments1 min readLW link

LLM cog­ni­tion is prob­a­bly not hu­man-like

Max H8 May 2023 1:22 UTC
26 points
15 comments7 min readLW link

[Question] If al­ign­ment prob­lem was un­solv­able, would that avoid doom?

Kinrany7 May 2023 22:13 UTC
3 points
3 comments1 min readLW link