Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC
81 points
13 comments5 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC
109 points
27 comments11 min readLW link

On Board Vi­sion, Hol­low Words, and the End of the World

Marcello17 Feb 2023 23:18 UTC
52 points
27 comments5 min readLW link

PICT: A Zero-Shot Prompt Tem­plate to Au­to­mate Evaluation

Quentin FEUILLADE--MONTIXI17 Feb 2023 23:16 UTC
17 points
1 comment11 min readLW link

Hunch seeds: Info bio

the gears to ascension17 Feb 2023 21:25 UTC
12 points
0 comments9 min readLW link

Why Do We Believe

Screwtape17 Feb 2023 20:58 UTC
9 points
3 comments3 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC
63 points
28 comments1 min readLW link

EIS VI: Cri­tiques of Mechanis­tic In­ter­pretabil­ity Work in AI Safety

scasper17 Feb 2023 20:48 UTC
49 points
9 comments12 min readLW link

Tinker Bell The­ory and LLMs

Fergus Fettes17 Feb 2023 20:23 UTC
1 point
11 comments1 min readLW link

Recom­men­da­tion: Bug Boun­ties and Re­spon­si­ble Dis­clo­sure for Ad­vanced ML Systems

Vaniver17 Feb 2023 20:11 UTC
125 points
12 comments2 min readLW link

Microsoft and OpenAI, stop tel­ling chat­bots to role­play as AI

hold_my_fish17 Feb 2023 19:55 UTC
49 points
10 comments1 min readLW link

A warm-up for the AI gov­er­nance project

jacek17 Feb 2023 18:06 UTC
10 points
2 comments3 min readLW link

Link Post > Blog Post

party girl17 Feb 2023 17:59 UTC
4 points
6 comments1 min readLW link
(onthespectrumontheguestlist.substack.com)

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC
127 points
11 comments7 min readLW link

[Question] Should we be kind and po­lite to emerg­ing AIs?

David Gross17 Feb 2023 16:58 UTC
9 points
13 comments1 min readLW link

Fol­low-up Post­ing on Cy­borg Psychologist

Hopkins Stanley17 Feb 2023 16:56 UTC
0 points
2 comments1 min readLW link
(www.lesswrong.com)

A “slow take­off” might still look fast

MichaelDickens17 Feb 2023 16:51 UTC
5 points
3 comments1 min readLW link

AI Safety Info Distil­la­tion Fellowship

17 Feb 2023 16:16 UTC
47 points
3 comments3 min readLW link

Noz­ick’s Dilemma: A Cri­tique of Game Theory

Edward P. Könings17 Feb 2023 16:11 UTC
10 points
1 comment13 min readLW link

[Question] Are LLMs suffi­cient for AI take­off?

rpglover6417 Feb 2023 15:46 UTC
8 points
2 comments1 min readLW link

Syd­ney’s Se­cret: A Short Story by Bing Chat

fela17 Feb 2023 13:31 UTC
36 points
1 comment5 min readLW link

Au­tomat­ing Consistency

Hoagy17 Feb 2023 13:24 UTC
10 points
0 comments1 min readLW link

Hu­man de­ci­sion pro­cesses are not well factored

17 Feb 2023 13:11 UTC
33 points
3 comments2 min readLW link

2023 ACX Pre­dic­tions: Buy/​Sell/​Hold

Zvi17 Feb 2023 13:10 UTC
25 points
3 comments20 min readLW link
(thezvi.wordpress.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
115 points
63 comments3 min readLW link

See­ing more whole

Joe Carlsmith17 Feb 2023 5:12 UTC
30 points
1 comment26 min readLW link

Pow­er­ful mesa-op­ti­mi­sa­tion is already here

Roman Leventov17 Feb 2023 4:59 UTC
35 points
1 comment2 min readLW link
(arxiv.org)

Self-Refer­ence Breaks the Orthog­o­nal­ity Thesis

lsusr17 Feb 2023 4:11 UTC
40 points
35 comments2 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC
114 points
9 comments1 min readLW link
(aiimpacts.org)

Bring “Ban faster SIMD semi­con­duc­tors” into the Over­ton window

worried-techno-optimist17 Feb 2023 3:27 UTC
−7 points
1 comment2 min readLW link

Repub­lish­ing an old es­say in light of cur­rent news on Bing’s AI: “Re­gard­ing Blake Le­moine’s claim that LaMDA is ‘sen­tient’, he might be right (sorta), but per­haps not for the rea­sons he thinks”

philosophybear17 Feb 2023 3:27 UTC
3 points
0 comments5 min readLW link
(philosophybear.substack.com)

How should AI sys­tems be­have, and who should de­cide? [OpenAI blog]

ShardPhoenix17 Feb 2023 1:05 UTC
22 points
2 comments1 min readLW link
(openai.com)

The Ethics of ACI

Akira Pyinya16 Feb 2023 23:51 UTC
−8 points
0 comments3 min readLW link

NYT: A Con­ver­sa­tion With Bing’s Chat­bot Left Me Deeply Unsettled

trevor16 Feb 2023 22:57 UTC
53 points
5 comments7 min readLW link
(www.nytimes.com)

[Question] What is a world-model?

Adam Shai16 Feb 2023 22:39 UTC
14 points
2 comments1 min readLW link

Prob­a­bil­ity The­ory: The Logic of Science, Jaynes

David Udell16 Feb 2023 21:57 UTC
29 points
0 comments18 min readLW link

[Question] Is AGI com­mu­nist?

MP16 Feb 2023 21:28 UTC
−10 points
3 comments1 min readLW link

[Question] Is “goal-con­tent in­tegrity” still a prob­lem?

G16 Feb 2023 20:46 UTC
−4 points
1 comment1 min readLW link
(www.reddit.com)

Paper: The Ca­pac­ity for Mo­ral Self-Cor­rec­tion in Large Lan­guage Models (An­thropic)

LawrenceC16 Feb 2023 19:47 UTC
65 points
9 comments1 min readLW link
(arxiv.org)

Non-Uni­tary Quan­tum Logic—SERI MATS Re­search Sprint

Yegreg16 Feb 2023 19:31 UTC
27 points
0 comments7 min readLW link

[Question] Look­ing for a post about vibing and banter

Introspective16 Feb 2023 19:28 UTC
1 point
1 comment1 min readLW link

EIS V: Blind Spots In AI Safety In­ter­pretabil­ity Research

scasper16 Feb 2023 19:09 UTC
54 points
24 comments10 min readLW link

Why should eth­i­cal anti-re­al­ists do ethics?

Joe Carlsmith16 Feb 2023 16:27 UTC
38 points
7 comments27 min readLW link

[Question] How se­ri­ously should we take the hy­poth­e­sis that LW is just wrong on how AI will im­pact the 21st cen­tury?

Noosphere8916 Feb 2023 15:25 UTC
56 points
66 comments1 min readLW link

Covid 2/​16/​23: It All Seems Rather Quaint

Zvi16 Feb 2023 15:10 UTC
25 points
2 comments5 min readLW link
(thezvi.wordpress.com)

Vi­su­al­ise your own prob­a­bil­ity of an AI catas­tro­phe: an in­ter­ac­tive Sankey plot

MNoetel16 Feb 2023 12:03 UTC
1 point
2 comments1 min readLW link

A poem co-writ­ten by ChatGPT

Sherrinford16 Feb 2023 10:17 UTC
13 points
0 comments7 min readLW link

Cam­bridge LW Ra­tion­al­ity Prac­tice: Be­ing Specific

16 Feb 2023 6:37 UTC
2 points
0 comments1 min readLW link

Hash­ing out long-stand­ing dis­agree­ments seems low-value to me

So8res16 Feb 2023 6:20 UTC
133 points
34 comments4 min readLW link

(Naïve) microe­co­nomics of bundling goods

rossry16 Feb 2023 5:39 UTC
24 points
2 comments5 min readLW link