In defense of prob­a­bly wrong mechanis­tic models

evhubDec 6, 2022, 11:24 PM
55 points
10 comments2 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM
4 points
2 comments3 min readLW link

ChatGPT and the Hu­man Race

Ben ReillyDec 6, 2022, 9:38 PM
6 points
1 comment3 min readLW link

[Question] How do finite fac­tored sets com­pare with phase space?

Alex_AltairDec 6, 2022, 8:05 PM
15 points
1 comment1 min readLW link

Mesa-Op­ti­miz­ers via Grokking

orthonormalDec 6, 2022, 8:05 PM
36 points
4 comments6 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

Dec 6, 2022, 7:54 PM
170 points
85 comments9 min readLW link

The Parable of the Crimp

PhosphorousDec 6, 2022, 6:41 PM
11 points
3 comments3 min readLW link

The Cat­e­gor­i­cal Im­per­a­tive Obscures

Gordon Seidoh WorleyDec 6, 2022, 5:48 PM
17 points
17 comments2 min readLW link

MIRI’s “Death with Dig­nity” in 60 sec­onds.

Cleo NardoDec 6, 2022, 5:18 PM
58 points
4 comments1 min readLW link

Things roll downhill

awenonianDec 6, 2022, 3:27 PM
19 points
0 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (28th Nov − 4th Dec 22′)

Zoe WilliamsDec 6, 2022, 9:38 AM
10 points
1 comment1 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie SteinerDec 6, 2022, 7:00 AM
31 points
4 comments3 min readLW link

Ver­ifi­ca­tion Is Not Easier Than Gen­er­a­tion In General

johnswentworthDec 6, 2022, 5:20 AM
69 points
27 comments1 min readLW link

Shh, don’t tell the AI it’s likely to be evil

naterushDec 6, 2022, 3:35 AM
19 points
9 comments1 min readLW link

[Question] What are the ma­jor un­der­ly­ing di­vi­sions in AI safety?

Chris_LeongDec 6, 2022, 3:28 AM
5 points
2 comments1 min readLW link

[Link] Why I’m op­ti­mistic about OpenAI’s al­ign­ment approach

janleikeDec 5, 2022, 10:51 PM
98 points
15 comments1 min readLW link
(aligned.substack.com)

The No Free Lunch the­o­rem for dummies

Steven ByrnesDec 5, 2022, 9:46 PM
37 points
16 comments3 min readLW link

ChatGPT and Ide­olog­i­cal Tur­ing Test

ViliamDec 5, 2022, 9:45 PM
42 points
1 comment1 min readLW link

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill BenzonDec 5, 2022, 9:10 PM
5 points
0 comments4 min readLW link

Up­dat­ing my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM
145 points
50 comments2 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

Dec 5, 2022, 8:28 PM
40 points
19 comments10 min readLW link

Col­lege Ad­mis­sions as a Bru­tal One-Shot Game

devanshDec 5, 2022, 8:05 PM
8 points
26 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash JafariDec 5, 2022, 7:21 PM
11 points
2 comments5 min readLW link

Test­ing Ways to By­pass ChatGPT’s Safety Features

Robert_AIZIDec 5, 2022, 6:50 PM
7 points
4 comments5 min readLW link
(aizi.substack.com)

Fore­sight for AGI Safety Strat­egy: Miti­gat­ing Risks and Iden­ti­fy­ing Golden Opportunities

jacquesthibsDec 5, 2022, 4:09 PM
28 points
6 comments8 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM
19 points
5 comments7 min readLW link

[Question] How should I judge the im­pact of giv­ing $5k to a fam­ily of three kids and two men­tally ill par­ents?

BlakeDec 5, 2022, 1:42 PM
10 points
10 comments1 min readLW link

Is the “Valley of Con­fused Ab­strac­tions” real?

jacquesthibsDec 5, 2022, 1:36 PM
20 points
11 comments2 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie SteinerDec 5, 2022, 10:39 AM
37 points
4 comments1 min readLW link

[Question] What are some good Less­wrong-re­lated ac­counts or hash­tags on Mastodon that I should fol­low?

SpectrumDTDec 5, 2022, 9:42 AM
2 points
0 comments1 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization ProcessDec 5, 2022, 9:12 AM
72 points
54 comments1 min readLW link

Monthly Shorts 11/​22

CelerDec 5, 2022, 7:30 AM
8 points
0 comments3 min readLW link
(keller.substack.com)

A ChatGPT story about ChatGPT doom

SurfingOrcaDec 5, 2022, 5:40 AM
6 points
2 comments4 min readLW link

A Ten­ta­tive Timeline of The Near Fu­ture (2022-2025) for Self-Accountability

YitzDec 5, 2022, 5:33 AM
26 points
0 comments4 min readLW link

Nook Nature

Duncan Sabien (Deactivated)Dec 5, 2022, 4:10 AM
52 points
18 comments10 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
78 points
40 comments2 min readLW link

His­tor­i­cal Notes on Char­i­ta­ble Funds

jefftkDec 4, 2022, 11:30 PM
28 points
0 comments3 min readLW link
(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleeseDec 4, 2022, 11:00 PM
8 points
8 comments7 min readLW link

South Bay ACX/​LW Pre-Holi­day Get-Together

ISDec 4, 2022, 10:57 PM
10 points
0 comments1 min readLW link

ChatGPT is set­tling the Chi­nese Room argument

averrosDec 4, 2022, 8:25 PM
−7 points
7 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM
29 points
6 comments1 min readLW link

Open & Wel­come Thread—De­cem­ber 2022

niplavDec 4, 2022, 3:06 PM
8 points
22 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−15 points
4 comments1 min readLW link

ChatGPT seems over­con­fi­dent to me

qbolecDec 4, 2022, 8:03 AM
19 points
3 comments16 min readLW link

Could an AI be Reli­gious?

mk54Dec 4, 2022, 5:00 AM
−12 points
14 comments1 min readLW link

Can GPT-3 Write Con­tra Dances?

jefftkDec 4, 2022, 3:00 AM
6 points
4 comments10 min readLW link
(www.jefftk.com)

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie SteinerDec 4, 2022, 2:48 AM
23 points
12 comments2 min readLW link

Sum­mary of a new study on out-group hate (and how to fix it)

DirectedEvolutionDec 4, 2022, 1:53 AM
60 points
30 comments3 min readLW link
(www.pnas.org)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahojDec 3, 2022, 8:32 PM
1 point
8 comments1 min readLW link

Log­i­cal in­duc­tion for soft­ware engineers

Alex FlintDec 3, 2022, 7:55 PM
161 points
8 comments27 min readLW link1 review