MATS Sum­mer 2023 Retrospective

1 Dec 2023 23:29 UTC
77 points
34 comments26 min readLW link

Com­plex sys­tems re­search as a field (and its rele­vance to AI Align­ment)

1 Dec 2023 22:10 UTC
64 points
11 comments19 min readLW link

[Question] Could there be “nat­u­ral im­pact reg­u­lariza­tion” or “im­pact reg­u­lariza­tion by de­fault”?

tailcalled1 Dec 2023 22:01 UTC
24 points
6 comments1 min readLW link

Bench­mark­ing Bowtie2 Threading

jefftk1 Dec 2023 20:20 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Please Bet On My Quan­tified Self De­ci­sion Markets

niplav1 Dec 2023 20:07 UTC
36 points
6 comments6 min readLW link

Speci­fi­ca­tion Gam­ing: How AI Can Turn Your Wishes Against You [RA Video]

Writer1 Dec 2023 19:30 UTC
19 points
0 comments5 min readLW link
(youtu.be)

Carv­ing up prob­lems at their joints

Jakub Smékal1 Dec 2023 18:48 UTC
1 point
0 comments2 min readLW link
(jakubsmekal.com)

Queu­ing the­ory: Benefits of op­er­at­ing at 60% capacity

ampdot1 Dec 2023 18:48 UTC
40 points
4 comments1 min readLW link
(less.works)

Re­searchers and writ­ers can ap­ply for proxy ac­cess to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC
14 points
0 comments1 min readLW link
(airtable.com)

Kol­mogorov Com­plex­ity Lays Bare the Soul

jakej1 Dec 2023 18:29 UTC
5 points
8 comments2 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
197 points
64 comments14 min readLW link1 review

Why Did NEPA Peak in 2016?

Maxwell Tabarrok1 Dec 2023 16:18 UTC
10 points
0 comments3 min readLW link
(maximumprogress.substack.com)

Wor­lds where I wouldn’t worry about AI risk

adekcz1 Dec 2023 16:06 UTC
2 points
0 comments4 min readLW link

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe Carlsmith1 Dec 2023 14:51 UTC
10 points
1 comment7 min readLW link

Real­ity is what­ever you can get away with.

sometimesperson1 Dec 2023 7:50 UTC
−5 points
0 comments1 min readLW link

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDev1 Dec 2023 5:18 UTC
7 points
0 comments29 min readLW link

[Question] Is OpenAI los­ing money on each re­quest?

thenoviceoof1 Dec 2023 3:27 UTC
8 points
8 comments5 min readLW link

How use­ful is mechanis­tic in­ter­pretabil­ity?

1 Dec 2023 2:54 UTC
165 points
54 comments25 min readLW link

FixDT

abramdemski30 Nov 2023 21:57 UTC
59 points
15 comments14 min readLW link1 review

Gen­er­al­iza­tion, from ther­mo­dy­nam­ics to statis­ti­cal physics

Jesse Hoogland30 Nov 2023 21:28 UTC
63 points
9 comments28 min readLW link

What’s next for the field of Agent Foun­da­tions?

30 Nov 2023 17:55 UTC
59 points
23 comments10 min readLW link

A Pro­posed Cure for Alzheimer’s Disease???

MadHatter30 Nov 2023 17:37 UTC
4 points
30 comments2 min readLW link

AI #40: A Vi­sion from Vitalik

Zvi30 Nov 2023 17:30 UTC
53 points
12 comments42 min readLW link
(thezvi.wordpress.com)

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe Carlsmith30 Nov 2023 16:43 UTC
8 points
0 comments6 min readLW link

A For­mula for Violence (and Its An­ti­dote)

MadHatter30 Nov 2023 16:04 UTC
−22 points
6 comments1 min readLW link
(blog.simpleheart.org)

Enkrateia: a safe model-based re­in­force­ment learn­ing algorithm

MadHatter30 Nov 2023 15:51 UTC
−15 points
4 comments2 min readLW link
(github.com)

Nor­ma­tive Ethics vs Utilitarianism

Logan Zoellner30 Nov 2023 15:36 UTC
6 points
0 comments2 min readLW link
(midwitalignment.substack.com)

In­for­ma­tion-The­o­retic Box­ing of Superintelligences

30 Nov 2023 14:31 UTC
30 points
0 comments7 min readLW link

OpenAI: Alt­man Returns

Zvi30 Nov 2023 14:10 UTC
66 points
12 comments11 min readLW link
(thezvi.wordpress.com)

[Linkpost] Re­marks on the Con­ver­gence in Distri­bu­tion of Ran­dom Neu­ral Net­works to Gaus­sian Pro­cesses in the In­finite Width Limit

carboniferous_umbraculum 30 Nov 2023 14:01 UTC
9 points
0 comments1 min readLW link
(drive.google.com)

[Question] Buy Noth­ing Day is a great idea with a ter­rible app— why has no­body built a kil­ler app for crowd­sourced ‘effec­tive com­mu­nism’ yet?

lillybaeum30 Nov 2023 13:47 UTC
8 points
17 comments1 min readLW link

[Question] Com­pre­hen­si­ble In­put is the only way peo­ple learn lan­guages—is it the only way peo­ple *learn*?

lillybaeum30 Nov 2023 13:31 UTC
8 points
2 comments3 min readLW link

Some In­tu­itions for the Ethicophysics

30 Nov 2023 6:47 UTC
2 points
4 comments8 min readLW link

The Align­ment Agenda THEY Don’t Want You to Know About

MadHatter30 Nov 2023 4:29 UTC
−18 points
16 comments1 min readLW link

Cis fragility

[deactivated]30 Nov 2023 4:14 UTC
−51 points
9 comments3 min readLW link

Home­work An­swer: Glicko Rat­ings for War

MadHatter30 Nov 2023 4:08 UTC
−43 points
1 comment77 min readLW link
(gist.github.com)

[Question] Fea­ture Re­quest for LessWrong

MadHatter30 Nov 2023 3:19 UTC
11 points
8 comments1 min readLW link

My Align­ment Re­search Agenda (“the Ethico­physics”)

MadHatter30 Nov 2023 2:57 UTC
−13 points
0 comments1 min readLW link

[Question] Stupid Ques­tion: Why am I get­ting con­sis­tently down­voted?

MadHatter30 Nov 2023 0:21 UTC
28 points
132 comments1 min readLW link

Inos­i­tol Non-Results

Elizabeth29 Nov 2023 21:40 UTC
20 points
2 comments1 min readLW link
(acesounderglass.com)

Los­ing Me­taphors: Zip and Paste

jefftk29 Nov 2023 20:31 UTC
26 points
6 comments1 min readLW link
(www.jefftk.com)

Pre­serv­ing our her­i­tage: Build­ing a move­ment and a knowl­edge ark for cur­rent and fu­ture generations

rnk829 Nov 2023 19:20 UTC
0 points
5 comments12 min readLW link

AGI Align­ment is Absurd

Youssef Mohamed29 Nov 2023 19:11 UTC
−9 points
4 comments3 min readLW link

The ori­gins of the steam en­g­ine: An es­say with in­ter­ac­tive an­i­mated diagrams

jasoncrawford29 Nov 2023 18:30 UTC
30 points
1 comment1 min readLW link
(rootsofprogress.org)

ChatGPT 4 solved all the gotcha prob­lems I posed that tripped ChatGPT 3.5

VipulNaik29 Nov 2023 18:11 UTC
33 points
16 comments14 min readLW link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe Carlsmith29 Nov 2023 16:32 UTC
29 points
1 comment11 min readLW link

Ly­ing Align­ment Chart

Zack_M_Davis29 Nov 2023 16:15 UTC
77 points
17 comments1 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreig29 Nov 2023 13:59 UTC
4 points
0 comments5 min readLW link

[Question] Thoughts on tele­trans­porta­tion with copies?

titotal29 Nov 2023 12:56 UTC
15 points
13 comments1 min readLW link

In­ter­pretabil­ity with Sparse Au­toen­coders (Co­lab ex­er­cises)

CallumMcDougall29 Nov 2023 12:56 UTC
74 points
9 comments4 min readLW link