A few thoughts on my self-study for al­ign­ment research

Thomas KehrenbergDec 30, 2022, 10:05 PM
6 points
0 comments2 min readLW link

Christ­mas Microscopy

jefftkDec 30, 2022, 9:10 PM
27 points
0 comments1 min readLW link
(www.jefftk.com)

What “up­side” of AI?

False NameDec 30, 2022, 8:58 PM
0 points
5 comments4 min readLW link

Ev­i­dence on re­cur­sive self-im­prove­ment from cur­rent ML

berenDec 30, 2022, 8:53 PM
31 points
12 comments6 min readLW link

[Question] Is ChatGPT TAI?

Amal Dec 30, 2022, 7:44 PM
14 points
5 comments1 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

Orpheus16Dec 30, 2022, 7:33 PM
55 points
3 comments20 min readLW link

Beyond Re­wards and Values: A Non-du­al­is­tic Ap­proach to Univer­sal Intelligence

Akira PyinyaDec 30, 2022, 7:05 PM
10 points
4 comments14 min readLW link

10 Years of LessWrong

SebastianG Dec 30, 2022, 5:15 PM
73 points
2 comments4 min readLW link

Chat­bots as a Publi­ca­tion Format

derek shillerDec 30, 2022, 2:11 PM
6 points
6 comments4 min readLW link

Hu­man sex­u­al­ity as an in­ter­est­ing case study of alignment

berenDec 30, 2022, 1:37 PM
39 points
26 comments3 min readLW link

The Twit­ter Files: Covid Edition

ZviDec 30, 2022, 1:30 PM
32 points
2 comments10 min readLW link
(thezvi.wordpress.com)

Wor­ldly Po­si­tions archive, briefly with pri­vate drafts

KatjaGraceDec 30, 2022, 12:20 PM
11 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Models Don’t “Get Re­ward”

Sam RingerDec 30, 2022, 10:37 AM
316 points
62 comments5 min readLW link1 review

The hy­per­finite timeline

Alok SinghDec 30, 2022, 9:30 AM
3 points
6 comments1 min readLW link
(alok.github.io)

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

Dec 30, 2022, 9:02 AM
−15 points
9 comments1 min readLW link

Things I carry al­most ev­ery day, as of late De­cem­ber 2022

DanielFilanDec 30, 2022, 7:40 AM
38 points
9 comments5 min readLW link
(danielfilan.com)

More ways to spot abysses

KatjaGraceDec 30, 2022, 6:30 AM
21 points
1 comment1 min readLW link
(worldspiritsockpuppet.com)

Lan­guage mod­els are nearly AGIs but we don’t no­tice it be­cause we keep shift­ing the bar

philosophybearDec 30, 2022, 5:15 AM
105 points
13 comments7 min readLW link

Progress links and tweets, 2022-12-29

jasoncrawfordDec 30, 2022, 4:54 AM
12 points
0 comments1 min readLW link
(rootsofprogress.org)

An­nounc­ing The Filan Cabinet

DanielFilanDec 30, 2022, 3:10 AM
21 points
2 comments1 min readLW link
(danielfilan.com)

[Question] Effec­tive Evil Causes?

Ulisse MiniDec 30, 2022, 2:56 AM
−12 points
2 comments1 min readLW link

But is it re­ally in Rome? An in­ves­ti­ga­tion of the ROME model edit­ing technique

jacquesthibsDec 30, 2022, 2:40 AM
104 points
2 comments18 min readLW link

A Year of AI In­creas­ing AI Progress

TW123Dec 30, 2022, 2:09 AM
148 points
3 comments2 min readLW link

Why not spend more time look­ing at hu­man al­ign­ment?

ajc586Dec 30, 2022, 12:22 AM
11 points
3 comments1 min readLW link

Why and how to write things on the Internet

benkuhnDec 29, 2022, 10:40 PM
20 points
2 comments15 min readLW link
(www.benkuhn.net)

Friendly and Un­friendly AGI are Indistinguishable

ErgoEchoDec 29, 2022, 10:13 PM
−4 points
4 comments4 min readLW link
(neologos.co)

200 COP in MI: Look­ing for Cir­cuits in the Wild

Neel NandaDec 29, 2022, 8:59 PM
16 points
5 comments13 min readLW link

Thoughts on the im­pli­ca­tions of GPT-3, two years ago and NOW [here be drag­ons, we’re swim­ming, fly­ing and talk­ing with them]

Bill BenzonDec 29, 2022, 8:05 PM
0 points
0 comments5 min readLW link

Covid 12/​29/​22: Next Up is XBB.1.5

ZviDec 29, 2022, 6:20 PM
33 points
4 comments10 min readLW link
(thezvi.wordpress.com)

En­trepreneur­ship ETG Might Be Bet­ter Than 80k Thought

XodarapDec 29, 2022, 5:51 PM
33 points
0 commentsLW link

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane RuthenisDec 29, 2022, 5:49 PM
26 points
6 comments7 min readLW link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram RachumDec 29, 2022, 4:08 PM
15 points
0 comments1 min readLW link

My scorched-earth policy on New Year’s resolutions

PatrickDFarleyDec 29, 2022, 2:45 PM
29 points
2 comments4 min readLW link

Don’t feed the void. She is fat enough!

Johannes C. MayerDec 29, 2022, 2:18 PM
11 points
0 comments1 min readLW link

[Question] Is there any unified re­source on Eliezer’s fa­tigue?

Johannes C. MayerDec 29, 2022, 2:04 PM
9 points
2 comments1 min readLW link

Log­i­cal Prob­a­bil­ity of Gold­bach’s Con­jec­ture: Prov­able Rule or Coin­ci­dence?

avturchinDec 29, 2022, 1:37 PM
5 points
15 comments8 min readLW link

Where do you get your ca­pa­bil­ities from?

tailcalledDec 29, 2022, 11:39 AM
37 points
27 comments6 min readLW link

The com­mer­cial in­cen­tive to in­ten­tion­ally train AI to de­ceive us

Derek M. JonesDec 29, 2022, 11:30 AM
5 points
1 comment4 min readLW link
(shape-of-code.com)

In­finite neck­lace: the line as a circle

Alok SinghDec 29, 2022, 10:41 AM
5 points
2 comments1 min readLW link

Pri­vacy Tradeoffs

jefftkDec 29, 2022, 3:40 AM
13 points
1 comment2 min readLW link
(www.jefftk.com)

Against John Searle, Gary Mar­cus, the Chi­nese Room thought ex­per­i­ment and its world

philosophybearDec 29, 2022, 3:26 AM
21 points
43 comments8 min readLW link

Large Lan­guage Models Suggest a Path to Ems

anithiteDec 29, 2022, 2:20 AM
17 points
2 comments5 min readLW link

[Question] Book recom­men­da­tions for the his­tory of ML?

Eleni AngelouDec 28, 2022, 11:50 PM
2 points
2 comments1 min readLW link

Rock-Paper-Scis­sors Can Be Weird

winwonceDec 28, 2022, 11:12 PM
14 points
3 comments1 min readLW link

200 COP in MI: The Case for Analysing Toy Lan­guage Models

Neel NandaDec 28, 2022, 9:07 PM
40 points
3 comments7 min readLW link

200 Con­crete Open Prob­lems in Mechanis­tic In­ter­pretabil­ity: Introduction

Neel NandaDec 28, 2022, 9:06 PM
106 points
0 comments10 min readLW link

Effec­tive ways to find love?

anonymoususerDec 28, 2022, 8:46 PM
9 points
6 comments1 min readLW link

Clas­si­cal logic based on propo­si­tions-as-sub­s­in­gle­ton-types

Thomas KehrenbergDec 28, 2022, 8:16 PM
5 points
0 comments16 min readLW link

In Defense of Wrap­per-Minds

Thane RuthenisDec 28, 2022, 6:28 PM
24 points
38 comments3 min readLW link

[Question] What is the best way to ap­proach Ex­pected Value calcu­la­tions when pay­offs are highly skewed?

jmhDec 28, 2022, 2:42 PM
8 points
16 comments1 min readLW link