[Question] Is Paul Chris­ti­ano still as op­ti­mistic about Ap­proval-Directed Agents as he was in 2018?

Chris_LeongDec 14, 2022, 11:28 PM
8 points
0 comments1 min readLW link

«Boundaries», Part 3b: Align­ment prob­lems in terms of bound­aries

Andrew_CritchDec 14, 2022, 10:34 PM
72 points
7 comments13 min readLW link

Align­ing al­ign­ment with performance

Marv KDec 14, 2022, 10:19 PM
2 points
0 comments2 min readLW link

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False NameDec 14, 2022, 10:01 PM
−2 points
5 comments22 min readLW link

Kol­mogorov Com­plex­ity and Si­mu­la­tion Hypothesis

False NameDec 14, 2022, 10:01 PM
−3 points
0 comments7 min readLW link

[Question] Stan­ley Meyer’s wa­ter fuel cell

mikbpDec 14, 2022, 9:19 PM
2 points
6 comments1 min readLW link

[Question] Is the AI timeline too short to have chil­dren?

YorethDec 14, 2022, 6:32 PM
38 points
20 comments1 min readLW link

Pre­dict­ing GPU performance

Dec 14, 2022, 4:27 PM
60 points
26 comments1 min readLW link
(epochai.org)

[In­com­plete] What is Com­pu­ta­tion Any­way?

DragonGodDec 14, 2022, 4:17 PM
16 points
1 comment13 min readLW link
(arxiv.org)

Chair Hang­ing Peg

jefftkDec 14, 2022, 3:30 PM
11 points
0 comments1 min readLW link
(www.jefftk.com)

My AGI safety re­search—2022 re­view, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM
51 points
10 comments7 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

Dec 14, 2022, 2:33 PM
29 points
5 comments11 min readLW link

Key Mostly Out­ward-Fac­ing Facts From the Story of VaccinateCA

ZviDec 14, 2022, 1:30 PM
61 points
2 comments23 min readLW link
(thezvi.wordpress.com)

Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Supervision

XodarapDec 14, 2022, 12:32 PM
45 points
1 comment1 min readLW link
(arxiv.org)

[Question] COVID China Per­sonal Ad­vice (No mRNA vax, pos­si­ble hos­pi­tal over­load, bug-chas­ing edi­tion)

Lao MeinDec 14, 2022, 10:31 AM
20 points
11 comments1 min readLW link

Beyond a bet­ter world

DavidmanheimDec 14, 2022, 10:18 AM
14 points
7 comments4 min readLW link
(progressforum.org)

Proof as mere strong evidence

adamShimiDec 14, 2022, 8:56 AM
28 points
16 comments2 min readLW link
(epistemologicalvigilance.substack.com)

Try­ing to dis­am­biguate differ­ent ques­tions about whether RLHF is “good”

BuckDec 14, 2022, 4:03 AM
106 points
47 comments7 min readLW link1 review

[Question] How can one liter­ally buy time (from x-risk) with money?

Alex_AltairDec 13, 2022, 7:24 PM
24 points
3 comments1 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

Ap­pli­ca­tions open for AGI Safety Fun­da­men­tals: Align­ment Course

Richard_NgoDec 13, 2022, 6:31 PM
49 points
0 comments2 min readLW link

What Does It Mean to Align AI With Hu­man Values?

AlgonDec 13, 2022, 4:56 PM
8 points
3 comments1 min readLW link
(www.quantamagazine.org)

It Takes Two Parac­eta­mol?

Eli_Dec 13, 2022, 4:29 PM
33 points
10 comments2 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
150 points
23 comments22 min readLW link2 reviews

[Question] Is the ChatGPT-simu­lated Linux vir­tual ma­chine real?

KenoubiDec 13, 2022, 3:41 PM
18 points
7 comments1 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasperDec 13, 2022, 2:47 PM
37 points
17 comments3 min readLW link

What is the cor­re­la­tion be­tween up­vot­ing and benefit to read­ers of LW?

banevDec 13, 2022, 2:26 PM
7 points
15 comments1 min readLW link

Limits of Superintelligence

Aleksei PetrenkoDec 13, 2022, 12:19 PM
1 point
5 comments1 min readLW link

Bay 2022 Solstice

RaemonDec 13, 2022, 8:58 AM
17 points
0 comments1 min readLW link

Last day to nom­i­nate things for the Re­view. Also, 2019 books still ex­ist.

RaemonDec 13, 2022, 8:53 AM
15 points
0 comments1 min readLW link

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM
255 points
21 comments2 min readLW link
(ai-alignment.com)

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie SteinerDec 13, 2022, 7:04 AM
37 points
3 comments2 min readLW link

[Question] Are law­suits against AGI com­pa­nies ex­tend­ing AGI timelines?

SlowingAGIDec 13, 2022, 6:00 AM
1 point
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (5th Dec − 11th Dec 22′)

Zoe WilliamsDec 13, 2022, 2:53 AM
7 points
0 commentsLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM
10 points
5 comments45 min readLW link

Re­vis­it­ing al­gorith­mic progress

Dec 13, 2022, 1:39 AM
95 points
15 comments2 min readLW link1 review
(arxiv.org)

An ex­plo­ra­tion of GPT-2′s em­bed­ding weights

Adam ScherlisDec 13, 2022, 12:46 AM
44 points
4 comments10 min readLW link

12 ca­reer-re­lated ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Orpheus16Dec 12, 2022, 10:36 PM
20 points
0 comments2 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

Dec 12, 2022, 10:09 PM
20 points
2 comments3 min readLW link

Let’s go meta: Gram­mat­i­cal knowl­edge and self-refer­en­tial sen­tences [ChatGPT]

Bill BenzonDec 12, 2022, 9:50 PM
5 points
0 comments9 min readLW link

D&D.Sci De­cem­ber 2022 Eval­u­a­tion and Ruleset

abstractapplicDec 12, 2022, 9:21 PM
17 points
8 comments2 min readLW link

Log-odds are bet­ter than Probabilities

Robert_AIZIDec 12, 2022, 8:10 PM
22 points
4 comments4 min readLW link
(aizi.substack.com)

Ben­galuru LW/​ACX So­cial Meetup—De­cem­ber 2022

faizDec 12, 2022, 7:30 PM
4 points
0 comments1 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

Dec 12, 2022, 6:15 PM
39 points
6 comments1 min readLW link

Con­fus­ing the goal and the path

adamShimiDec 12, 2022, 4:42 PM
44 points
7 comments1 min readLW link
(epistemologicalvigilance.substack.com)

Mean­ingful things are those the uni­verse pos­sesses a se­man­tics for

Abhimanyu Pallavi SudhirDec 12, 2022, 4:03 PM
16 points
14 comments14 min readLW link

Trade­offs in com­plex­ity, ab­strac­tion, and generality

Dec 12, 2022, 3:55 PM
32 points
0 comments2 min readLW link

Green Line Ex­ten­sion Open­ing Dates

jefftkDec 12, 2022, 2:40 PM
12 points
0 comments1 min readLW link
(www.jefftk.com)

Join the AI Test­ing Hackathon this Friday

Esben KranDec 12, 2022, 2:24 PM
10 points
0 commentsLW link

Side-chan­nels: in­put ver­sus output

davidadDec 12, 2022, 12:32 PM
44 points
16 comments2 min readLW link