[Linkpost] Jan Leike on three kinds of al­ign­ment taxes

Akash6 Jan 2023 23:57 UTC
27 points
2 comments3 min readLW link
(aligned.substack.com)

The Limit of Lan­guage Models

DragonGod6 Jan 2023 23:53 UTC
44 points
26 comments4 min readLW link

Why didn’t we get the four-hour work­day?

jasoncrawford6 Jan 2023 21:29 UTC
139 points
34 comments6 min readLW link
(rootsofprogress.org)

AI se­cu­rity might be helpful for AI alignment

Igor Ivanov6 Jan 2023 20:16 UTC
35 points
1 comment2 min readLW link

Cat­e­go­riz­ing failures as “outer” or “in­ner” mis­al­ign­ment is of­ten confused

Rohin Shah6 Jan 2023 15:48 UTC
93 points
21 comments8 min readLW link

Defi­ni­tions of “ob­jec­tive” should be Prob­a­ble and Predictive

Rohin Shah6 Jan 2023 15:40 UTC
43 points
27 comments12 min readLW link

200 COP in MI: Tech­niques, Tool­ing and Automation

Neel Nanda6 Jan 2023 15:08 UTC
13 points
0 comments15 min readLW link

Ball Square Sta­tion and Rider­ship Maximization

jefftk6 Jan 2023 13:20 UTC
13 points
0 comments1 min readLW link
(www.jefftk.com)

Child­hood Roundup #1

Zvi6 Jan 2023 13:00 UTC
83 points
27 comments8 min readLW link
(thezvi.wordpress.com)

AI im­prov­ing AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

AI Safety Camp, Vir­tual Edi­tion 2023

Linda Linsefors6 Jan 2023 11:09 UTC
40 points
10 comments3 min readLW link
(aisafety.camp)

Kakistocuriosity

LVSN6 Jan 2023 7:38 UTC
7 points
3 comments1 min readLW link

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni Angelou6 Jan 2023 3:21 UTC
3 points
0 comments1 min readLW link

Me­tac­u­lus Year in Re­view: 2022

ChristianWilliams6 Jan 2023 1:23 UTC
6 points
0 comments1 min readLW link

UDASSA

Jacob Falkovich6 Jan 2023 1:07 UTC
21 points
8 comments10 min readLW link

The In­vol­un­tary Pacifists

Capybasilisk6 Jan 2023 0:28 UTC
11 points
3 comments2 min readLW link

Get an Elec­tric Tooth­brush.

Cervera5 Jan 2023 21:08 UTC
21 points
4 comments1 min readLW link

Dis­cur­sive Com­pe­tence in ChatGPT, Part 1: Talk­ing with Dragons

Bill Benzon5 Jan 2023 21:01 UTC
2 points
0 comments6 min readLW link

Trans­for­ma­tive AI is­sues (not just mis­al­ign­ment): an overview

HoldenKarnofsky5 Jan 2023 20:20 UTC
34 points
6 comments18 min readLW link
(www.cold-takes.com)

How to slow down sci­en­tific progress, ac­cord­ing to Leo Szilard

jasoncrawford5 Jan 2023 18:26 UTC
134 points
18 comments2 min readLW link
(rootsofprogress.org)

Paper: Su­per­po­si­tion, Me­moriza­tion, and Dou­ble Des­cent (An­thropic)

LawrenceC5 Jan 2023 17:54 UTC
53 points
11 comments1 min readLW link
(transformer-circuits.pub)

Col­lapse Might Not Be Desirable

Dzoldzaya5 Jan 2023 17:29 UTC
−2 points
9 comments2 min readLW link

Sin­ga­pore—Small ca­sual din­ner in Chi­na­town #6

Joe Rocca5 Jan 2023 17:00 UTC
2 points
1 comment1 min readLW link

[Question] Image gen­er­a­tion and al­ign­ment

rpglover645 Jan 2023 16:05 UTC
3 points
3 comments1 min readLW link

[Question] Ma­chine Learn­ing vs Differ­en­tial Privacy

Ilio5 Jan 2023 15:14 UTC
10 points
10 comments1 min readLW link

Covid 1/​5/​23: Var­i­ous XBB Takes

Zvi5 Jan 2023 14:20 UTC
21 points
18 comments15 min readLW link
(thezvi.wordpress.com)

Run­ning by Default

jefftk5 Jan 2023 13:50 UTC
111 points
40 comments1 min readLW link
(www.jefftk.com)

PSA: re­ward is part of the habit loop too

Alok Singh5 Jan 2023 11:00 UTC
22 points
2 comments1 min readLW link
(alok.github.io)

In­fo­haz­ards vs Fork Hazards

jimrandomh5 Jan 2023 9:45 UTC
68 points
16 comments1 min readLW link

Monthly Shorts 12/​22

Celer5 Jan 2023 7:20 UTC
5 points
2 comments1 min readLW link
(keller.substack.com)

The 2021 Re­view Phase

Raemon5 Jan 2023 7:12 UTC
34 points
7 comments3 min readLW link

Illu­sion of truth effect and Am­bi­guity effect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt5 Jan 2023 4:05 UTC
−13 points
2 comments1 min readLW link

When you plan ac­cord­ing to your AI timelines, should you put more weight on the me­dian fu­ture, or the me­dian fu­ture | even­tual AI al­ign­ment suc­cess? ⚖️

Jeffrey Ladish5 Jan 2023 1:21 UTC
25 points
10 comments2 min readLW link

Why I’m join­ing Anthropic

evhub5 Jan 2023 1:12 UTC
121 points
4 comments1 min readLW link

Con­tra Com­mon Knowledge

abramdemski4 Jan 2023 22:50 UTC
52 points
31 comments16 min readLW link

Ad­di­tional space com­plex­ity isn’t always a use­ful met­ric

Brendan Long4 Jan 2023 21:53 UTC
4 points
3 comments3 min readLW link
(www.brendanlong.com)

List of links for get­ting into AI safety

zef4 Jan 2023 19:45 UTC
6 points
0 comments1 min readLW link

Open­ing Face­book Links Externally

jefftk4 Jan 2023 19:00 UTC
12 points
3 comments1 min readLW link
(www.jefftk.com)

Con­ver­sa­tional canyons

Henrik Karlsson4 Jan 2023 18:55 UTC
59 points
4 comments7 min readLW link
(escapingflatland.substack.com)

Progress links and tweets, 2023-01-04

jasoncrawford4 Jan 2023 18:23 UTC
15 points
0 comments1 min readLW link
(rootsofprogress.org)

200 COP in MI: Analysing Train­ing Dynamics

Neel Nanda4 Jan 2023 16:08 UTC
16 points
0 comments14 min readLW link

What’s up with ChatGPT and the Tur­ing Test?

4 Jan 2023 15:37 UTC
13 points
19 comments3 min readLW link

2022 was the year AGI ar­rived (Just don’t call it that)

Logan Zoellner4 Jan 2023 15:19 UTC
102 points
60 comments3 min readLW link

From Si­mon’s ant to ma­chine learn­ing, a parable

Bill Benzon4 Jan 2023 14:37 UTC
6 points
5 comments2 min readLW link

Ba­sic Facts about Lan­guage Model Internals

4 Jan 2023 13:01 UTC
130 points
19 comments9 min readLW link

Ri­tual as the only tool for over­writ­ing val­ues and goals

mrcbarbier4 Jan 2023 11:11 UTC
40 points
24 comments32 min readLW link

Nor­malcy bias and Base rate ne­glect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt4 Jan 2023 3:16 UTC
−16 points
0 comments1 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC
19 points
0 comments8 min readLW link

What makes a prob­a­bil­ity ques­tion “well-defined”? (Part II: Ber­trand’s Para­dox)

Noah Topper3 Jan 2023 22:39 UTC
7 points
3 comments9 min readLW link
(naivebayes.substack.com)

“AI” is an indexical

TW1233 Jan 2023 22:00 UTC
10 points
0 comments6 min readLW link
(aiwatchtower.substack.com)