[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTroutAug 11, 2022, 10:22 PM
87 points
43 comments2 min readLW link

En­cul­tured AI Pre-plan­ning, Part 2: Pro­vid­ing a Service

Aug 11, 2022, 8:11 PM
33 points
4 comments3 min readLW link

My sum­mary of the al­ign­ment problem

Peter HroššoAug 11, 2022, 7:42 PM
15 points
3 comments2 min readLW link
(threadreaderapp.com)

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Aug 11, 2022, 5:45 PM
182 points
60 comments13 min readLW link1 review

In­tro­duc­ing Past­cast­ing: A tool for fore­cast­ing practice

Sage FutureAug 11, 2022, 5:38 PM
95 points
10 comments2 min readLW link2 reviews

Pen­du­lums, Policy-Level De­ci­sion­mak­ing, Sav­ing State

CFAR!DuncanAug 11, 2022, 4:47 PM
30 points
3 comments8 min readLW link

Covid 8/​11/​22: The End Is Never The End

ZviAug 11, 2022, 4:20 PM
28 points
11 comments16 min readLW link
(thezvi.wordpress.com)

Sin­ga­pore—Small ca­sual din­ner in Chi­na­town #4

Joe RoccaAug 11, 2022, 12:30 PM
3 points
3 comments1 min readLW link

Thoughts on the good reg­u­la­tor theorem

JonasMossAug 11, 2022, 12:08 PM
12 points
0 comments4 min readLW link

How and why to turn ev­ery­thing into audio

Aug 11, 2022, 8:55 AM
55 points
20 comments5 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
166 points
34 comments10 min readLW link

[Question] Do ad­vance­ments in De­ci­sion The­ory point to­wards moral ab­solutism?

Nathan1123Aug 11, 2022, 12:59 AM
0 points
4 comments4 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_NgoAug 10, 2022, 10:46 PM
107 points
15 comments27 min readLW link1 review

How much al­ign­ment data will we need in the long run?

Jacob_HiltonAug 10, 2022, 9:39 PM
37 points
15 comments4 min readLW link

On Ego, Rein­car­na­tion, Con­scious­ness and The Uni­verse

qmauryAug 10, 2022, 8:21 PM
−3 points
6 comments5 min readLW link

For­mal­iz­ing Alignment

Marv KAug 10, 2022, 6:50 PM
4 points
0 comments2 min readLW link

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

Aug 10, 2022, 6:14 PM
28 points
30 comments11 min readLW link

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aogAug 10, 2022, 6:02 PM
25 points
2 comments1 min readLW link
(arxiv.org)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
209 points
34 comments3 min readLW link1 review

Us­ing GPT-3 to aug­ment hu­man intelligence

Henrik KarlssonAug 10, 2022, 3:54 PM
52 points
8 comments18 min readLW link
(escapingflatland.substack.com)

ACX meetup [Au­gust]

sallatikAug 10, 2022, 9:54 AM
1 point
1 comment1 min readLW link

Dissent Collusion

ScrewtapeAug 10, 2022, 2:43 AM
30 points
7 comments3 min readLW link

The Medium Is The Bandage

party girlAug 10, 2022, 1:45 AM
11 points
0 comments10 min readLW link

[Question] Why is in­creas­ing pub­lic aware­ness of AI safety not a pri­or­ity?

FinalFormal2Aug 10, 2022, 1:28 AM
−5 points
14 comments1 min readLW link

Man­i­fold x CSPI $25k Fore­cast­ing Tournament

David CheeAug 9, 2022, 9:13 PM
5 points
0 comments1 min readLW link
(www.cspicenter.com)

Pro­posal: Con­sider not us­ing dis­tance-di­rec­tion-di­men­sion words in ab­stract discussions

moridinamaelAug 9, 2022, 8:44 PM
46 points
18 comments5 min readLW link

[Question] How would two su­per­in­tel­li­gent AIs in­ter­act, if they are un­al­igned with each other?

Nathan1123Aug 9, 2022, 6:58 PM
4 points
6 comments1 min readLW link

Disagree­ments about Align­ment: Why, and how, we should try to solve them

ojorgensenAug 9, 2022, 6:49 PM
11 points
2 comments16 min readLW link

Progress links and tweets, 2022-08-09

jasoncrawfordAug 9, 2022, 5:35 PM
11 points
3 comments1 min readLW link
(rootsofprogress.org)

[Question] Is it pos­si­ble to find ven­ture cap­i­tal for AI re­search org with strong safety fo­cus?

AnonResearchAug 9, 2022, 4:12 PM
6 points
1 comment1 min readLW link

[Question] Many Gods re­fu­ta­tion and In­stru­men­tal Goals. (Proper one)

aditya malikAug 9, 2022, 11:59 AM
0 points
15 comments1 min readLW link

Con­tent gen­er­a­tion. Where do we draw the line?

Q HomeAug 9, 2022, 10:51 AM
6 points
7 comments2 min readLW link

[Question] What are some al­ter­na­tives to Shap­ley val­ues which drop ad­di­tivity?

eapiAug 9, 2022, 9:16 AM
11 points
6 comments1 min readLW link
(math.stackexchange.com)

Ra­dio Bostrom: Au­dio nar­ra­tions of pa­pers by Nick Bostrom

PeterHAug 9, 2022, 8:56 AM
12 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Team Shard Sta­tus Report

David UdellAug 9, 2022, 5:33 AM
38 points
8 comments3 min readLW link

An­nounc­ing: Mechanism De­sign for AI Safety—Read­ing Group

Rubi J. HudsonAug 9, 2022, 4:21 AM
18 points
3 comments4 min readLW link

[Question] What are some Works that might be use­ful but are difficult, so for­got­ten?

TekhneMakreAug 9, 2022, 2:22 AM
10 points
5 comments1 min readLW link

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

Aug 9, 2022, 1:09 AM
21 points
4 comments2 min readLW link

How (not) to choose a re­search project

Aug 9, 2022, 12:26 AM
79 points
11 comments7 min readLW link

[Question] Are ya win­ning, son?

Nathan1123Aug 9, 2022, 12:06 AM
14 points
13 comments2 min readLW link

Gen­eral al­ign­ment properties

TurnTroutAug 8, 2022, 11:40 PM
51 points
2 comments1 min readLW link

Ex­per­i­ment: Be my math tu­tor?

sudoAug 8, 2022, 10:50 PM
12 points
5 comments1 min readLW link

En­cul­tured AI, Part 1 Ap­pendix: Rele­vant Re­search Examples

Aug 8, 2022, 10:44 PM
11 points
1 comment7 min readLW link

En­cul­tured AI Pre-plan­ning, Part 1: En­abling New Benchmarks

Aug 8, 2022, 10:44 PM
63 points
2 comments6 min readLW link

Broad Bas­ins and Data Compression

Aug 8, 2022, 8:33 PM
33 points
6 comments7 min readLW link

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworthAug 8, 2022, 6:05 PM
143 points
13 comments3 min readLW link

LW Meetup @ DEFCON (Las Ve­gas) − 5-7pm Thu. Aug. 11 at Fo­rum Food Court (Cae­sars)

jchanAug 8, 2022, 2:57 PM
6 points
0 comments1 min readLW link

A suffi­ciently para­noid pa­per­clip maximizer

RomanSAug 8, 2022, 11:17 AM
18 points
10 comments2 min readLW link

[Question] In­stru­men­tal Goals and Many Gods Re­fu­ta­tion

aditya malikAug 8, 2022, 10:46 AM
−10 points
4 comments1 min readLW link

Area un­der the curve, Eat Dirt, Broc­coli Er­rors, Coper­ni­cus & Chaos

CFAR!DuncanAug 8, 2022, 8:17 AM
41 points
0 comments7 min readLW link