RSS

AI

Core TagLast edit: Jan 23, 2025, 12:13 PM by Dakara

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Goodhart’s Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb’s Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Debate (AI safety technique)
Eliciting Latent Knowledge
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Interpretability (ML & AI)
Value Learning

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MATS Program
MIRI
OpenAI
Ought
Redwood Research

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Risk Skepticism
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

Other

AI Alignment Intro Materials
AI Capabilities
Compute
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhubMay 29, 2020, 8:38 PM
194 points
36 comments38 min readLW link2 reviews

There’s No Fire Alarm for Ar­tifi­cial Gen­eral Intelligence

Eliezer YudkowskyOct 13, 2017, 9:38 PM
124 points
71 comments25 min readLW link

Su­per­in­tel­li­gence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM
92 points
16 comments27 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

May 31, 2019, 11:44 PM
166 points
42 comments12 min readLW link3 reviews

Embed­ded Agents

Oct 29, 2018, 7:53 PM
198 points
41 comments1 min readLW link2 reviews

What failure looks like

paulfchristianoMar 17, 2019, 8:18 PM
319 points
49 comments8 min readLW link2 reviews

The Rocket Align­ment Problem

Eliezer YudkowskyOct 4, 2018, 12:38 AM
198 points
42 comments15 min readLW link2 reviews

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer YudkowskyMay 19, 2018, 6:18 PM
115 points
54 comments23 min readLW link1 review

Embed­ded Agency (full-text ver­sion)

Nov 15, 2018, 7:49 PM
143 points
15 comments54 min readLW link

A space of pro­pos­als for build­ing safe ad­vanced AI

Richard_NgoJul 10, 2020, 4:58 PM
55 points
4 comments4 min readLW link

Biol­ogy-In­spired AGI Timelines: The Trick That Never Works

Eliezer YudkowskyDec 1, 2021, 10:35 PM
181 points
143 comments65 min readLW link

PreDCA: vanessa kosoy’s al­ign­ment protocol

Tamsin LeakeAug 20, 2022, 10:03 AM
46 points
8 comments7 min readLW link
(carado.moe)

larger lan­guage mod­els may dis­ap­point you [or, an eter­nally un­finished draft]

nostalgebraistNov 26, 2021, 11:08 PM
237 points
29 comments31 min readLW link1 review

Deep­mind’s Go­pher—more pow­er­ful than GPT-3

hathDec 8, 2021, 5:06 PM
86 points
27 comments1 min readLW link
(deepmind.com)

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

Aug 9, 2022, 1:09 AM
21 points
4 comments2 min readLW link

Good­hart Taxonomy

Scott GarrabrantDec 30, 2017, 4:38 PM
180 points
33 comments10 min readLW link

AI Align­ment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM
125 points
6 comments35 min readLW link

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew_CritchNov 19, 2020, 3:18 AM
199 points
40 comments50 min readLW link2 reviews

Mo­ravec’s Para­dox Comes From The Availa­bil­ity Heuristic

james.lucassenOct 20, 2021, 6:23 AM
32 points
2 comments2 min readLW link
(jlucassen.com)

In­fer­ence cost limits the im­pact of ever larger models

SoerenMindOct 23, 2021, 10:51 AM
36 points
28 comments2 min readLW link

[Linkpost] Chi­nese gov­ern­ment’s guidelines on AI

RomanSDec 10, 2021, 9:10 PM
61 points
14 comments1 min readLW link

That Alien Message

Eliezer YudkowskyMay 22, 2008, 5:55 AM
304 points
173 comments10 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimiMar 8, 2021, 10:05 PM
53 points
7 comments9 min readLW link

Effi­cien­tZero: hu­man ALE sam­ple-effi­ciency w/​MuZero+self-supervised

gwernNov 2, 2021, 2:32 AM
134 points
52 comments1 min readLW link
(arxiv.org)

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

Nov 11, 2021, 3:01 AM
325 points
257 comments34 min readLW link

Shul­man and Yud­kowsky on AI progress

Dec 3, 2021, 8:05 PM
90 points
16 comments20 min readLW link

Fu­ture ML Sys­tems Will Be Qual­i­ta­tively Different

jsteinhardtJan 11, 2022, 7:50 PM
113 points
10 comments5 min readLW link
(bounded-regret.ghost.io)

[Linkpost] Tro­janNet: Embed­ding Hid­den Tro­jan Horse Models in Neu­ral Networks

Gunnar_ZarnckeFeb 11, 2022, 1:17 AM
13 points
1 comment1 min readLW link

Briefly think­ing through some analogs of debate

Eli TyreSep 11, 2022, 12:02 PM
20 points
3 comments4 min readLW link

Ro­bust­ness to Scale

Scott GarrabrantFeb 21, 2018, 10:55 PM
109 points
22 comments2 min readLW link1 review

Chris Olah’s views on AGI safety

evhubNov 1, 2019, 8:13 PM
197 points
38 comments12 min readLW link2 reviews

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

Rohin ShahApr 22, 2020, 5:20 PM
17 points
4 comments10 min readLW link
(mailchi.mp)

Matt Botv­inick on the spon­ta­neous emer­gence of learn­ing algorithms

Adam SchollAug 12, 2020, 7:47 AM
147 points
87 comments5 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

Jun 6, 2022, 9:59 PM
126 points
21 comments7 min readLW link

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior

Rohin ShahDec 3, 2018, 3:26 AM
101 points
69 comments7 min readLW link3 reviews

Align­ment By Default

johnswentworthAug 12, 2020, 6:54 PM
153 points
92 comments11 min readLW link2 reviews

Book re­view: “A Thou­sand Brains” by Jeff Hawkins

Steven ByrnesMar 4, 2021, 5:10 AM
110 points
18 comments19 min readLW link

Model­ling Trans­for­ma­tive AI Risks (MTAIR) Pro­ject: Introduction

Aug 16, 2021, 7:12 AM
89 points
0 comments9 min readLW link

In­fra-Bayesian phys­i­cal­ism: a for­mal the­ory of nat­u­ral­ized induction

Vanessa KosoyNov 30, 2021, 10:25 PM
98 points
20 comments42 min readLW link1 review

What an ac­tu­ally pes­simistic con­tain­ment strat­egy looks like

lcApr 5, 2022, 12:19 AM
554 points
136 comments6 min readLW link

Why I think strong gen­eral AI is com­ing soon

porbySep 28, 2022, 5:40 AM
269 points
126 comments34 min readLW link

AlphaGo Zero and the Foom Debate

Eliezer YudkowskyOct 21, 2017, 2:18 AM
89 points
17 comments3 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

VikaJul 4, 2020, 11:56 AM
37 points
24 comments5 min readLW link

Com­pe­ti­tion: Am­plify Ro­hin’s Pre­dic­tion on AGI re­searchers & Safety Concerns

stuhlmuellerJul 21, 2020, 8:06 PM
80 points
40 comments3 min readLW link

the scal­ing “in­con­sis­tency”: openAI’s new insight

nostalgebraistNov 7, 2020, 7:40 AM
146 points
14 comments9 min readLW link
(nostalgebraist.tumblr.com)

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTroutDec 23, 2020, 5:16 PM
35 points
0 comments4 min readLW link
(www.lesswrong.com)

Boot­strapped Alignment

Gordon Seidoh WorleyFeb 27, 2021, 3:46 PM
19 points
12 comments2 min readLW link

Mul­ti­modal Neu­rons in Ar­tifi­cial Neu­ral Networks

Kaj_SotalaMar 5, 2021, 9:01 AM
57 points
2 comments2 min readLW link
(distill.pub)

Re­view of “Fun with +12 OOMs of Com­pute”

Mar 28, 2021, 2:55 PM
60 points
20 comments8 min readLW link

Draft re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe CarlsmithApr 28, 2021, 9:41 PM
80 points
23 comments1 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

Jun 3, 2021, 8:37 PM
70 points
9 comments3 min readLW link

Deep­Mind: Gen­er­ally ca­pa­ble agents emerge from open-ended play

Daniel KokotajloJul 27, 2021, 2:19 PM
247 points
53 comments2 min readLW link
(deepmind.com)

Analo­gies and Gen­eral Pri­ors on Intelligence

Aug 20, 2021, 9:03 PM
57 points
12 comments14 min readLW link

We’re already in AI takeoff

ValentineMar 8, 2022, 11:09 PM
120 points
115 comments7 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwernMar 9, 2022, 4:35 PM
386 points
125 comments1 min readLW link
(www.gwern.net)

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM
45 points
0 comments59 min readLW link

Why all the fuss about re­cur­sive self-im­prove­ment?

So8resJun 12, 2022, 8:53 PM
150 points
62 comments7 min readLW link

AI Safety bounty for prac­ti­cal ho­mo­mor­phic encryption

acylhalideAug 19, 2022, 12:27 PM
29 points
9 comments4 min readLW link

Paper: Dis­cov­er­ing novel al­gorithms with AlphaTen­sor [Deep­mind]

LawrenceCOct 5, 2022, 4:20 PM
80 points
18 comments1 min readLW link
(www.deepmind.com)

The Teacup Test

lsusrOct 8, 2022, 4:25 AM
71 points
28 comments2 min readLW link

Dis­con­tin­u­ous progress in his­tory: an update

KatjaGraceApr 14, 2020, 12:00 AM
179 points
25 comments31 min readLW link1 review
(aiimpacts.org)

Repli­ca­tion Dy­nam­ics Bridge to RL in Ther­mo­dy­namic Limit

Past AccountMay 18, 2020, 1:02 AM
6 points
1 comment2 min readLW link

The ground of optimization

Alex FlintJun 20, 2020, 12:38 AM
218 points
74 comments27 min readLW link1 review

Model­ling Con­tin­u­ous Progress

Sammy MartinJun 23, 2020, 6:06 PM
29 points
3 comments7 min readLW link

Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral Intelligence

Rohin ShahJan 8, 2019, 7:12 AM
118 points
75 comments5 min readLW link2 reviews
(www.fhi.ox.ac.uk)

Clas­sifi­ca­tion of AI al­ign­ment re­search: de­con­fu­sion, “good enough” non-su­per­in­tel­li­gent AI al­ign­ment, su­per­in­tel­li­gent AI alignment

philip_bJul 14, 2020, 10:48 PM
35 points
25 comments3 min readLW link

Col­lec­tion of GPT-3 results

Kaj_SotalaJul 18, 2020, 8:04 PM
89 points
24 comments1 min readLW link
(twitter.com)

Hiring en­g­ineers and re­searchers to help al­ign GPT-3

paulfchristianoOct 1, 2020, 6:54 PM
206 points
14 comments3 min readLW link

The date of AI Takeover is not the day the AI takes over

Daniel KokotajloOct 22, 2020, 10:41 AM
116 points
32 comments2 min readLW link1 review

[Question] What could one do with truly un­limited com­pu­ta­tional power?

YitzNov 11, 2020, 10:03 AM
30 points
22 comments2 min readLW link

AGI Predictions

Nov 21, 2020, 3:46 AM
110 points
36 comments4 min readLW link

[Question] What are the best prece­dents for in­dus­tries failing to in­vest in valuable AI re­search?

Daniel KokotajloDec 14, 2020, 11:57 PM
18 points
17 comments1 min readLW link

Ex­trap­o­lat­ing GPT-N performance

Lukas FinnvedenDec 18, 2020, 9:41 PM
103 points
31 comments25 min readLW link1 review

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth BarnesDec 23, 2020, 3:24 AM
125 points
21 comments16 min readLW link

Liter­a­ture Re­view on Goal-Directedness

Jan 18, 2021, 11:15 AM
69 points
21 comments31 min readLW link

[Question] How will OpenAI + GitHub’s Copi­lot af­fect pro­gram­ming?

smountjoyJun 29, 2021, 4:42 PM
55 points
23 comments1 min readLW link

Model­ing Risks From Learned Optimization

Ben CottierOct 12, 2021, 8:54 PM
44 points
0 comments12 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

Oct 18, 2021, 6:37 PM
81 points
9 comments10 min readLW link

Effi­cien­tZero: How It Works

1a3ornNov 26, 2021, 3:17 PM
273 points
42 comments29 min readLW link

The­o­ret­i­cal Neu­ro­science For Align­ment Theory

Cameron BergDec 7, 2021, 9:50 PM
62 points
19 comments23 min readLW link

Magna Alta Doctrina

jacob_cannellDec 11, 2021, 9:54 PM
37 points
7 comments28 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannellDec 18, 2021, 2:15 AM
32 points
22 comments4 min readLW link

Reg­u­lariza­tion Causes Mo­du­lar­ity Causes Generalization

dkirmaniJan 1, 2022, 11:34 PM
49 points
7 comments3 min readLW link

Is Gen­eral In­tel­li­gence “Com­pact”?

DragonGodJul 4, 2022, 1:27 PM
21 points
6 comments22 min readLW link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

Gabe MJul 2, 2022, 6:36 PM
22 points
0 comments14 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
135 points
34 comments10 min readLW link

How evolu­tion suc­ceeds and fails at value alignment

OcracokeAug 21, 2022, 7:14 AM
21 points
2 comments4 min readLW link

An Un­trol­lable Math­e­mat­i­cian Illustrated

abramdemskiMar 20, 2018, 12:00 AM
155 points
38 comments1 min readLW link1 review

Con­di­tions for Mesa-Optimization

Jun 1, 2019, 8:52 PM
75 points
48 comments12 min readLW link

Thoughts on Hu­man Models

Feb 21, 2019, 9:10 AM
124 points
32 comments10 min readLW link1 review

In­ner al­ign­ment in the brain

Steven ByrnesApr 22, 2020, 1:14 PM
76 points
16 comments16 min readLW link

Prob­lem re­lax­ation as a tactic

TurnTroutApr 22, 2020, 11:44 PM
113 points
8 comments7 min readLW link

[Question] How should po­ten­tial AI al­ign­ment re­searchers gauge whether the field is right for them?

TurnTroutMay 6, 2020, 12:24 PM
20 points
5 comments1 min readLW link

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

May 6, 2020, 11:51 PM
46 points
8 comments6 min readLW link

Les­sons from Isaac: Pit­falls of Reason

adamShimiMay 8, 2020, 8:44 PM
9 points
0 comments8 min readLW link

Cor­rigi­bil­ity as out­side view

TurnTroutMay 8, 2020, 9:56 PM
36 points
11 comments4 min readLW link

[Question] How to choose a PhD with AI Safety in mind

kwiat.devMay 15, 2020, 10:19 PM
9 points
1 comment1 min readLW link

Re­ward func­tions and up­dat­ing as­sump­tions can hide a mul­ti­tude of sins

Stuart_ArmstrongMay 18, 2020, 3:18 PM
16 points
2 comments9 min readLW link

Pos­si­ble take­aways from the coro­n­avirus pan­demic for slow AI takeoff

VikaMay 31, 2020, 5:51 PM
135 points
36 comments3 min readLW link1 review

Fo­cus: you are al­lowed to be bad at ac­com­plish­ing your goals

adamShimiJun 3, 2020, 9:04 PM
19 points
17 comments3 min readLW link

Re­ply to Paul Chris­ti­ano on Inac­cessible Information

Alex FlintJun 5, 2020, 9:10 AM
77 points
15 comments6 min readLW link

Our take on CHAI’s re­search agenda in un­der 1500 words

Alex FlintJun 17, 2020, 12:24 PM
112 points
19 comments5 min readLW link

[Question] Ques­tion on GPT-3 Ex­cel Demo

Zhitao HouJun 22, 2020, 8:31 PM
0 points
2 comments1 min readLW link

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_ArmstrongJul 7, 2020, 12:02 PM
30 points
8 comments2 min readLW link

Cortés, Pizarro, and Afonso as Prece­dents for Takeover

Daniel KokotajloMar 1, 2020, 3:49 AM
145 points
75 comments11 min readLW link1 review

[Question] What prob­lem would you like to see Re­in­force­ment Learn­ing ap­plied to?

Julian SchrittwieserJul 8, 2020, 2:40 AM
43 points
4 comments1 min readLW link

My cur­rent frame­work for think­ing about AGI timelines

zhukeepaMar 30, 2020, 1:23 AM
107 points
5 comments3 min readLW link

[Question] To what ex­tent is GPT-3 ca­pa­ble of rea­son­ing?

TurnTroutJul 20, 2020, 5:10 PM
70 points
74 comments16 min readLW link

Repli­cat­ing the repli­ca­tion crisis with GPT-3?

skybrianJul 22, 2020, 9:20 PM
29 points
10 comments1 min readLW link

Can you get AGI from a Trans­former?

Steven ByrnesJul 23, 2020, 3:27 PM
114 points
39 comments12 min readLW link

Writ­ing with GPT-3

Jacob FalkovichJul 24, 2020, 3:22 PM
42 points
0 comments4 min readLW link

In­ner Align­ment: Ex­plain like I’m 12 Edition

Rafael HarthAug 1, 2020, 3:24 PM
175 points
46 comments13 min readLW link2 reviews

Devel­op­men­tal Stages of GPTs

orthonormalJul 26, 2020, 10:03 PM
140 points
74 comments7 min readLW link1 review

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTroutJul 27, 2020, 12:28 AM
40 points
6 comments4 min readLW link

Are we in an AI over­hang?

Andy JonesJul 27, 2020, 12:48 PM
255 points
109 comments4 min readLW link

[Question] What spe­cific dan­gers arise when ask­ing GPT-N to write an Align­ment Fo­rum post?

Matthew BarnettJul 28, 2020, 2:56 AM
44 points
14 comments1 min readLW link

[Question] Prob­a­bil­ity that other ar­chi­tec­tures will scale as well as Trans­form­ers?

Daniel KokotajloJul 28, 2020, 7:36 PM
22 points
4 comments1 min readLW link

What a 20-year-lead in mil­i­tary tech might look like

Daniel KokotajloJul 29, 2020, 8:10 PM
68 points
44 comments16 min readLW link

[Question] What if memes are com­mon in highly ca­pa­ble minds?

Daniel KokotajloJul 30, 2020, 8:45 PM
36 points
15 comments2 min readLW link

Three men­tal images from think­ing about AGI de­bate & corrigibility

Steven ByrnesAug 3, 2020, 2:29 PM
55 points
35 comments4 min readLW link

Solv­ing Key Align­ment Prob­lems Group

Logan RiggsAug 3, 2020, 7:30 PM
19 points
7 comments2 min readLW link

How eas­ily can we sep­a­rate a friendly AI in de­sign space from one which would bring about a hy­per­ex­is­ten­tial catas­tro­phe?

AnirandisSep 10, 2020, 12:40 AM
19 points
20 comments2 min readLW link

My com­pu­ta­tional frame­work for the brain

Steven ByrnesSep 14, 2020, 2:19 PM
144 points
26 comments13 min readLW link1 review

[Question] Where is hu­man level on text pre­dic­tion? (GPTs task)

Daniel KokotajloSep 20, 2020, 9:00 AM
27 points
19 comments1 min readLW link

Needed: AI in­fo­haz­ard policy

Vanessa KosoySep 21, 2020, 3:26 PM
61 points
17 comments2 min readLW link

The Col­lid­ing Ex­po­nen­tials of AI

VermillionOct 14, 2020, 11:31 PM
27 points
16 comments5 min readLW link

“Lit­tle glimpses of em­pa­thy” as the foun­da­tion for so­cial emotions

Steven ByrnesOct 22, 2020, 11:02 AM
31 points
1 comment5 min readLW link

In­tro­duc­tion to Carte­sian Frames

Scott GarrabrantOct 22, 2020, 1:00 PM
145 points
29 comments22 min readLW link1 review

“Carte­sian Frames” Talk #2 this Sun­day at 2pm (PT)

Rob BensingerOct 28, 2020, 1:59 PM
30 points
0 comments1 min readLW link

Does SGD Pro­duce De­cep­tive Align­ment?

Mark XuNov 6, 2020, 11:48 PM
85 points
6 comments16 min readLW link

[Question] How can I bet on short timelines?

Daniel KokotajloNov 7, 2020, 12:44 PM
43 points
16 comments2 min readLW link

Non-Ob­struc­tion: A Sim­ple Con­cept Mo­ti­vat­ing Corrigibility

TurnTroutNov 21, 2020, 7:35 PM
67 points
19 comments19 min readLW link

Carte­sian Frames Definitions

Rob BensingerNov 8, 2020, 12:44 PM
25 points
0 comments4 min readLW link

Com­mu­ni­ca­tion Prior as Align­ment Strategy

johnswentworthNov 12, 2020, 10:06 PM
40 points
8 comments6 min readLW link

How Rood­man’s GWP model trans­lates to TAI timelines

Daniel KokotajloNov 16, 2020, 2:05 PM
22 points
5 comments3 min readLW link

Normativity

abramdemskiNov 18, 2020, 4:52 PM
46 points
11 comments9 min readLW link

In­ner Align­ment in Salt-Starved Rats

Steven ByrnesNov 19, 2020, 2:40 AM
136 points
39 comments11 min readLW link2 reviews

Con­tin­u­ing the take­offs debate

Richard_NgoNov 23, 2020, 3:58 PM
67 points
13 comments9 min readLW link

The next AI win­ter will be due to en­ergy costs

hippkeNov 24, 2020, 4:53 PM
57 points
7 comments2 min readLW link

Re­cur­sive Quan­tiliz­ers II

abramdemskiDec 2, 2020, 3:26 PM
30 points
15 comments13 min readLW link

Su­per­vised learn­ing in the brain, part 4: com­pres­sion /​ filtering

Steven ByrnesDec 5, 2020, 5:06 PM
12 points
0 comments5 min readLW link

Con­ser­vatism in neo­cor­tex-like AGIs

Steven ByrnesDec 8, 2020, 4:37 PM
22 points
5 comments8 min readLW link

Avoid­ing Side Effects in Com­plex Environments

Dec 12, 2020, 12:34 AM
62 points
9 comments2 min readLW link
(avoiding-side-effects.github.io)

The Power of Annealing

meanderingmooseDec 14, 2020, 11:02 AM
25 points
6 comments5 min readLW link

[link] The AI Gir­lfriend Se­duc­ing China’s Lonely Men

Kaj_SotalaDec 14, 2020, 8:18 PM
34 points
11 comments1 min readLW link
(www.sixthtone.com)

Oper­a­tional­iz­ing com­pat­i­bil­ity with strat­egy-stealing

evhubDec 24, 2020, 10:36 PM
41 points
6 comments4 min readLW link

De­fus­ing AGI Danger

Mark XuDec 24, 2020, 10:58 PM
48 points
9 comments9 min readLW link

Multi-di­men­sional re­wards for AGI in­ter­pretabil­ity and control

Steven ByrnesJan 4, 2021, 3:08 AM
19 points
8 comments10 min readLW link

DALL-E by OpenAI

Daniel KokotajloJan 5, 2021, 8:05 PM
97 points
22 comments1 min readLW link

Re­view of ‘But ex­actly how com­plex and frag­ile?’

TurnTroutJan 6, 2021, 6:39 PM
55 points
0 comments8 min readLW link

The Case for a Jour­nal of AI Alignment

adamShimiJan 9, 2021, 6:13 PM
45 points
32 comments4 min readLW link

Trans­parency and AGI safety

jylin04Jan 11, 2021, 6:51 PM
52 points
12 comments30 min readLW link

Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain

Daniel KokotajloJan 18, 2021, 12:08 PM
184 points
85 comments14 min readLW link1 review

In­fra-Bayesi­anism Unwrapped

adamShimiJan 20, 2021, 1:35 PM
41 points
0 comments24 min readLW link

Op­ti­mal play in hu­man-judged De­bate usu­ally won’t an­swer your question

Joe CollmanJan 27, 2021, 7:34 AM
33 points
12 comments12 min readLW link

Creat­ing AGI Safety Interlocks

Koen.HoltmanFeb 5, 2021, 12:01 PM
7 points
4 comments8 min readLW link

Timeline of AI safety

riceissaFeb 7, 2021, 10:29 PM
63 points
6 comments2 min readLW link
(timelines.issarice.com)

Tour­ne­sol, YouTube and AI Risk

adamShimiFeb 12, 2021, 6:56 PM
36 points
13 comments4 min readLW link

In­ter­net En­cy­clo­pe­dia of Philos­o­phy on Ethics of Ar­tifi­cial Intelligence

Kaj_SotalaFeb 20, 2021, 1:54 PM
15 points
1 comment4 min readLW link
(iep.utm.edu)

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM
21 points
12 comments9 min readLW link

A sim­ple way to make GPT-3 fol­low instructions

Quintin PopeMar 8, 2021, 2:57 AM
11 points
5 comments4 min readLW link

Towards a Mechanis­tic Un­der­stand­ing of Goal-Directedness

Mark XuMar 9, 2021, 8:17 PM
45 points
1 comment5 min readLW link

AXRP Epi­sode 5 - In­fra-Bayesi­anism with Vanessa Kosoy

DanielFilanMar 10, 2021, 4:30 AM
33 points
12 comments35 min readLW link

Com­ments on “The Sin­gu­lar­ity is Nowhere Near”

Steven ByrnesMar 16, 2021, 11:59 PM
50 points
6 comments8 min readLW link

Is RL in­volved in sen­sory pro­cess­ing?

Steven ByrnesMar 18, 2021, 1:57 PM
21 points
21 comments5 min readLW link

Against evolu­tion as an anal­ogy for how hu­mans will cre­ate AGI

Steven ByrnesMar 23, 2021, 12:29 PM
44 points
25 comments25 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven ByrnesMar 25, 2021, 1:45 PM
66 points
40 comments16 min readLW link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGraceMar 26, 2021, 4:10 PM
88 points
27 comments14 min readLW link
(aiimpacts.org)

Trans­parency Trichotomy

Mark XuMar 28, 2021, 8:26 PM
25 points
2 comments7 min readLW link

Hard­ware is already ready for the sin­gu­lar­ity. Al­gorithm knowl­edge is the only bar­rier.

Andrew VlahosMar 30, 2021, 10:48 PM
16 points
3 comments3 min readLW link

Ben Go­ertzel’s “Kinds of Minds”

JoshuaFoxApr 11, 2021, 12:41 PM
12 points
4 comments1 min readLW link

Up­dat­ing the Lot­tery Ticket Hypothesis

johnswentworthApr 18, 2021, 9:45 PM
73 points
41 comments2 min readLW link

Three rea­sons to ex­pect long AI timelines

Matthew BarnettApr 22, 2021, 6:44 PM
68 points
29 comments11 min readLW link
(matthewbarnett.substack.com)

Be­ware over-use of the agent model

Alex FlintApr 25, 2021, 10:19 PM
28 points
10 comments5 min readLW link1 review

Agents Over Carte­sian World Models

Apr 27, 2021, 2:06 AM
62 points
3 comments27 min readLW link

Less Real­is­tic Tales of Doom

Mark XuMay 6, 2021, 11:01 PM
110 points
13 comments4 min readLW link

Challenge: know ev­ery­thing that the best go bot knows about go

DanielFilanMay 11, 2021, 5:10 AM
48 points
93 comments2 min readLW link
(danielfilan.com)

For­mal In­ner Align­ment, Prospectus

abramdemskiMay 12, 2021, 7:57 PM
91 points
57 comments16 min readLW link

Agency in Con­way’s Game of Life

Alex FlintMay 13, 2021, 1:07 AM
97 points
81 comments9 min readLW link1 review

Knowl­edge Neu­rons in Pre­trained Transformers

evhubMay 17, 2021, 10:54 PM
98 points
7 comments2 min readLW link
(arxiv.org)

De­cou­pling de­liber­a­tion from competition

paulfchristianoMay 25, 2021, 6:50 PM
72 points
16 comments9 min readLW link
(ai-alignment.com)

Power dy­nam­ics as a blind spot or blurry spot in our col­lec­tive world-mod­el­ing, es­pe­cially around AI

Andrew_CritchJun 1, 2021, 6:45 PM
176 points
26 comments6 min readLW link

Game-the­o­retic Align­ment in terms of At­tain­able Utility

Jun 8, 2021, 12:36 PM
20 points
2 comments9 min readLW link

Beijing Academy of Ar­tifi­cial In­tel­li­gence an­nounces 1,75 trillion pa­ram­e­ters model, Wu Dao 2.0

OzyrusJun 3, 2021, 12:07 PM
23 points
9 comments1 min readLW link
(www.engadget.com)

An In­tu­itive Guide to Garrabrant Induction

Mark XuJun 3, 2021, 10:21 PM
115 points
18 comments24 min readLW link

Con­ser­va­tive Agency with Mul­ti­ple Stakeholders

TurnTroutJun 8, 2021, 12:30 AM
31 points
0 comments3 min readLW link

Sup­ple­ment to “Big pic­ture of pha­sic dopamine”

Steven ByrnesJun 8, 2021, 1:08 PM
13 points
2 comments9 min readLW link

Look­ing Deeper at Deconfusion

adamShimiJun 13, 2021, 9:29 PM
57 points
13 comments15 min readLW link

[Question] Open prob­lem: how can we quan­tify player al­ign­ment in 2x2 nor­mal-form games?

TurnTroutJun 16, 2021, 2:09 AM
23 points
59 comments1 min readLW link

Re­ward Is Not Enough

Steven ByrnesJun 16, 2021, 1:52 PM
105 points
18 comments10 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTroutJun 22, 2021, 10:26 PM
71 points
44 comments16 min readLW link
(arxiv.org)

AXRP Epi­sode 9 - Finite Fac­tored Sets with Scott Garrabrant

DanielFilanJun 24, 2021, 10:10 PM
56 points
2 comments58 min readLW link

Mus­ings on gen­eral sys­tems alignment

Alex FlintJun 30, 2021, 6:16 PM
31 points
11 comments3 min readLW link

Thoughts on safety in pre­dic­tive learning

Steven ByrnesJun 30, 2021, 7:17 PM
18 points
17 comments19 min readLW link

The More Power At Stake, The Stronger In­stru­men­tal Con­ver­gence Gets For Op­ti­mal Policies

TurnTroutJul 11, 2021, 5:36 PM
45 points
7 comments6 min readLW link

A world in which the al­ign­ment prob­lem seems lower-stakes

TurnTroutJul 8, 2021, 2:31 AM
19 points
17 comments2 min readLW link

Frac­tional progress es­ti­mates for AI timelines and im­plied re­source requirements

Jul 15, 2021, 6:43 PM
55 points
6 comments7 min readLW link

Ex­per­i­men­ta­tion with AI-gen­er­ated images (VQGAN+CLIP) | So­larpunk air­ships flee­ing a dragon

Kaj_SotalaJul 15, 2021, 11:00 AM
44 points
4 comments2 min readLW link
(kajsotala.fi)

Seek­ing Power is Con­ver­gently In­stru­men­tal in a Broad Class of Environments

TurnTroutAug 8, 2021, 2:02 AM
41 points
15 comments8 min readLW link

LCDT, A My­opic De­ci­sion Theory

Aug 3, 2021, 10:41 PM
50 points
51 comments15 min readLW link

When Most VNM-Co­her­ent Prefer­ence Order­ings Have Con­ver­gent In­stru­men­tal Incentives

TurnTroutAug 9, 2021, 5:22 PM
52 points
4 comments5 min readLW link

Two AI-risk-re­lated game de­sign ideas

Daniel KokotajloAug 5, 2021, 1:36 PM
47 points
9 comments5 min readLW link

Re­search agenda update

Steven ByrnesAug 6, 2021, 7:24 PM
54 points
40 comments7 min readLW link

What 2026 looks like

Daniel KokotajloAug 6, 2021, 4:14 PM
371 points
109 comments16 min readLW link1 review

Satis­ficers Tend To Seek Power: In­stru­men­tal Con­ver­gence Via Retargetability

TurnTroutNov 18, 2021, 1:54 AM
69 points
8 comments17 min readLW link
(www.overleaf.com)

Dopamine-su­per­vised learn­ing in mam­mals & fruit flies

Steven ByrnesAug 10, 2021, 4:13 PM
16 points
6 comments8 min readLW link

Free course re­view — Reli­able and In­ter­pretable Ar­tifi­cial In­tel­li­gence (ETH Zurich)

Jan CzechowskiAug 10, 2021, 4:36 PM
7 points
0 comments3 min readLW link

Tech­ni­cal Pre­dic­tions Re­lated to AI Safety

lsusrAug 13, 2021, 12:29 AM
28 points
12 comments8 min readLW link

Provide feed­back on Open Philan­thropy’s AI al­ign­ment RFP

Aug 20, 2021, 7:52 PM
56 points
6 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooenAug 21, 2021, 2:02 AM
74 points
13 comments2 min readLW link

Ran­dal Koene on brain un­der­stand­ing be­fore whole brain emulation

Steven ByrnesAug 23, 2021, 8:59 PM
36 points
12 comments3 min readLW link

MIRI/​OP ex­change about de­ci­sion theory

Rob BensingerAug 25, 2021, 10:44 PM
47 points
7 comments10 min readLW link

Good­hart Ethology

Charlie SteinerSep 17, 2021, 5:31 PM
18 points
4 comments14 min readLW link

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimiAug 28, 2021, 1:35 PM
12 points
2 comments1 min readLW link

Brain-Com­puter In­ter­faces and AI Alignment

niplavAug 28, 2021, 7:48 PM
31 points
6 comments11 min readLW link

Su­per­in­tel­li­gent In­tro­spec­tion: A Counter-ar­gu­ment to the Orthog­o­nal­ity Thesis

DirectedEvolutionAug 29, 2021, 4:53 AM
3 points
18 comments4 min readLW link

Align­ment Re­search = Con­cep­tual Align­ment Re­search + Ap­plied Align­ment Research

adamShimiAug 30, 2021, 9:13 PM
37 points
14 comments5 min readLW link

AXRP Epi­sode 11 - At­tain­able Utility and Power with Alex Turner

DanielFilanSep 25, 2021, 9:10 PM
19 points
5 comments52 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yassSep 28, 2021, 1:54 AM
10 points
3 comments1 min readLW link

Take­off Speeds and Discontinuities

Sep 30, 2021, 1:50 PM
62 points
1 comment15 min readLW link

My take on Vanessa Kosoy’s take on AGI safety

Steven ByrnesSep 30, 2021, 12:23 PM
84 points
10 comments31 min readLW link

[Pre­dic­tion] We are in an Al­gorith­mic Overhang

lsusrSep 29, 2021, 11:40 PM
31 points
14 comments1 min readLW link

In­ter­view with Skynet

lsusrSep 30, 2021, 2:20 AM
49 points
1 comment2 min readLW link

AI learns be­trayal and how to avoid it

Stuart_ArmstrongSep 30, 2021, 9:39 AM
30 points
4 comments2 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron BergOct 3, 2021, 8:10 PM
19 points
1 comment16 min readLW link

[Question] How to think about and deal with OpenAI

Rafael HarthOct 9, 2021, 1:10 PM
107 points
71 comments1 min readLW link

NVIDIA and Microsoft re­leases 530B pa­ram­e­ter trans­former model, Me­ga­tron-Tur­ing NLG

OzyrusOct 11, 2021, 3:28 PM
51 points
36 comments1 min readLW link
(developer.nvidia.com)

Post­mod­ern Warfare

lsusrOct 25, 2021, 9:02 AM
61 points
25 comments2 min readLW link

A very crude de­cep­tion eval is already passed

Beth BarnesOct 29, 2021, 5:57 PM
105 points
8 comments2 min readLW link

Study Guide

johnswentworthNov 6, 2021, 1:23 AM
220 points
41 comments16 min readLW link

Re: At­tempted Gears Anal­y­sis of AGI In­ter­ven­tion Dis­cus­sion With Eliezer

lsusrNov 15, 2021, 10:02 AM
20 points
8 comments15 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

Nov 15, 2021, 8:31 PM
235 points
143 comments99 min readLW link

Cor­rigi­bil­ity Can Be VNM-Incoherent

TurnTroutNov 20, 2021, 12:30 AM
64 points
24 comments7 min readLW link

Visi­ble Thoughts Pro­ject and Bounty Announcement

So8resNov 30, 2021, 12:19 AM
245 points
104 comments13 min readLW link

In­ter­pret­ing Yud­kowsky on Deep vs Shal­low Knowledge

adamShimiDec 5, 2021, 5:32 PM
100 points
32 comments24 min readLW link

Are there al­ter­na­tive to solv­ing value trans­fer and ex­trap­o­la­tion?

Stuart_ArmstrongDec 6, 2021, 6:53 PM
19 points
7 comments5 min readLW link

Con­sid­er­a­tions on in­ter­ac­tion be­tween AI and ex­pected value of the fu­ture

Beth BarnesDec 7, 2021, 2:46 AM
64 points
28 comments4 min readLW link

Some thoughts on why ad­ver­sar­ial train­ing might be useful

Beth BarnesDec 8, 2021, 1:28 AM
9 points
5 comments3 min readLW link

The Plan

johnswentworthDec 10, 2021, 11:41 PM
235 points
77 comments14 min readLW link

Moore’s Law, AI, and the pace of progress

VeedracDec 11, 2021, 3:02 AM
120 points
39 comments24 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

DiffractorDec 13, 2021, 8:16 AM
14 points
6 comments4 min readLW link

Con­se­quen­tial­ism & corrigibility

Steven ByrnesDec 14, 2021, 1:23 PM
60 points
27 comments7 min readLW link

Should we rely on the speed prior for safety?

Marc CarauleanuDec 14, 2021, 8:45 PM
14 points
6 comments5 min readLW link

The Case for Rad­i­cal Op­ti­mism about Interpretability

Quintin PopeDec 16, 2021, 11:38 PM
57 points
16 comments8 min readLW link1 review

Re­searcher in­cen­tives cause smoother progress on bench­marks

ryan_greenblattDec 21, 2021, 4:13 AM
20 points
4 comments1 min readLW link

Self-Or­ganised Neu­ral Net­works: A sim­ple, nat­u­ral and effi­cient way to intelligence

D𝜋Jan 1, 2022, 11:24 PM
41 points
51 comments44 min readLW link

Prizes for ELK proposals

paulfchristianoJan 3, 2022, 8:23 PM
141 points
156 comments7 min readLW link

D𝜋′s Spik­ing Network

lsusrJan 4, 2022, 4:08 AM
50 points
37 comments4 min readLW link

More Is Differ­ent for AI

jsteinhardtJan 4, 2022, 7:30 PM
137 points
22 comments3 min readLW link
(bounded-regret.ghost.io)

In­stru­men­tal Con­ver­gence For Real­is­tic Agent Objectives

TurnTroutJan 22, 2022, 12:41 AM
35 points
9 comments9 min readLW link

What’s Up With Con­fus­ingly Per­va­sive Con­se­quen­tial­ism?

RaemonJan 20, 2022, 7:22 PM
169 points
88 comments4 min readLW link

[In­tro to brain-like-AGI safety] 1. What’s the prob­lem & Why work on it now?

Steven ByrnesJan 26, 2022, 3:23 PM
119 points
19 comments23 min readLW link

Ar­gu­ments about Highly Reli­able Agent De­signs as a Use­ful Path to Ar­tifi­cial In­tel­li­gence Safety

Jan 27, 2022, 1:13 PM
27 points
0 comments1 min readLW link
(arxiv.org)

Com­pet­i­tive pro­gram­ming with AlphaCode

AlgonFeb 2, 2022, 4:49 PM
58 points
37 comments15 min readLW link
(deepmind.com)

Thoughts on AGI safety from the top

jylin04Feb 2, 2022, 8:06 PM
35 points
3 comments32 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron BergFeb 8, 2022, 4:12 PM
24 points
5 comments14 min readLW link

[In­tro to brain-like-AGI safety] 3. Two sub­sys­tems: Learn­ing & Steering

Steven ByrnesFeb 9, 2022, 1:09 PM
59 points
3 comments24 min readLW link

[In­tro to brain-like-AGI safety] 4. The “short-term pre­dic­tor”

Steven ByrnesFeb 16, 2022, 1:12 PM
51 points
11 comments13 min readLW link

ELK Pro­posal: Think­ing Via A Hu­man Imitator

TurnTroutFeb 22, 2022, 1:52 AM
28 points
6 comments11 min readLW link

Why I’m co-found­ing Aligned AI

Stuart_ArmstrongFeb 17, 2022, 7:55 PM
93 points
54 comments3 min readLW link

Im­pli­ca­tions of au­to­mated on­tol­ogy identification

Feb 18, 2022, 3:30 AM
67 points
29 comments23 min readLW link

Align­ment re­search exercises

Richard_NgoFeb 21, 2022, 8:24 PM
146 points
17 comments8 min readLW link

[In­tro to brain-like-AGI safety] 5. The “long-term pre­dic­tor”, and TD learning

Steven ByrnesFeb 23, 2022, 2:44 PM
41 points
25 comments21 min readLW link

How do new mod­els from OpenAI, Deep­Mind and An­thropic perform on Truth­fulQA?

Owain_EvansFeb 26, 2022, 12:46 PM
42 points
3 comments11 min readLW link

Es­ti­mat­ing Brain-Equiv­a­lent Com­pute from Image Recog­ni­tion Al­gorithms

Gunnar_ZarnckeFeb 27, 2022, 2:45 AM
14 points
4 comments2 min readLW link

[Link] Aligned AI AMA

Stuart_ArmstrongMar 1, 2022, 12:01 PM
18 points
0 comments1 min readLW link

[In­tro to brain-like-AGI safety] 6. Big pic­ture of mo­ti­va­tion, de­ci­sion-mak­ing, and RL

Steven ByrnesMar 2, 2022, 3:26 PM
41 points
13 comments16 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yassMar 2, 2022, 11:47 PM
25 points
10 comments1 min readLW link

[In­tro to brain-like-AGI safety] 7. From hard­coded drives to fore­sighted plans: A worked example

Steven ByrnesMar 9, 2022, 2:28 PM
56 points
0 comments9 min readLW link

[In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

Steven ByrnesMar 23, 2022, 12:48 PM
31 points
6 comments23 min readLW link

Hu­mans pre­tend­ing to be robots pre­tend­ing to be human

Richard_KennawayMar 28, 2022, 3:13 PM
27 points
15 comments1 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven ByrnesMar 30, 2022, 1:24 PM
34 points
4 comments21 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilanMar 31, 2022, 5:20 AM
24 points
1 comment48 min readLW link

Un­con­trol­lable Su­per-Pow­er­ful Explosives

Sammy MartinApr 2, 2022, 8:13 PM
53 points
12 comments5 min readLW link

The case for Do­ing Some­thing Else (if Align­ment is doomed)

Rafael HarthApr 5, 2022, 5:52 PM
81 points
14 comments2 min readLW link

[In­tro to brain-like-AGI safety] 11. Safety ≠ al­ign­ment (but they’re close!)

Steven ByrnesApr 6, 2022, 1:39 PM
25 points
1 comment10 min readLW link

Strate­gic Con­sid­er­a­tions Re­gard­ing Autis­tic/​Literal AI

Chris_LeongApr 6, 2022, 2:57 PM
−1 points
2 comments2 min readLW link

DALL·E 2 by OpenAI

P.Apr 6, 2022, 2:17 PM
44 points
51 comments1 min readLW link
(openai.com)

How to train your trans­former

p.b.Apr 7, 2022, 9:34 AM
6 points
0 comments8 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimiApr 9, 2022, 9:43 AM
46 points
42 comments1 min readLW link

Worse than an un­al­igned AGI

ShmiApr 10, 2022, 3:35 AM
−1 points
12 comments1 min readLW link

[Question] Did OpenAI let GPT out of the box?

ChristianKlApr 16, 2022, 2:56 PM
4 points
12 comments1 min readLW link

In­stru­men­tal Con­ver­gence To Offer Hope?

michael_mjdApr 22, 2022, 1:56 AM
12 points
7 comments3 min readLW link

[In­tro to brain-like-AGI safety] 13. Sym­bol ground­ing & hu­man so­cial instincts

Steven ByrnesApr 27, 2022, 1:30 PM
54 points
13 comments14 min readLW link

[In­tro to brain-like-AGI safety] 14. Con­trol­led AGI

Steven ByrnesMay 11, 2022, 1:17 PM
26 points
25 comments18 min readLW link

[Question] What’s keep­ing con­cerned ca­pa­bil­ities gain re­searchers from leav­ing the field?

sovranMay 12, 2022, 12:16 PM
19 points
4 comments1 min readLW link

[Question] What’s keep­ing con­cerned ca­pa­bil­ities gain re­searchers from leav­ing the field?

sovranMay 12, 2022, 12:16 PM
19 points
4 comments1 min readLW link

Read­ing the ethi­cists: A re­view of ar­ti­cles on AI in the jour­nal Science and Eng­ineer­ing Ethics

Charlie SteinerMay 18, 2022, 8:52 PM
50 points
8 comments14 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_SotalaJun 2, 2022, 9:41 PM
132 points
26 comments4 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_AltairJun 3, 2022, 1:35 PM
85 points
5 comments4 min readLW link

An­nounc­ing the Align­ment of Com­plex Sys­tems Re­search Group

Jun 4, 2022, 4:10 AM
79 points
18 comments5 min readLW link

AGI Ruin: A List of Lethalities

Eliezer YudkowskyJun 5, 2022, 10:05 PM
725 points
653 comments30 min readLW link

Yes, AI re­search will be sub­stan­tially cur­tailed if a lab causes a ma­jor disaster

lcJun 14, 2022, 10:17 PM
96 points
35 comments2 min readLW link

Lamda is not an LLM

KevinJun 19, 2022, 11:13 AM
7 points
10 comments1 min readLW link
(www.wired.com)

Google’s new text-to-image model—Parti, a demon­stra­tion of scal­ing benefits

KaydenJun 22, 2022, 8:00 PM
32 points
4 comments1 min readLW link

[Link] OpenAI: Learn­ing to Play Minecraft with Video PreTrain­ing (VPT)

Aryeh EnglanderJun 23, 2022, 4:29 PM
53 points
3 comments1 min readLW link

An­nounc­ing Epoch: A re­search or­ga­ni­za­tion in­ves­ti­gat­ing the road to Trans­for­ma­tive AI

Jun 27, 2022, 1:55 PM
95 points
2 comments2 min readLW link
(epochai.org)

Paper: Fore­cast­ing world events with neu­ral nets

Jul 1, 2022, 7:40 PM
39 points
3 comments4 min readLW link

Naive Hy­pothe­ses on AI Alignment

Shoshannah TekofskyJul 2, 2022, 7:03 PM
89 points
29 comments5 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Jul 14, 2022, 2:31 AM
175 points
92 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

TW123Jul 17, 2022, 8:06 PM
104 points
14 comments1 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardtJul 18, 2022, 2:50 AM
36 points
19 comments12 min readLW link
(bounded-regret.ghost.io)

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimiJul 23, 2022, 11:40 AM
37 points
5 comments3 min readLW link

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas TeixeiraJul 29, 2022, 9:06 PM
33 points
1 comment9 min readLW link

Where are the red lines for AI?

Karl von WendtAug 5, 2022, 9:34 AM
23 points
8 comments6 min readLW link

Jack Clark on the re­al­ities of AI policy

Kaj_SotalaAug 7, 2022, 8:44 AM
66 points
3 comments3 min readLW link
(threadreaderapp.com)

GD’s Im­plicit Bias on Separable Data

Xander DaviesOct 17, 2022, 4:13 AM
23 points
0 comments7 min readLW link

AI Trans­parency: Why it’s crit­i­cal and how to ob­tain it.

Zohar JacksonAug 14, 2022, 10:31 AM
6 points
1 comment5 min readLW link

Brain-like AGI pro­ject “ain­telope”

Gunnar_ZarnckeAug 14, 2022, 4:33 PM
48 points
2 comments1 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of Grokking

Aug 15, 2022, 2:41 AM
338 points
39 comments42 min readLW link
(colab.research.google.com)

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshenAug 20, 2022, 10:29 AM
30 points
5 comments7 min readLW link

AI art isn’t “about to shake things up”. It’s already here.

Davis_KingsleyAug 22, 2022, 11:17 AM
65 points
19 comments3 min readLW link

Some con­cep­tual al­ign­ment re­search projects

Richard_NgoAug 25, 2022, 10:51 PM
168 points
14 comments3 min readLW link

Lev­el­ling Up in AI Safety Re­search Engineering

Gabe MSep 2, 2022, 4:59 AM
40 points
7 comments17 min readLW link

The shard the­ory of hu­man values

Sep 4, 2022, 4:28 AM
202 points
57 comments24 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin PopeSep 10, 2022, 6:39 AM
119 points
5 comments9 min readLW link

LOVE in a sim­box is all you need

jacob_cannellSep 28, 2022, 6:25 PM
59 points
69 comments44 min readLW link

A shot at the di­a­mond-al­ign­ment problem

TurnTroutOct 6, 2022, 6:29 PM
77 points
53 comments15 min readLW link

More ex­am­ples of goal misgeneralization

Oct 7, 2022, 2:38 PM
51 points
8 comments2 min readLW link
(deepmindsafetyresearch.medium.com)

[Cross­post] AlphaTen­sor, Taste, and the Scal­a­bil­ity of AI

jamierumbelowOct 9, 2022, 7:42 PM
16 points
4 comments1 min readLW link
(jamieonsoftware.com)

QAPR 4: In­duc­tive biases

Quintin PopeOct 10, 2022, 10:08 PM
63 points
2 comments18 min readLW link

In­finite Pos­si­bil­ity Space and the Shut­down Problem

magfrumpOct 18, 2022, 5:37 AM
6 points
0 comments2 min readLW link
(www.magfrump.net)

Cruxes in Katja Grace’s Counterarguments

azsantoskOct 16, 2022, 8:44 AM
16 points
0 comments7 min readLW link

Deep­Mind on Strat­ego, an im­perfect in­for­ma­tion game

sanxiynOct 24, 2022, 5:57 AM
15 points
9 comments1 min readLW link
(arxiv.org)

An­nounc­ing: What Fu­ture World? - Grow­ing the AI Gover­nance Community

DavidCorfieldNov 2, 2022, 1:24 AM
1 point
0 comments1 min readLW link

Poster Ses­sion on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM
7 points
6 comments1 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

Nov 22, 2022, 6:57 PM
103 points
86 comments24 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

Dec 1, 2022, 11:11 PM
265 points
30 comments2 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM
65 points
26 comments9 min readLW link

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

Rohin ShahApr 8, 2020, 5:10 PM
11 points
0 comments7 min readLW link
(mailchi.mp)

[Question] What are the rel­a­tive speeds of AI ca­pa­bil­ities and AI safety?

NunoSempereApr 24, 2020, 6:21 PM
8 points
2 comments1 min readLW link

Seek­ing Power is Often Con­ver­gently In­stru­men­tal in MDPs

Dec 5, 2019, 2:33 AM
153 points
38 comments16 min readLW link2 reviews
(arxiv.org)

“Don’t even think about hell”

emmabMay 2, 2020, 8:06 AM
6 points
2 comments1 min readLW link

[Question] AI Box­ing for Hard­ware-bound agents (aka the China al­ign­ment prob­lem)

Logan ZoellnerMay 8, 2020, 3:50 PM
11 points
27 comments10 min readLW link

Point­ing to a Flower

johnswentworthMay 18, 2020, 6:54 PM
59 points
18 comments9 min readLW link

Learn­ing and ma­nipu­lat­ing learning

Stuart_ArmstrongMay 19, 2020, 1:02 PM
39 points
5 comments10 min readLW link

[Question] Why aren’t we test­ing gen­eral in­tel­li­gence dis­tri­bu­tion?

B JacobsMay 26, 2020, 4:07 PM
25 points
7 comments1 min readLW link

OpenAI an­nounces GPT-3

gwernMay 29, 2020, 1:49 AM
67 points
23 comments1 min readLW link
(arxiv.org)

GPT-3: a dis­ap­point­ing paper

nostalgebraistMay 29, 2020, 7:06 PM
65 points
44 comments8 min readLW link1 review

In­tro­duc­tion to Ex­is­ten­tial Risks from Ar­tifi­cial In­tel­li­gence, for an EA audience

JoshuaFoxJun 2, 2020, 8:30 AM
10 points
1 comment1 min readLW link

Prepar­ing for “The Talk” with AI projects

Daniel KokotajloJun 13, 2020, 11:01 PM
64 points
16 comments3 min readLW link

[Question] What are the high-level ap­proaches to AI al­ign­ment?

Gordon Seidoh WorleyJun 16, 2020, 5:10 PM
12 points
13 comments1 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_ArmstrongJun 17, 2020, 5:44 PM
58 points
2 comments1 min readLW link

[Question] Like­li­hood of hy­per­ex­is­ten­tial catas­tro­phe from a bug?

AnirandisJun 18, 2020, 4:23 PM
13 points
27 comments1 min readLW link

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

CullenJun 22, 2020, 4:59 PM
11 points
3 comments3 min readLW link

Goals and short descriptions

Michele CampoloJul 2, 2020, 5:41 PM
14 points
8 comments5 min readLW link

Re­search ideas to study hu­mans with AI Safety in mind

Riccardo VolpatoJul 3, 2020, 4:01 PM
23 points
2 comments5 min readLW link

AI Benefits Post 3: Direct and Indi­rect Ap­proaches to AI Benefits

CullenJul 6, 2020, 6:48 PM
8 points
0 comments2 min readLW link

An­titrust-Com­pli­ant AI In­dus­try Self-Regulation

CullenJul 7, 2020, 8:53 PM
9 points
3 comments1 min readLW link
(cullenokeefe.com)

Should AI Be Open?

Scott AlexanderDec 17, 2015, 8:25 AM
20 points
3 comments13 min readLW link

Meta Pro­gram­ming GPT: A route to Su­per­in­tel­li­gence?

dmteaJul 11, 2020, 2:51 PM
10 points
7 comments4 min readLW link

The Dilemma of Worse Than Death Scenarios

arkaeikJul 10, 2018, 9:18 AM
5 points
18 comments4 min readLW link

[Question] What are the mostly likely ways AGI will emerge?

Craig QuiterJul 14, 2020, 12:58 AM
3 points
7 comments1 min readLW link

AI Benefits Post 4: Out­stand­ing Ques­tions on Select­ing Benefits

CullenJul 14, 2020, 5:26 PM
4 points
4 comments5 min readLW link

Solv­ing Math Prob­lems by Relay

Jul 17, 2020, 3:32 PM
98 points
26 comments7 min readLW link

AI Benefits Post 5: Out­stand­ing Ques­tions on Govern­ing Benefits

CullenJul 21, 2020, 4:46 PM
4 points
0 comments4 min readLW link

[Question] Why is pseudo-al­ign­ment “worse” than other ways ML can fail to gen­er­al­ize?

nostalgebraistJul 18, 2020, 10:54 PM
45 points
10 comments2 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplavJul 20, 2020, 11:09 AM
5 points
3 comments1 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin ShahJan 2, 2020, 6:20 PM
34 points
94 comments10 min readLW link
(mailchi.mp)

Ac­cess to AI: a hu­man right?

dmteaJul 25, 2020, 9:38 AM
5 points
3 comments2 min readLW link

The Rise of Com­mon­sense Reasoning

DragonGodJul 27, 2020, 7:01 PM
8 points
0 comments1 min readLW link
(www.reddit.com)

AI and Efficiency

DragonGodJul 27, 2020, 8:58 PM
9 points
1 comment1 min readLW link
(openai.com)

FHI Re­port: How Will Na­tional Se­cu­rity Con­sid­er­a­tions Affect An­titrust De­ci­sions in AI? An Ex­am­i­na­tion of His­tor­i­cal Precedents

CullenJul 28, 2020, 6:34 PM
2 points
0 comments1 min readLW link
(www.fhi.ox.ac.uk)

The “best pre­dic­tor is mal­i­cious op­ti­miser” problem

Donald HobsonJul 29, 2020, 11:49 AM
14 points
10 comments2 min readLW link

Suffi­ciently Ad­vanced Lan­guage Models Can Do Re­in­force­ment Learning

Past AccountAug 2, 2020, 3:32 PM
21 points
7 comments7 min readLW link

[Question] What are the most im­por­tant pa­pers/​post/​re­sources to read to un­der­stand more of GPT-3?

adamShimiAug 2, 2020, 8:53 PM
22 points
4 comments1 min readLW link

[Question] What should an Ein­stein-like figure in Ma­chine Learn­ing do?

RaziedAug 5, 2020, 11:52 PM
3 points
3 comments1 min readLW link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

OferAug 11, 2020, 5:30 PM
15 points
0 comments2 min readLW link

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimiAug 11, 2020, 6:16 PM
50 points
56 comments1 min readLW link

Blog post: A tale of two re­search communities

Aryeh EnglanderAug 12, 2020, 8:41 PM
14 points
0 comments4 min readLW link

Map­ping Out Alignment

Aug 15, 2020, 1:02 AM
42 points
0 comments5 min readLW link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi NguyenAug 15, 2020, 8:02 PM
119 points
21 comments39 min readLW link

GPT-3, be­lief, and consistency

skybrianAug 16, 2020, 11:12 PM
18 points
7 comments2 min readLW link

[Question] What pre­cisely do we mean by AI al­ign­ment?

Gordon Seidoh WorleyDec 9, 2018, 2:23 AM
27 points
8 comments1 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouartiAug 21, 2020, 11:25 PM
8 points
10 comments1 min readLW link

[Question] Fore­cast­ing Thread: AI Timelines

Aug 22, 2020, 2:33 AM
133 points
95 comments2 min readLW link

Learn­ing hu­man prefer­ences: black-box, white-box, and struc­tured white-box access

Stuart_ArmstrongAug 24, 2020, 11:42 AM
25 points
9 comments6 min readLW link

Proofs Sec­tion 2.3 (Up­dates, De­ci­sion The­ory)

DiffractorAug 27, 2020, 7:49 AM
7 points
0 comments31 min readLW link

Proofs Sec­tion 2.2 (Iso­mor­phism to Ex­pec­ta­tions)

DiffractorAug 27, 2020, 7:52 AM
7 points
0 comments46 min readLW link

Proofs Sec­tion 2.1 (The­o­rem 1, Lem­mas)

DiffractorAug 27, 2020, 7:54 AM
7 points
0 comments36 min readLW link

Proofs Sec­tion 1.1 (Ini­tial re­sults to LF-du­al­ity)

DiffractorAug 27, 2020, 7:59 AM
7 points
0 comments20 min readLW link

Proofs Sec­tion 1.2 (Mix­tures, Up­dates, Push­for­wards)

DiffractorAug 27, 2020, 7:57 AM
7 points
0 comments14 min readLW link

Ba­sic In­framea­sure Theory

DiffractorAug 27, 2020, 8:02 AM
35 points
16 comments25 min readLW link

Belief Func­tions And De­ci­sion Theory

DiffractorAug 27, 2020, 8:00 AM
15 points
8 comments39 min readLW link

Tech­ni­cal model re­fine­ment formalism

Stuart_ArmstrongAug 27, 2020, 11:54 AM
19 points
0 comments6 min readLW link

Pong from pix­els with­out read­ing “Pong from Pix­els”

Ian McKenzieAug 29, 2020, 5:26 PM
15 points
1 comment7 min readLW link

Reflec­tions on AI Timelines Fore­cast­ing Thread

AmandangoSep 1, 2020, 1:42 AM
53 points
7 comments5 min readLW link

on “learn­ing to sum­ma­rize”

nostalgebraistSep 12, 2020, 3:20 AM
25 points
13 comments8 min readLW link
(nostalgebraist.tumblr.com)

[Question] The uni­ver­sal­ity of com­pu­ta­tion and mind de­sign space

alanfSep 12, 2020, 2:58 PM
1 point
7 comments1 min readLW link

Clar­ify­ing “What failure looks like”

Sam ClarkeSep 20, 2020, 8:40 PM
95 points
14 comments17 min readLW link

Hu­man Bi­ases that Ob­scure AI Progress

Danielle EnsignSep 25, 2020, 12:24 AM
42 points
2 comments4 min readLW link

[Question] Com­pe­tence vs Alignment

kwiat.devSep 30, 2020, 9:03 PM
6 points
4 comments1 min readLW link

AGI safety from first prin­ci­ples: Alignment

Richard_NgoOct 1, 2020, 3:13 AM
56 points
2 comments13 min readLW link

[Question] GPT-3 + GAN

stick109Oct 17, 2020, 7:58 AM
4 points
4 comments1 min readLW link

Book Re­view: Re­in­force­ment Learn­ing by Sut­ton and Barto

billmeiOct 20, 2020, 7:40 PM
52 points
3 comments10 min readLW link

GPT-X, Paper­clip Max­i­mizer? An­a­lyz­ing AGI and Fi­nal Goals

meanderingmooseOct 22, 2020, 2:33 PM
8 points
1 comment6 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomationOct 31, 2020, 4:16 PM
1 point
9 comments2 min readLW link

Why those who care about catas­trophic and ex­is­ten­tial risk should care about au­tonomous weapons

aaguirreNov 11, 2020, 3:22 PM
60 points
20 comments19 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AINov 14, 2020, 3:51 PM
32 points
8 comments1 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.bartenNov 18, 2020, 3:43 PM
27 points
36 comments3 min readLW link

Com­mit­ment and cred­i­bil­ity in mul­ti­po­lar AI scenarios

anni_leskelaDec 4, 2020, 6:48 PM
25 points
3 comments18 min readLW link

[Question] AI Win­ter Is Com­ing—How to profit from it?

maximkazhenkovDec 5, 2020, 8:23 PM
10 points
7 comments1 min readLW link

An­nounc­ing the Tech­ni­cal AI Safety Podcast

QuinnDec 7, 2020, 6:51 PM
42 points
6 comments2 min readLW link
(technical-ai-safety.libsyn.com)

All GPT skills are translation

p.b.Dec 13, 2020, 8:06 PM
4 points
0 comments2 min readLW link

[Question] Judg­ing AGI Output

cy6erlionDec 14, 2020, 12:43 PM
3 points
0 comments2 min readLW link

Risk Map of AI Systems

Dec 15, 2020, 9:16 AM
25 points
3 comments8 min readLW link

AI Align­ment, Philo­soph­i­cal Plu­ral­ism, and the Rele­vance of Non-Western Philosophy

xuanJan 1, 2021, 12:08 AM
30 points
21 comments20 min readLW link

Are we all mis­al­igned?

Mateusz MazurkiewiczJan 3, 2021, 2:42 AM
11 points
0 comments5 min readLW link

[Question] What do we *re­ally* ex­pect from a well-al­igned AI?

Jan BetleyJan 4, 2021, 8:57 PM
8 points
10 comments1 min readLW link

Eight claims about multi-agent AGI safety

Richard_NgoJan 7, 2021, 1:34 PM
73 points
18 comments5 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth BarnesJan 10, 2021, 12:30 AM
92 points
14 comments12 min readLW link

Pre­dic­tion can be Outer Aligned at Optimum

Lukas FinnvedenJan 10, 2021, 6:48 PM
15 points
12 comments11 min readLW link

[Question] Poll: Which vari­ables are most strate­gi­cally rele­vant?

Jan 22, 2021, 5:17 PM
32 points
34 comments1 min readLW link

AISU 2021

Linda LinseforsJan 30, 2021, 5:40 PM
28 points
2 comments1 min readLW link

Deep­mind has made a gen­eral in­duc­tor (“Mak­ing sense of sen­sory in­put”)

mako yassFeb 2, 2021, 2:54 AM
48 points
10 comments1 min readLW link
(www.sciencedirect.com)

Coun­ter­fac­tual Plan­ning in AGI Systems

Koen.HoltmanFeb 3, 2021, 1:54 PM
7 points
0 comments5 min readLW link

[AN #136]: How well will GPT-N perform on down­stream tasks?

Rohin ShahFeb 3, 2021, 6:10 PM
21 points
2 comments9 min readLW link
(mailchi.mp)

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohenFeb 18, 2021, 2:51 PM
47 points
123 comments2 min readLW link

TASP Ep 3 - Op­ti­mal Poli­cies Tend to Seek Power

QuinnMar 11, 2021, 1:44 AM
24 points
0 comments1 min readLW link
(technical-ai-safety.libsyn.com)

Phy­lac­tery De­ci­sion Theory

BunthutApr 2, 2021, 8:55 PM
14 points
6 comments2 min readLW link

Pre­dic­tive Cod­ing has been Unified with Backpropagation

lsusrApr 2, 2021, 9:42 PM
166 points
44 comments2 min readLW link

[Question] What if we could use the the­ory of Mechanism De­sign from Game The­ory as a medium achieve AI Align­ment?

farari7Apr 4, 2021, 12:56 PM
4 points
0 comments1 min readLW link

Another (outer) al­ign­ment failure story

paulfchristianoApr 7, 2021, 8:12 PM
210 points
38 comments12 min readLW link

A Sys­tem For Evolv­ing In­creas­ingly Gen­eral Ar­tifi­cial In­tel­li­gence From Cur­rent Technologies

Tsang Chung ShuApr 8, 2021, 9:37 PM
1 point
3 comments11 min readLW link

April 2021 Deep Dive: Trans­form­ers and GPT-3

adamShimiMay 1, 2021, 11:18 AM
30 points
6 comments7 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

QuinnMay 4, 2021, 7:10 PM
6 points
2 comments1 min readLW link

Mostly ques­tions about Dumb AI Kernels

HorizonHeldMay 12, 2021, 10:00 PM
1 point
1 comment9 min readLW link

Thoughts on Iter­ated Distil­la­tion and Amplification

WaddingtonMay 11, 2021, 9:32 PM
9 points
2 comments20 min readLW link

How do we build or­gani­sa­tions that want to build safe AI?

sxaeMay 12, 2021, 3:08 PM
4 points
4 comments9 min readLW link

[Question] Who has ar­gued in de­tail that a cur­rent AI sys­tem is phe­nom­e­nally con­scious?

RobboMay 14, 2021, 10:03 PM
3 points
2 comments1 min readLW link

How I Learned to Stop Wor­ry­ing and Love MUM

WaddingtonMay 20, 2021, 7:57 AM
2 points
0 comments3 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_EvansMay 21, 2021, 1:39 PM
58 points
2 comments3 min readLW link

[Question] How one uses set the­ory for al­ign­ment prob­lem?

Valentin2026May 29, 2021, 12:28 AM
8 points
6 comments1 min readLW link

Reflec­tion of Hier­ar­chi­cal Re­la­tion­ship via Nuanced Con­di­tion­ing of Game The­ory Ap­proach for AI Devel­op­ment and Utilization

Kyoung-cheol KimJun 4, 2021, 7:20 AM
2 points
2 comments9 min readLW link

Re­view of “Learn­ing Nor­ma­tivity: A Re­search Agenda”

Jun 6, 2021, 1:33 PM
34 points
0 comments6 min readLW link

Hard­ware for Trans­for­ma­tive AI

MrThinkJun 22, 2021, 6:13 PM
17 points
7 comments2 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimiJun 23, 2021, 9:44 AM
15 points
3 comments3 min readLW link

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

Jun 23, 2021, 11:25 PM
70 points
7 comments9 min readLW link

The Lan­guage of Bird

johnswentworthJun 27, 2021, 4:44 AM
44 points
9 comments2 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

QuinnJun 27, 2021, 5:44 PM
17 points
6 comments2 min readLW link

An ex­am­i­na­tion of Me­tac­u­lus’ re­solved AI pre­dic­tions and their im­pli­ca­tions for AI timelines

CharlesDJul 20, 2021, 9:08 AM
28 points
0 comments7 min readLW link

[Question] How should my timelines in­fluence my ca­reer choice?

Tom LieberumAug 3, 2021, 10:14 AM
13 points
10 comments1 min readLW link

What is the prob­lem?

Carlos RamirezAug 11, 2021, 10:33 PM
7 points
0 comments6 min readLW link

OpenAI Codex: First Impressions

specbugAug 13, 2021, 4:52 PM
49 points
8 comments4 min readLW link
(sixeleven.in)

[Question] 1h-vol­un­teers needed for a small AI Safety-re­lated re­search pro­ject

PabloAMCAug 16, 2021, 5:53 PM
2 points
0 comments1 min readLW link

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hubAug 24, 2021, 4:34 PM
18 points
2 comments5 min readLW link

Call for re­search on eval­u­at­ing al­ign­ment (fund­ing + ad­vice available)

Beth BarnesAug 31, 2021, 11:28 PM
105 points
11 comments5 min readLW link

Ob­sta­cles to gra­di­ent hacking

leogaoSep 5, 2021, 10:42 PM
21 points
11 comments4 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouartiSep 6, 2021, 5:30 PM
2 points
1 comment1 min readLW link

Dist­in­guish­ing AI takeover scenarios

Sep 8, 2021, 4:19 PM
67 points
11 comments14 min readLW link

Paths To High-Level Ma­chine Intelligence

Daniel_EthSep 10, 2021, 1:21 PM
67 points
8 comments33 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_EvansSep 16, 2021, 10:09 AM
56 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy MartinSep 17, 2021, 6:47 PM
27 points
1 comment27 min readLW link

A suffi­ciently para­noid non-Friendly AGI might self-mod­ify it­self to be­come Friendly

RomanSSep 22, 2021, 6:29 AM
5 points
2 comments1 min readLW link

Towards De­con­fus­ing Gra­di­ent Hacking

leogaoOct 24, 2021, 12:43 AM
25 points
1 comment12 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben SmithSep 29, 2021, 5:09 PM
27 points
8 comments10 min readLW link

Meta learn­ing to gra­di­ent hack

Quintin PopeOct 1, 2021, 7:25 PM
54 points
11 comments3 min readLW link

Pro­posal: Scal­ing laws for RL generalization

axiomanOct 1, 2021, 9:32 PM
14 points
10 comments11 min readLW link

A Frame­work of Pre­dic­tion Technologies

isaduanOct 3, 2021, 10:26 AM
8 points
2 comments9 min readLW link

AI Pre­dic­tion Ser­vices and Risks of War

isaduanOct 3, 2021, 10:26 AM
3 points
2 comments10 min readLW link

Pos­si­ble Wor­lds af­ter Pre­dic­tion Take-off

isaduanOct 3, 2021, 10:26 AM
5 points
0 comments4 min readLW link

[Pro­posal] Method of lo­cat­ing use­ful sub­nets in large models

Quintin PopeOct 13, 2021, 8:52 PM
9 points
0 comments2 min readLW link

Com­men­tary on “AGI Safety From First Prin­ci­ples by Richard Ngo, Septem­ber 2020”

Robert KralischOct 14, 2021, 3:11 PM
3 points
0 comments20 min readLW link

The AGI needs to be honest

rokosbasiliskOct 16, 2021, 7:24 PM
2 points
12 comments2 min readLW link

“Re­dun­dant” AI Alignment

Mckay JensenOct 16, 2021, 9:32 PM
12 points
3 comments1 min readLW link
(quevivasbien.github.io)

[MLSN #1]: ICLR Safety Paper Roundup

Dan_HOct 18, 2021, 3:19 PM
59 points
1 comment2 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_EvansOct 22, 2021, 4:23 PM
31 points
15 comments1 min readLW link

Hegel vs. GPT-3

BezziOct 27, 2021, 5:55 AM
9 points
21 comments2 min readLW link

Google an­nounces Path­ways: new gen­er­a­tion mul­ti­task AI Architecture

OzyrusOct 29, 2021, 11:55 AM
6 points
1 comment1 min readLW link
(blog.google)

What is the most evil AI that we could build, to­day?

ThomasJNov 1, 2021, 7:58 PM
−2 points
14 comments1 min readLW link

Why we need proso­cial agents

Akbir KhanNov 2, 2021, 3:19 PM
6 points
0 comments2 min readLW link

Pos­si­ble re­search di­rec­tions to im­prove the mechanis­tic ex­pla­na­tion of neu­ral networks

delton137Nov 9, 2021, 2:36 AM
29 points
8 comments9 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius HobbhahnNov 8, 2021, 12:51 PM
26 points
15 comments12 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

RobboNov 7, 2021, 12:00 AM
35 points
10 comments7 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJulesNov 11, 2021, 7:04 AM
2 points
2 comments1 min readLW link

Stop but­ton: to­wards a causal solution

tailcalledNov 12, 2021, 7:09 PM
23 points
37 comments9 min readLW link

A FLI post­doc­toral grant ap­pli­ca­tion: AI al­ign­ment via causal anal­y­sis and de­sign of agents

PabloAMCNov 13, 2021, 1:44 AM
4 points
0 comments7 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant DemareeNov 14, 2021, 8:09 AM
73 points
43 comments3 min readLW link

At­tempted Gears Anal­y­sis of AGI In­ter­ven­tion Dis­cus­sion With Eliezer

ZviNov 15, 2021, 3:50 AM
204 points
48 comments16 min readLW link
(thezvi.wordpress.com)

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhubNov 16, 2021, 1:49 AM
78 points
47 comments6 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair HalberstadtNov 16, 2021, 7:55 PM
10 points
2 comments6 min readLW link

Some real ex­am­ples of gra­di­ent hacking

Oliver SourbutNov 22, 2021, 12:11 AM
15 points
8 comments2 min readLW link

[linkpost] Ac­qui­si­tion of Chess Knowl­edge in AlphaZero

Quintin PopeNov 23, 2021, 7:55 AM
8 points
1 comment1 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

Nov 23, 2021, 7:16 PM
64 points
13 comments3 min readLW link
(aitracker.org)

AI Safety Needs Great Engineers

Andy JonesNov 23, 2021, 3:40 PM
78 points
45 comments4 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika SrikumarNov 24, 2021, 8:27 AM
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius HobbhahnNov 29, 2021, 3:18 PM
16 points
5 comments7 min readLW link

AI Gover­nance Fun­da­men­tals—Cur­ricu­lum and Application

MauNov 30, 2021, 2:19 AM
17 points
0 comments16 min readLW link

Be­hav­ior Clon­ing is Miscalibrated

leogaoDec 5, 2021, 1:36 AM
53 points
3 comments3 min readLW link

ML Align­ment The­ory Pro­gram un­der Evan Hubinger

Dec 6, 2021, 12:03 AM
82 points
3 comments2 min readLW link

In­for­ma­tion bot­tle­neck for coun­ter­fac­tual corrigibility

tailcalledDec 6, 2021, 5:11 PM
8 points
1 comment7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

Dec 6, 2021, 1:54 PM
54 points
1 comment12 min readLW link

Find­ing the mul­ti­ple ground truths of CoinRun and image classification

Stuart_ArmstrongDec 8, 2021, 6:13 PM
15 points
3 comments2 min readLW link

[Question] What al­ign­ment-re­lated con­cepts should be bet­ter known in the broader ML com­mu­nity?

Lauro LangoscoDec 9, 2021, 8:44 PM
6 points
4 comments1 min readLW link

Un­der­stand­ing Gra­di­ent Hacking

peterbarnettDec 10, 2021, 3:58 PM
30 points
5 comments30 min readLW link

What’s the back­ward-for­ward FLOP ra­tio for Neu­ral Net­works?

Dec 13, 2021, 8:54 AM
17 points
8 comments10 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM
111 points
9 comments15 min readLW link

Disen­tan­gling Per­spec­tives On Strat­egy-Steal­ing in AI Safety

shawnghuDec 18, 2021, 8:13 PM
20 points
1 comment11 min readLW link

De­mand­ing and De­sign­ing Aligned Cog­ni­tive Architectures

Koen.HoltmanDec 21, 2021, 5:32 PM
8 points
5 comments5 min readLW link

Po­ten­tial gears level ex­pla­na­tions of smooth progress

ryan_greenblattDec 22, 2021, 6:05 PM
4 points
2 comments2 min readLW link

Trans­former Circuits

evhubDec 22, 2021, 9:09 PM
142 points
4 comments3 min readLW link
(transformer-circuits.pub)

Gra­di­ent Hack­ing via Schel­ling Goals

Adam ScherlisDec 28, 2021, 8:38 PM
33 points
4 comments4 min readLW link

Reader-gen­er­ated Essays

Henrik KarlssonJan 3, 2022, 8:56 AM
17 points
0 comments6 min readLW link
(escapingflatland.substack.com)

Brain Effi­ciency: Much More than You Wanted to Know

jacob_cannellJan 6, 2022, 3:38 AM
195 points
87 comments28 min readLW link

Un­der­stand­ing the two-head strat­egy for teach­ing ML to an­swer ques­tions honestly

Adam ScherlisJan 11, 2022, 11:24 PM
28 points
1 comment10 min readLW link

Plan B in AI Safety approach

avturchinJan 13, 2022, 12:03 PM
33 points
9 comments2 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM
65 points
14 comments13 min readLW link

How I’m think­ing about GPT-N

delton137Jan 17, 2022, 5:11 PM
46 points
21 comments18 min readLW link

Align­ment Prob­lems All the Way Down

peterbarnettJan 22, 2022, 12:19 AM
26 points
7 comments10 min readLW link

[Question] How fea­si­ble/​costly would it be to train a very large AI model on dis­tributed clusters of GPUs?

AnonymousJan 25, 2022, 7:20 PM
7 points
4 comments1 min readLW link

Causal­ity, Trans­for­ma­tive AI and al­ign­ment—part I

Marius HobbhahnJan 27, 2022, 4:18 PM
13 points
11 comments8 min readLW link

2+2: On­tolog­i­cal Framework

LyrialtusFeb 1, 2022, 1:07 AM
−15 points
2 comments12 min readLW link

QNR prospects are im­por­tant for AI al­ign­ment research

Eric DrexlerFeb 3, 2022, 3:20 PM
82 points
10 comments11 min readLW link

Paradigm-build­ing: Introduction

Cameron BergFeb 8, 2022, 12:06 AM
25 points
0 comments2 min readLW link

Paradigm-build­ing: The hi­er­ar­chi­cal ques­tion framework

Cameron BergFeb 9, 2022, 4:47 PM
11 points
16 comments3 min readLW link

Ques­tion 1: Pre­dicted ar­chi­tec­ture of AGI learn­ing al­gorithm(s)

Cameron BergFeb 10, 2022, 5:22 PM
12 points
1 comment7 min readLW link

Ques­tion 2: Pre­dicted bad out­comes of AGI learn­ing architecture

Cameron BergFeb 11, 2022, 10:23 PM
5 points
1 comment10 min readLW link

Ques­tion 3: Con­trol pro­pos­als for min­i­miz­ing bad outcomes

Cameron BergFeb 12, 2022, 7:13 PM
5 points
1 comment7 min readLW link

Ques­tion 4: Im­ple­ment­ing the con­trol proposals

Cameron BergFeb 13, 2022, 5:12 PM
6 points
2 comments5 min readLW link

Ques­tion 5: The timeline hyperparameter

Cameron BergFeb 14, 2022, 4:38 PM
5 points
3 comments7 min readLW link

Paradigm-build­ing: Con­clu­sion and prac­ti­cal takeaways

Cameron BergFeb 15, 2022, 4:11 PM
2 points
1 comment2 min readLW link

How com­plex are my­opic imi­ta­tors?

Vivek HebbarFeb 8, 2022, 12:00 PM
23 points
1 comment15 min readLW link

Me­tac­u­lus launches con­test for es­says with quan­ti­ta­tive pre­dic­tions about AI

Feb 8, 2022, 4:07 PM
25 points
2 comments1 min readLW link
(www.metaculus.com)

Hy­poth­e­sis: gra­di­ent de­scent prefers gen­eral circuits

Quintin PopeFeb 8, 2022, 9:12 PM
40 points
26 comments11 min readLW link

Com­pute Trends Across Three eras of Ma­chine Learning

Feb 16, 2022, 2:18 PM
91 points
13 comments2 min readLW link

[Question] Is the com­pe­ti­tion/​co­op­er­a­tion be­tween sym­bolic AI and statis­ti­cal AI (ML) about his­tor­i­cal ap­proach to re­search /​ en­g­ineer­ing, or is it more fun­da­men­tally about what in­tel­li­gent agents “are”?

Edward HammondFeb 17, 2022, 11:11 PM
1 point
1 comment2 min readLW link

HCH and Ad­ver­sar­ial Questions

David UdellFeb 19, 2022, 12:52 AM
15 points
7 comments26 min readLW link

Thoughts on Danger­ous Learned Optimization

peterbarnettFeb 19, 2022, 10:46 AM
4 points
2 comments4 min readLW link

Rel­a­tivized Defi­ni­tions as a Method to Sidestep the Löbian Obstacle

homotowatFeb 27, 2022, 6:37 AM
27 points
4 comments7 min readLW link

What we know about ma­chine learn­ing’s repli­ca­tion crisis

Younes KamelMar 5, 2022, 11:55 PM
35 points
4 comments6 min readLW link
(youneskamel.substack.com)

Pro­ject­ing com­pute trends in Ma­chine Learning

Mar 7, 2022, 3:32 PM
59 points
5 comments6 min readLW link

[Sur­vey] Ex­pec­ta­tions of a Post-ASI Order

Lone PineMar 9, 2022, 7:17 PM
5 points
0 comments1 min readLW link

A Longlist of The­o­ries of Im­pact for Interpretability

Neel NandaMar 11, 2022, 2:55 PM
106 points
29 comments5 min readLW link

New GPT3 Im­pres­sive Ca­pa­bil­ities—In­struc­tGPT3 [1/​2]

simeon_cMar 13, 2022, 10:58 AM
71 points
10 comments7 min readLW link

Phase tran­si­tions and AGI

Mar 17, 2022, 5:22 PM
44 points
19 comments9 min readLW link
(www.metaculus.com)

Can we simu­late hu­man evolu­tion to cre­ate a some­what al­igned AGI?

Thomas KwaMar 28, 2022, 10:55 PM
21 points
7 comments7 min readLW link

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

Apr 4, 2022, 12:59 PM
69 points
20 comments16 min readLW link

My agenda for re­search into trans­former ca­pa­bil­ities—Introduction

p.b.Apr 5, 2022, 9:23 PM
11 points
1 comment3 min readLW link

Re­search agenda: Can trans­form­ers do sys­tem 2 think­ing?

p.b.Apr 6, 2022, 1:31 PM
20 points
0 comments2 min readLW link

PaLM in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lukas FinnvedenApr 6, 2022, 1:05 PM
80 points
19 comments2 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.Apr 7, 2022, 12:25 PM
8 points
2 comments2 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

simeon_cApr 7, 2022, 1:46 PM
11 points
0 comments7 min readLW link

Play­ing with DALL·E 2

Dave OrrApr 7, 2022, 6:49 PM
165 points
116 comments6 min readLW link

Progress Re­port 4: logit lens redux

Nathan Helm-BurgerApr 8, 2022, 6:35 PM
3 points
0 comments2 min readLW link

Hyper­bolic takeoff

Ege ErdilApr 9, 2022, 3:57 PM
17 points
8 comments10 min readLW link
(www.metaculus.com)

Elicit: Lan­guage Models as Re­search Assistants

Apr 9, 2022, 2:56 PM
70 points
7 comments13 min readLW link

Is it time to start think­ing about what AI Friendli­ness means?

Victor NovikovApr 11, 2022, 9:32 AM
18 points
6 comments3 min readLW link

What more com­pute does for brain-like mod­els: re­sponse to Rohin

Nathan Helm-BurgerApr 13, 2022, 3:40 AM
22 points
14 comments11 min readLW link

Align­ment and Deep Learning

AiyenApr 17, 2022, 12:02 AM
44 points
35 comments8 min readLW link

[$20K in Prizes] AI Safety Ar­gu­ments Competition

Apr 26, 2022, 4:13 PM
74 points
543 comments3 min readLW link

SERI ML Align­ment The­ory Schol­ars Pro­gram 2022

Apr 27, 2022, 12:43 AM
56 points
6 comments3 min readLW link

[Question] What is a train­ing “step” vs. “epi­sode” in ma­chine learn­ing?

Evan R. MurphyApr 28, 2022, 9:53 PM
9 points
4 comments1 min readLW link

Prize for Align­ment Re­search Tasks

Apr 29, 2022, 8:57 AM
63 points
36 comments10 min readLW link

Quick Thoughts on A.I. Governance

Nicholas / Heather KrossApr 30, 2022, 2:49 PM
66 points
8 comments2 min readLW link
(www.thinkingmuchbetter.com)

What DALL-E 2 can and can­not do

Swimmer963 (Miranda Dixon-Luinenburg) May 1, 2022, 11:51 PM
351 points
305 comments9 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

May 6, 2022, 9:37 AM
12 points
7 comments17 min readLW link

[Linkpost] diffu­sion mag­ne­tizes man­i­folds (DALL-E 2 in­tu­ition build­ing)

Paul BricmanMay 7, 2022, 11:01 AM
1 point
0 comments1 min readLW link
(paulbricman.com)

Up­dat­ing Utility Functions

May 9, 2022, 9:44 AM
36 points
7 comments8 min readLW link

Con­di­tions for math­e­mat­i­cal equiv­alence of Stochas­tic Gra­di­ent Des­cent and Nat­u­ral Selection

Oliver SourbutMay 9, 2022, 9:38 PM
54 points
12 comments10 min readLW link

AI safety should be made more ac­cessible us­ing non text-based media

MassimogMay 10, 2022, 3:14 AM
2 points
4 comments4 min readLW link

The limits of AI safety via debate

Marius HobbhahnMay 10, 2022, 1:33 PM
28 points
7 comments10 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. MurphyMay 12, 2022, 7:59 PM
16 points
0 comments8 min readLW link

Gato as the Dawn of Early AGI

David UdellMay 15, 2022, 6:52 AM
84 points
29 comments12 min readLW link

Is AI Progress Im­pos­si­ble To Pre­dict?

alyssavanceMay 15, 2022, 6:30 PM
276 points
38 comments2 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

May 16, 2022, 9:21 PM
57 points
6 comments6 min readLW link

Gato’s Gen­er­al­i­sa­tion: Pre­dic­tions and Ex­per­i­ments I’d Like to See

Oliver SourbutMay 18, 2022, 7:15 AM
43 points
3 comments10 min readLW link

Un­der­stand­ing Gato’s Su­per­vised Re­in­force­ment Learning

lorepieriMay 18, 2022, 11:08 AM
3 points
5 comments1 min readLW link
(lorenzopieri.com)

A Story of AI Risk: In­struc­tGPT-N

peterbarnettMay 26, 2022, 11:22 PM
24 points
0 comments8 min readLW link

[Linkpost] A Chi­nese AI op­ti­mized for killing

RomanSJun 3, 2022, 9:17 AM
−2 points
4 comments1 min readLW link

Give the AI safe tools

Adam JermynJun 3, 2022, 5:04 PM
3 points
0 comments4 min readLW link

Towards a For­mal­i­sa­tion of Re­turns on Cog­ni­tive Rein­vest­ment (Part 1)

DragonGodJun 4, 2022, 6:42 PM
17 points
8 comments13 min readLW link

Give the model a model-builder

Adam JermynJun 6, 2022, 12:21 PM
3 points
0 comments5 min readLW link

AGI Safety FAQ /​ all-dumb-ques­tions-al­lowed thread

Aryeh EnglanderJun 7, 2022, 5:47 AM
221 points
515 comments4 min readLW link

Em­bod­i­ment is Indis­pens­able for AGI

P. G. Keerthana GopalakrishnanJun 7, 2022, 9:31 PM
6 points
1 comment6 min readLW link
(keerthanapg.com)

You Only Get One Shot: an In­tu­ition Pump for Embed­ded Agency

Oliver SourbutJun 9, 2022, 9:38 PM
22 points
4 comments2 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleeseJun 10, 2022, 10:35 PM
32 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane RuthenisJun 11, 2022, 6:29 PM
43 points
5 comments4 min readLW link

ELK Pro­posal—Make the Re­porter care about the Pre­dic­tor’s beliefs

Jun 11, 2022, 10:53 PM
8 points
0 comments6 min readLW link

Grokking “Semi-in­for­ma­tive pri­ors over AI timelines”

anson.hoJun 12, 2022, 10:17 PM
15 points
7 comments14 min readLW link

[Question] Favourite new AI pro­duc­tivity tools?

Gabe MJun 15, 2022, 1:08 AM
14 points
5 comments1 min readLW link

Con­tra Hofs­tadter on GPT-3 Nonsense

ricticJun 15, 2022, 9:53 PM
235 points
22 comments2 min readLW link

[Question] What if LaMDA is in­deed sen­tient /​ self-aware /​ worth hav­ing rights?

RomanSJun 16, 2022, 9:10 AM
22 points
13 comments1 min readLW link

Ten ex­per­i­ments in mod­u­lar­ity, which we’d like you to run!

Jun 16, 2022, 9:17 AM
59 points
2 comments9 min readLW link

Align­ment re­search for “meta” purposes

acylhalideJun 16, 2022, 2:03 PM
15 points
0 comments1 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso64Jun 19, 2022, 5:35 PM
10 points
8 comments1 min readLW link

Half-baked al­ign­ment idea: train­ing to generalize

Aaron BergmanJun 19, 2022, 8:16 PM
7 points
2 comments4 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM
9 points
7 comments9 min readLW link

Miti­gat­ing the dam­age from un­al­igned ASI by co­op­er­at­ing with aliens that don’t ex­ist yet

MSRayneJun 21, 2022, 4:12 PM
−8 points
7 comments6 min readLW link

AI Train­ing Should Allow Opt-Out

alyssavanceJun 23, 2022, 1:33 AM
76 points
13 comments6 min readLW link

Up­dated Defer­ence is not a strong ar­gu­ment against the util­ity un­cer­tainty ap­proach to alignment

Ivan VendrovJun 24, 2022, 7:32 PM
20 points
8 comments4 min readLW link

SunPJ in Alenia

FlorianHJun 25, 2022, 7:39 PM
7 points
19 comments8 min readLW link
(plausiblestuff.com)

Con­di­tion­ing Gen­er­a­tive Models

Adam JermynJun 25, 2022, 10:15 PM
22 points
18 comments10 min readLW link

Train­ing Trace Pri­ors and Speed Priors

Adam JermynJun 26, 2022, 6:07 PM
17 points
0 comments3 min readLW link

De­liber­a­tion Every­where: Sim­ple Examples

Oliver SourbutJun 27, 2022, 5:26 PM
14 points
0 comments15 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver SourbutJun 27, 2022, 5:25 PM
10 points
0 comments11 min readLW link

For­mal Philos­o­phy and Align­ment Pos­si­ble Projects

Daniel HerrmannJun 30, 2022, 10:42 AM
33 points
5 comments8 min readLW link

Refram­ing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM
26 points
7 comments6 min readLW link

Trends in GPU price-performance

Jul 1, 2022, 3:51 PM
85 points
10 comments1 min readLW link
(epochai.org)

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanSJul 2, 2022, 5:45 PM
3 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben SmithJul 3, 2022, 2:51 AM
11 points
1 comment4 min readLW link

We Need a Con­soli­dated List of Bad AI Align­ment Solutions

DoubleJul 4, 2022, 6:54 AM
9 points
14 comments1 min readLW link

A com­pressed take on re­cent disagreements

kmanJul 4, 2022, 4:39 AM
33 points
9 comments1 min readLW link

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM
61 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

The cu­ri­ous case of Pretty Good hu­man in­ner/​outer alignment

PavleMihaJul 5, 2022, 7:04 PM
41 points
45 comments4 min readLW link

In­tro­duc­ing the Fund for Align­ment Re­search (We’re Hiring!)

Jul 6, 2022, 2:07 AM
59 points
0 comments4 min readLW link

Outer vs in­ner mis­al­ign­ment: three framings

Richard_NgoJul 6, 2022, 7:46 PM
43 points
4 comments9 min readLW link

Re­sponse to Blake Richards: AGI, gen­er­al­ity, al­ign­ment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM
59 points
9 comments15 min readLW link

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM
45 points
15 comments4 min readLW link

De­cep­tion?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM
50 points
5 comments13 min readLW link

Four ques­tions I ask AI safety researchers

Orpheus16Jul 17, 2022, 5:25 PM
17 points
0 comments1 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM
15 points
1 comment10 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
40 points
8 comments22 min readLW link

In­for­ma­tion the­o­retic model anal­y­sis may not lend much in­sight, but we may have been do­ing them wrong!

Garrett BakerJul 24, 2022, 12:42 AM
7 points
0 comments10 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
78 points
11 comments8 min readLW link

Our Ex­ist­ing Solu­tions to AGI Align­ment (semi-safe)

Michael SoareverixJul 21, 2022, 7:00 PM
12 points
1 comment3 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTroutJul 25, 2022, 12:03 AM
252 points
97 comments10 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane RuthenisJul 23, 2022, 7:27 PM
24 points
1 comment12 min readLW link

AGI Safety Needs Peo­ple With All Skil­lsets!

Severin T. SeehrichJul 25, 2022, 1:32 PM
28 points
0 comments2 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Jul 29, 2022, 7:07 PM
119 points
6 comments19 min readLW link

Hu­mans Reflect­ing on HRH

leogaoJul 29, 2022, 9:56 PM
20 points
4 comments2 min readLW link

[Question] Would “Man­hat­tan Pro­ject” style be benefi­cial or dele­te­ri­ous for AI Align­ment?

Valentin2026Aug 4, 2022, 7:12 PM
5 points
1 comment1 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM
37 points
1 comment13 min readLW link

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
143 points
30 comments3 min readLW link

For­mal­iz­ing Alignment

Marv KAug 10, 2022, 6:50 PM
3 points
0 comments2 min readLW link

My sum­mary of the al­ign­ment problem

Peter HroššoAug 11, 2022, 7:42 PM
16 points
3 comments2 min readLW link
(threadreaderapp.com)

Ar­tifi­cial in­tel­li­gence wireheading

Big TonyAug 12, 2022, 3:06 AM
3 points
2 comments1 min readLW link

In­fant AI Scenario

Nathan1123Aug 12, 2022, 9:20 PM
1 point
0 comments3 min readLW link

Gra­di­ent de­scent doesn’t se­lect for in­ner search

Ivan VendrovAug 13, 2022, 4:15 AM
36 points
23 comments4 min readLW link

No short­cuts to knowl­edge: Why AI needs to ease up on scal­ing and learn how to code

YldedlyAug 15, 2022, 8:42 AM
4 points
0 comments1 min readLW link
(deoxyribose.github.io)

Mesa-op­ti­miza­tion for goals defined only within a train­ing en­vi­ron­ment is dangerous

Rubi J. HudsonAug 17, 2022, 3:56 AM
6 points
2 comments4 min readLW link

The longest train­ing run

Aug 17, 2022, 5:18 PM
68 points
11 comments9 min readLW link
(epochai.org)

Matt Ygle­sias on AI Policy

Grant DemareeAug 17, 2022, 11:57 PM
25 points
1 comment1 min readLW link
(www.slowboring.com)

Epistemic Arte­facts of (con­cep­tual) AI al­ign­ment research

Aug 19, 2022, 5:18 PM
30 points
1 comment5 min readLW link

A Bite Sized In­tro­duc­tion to ELK

Luk27182Sep 17, 2022, 12:28 AM
5 points
0 comments6 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul BricmanAug 20, 2022, 10:01 AM
25 points
2 comments14 min readLW link

The ‘Bit­ter Les­son’ is Wrong

deepthoughtlifeAug 20, 2022, 4:15 PM
−9 points
14 comments2 min readLW link

My Plan to Build Aligned Superintelligence

apollonianbluesAug 21, 2022, 1:16 PM
18 points
7 comments8 min readLW link

Beliefs and Disagree­ments about Au­tomat­ing Align­ment Research

Ian McKenzieAug 24, 2022, 6:37 PM
92 points
4 comments7 min readLW link

Google AI in­te­grates PaLM with robotics: SayCan up­date [Linkpost]

Evan R. MurphyAug 24, 2022, 8:54 PM
25 points
0 comments1 min readLW link
(sites.research.google)

The Shard The­ory Align­ment Scheme

David UdellAug 25, 2022, 4:52 AM
47 points
33 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh EnglanderAug 27, 2022, 5:31 PM
13 points
4 comments1 min readLW link

(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

Aug 29, 2022, 1:23 AM
345 points
83 comments38 min readLW link

Break­ing down the train­ing/​de­ploy­ment dichotomy

Erik JennerAug 28, 2022, 9:45 PM
29 points
4 comments3 min readLW link

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

Sep 1, 2022, 4:34 AM
28 points
4 comments18 min readLW link

Gra­di­ent Hacker De­sign Prin­ci­ples From Biology

johnswentworthSep 1, 2022, 7:03 PM
52 points
13 comments3 min readLW link

No, hu­man brains are not (much) more effi­cient than computers

Jesse HooglandSep 6, 2022, 1:53 PM
19 points
16 comments4 min readLW link
(www.jessehoogland.com)

Can “Re­ward Eco­nomics” solve AI Align­ment?

Q HomeSep 7, 2022, 7:58 AM
3 points
15 comments18 min readLW link

Gen­er­a­tors Of Disagree­ment With AI Alignment

George3d6Sep 7, 2022, 6:15 PM
26 points
9 comments9 min readLW link
(www.epistem.ink)

Search­ing for Mo­du­lar­ity in Large Lan­guage Models

Sep 8, 2022, 2:25 AM
43 points
3 comments14 min readLW link

We may be able to see sharp left turns coming

Sep 3, 2022, 2:55 AM
50 points
26 comments1 min readLW link

Gate­keeper Vic­tory: AI Box Reflection

Sep 9, 2022, 9:38 PM
4 points
5 comments9 min readLW link

Can you force a neu­ral net­work to keep gen­er­al­iz­ing?

Q HomeSep 12, 2022, 10:14 AM
2 points
10 comments5 min readLW link

Align­ment via proso­cial brain algorithms

Cameron BergSep 12, 2022, 1:48 PM
42 points
28 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasperSep 12, 2022, 7:07 PM
96 points
7 comments2 min readLW link
(arxiv.org)

Try­ing to find the un­der­ly­ing struc­ture of com­pu­ta­tional systems

Matthias G. MayerSep 13, 2022, 9:16 PM
17 points
9 comments4 min readLW link

[Question] Are Speed Su­per­in­tel­li­gences Fea­si­ble for Modern ML Tech­niques?

DragonGodSep 14, 2022, 12:59 PM
8 points
5 comments1 min readLW link

The Defen­der’s Ad­van­tage of Interpretability

Marius HobbhahnSep 14, 2022, 2:05 PM
41 points
4 comments6 min readLW link

When does tech­ni­cal work to re­duce AGI con­flict make a differ­ence?: Introduction

Sep 14, 2022, 7:38 PM
42 points
3 comments6 min readLW link

ACT-1: Trans­former for Actions

Daniel KokotajloSep 14, 2022, 7:09 PM
52 points
4 comments1 min readLW link
(www.adept.ai)

[Question] Fore­cast­ing thread: How does AI risk level vary based on timelines?

eliflandSep 14, 2022, 11:56 PM
33 points
7 comments1 min readLW link

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín SotoSep 15, 2022, 5:23 AM
9 points
0 comments10 min readLW link

Why de­cep­tive al­ign­ment mat­ters for AGI safety

Marius HobbhahnSep 15, 2022, 1:38 PM
48 points
12 comments13 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Orpheus16Sep 15, 2022, 6:37 PM
103 points
24 comments15 min readLW link

or­der­ing ca­pa­bil­ity thresholds

Tamsin LeakeSep 16, 2022, 4:36 PM
27 points
0 comments4 min readLW link
(carado.moe)

Levels of goals and alignment

zeshenSep 16, 2022, 4:44 PM
27 points
4 comments6 min readLW link

Katja Grace on Slow­ing Down AI, AI Ex­pert Sur­veys And Es­ti­mat­ing AI Risk

Michaël TrazziSep 16, 2022, 5:45 PM
40 points
2 comments3 min readLW link
(theinsideview.ai)

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 18, 2022, 1:08 PM
43 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John NaySep 18, 2022, 8:39 PM
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Align­ment Org Cheat Sheet

Sep 20, 2022, 5:36 PM
63 points
6 comments4 min readLW link

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

YitzSep 23, 2022, 5:08 AM
144 points
42 comments6 min readLW link

Nearcast-based “de­ploy­ment prob­lem” analysis

HoldenKarnofskySep 21, 2022, 6:52 PM
78 points
2 comments26 min readLW link

Math­e­mat­i­cal Cir­cuits in Neu­ral Networks

Sean OsierSep 22, 2022, 3:48 AM
34 points
4 comments1 min readLW link
(www.youtube.com)

Un­der­stand­ing In­fra-Bayesi­anism: A Begin­ner-Friendly Video Series

Sep 22, 2022, 1:25 PM
114 points
6 comments2 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul BricmanSep 23, 2022, 3:30 PM
15 points
0 comments10 min readLW link

[Question] What Do AI Safety Pitches Not Get About Your Field?

ArisSep 22, 2022, 9:27 PM
28 points
3 comments1 min readLW link

Let’s Com­pare Notes

Shoshannah TekofskySep 22, 2022, 8:47 PM
17 points
3 comments6 min readLW link

Brain-over-body bi­ases, and the em­bod­ied value prob­lem in AI alignment

geoffreymillerSep 24, 2022, 10:24 PM
10 points
6 comments25 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM
32 points
2 comments2 min readLW link

You are Un­der­es­ti­mat­ing The Like­li­hood That Con­ver­gent In­stru­men­tal Sub­goals Lead to Aligned AGI

Mark NeyerSep 26, 2022, 2:22 PM
3 points
6 comments3 min readLW link

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

Sep 27, 2022, 11:13 PM
157 points
10 comments4 min readLW link

Threat-Re­sis­tant Bar­gain­ing Me­ga­post: In­tro­duc­ing the ROSE Value

DiffractorSep 28, 2022, 1:20 AM
89 points
11 comments53 min readLW link

Failure modes in a shard the­ory al­ign­ment plan

Thomas KwaSep 27, 2022, 10:34 PM
24 points
2 comments7 min readLW link

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

Quintin PopeSep 28, 2022, 4:02 PM
47 points
2 comments10 min readLW link

[Question] What’s the ac­tual ev­i­dence that AI mar­ket­ing tools are chang­ing prefer­ences in a way that makes them eas­ier to pre­dict?

EmrikOct 1, 2022, 3:21 PM
10 points
7 comments1 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

KriegerOct 2, 2022, 9:53 AM
7 points
6 comments1 min readLW link

AI Timelines via Cu­mu­la­tive Op­ti­miza­tion Power: Less Long, More Short

jacob_cannellOct 6, 2022, 12:21 AM
111 points
32 comments6 min readLW link

con­fu­sion about al­ign­ment requirements

Tamsin LeakeOct 6, 2022, 10:32 AM
28 points
10 comments3 min readLW link
(carado.moe)

Good on­tolo­gies in­duce com­mu­ta­tive diagrams

Erik JennerOct 9, 2022, 12:06 AM
40 points
5 comments14 min readLW link

Un­con­trol­lable AI as an Ex­is­ten­tial Risk

Karl von WendtOct 9, 2022, 10:36 AM
19 points
0 comments20 min readLW link

Ob­jects in Mir­ror Are Closer Than They Ap­pear...

VestoziaOct 11, 2022, 4:34 AM
2 points
7 comments9 min readLW link

Misal­ign­ment Harms Can Be Caused by Low In­tel­li­gence Systems

DialecticEelOct 11, 2022, 1:39 PM
11 points
3 comments1 min readLW link

Build­ing a trans­former from scratch—AI safety up-skil­ling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM
42 points
1 comment5 min readLW link

Help out Red­wood Re­search’s in­ter­pretabil­ity team by find­ing heuris­tics im­ple­mented by GPT-2 small

Oct 12, 2022, 9:25 PM
49 points
11 comments4 min readLW link

Science of Deep Learn­ing—a tech­ni­cal agenda

Marius HobbhahnOct 18, 2022, 2:54 PM
35 points
7 comments4 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

Oct 19, 2022, 1:17 AM
75 points
18 comments15 min readLW link

[Question] What Does AI Align­ment Suc­cess Look Like?

ShmiOct 20, 2022, 12:32 AM
23 points
7 comments1 min readLW link

AI Re­search Pro­gram Pre­dic­tion Markets

tailcalledOct 20, 2022, 1:42 PM
38 points
10 comments1 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John NayOct 21, 2022, 2:03 AM
3 points
18 comments54 min readLW link

Im­proved Se­cu­rity to Prevent Hacker-AI and Digi­tal Ghosts

Erland WittkotterOct 21, 2022, 10:11 AM
4 points
3 comments12 min readLW link

What will the scaled up GATO look like? (Up­dated with ques­tions)

Amal Oct 25, 2022, 12:44 PM
33 points
20 comments1 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM
−6 points
10 comments3 min readLW link

Re­sources that (I think) new al­ign­ment re­searchers should know about

Orpheus16Oct 28, 2022, 10:13 PM
69 points
8 comments4 min readLW link

Boundaries vs Frames

Scott GarrabrantOct 31, 2022, 3:14 PM
47 points
7 comments7 min readLW link

Ad­ver­sar­ial Poli­cies Beat Pro­fes­sional-Level Go AIs

sanxiynNov 3, 2022, 1:27 PM
31 points
35 comments1 min readLW link
(goattack.alignmentfund.org)

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
159 points
27 comments31 min readLW link

Sim­ple Way to Prevent Power-Seek­ing AI

research_prime_spaceDec 7, 2022, 12:26 AM
7 points
1 comment1 min readLW link

You can still fetch the coffee to­day if you’re dead tomorrow

davidadDec 9, 2022, 2:06 PM
58 points
15 comments5 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

Dec 14, 2022, 2:33 PM
22 points
2 comments11 min readLW link

Real­ism about rationality

Richard_NgoSep 16, 2018, 10:46 AM
180 points
145 comments4 min readLW link3 reviews
(thinkingcomplete.blogspot.com)

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben PaceOct 4, 2019, 4:08 AM
205 points
60 comments15 min readLW link2 reviews

The Parable of Pre­dict-O-Matic

abramdemskiOct 15, 2019, 12:49 AM
291 points
42 comments14 min readLW link2 reviews

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 18, 2018, 4:46 AM
190 points
26 comments62 min readLW link1 review

An Ortho­dox Case Against Utility Functions

abramdemskiApr 7, 2020, 7:18 PM
128 points
53 comments8 min readLW link2 reviews

“How con­ser­va­tive” should the par­tial max­imisers be?

Stuart_ArmstrongApr 13, 2020, 3:50 PM
30 points
8 comments2 min readLW link

[AN #95]: A frame­work for think­ing about how to make AI go well

Rohin ShahApr 15, 2020, 5:10 PM
20 points
2 comments10 min readLW link
(mailchi.mp)

AI Align­ment Pod­cast: An Overview of Tech­ni­cal AI Align­ment in 2018 and 2019 with Buck Sh­legeris and Ro­hin Shah

Palus AstraApr 16, 2020, 12:50 AM
58 points
27 comments89 min readLW link

Open ques­tion: are min­i­mal cir­cuits dae­mon-free?

paulfchristianoMay 5, 2018, 10:40 PM
81 points
70 comments2 min readLW link1 review

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_NgoJan 21, 2019, 12:41 PM
129 points
23 comments8 min readLW link

In­te­grat­ing Hid­den Vari­ables Im­proves Approximation

johnswentworthApr 16, 2020, 9:43 PM
15 points
4 comments1 min readLW link

AI Ser­vices as a Re­search Paradigm

VojtaKovarikApr 20, 2020, 1:00 PM
30 points
12 comments4 min readLW link
(docs.google.com)

Databases of hu­man be­havi­our and prefer­ences?

Stuart_ArmstrongApr 21, 2020, 6:06 PM
10 points
9 comments1 min readLW link

Critch on ca­reer ad­vice for ju­nior AI-x-risk-con­cerned researchers

Rob BensingerMay 12, 2018, 2:13 AM
117 points
25 comments4 min readLW link

Refram­ing Impact

TurnTroutSep 20, 2019, 7:03 PM
90 points
15 comments3 min readLW link1 review

De­scrip­tion vs simu­lated prediction

Richard Korzekwa Apr 22, 2020, 4:40 PM
26 points
0 comments5 min readLW link
(aiimpacts.org)

Deep­Mind team on speci­fi­ca­tion gaming

JoshuaFoxApr 23, 2020, 8:01 AM
30 points
2 comments1 min readLW link
(deepmind.com)

[Question] Does Agent-like Be­hav­ior Im­ply Agent-like Ar­chi­tec­ture?

Scott GarrabrantAug 23, 2019, 2:01 AM
54 points
7 comments1 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

Jun 7, 2019, 7:53 PM
78 points
4 comments6 min readLW link

De­cep­tive Alignment

Jun 5, 2019, 8:16 PM
97 points
11 comments17 min readLW link

The In­ner Align­ment Problem

Jun 4, 2019, 1:20 AM
99 points
17 comments13 min readLW link

How the MtG Color Wheel Ex­plains AI Safety

Scott GarrabrantFeb 15, 2019, 11:42 PM
57 points
4 comments6 min readLW link

[Question] How does Gra­di­ent Des­cent In­ter­act with Good­hart?

Scott GarrabrantFeb 2, 2019, 12:14 AM
68 points
19 comments4 min readLW link

For­mal Open Prob­lem in De­ci­sion Theory

Scott GarrabrantNov 29, 2018, 3:25 AM
35 points
11 comments4 min readLW link

The Ubiquitous Con­verse Law­vere Problem

Scott GarrabrantNov 29, 2018, 3:16 AM
21 points
0 comments2 min readLW link

Embed­ded Curiosities

Nov 8, 2018, 2:19 PM
88 points
1 comment2 min readLW link

Sub­sys­tem Alignment

Nov 6, 2018, 4:16 PM
100 points
12 comments1 min readLW link

Ro­bust Delegation

Nov 4, 2018, 4:38 PM
110 points
10 comments1 min readLW link

Embed­ded World-Models

Nov 2, 2018, 4:07 PM
87 points
16 comments1 min readLW link

De­ci­sion Theory

Oct 31, 2018, 6:41 PM
114 points
46 comments1 min readLW link

(A → B) → A

Scott GarrabrantSep 11, 2018, 10:38 PM
62 points
11 comments2 min readLW link

His­tory of the Devel­op­ment of Log­i­cal Induction

Scott GarrabrantAug 29, 2018, 3:15 AM
89 points
4 comments5 min readLW link

Op­ti­miza­tion Amplifies

Scott GarrabrantJun 27, 2018, 1:51 AM
98 points
12 comments4 min readLW link

What makes coun­ter­fac­tu­als com­pa­rable?

Chris_LeongApr 24, 2020, 10:47 PM
11 points
6 comments3 min readLW link

New Paper Ex­pand­ing on the Good­hart Taxonomy

Scott GarrabrantMar 14, 2018, 9:01 AM
17 points
4 comments1 min readLW link
(arxiv.org)

Sources of in­tu­itions and data on AGI

Scott GarrabrantJan 31, 2018, 11:30 PM
84 points
26 comments3 min readLW link

Corrigibility

paulfchristianoNov 27, 2018, 9:50 PM
52 points
7 comments6 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_ArmstrongMar 15, 2013, 9:09 AM
10 points
5 comments8 min readLW link

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_ArmstrongFeb 25, 2016, 2:03 PM
15 points
2 comments4 min readLW link

AI-cre­ated pseudo-deontology

Stuart_ArmstrongFeb 12, 2015, 9:11 PM
10 points
35 comments1 min readLW link

Eth­i­cal Injunctions

Eliezer YudkowskyOct 20, 2008, 11:00 PM
66 points
76 comments9 min readLW link

Mo­ti­vat­ing Ab­strac­tion-First De­ci­sion Theory

johnswentworthApr 29, 2020, 5:47 PM
42 points
16 comments5 min readLW link

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

Rohin ShahApr 29, 2020, 5:30 PM
15 points
0 comments10 min readLW link
(mailchi.mp)

My Up­dat­ing Thoughts on AI policy

Ben PaceMar 1, 2020, 7:06 AM
20 points
1 comment9 min readLW link

Use­ful Does Not Mean Secure

Ben PaceNov 30, 2019, 2:05 AM
46 points
12 comments11 min readLW link

[Question] What is the al­ter­na­tive to in­tent al­ign­ment called?

Richard_NgoApr 30, 2020, 2:16 AM
12 points
6 comments1 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraperApr 30, 2020, 10:47 AM
3 points
1 comment51 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM
43 points
19 comments7 min readLW link
(plato.stanford.edu)

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissaMay 2, 2020, 11:44 PM
19 points
9 comments2 min readLW link

How uniform is the neo­cor­tex?

zhukeepaMay 4, 2020, 2:16 AM
78 points
23 comments11 min readLW link1 review

Scott Garrabrant’s prob­lem on re­cov­er­ing Brouwer as a corol­lary of Lawvere

RupertMay 4, 2020, 10:01 AM
26 points
2 comments2 min readLW link

“AI and Effi­ciency”, OA (44✕ im­prove­ment in CNNs since 2012)

gwernMay 5, 2020, 4:32 PM
47 points
0 comments1 min readLW link
(openai.com)

Com­pet­i­tive safety via gra­dated curricula

Richard_NgoMay 5, 2020, 6:11 PM
38 points
5 comments5 min readLW link

Model­ing nat­u­ral­ized de­ci­sion prob­lems in lin­ear logic

jessicataMay 6, 2020, 12:15 AM
14 points
2 comments6 min readLW link
(unstableontology.com)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

Rohin ShahMay 6, 2020, 5:10 PM
22 points
3 comments9 min readLW link
(mailchi.mp)

[Question] Is AI safety re­search less par­alleliz­able than AI re­search?

Mati_RoyMay 10, 2020, 8:43 PM
9 points
5 comments1 min readLW link

Thoughts on im­ple­ment­ing cor­rigible ro­bust alignment

Steven ByrnesNov 26, 2019, 2:06 PM
26 points
2 comments6 min readLW link

Wire­head­ing is in the eye of the beholder

Stuart_ArmstrongJan 30, 2019, 6:23 PM
26 points
10 comments1 min readLW link

Wire­head­ing as a po­ten­tial prob­lem with the new im­pact measure

Stuart_ArmstrongSep 25, 2018, 2:15 PM
25 points
20 comments4 min readLW link

Wire­head­ing and discontinuity

Michele CampoloFeb 18, 2020, 10:49 AM
21 points
4 comments3 min readLW link

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

Rohin ShahMay 13, 2020, 5:20 PM
29 points
0 comments10 min readLW link
(mailchi.mp)

How should AIs up­date a prior over hu­man prefer­ences?

Stuart_ArmstrongMay 15, 2020, 1:14 PM
17 points
9 comments2 min readLW link

Con­jec­ture Workshop

johnswentworthMay 15, 2020, 10:41 PM
34 points
2 comments2 min readLW link

Multi-agent safety

Richard_NgoMay 16, 2020, 1:59 AM
31 points
8 comments5 min readLW link

The Mechanis­tic and Nor­ma­tive Struc­ture of Agency

Gordon Seidoh WorleyMay 18, 2020, 4:03 PM
15 points
4 comments1 min readLW link
(philpapers.org)

“Star­wink” by Alicorn

Zack_M_DavisMay 18, 2020, 8:17 AM
44 points
1 comment1 min readLW link
(alicorn.elcenia.com)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

Rohin ShahMay 20, 2020, 5:30 PM
33 points
2 comments12 min readLW link
(mailchi.mp)

Prob­a­bil­ities, weights, sums: pretty much the same for re­ward functions

Stuart_ArmstrongMay 20, 2020, 3:19 PM
11 points
1 comment2 min readLW link

[Question] Source code size vs learned model size in ML and in hu­mans?

riceissaMay 20, 2020, 8:47 AM
11 points
6 comments1 min readLW link

Com­par­ing re­ward learn­ing/​re­ward tam­per­ing formalisms

Stuart_ArmstrongMay 21, 2020, 12:03 PM
9 points
3 comments3 min readLW link

AGIs as collectives

Richard_NgoMay 22, 2020, 8:36 PM
22 points
23 comments4 min readLW link

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

Rohin ShahMay 27, 2020, 5:20 PM
15 points
0 comments10 min readLW link
(mailchi.mp)

AI Safety Dis­cus­sion Days

Linda LinseforsMay 27, 2020, 4:54 PM
13 points
1 comment3 min readLW link

Build­ing brain-in­spired AGI is in­finitely eas­ier than un­der­stand­ing the brain

Steven ByrnesJun 2, 2020, 2:13 PM
51 points
14 comments7 min readLW link

Spar­sity and in­ter­pretabil­ity?

Jun 1, 2020, 1:25 PM
41 points
3 comments7 min readLW link

GPT-3: A Summary

leogaoJun 2, 2020, 6:14 PM
20 points
0 comments1 min readLW link
(leogao.dev)

Inac­cessible information

paulfchristianoJun 3, 2020, 5:10 AM
84 points
17 comments14 min readLW link2 reviews
(ai-alignment.com)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

Rohin ShahJun 3, 2020, 5:20 PM
38 points
6 comments10 min readLW link
(mailchi.mp)

Feed­back is cen­tral to agency

Alex FlintJun 1, 2020, 12:56 PM
28 points
1 comment3 min readLW link

Think­ing About Su­per-Hu­man AI: An Ex­am­i­na­tion of Likely Paths and Ul­ti­mate Constitution

meanderingmooseJun 4, 2020, 11:22 PM
−3 points
0 comments7 min readLW link

Emer­gence and Con­trol: An ex­am­i­na­tion of our abil­ity to gov­ern the be­hav­ior of in­tel­li­gent systems

meanderingmooseJun 5, 2020, 5:10 PM
1 point
0 comments6 min readLW link

GAN Discrim­i­na­tors Don’t Gen­er­al­ize?

tryactionsJun 8, 2020, 8:36 PM
18 points
7 comments2 min readLW link

More on dis­am­biguat­ing “dis­con­ti­nu­ity”

Aryeh EnglanderJun 9, 2020, 3:16 PM
16 points
1 comment3 min readLW link

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

Rohin ShahJun 10, 2020, 5:20 PM
27 points
1 comment10 min readLW link
(mailchi.mp)

Dutch-Book­ing CDT: Re­vised Argument

abramdemskiOct 27, 2020, 4:31 AM
50 points
22 comments16 min readLW link

[Question] List of pub­lic pre­dic­tions of what GPT-X can or can’t do?

Daniel KokotajloJun 14, 2020, 2:25 PM
20 points
9 comments1 min readLW link

Achiev­ing AI al­ign­ment through de­liber­ate un­cer­tainty in mul­ti­a­gent systems

Florian DietzJun 15, 2020, 12:19 PM
3 points
10 comments7 min readLW link

Su­per­ex­po­nen­tial His­toric Growth, by David Roodman

Ben PaceJun 15, 2020, 9:49 PM
43 points
6 comments5 min readLW link
(www.openphilanthropy.org)

Re­lat­ing HCH and Log­i­cal Induction

abramdemskiJun 16, 2020, 10:08 PM
47 points
4 comments5 min readLW link

Image GPT

Daniel KokotajloJun 18, 2020, 11:41 AM
29 points
27 comments1 min readLW link
(openai.com)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

Rohin ShahJun 18, 2020, 5:10 PM
19 points
5 comments8 min readLW link
(mailchi.mp)

[Question] If AI is based on GPT, how to en­sure its safety?

avturchinJun 18, 2020, 8:33 PM
20 points
11 comments1 min readLW link

What’s Your Cog­ni­tive Al­gorithm?

RaemonJun 18, 2020, 10:16 PM
71 points
23 comments13 min readLW link

Rele­vant pre-AGI possibilities

Daniel KokotajloJun 20, 2020, 10:52 AM
38 points
7 comments19 min readLW link
(aiimpacts.org)

Plau­si­ble cases for HRAD work, and lo­cat­ing the crux in the “re­al­ism about ra­tio­nal­ity” debate

riceissaJun 22, 2020, 1:10 AM
85 points
15 comments10 min readLW link

The In­dex­ing Problem

johnswentworthJun 22, 2020, 7:11 PM
35 points
2 comments4 min readLW link

[Question] Re­quest­ing feed­back/​ad­vice: what Type The­ory to study for AI safety?

rvnntJun 23, 2020, 5:03 PM
7 points
4 comments3 min readLW link

Lo­cal­ity of goals

adamShimiJun 22, 2020, 9:56 PM
16 points
8 comments6 min readLW link

[Question] What is “In­stru­men­tal Cor­rigi­bil­ity”?

joebernsteinJun 23, 2020, 8:24 PM
4 points
1 comment1 min readLW link

Models, myths, dreams, and Cheshire cat grins

Stuart_ArmstrongJun 24, 2020, 10:50 AM
21 points
7 comments2 min readLW link

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

Rohin ShahJun 24, 2020, 5:30 PM
24 points
3 comments11 min readLW link
(mailchi.mp)

There’s an Awe­some AI Ethics List and it’s a lit­tle thin

AABoylesJun 25, 2020, 1:43 PM
13 points
1 comment1 min readLW link
(github.com)

GPT-3 Fic­tion Samples

gwernJun 25, 2020, 4:12 PM
63 points
18 comments1 min readLW link
(www.gwern.net)

Walk­through: The Trans­former Ar­chi­tec­ture [Part 1/​2]

Matthew BarnettJul 30, 2019, 1:54 PM
35 points
0 comments6 min readLW link

Ro­bust­ness as a Path to AI Alignment

abramdemskiOct 10, 2017, 8:14 AM
45 points
9 comments9 min readLW link

Rad­i­cal Prob­a­bil­ism [Tran­script]

Jun 26, 2020, 10:14 PM
46 points
12 comments6 min readLW link

AI safety via mar­ket making

evhubJun 26, 2020, 11:07 PM
55 points
45 comments11 min readLW link

[Question] Have gen­eral de­com­posers been for­mal­ized?

QuinnJun 27, 2020, 6:09 PM
8 points
5 comments1 min readLW link

Gary Mar­cus vs Cor­ti­cal Uniformity

Steven ByrnesJun 28, 2020, 6:18 PM
18 points
0 comments8 min readLW link

Web AI dis­cus­sion Groups

Donald HobsonJun 30, 2020, 11:22 AM
11 points
0 comments2 min readLW link

Com­par­ing AI Align­ment Ap­proaches to Min­i­mize False Pos­i­tive Risk

Gordon Seidoh WorleyJun 30, 2020, 7:34 PM
5 points
0 comments9 min readLW link

AvE: As­sis­tance via Empowerment

FactorialCodeJun 30, 2020, 10:07 PM
12 points
1 comment1 min readLW link
(arxiv.org)

Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

Palus AstraJul 1, 2020, 5:30 PM
35 points
4 comments67 min readLW link

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

Rohin ShahJul 1, 2020, 5:20 PM
14 points
2 comments11 min readLW link
(mailchi.mp)

The “AI De­bate” Debate

michaelcohenJul 2, 2020, 10:16 AM
20 points
20 comments3 min readLW link

Idea: Imi­ta­tion/​Value Learn­ing AIXI

Past AccountJul 3, 2020, 5:10 PM
3 points
6 comments1 min readLW link

Split­ting De­bate up into Two Subsystems

NandiJul 3, 2020, 8:11 PM
13 points
5 comments4 min readLW link

AI Un­safety via Non-Zero-Sum Debate

VojtaKovarikJul 3, 2020, 10:03 PM
25 points
10 comments5 min readLW link

Clas­sify­ing games like the Pri­soner’s Dilemma

philhJul 4, 2020, 5:10 PM
100 points
28 comments6 min readLW link1 review
(reasonableapproximation.net)

AI-Feyn­man as a bench­mark for what we should be aiming for

Faustus2Jul 4, 2020, 9:24 AM
8 points
1 comment2 min readLW link

Learn­ing the prior

paulfchristianoJul 5, 2020, 9:00 PM
79 points
29 comments8 min readLW link
(ai-alignment.com)

Bet­ter pri­ors as a safety problem

paulfchristianoJul 5, 2020, 9:20 PM
64 points
7 comments5 min readLW link
(ai-alignment.com)

[Question] How far is AGI?

Roko JelavićJul 5, 2020, 5:58 PM
6 points
5 comments1 min readLW link

Clas­sify­ing speci­fi­ca­tion prob­lems as var­i­ants of Good­hart’s Law

VikaAug 19, 2019, 8:40 PM
70 points
5 comments5 min readLW link1 review

New safety re­search agenda: scal­able agent al­ign­ment via re­ward modeling

VikaNov 20, 2018, 5:29 PM
34 points
13 comments1 min readLW link
(medium.com)

De­sign­ing agent in­cen­tives to avoid side effects

Mar 11, 2019, 8:55 PM
29 points
0 comments2 min readLW link
(medium.com)

Dis­cus­sion on the ma­chine learn­ing ap­proach to AI safety

VikaNov 1, 2018, 8:54 PM
26 points
3 comments4 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

VikaApr 3, 2018, 12:30 PM
43 points
9 comments1 min readLW link2 reviews

[Question] (an­swered: yes) Has any­one writ­ten up a con­sid­er­a­tion of Downs’s “Para­dox of Vot­ing” from the per­spec­tive of MIRI-ish de­ci­sion the­o­ries (UDT, FDT, or even just EDT)?

Jameson QuinnJul 6, 2020, 6:26 PM
10 points
24 comments1 min readLW link

New Deep­Mind AI Safety Re­search Blog

VikaSep 27, 2018, 4:28 PM
43 points
0 comments1 min readLW link
(medium.com)

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_ArmstrongJul 31, 2019, 6:48 PM
57 points
156 comments3 min readLW link

De­con­fus­ing Hu­man Values Re­search Agenda v1

Gordon Seidoh WorleyMar 23, 2020, 4:25 PM
27 points
12 comments4 min readLW link

[Question] How “hon­est” is GPT-3?

abramdemskiJul 8, 2020, 7:38 PM
72 points
18 comments5 min readLW link

What does it mean to ap­ply de­ci­sion the­ory?

abramdemskiJul 8, 2020, 8:31 PM
51 points
5 comments8 min readLW link

AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

habrykaJul 9, 2020, 2:49 AM
60 points
8 comments1 min readLW link
(arxiv.org)

The Un­rea­son­able Effec­tive­ness of Deep Learning

Richard_NgoSep 30, 2018, 3:48 PM
85 points
5 comments13 min readLW link
(thinkingcomplete.blogspot.com)

mAIry’s room: AI rea­son­ing to solve philo­soph­i­cal problems

Stuart_ArmstrongMar 5, 2019, 8:24 PM
92 points
41 comments6 min readLW link2 reviews

Failures of an em­bod­ied AIXI

So8resJun 15, 2014, 6:29 PM
48 points
46 comments12 min readLW link

The Prob­lem with AIXI

Rob BensingerMar 18, 2014, 1:55 AM
43 points
78 comments23 min readLW link

Ver­sions of AIXI can be ar­bi­trar­ily stupid

Stuart_ArmstrongAug 10, 2015, 1:23 PM
29 points
59 comments1 min readLW link

Reflec­tive AIXI and Anthropics

DiffractorSep 24, 2018, 2:15 AM
17 points
13 comments8 min readLW link

AIXI and Ex­is­ten­tial Despair

paulfchristianoDec 8, 2011, 8:03 PM
23 points
38 comments6 min readLW link

How to make AIXI-tl in­ca­pable of learning

itaibn0Jan 27, 2014, 12:05 AM
7 points
5 comments2 min readLW link

Help re­quest: What is the Kol­mogorov com­plex­ity of com­putable ap­prox­i­ma­tions to AIXI?

AnnaSalamonDec 5, 2010, 10:23 AM
9 points
9 comments1 min readLW link

“AIXIjs: A Soft­ware Demo for Gen­eral Re­in­force­ment Learn­ing”, As­lanides 2017

gwernMay 29, 2017, 9:09 PM
7 points
1 comment1 min readLW link
(arxiv.org)

Can AIXI be trained to do any­thing a hu­man can?

Stuart_ArmstrongOct 20, 2014, 1:12 PM
5 points
9 comments2 min readLW link

Shap­ing eco­nomic in­cen­tives for col­lab­o­ra­tive AGI

Kaj_SotalaJun 29, 2018, 4:26 PM
45 points
15 comments4 min readLW link

Is the Star Trek Fed­er­a­tion re­ally in­ca­pable of build­ing AI?

Kaj_SotalaMar 18, 2018, 10:30 AM
19 points
4 comments2 min readLW link
(kajsotala.fi)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM
33 points
4 comments6 min readLW link
(kajsotala.fi)

Mis­con­cep­tions about con­tin­u­ous takeoff

Matthew BarnettOct 8, 2019, 9:31 PM
79 points
38 comments4 min readLW link

Dist­in­guish­ing defi­ni­tions of takeoff

Matthew BarnettFeb 14, 2020, 12:16 AM
60 points
6 comments6 min readLW link

Book re­view: Ar­tifi­cial In­tel­li­gence Safety and Security

PeterMcCluskeyDec 8, 2018, 3:47 AM
27 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

Why AI may not foom

John_MaxwellMar 24, 2013, 8:11 AM
29 points
81 comments12 min readLW link

Hu­mans Who Are Not Con­cen­trat­ing Are Not Gen­eral Intelligences

sarahconstantinFeb 25, 2019, 8:40 PM
181 points
35 comments6 min readLW link1 review
(srconstantin.wordpress.com)

The Hacker Learns to Trust

Ben PaceJun 22, 2019, 12:27 AM
80 points
18 comments8 min readLW link
(medium.com)

Book Re­view: Hu­man Compatible

Scott AlexanderJan 31, 2020, 5:20 AM
77 points
6 comments16 min readLW link
(slatestarcodex.com)

SSC Jour­nal Club: AI Timelines

Scott AlexanderJun 8, 2017, 7:00 PM
12 points
15 comments8 min readLW link

Ar­gu­ments against my­opic training

Richard_NgoJul 9, 2020, 4:07 PM
56 points
39 comments12 min readLW link

On mo­ti­va­tions for MIRI’s highly re­li­able agent de­sign research

jessicataJan 29, 2017, 7:34 PM
27 points
1 comment5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_ArmstrongJul 9, 2020, 5:26 PM
16 points
1 comment2 min readLW link

My cur­rent take on the Paul-MIRI dis­agree­ment on al­ignabil­ity of messy AI

jessicataJan 29, 2017, 8:52 PM
21 points
0 comments10 min readLW link

Ben Go­ertzel: The Sin­gu­lar­ity In­sti­tute’s Scary Idea (and Why I Don’t Buy It)

Paul CrowleyOct 30, 2010, 9:31 AM
42 points
442 comments1 min readLW link

An An­a­lytic Per­spec­tive on AI Alignment

DanielFilanMar 1, 2020, 4:10 AM
54 points
45 comments8 min readLW link
(danielfilan.com)

Mechanis­tic Trans­parency for Ma­chine Learning

DanielFilanJul 11, 2018, 12:34 AM
54 points
9 comments4 min readLW link

A model I use when mak­ing plans to re­duce AI x-risk

Ben PaceJan 19, 2018, 12:21 AM
69 points
41 comments6 min readLW link

AI Re­searchers On AI Risk

Scott AlexanderMay 22, 2015, 11:16 AM
18 points
0 comments16 min readLW link

Mini ad­vent cal­en­dar of Xrisks: Ar­tifi­cial Intelligence

Stuart_ArmstrongDec 7, 2012, 11:26 AM
5 points
5 comments1 min readLW link

For FAI: Is “Molec­u­lar Nan­otech­nol­ogy” putting our best foot for­ward?

leplenJun 22, 2013, 4:44 AM
79 points
118 comments3 min readLW link

UFAI can­not be the Great Filter

ThrasymachusDec 22, 2012, 11:26 AM
59 points
92 comments3 min readLW link

Don’t Fear The Filter

Scott AlexanderMay 29, 2014, 12:45 AM
11 points
18 comments6 min readLW link

The Great Filter is early, or AI is hard

Stuart_ArmstrongAug 29, 2014, 4:17 PM
32 points
76 comments1 min readLW link

Talk: Key Is­sues In Near-Term AI Safety Research

Aryeh EnglanderJul 10, 2020, 6:36 PM
22 points
1 comment1 min readLW link

Mesa-Op­ti­miz­ers vs “Steered Op­ti­miz­ers”

Steven ByrnesJul 10, 2020, 4:49 PM
45 points
7 comments8 min readLW link

AlphaS­tar: Im­pres­sive for RL progress, not for AGI progress

orthonormalNov 2, 2019, 1:50 AM
113 points
58 comments2 min readLW link1 review

The Catas­trophic Con­ver­gence Conjecture

TurnTroutFeb 14, 2020, 9:16 PM
44 points
15 comments8 min readLW link

[Question] How well can the GPT ar­chi­tec­ture solve the par­ity task?

FactorialCodeJul 11, 2020, 7:02 PM
19 points
3 comments1 min readLW link

Sun­day July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stu­art_Armstrong

Jul 8, 2020, 12:27 AM
19 points
2 comments1 min readLW link

[Link] Word-vec­tor based DL sys­tem achieves hu­man par­ity in ver­bal IQ tests

jacob_cannellJun 13, 2015, 11:38 PM
17 points
8 comments1 min readLW link

The Power of Intelligence

Eliezer YudkowskyJan 1, 2007, 8:00 PM
66 points
4 comments4 min readLW link

Com­ments on CAIS

Richard_NgoJan 12, 2019, 3:20 PM
76 points
14 comments7 min readLW link

[Question] What are CAIS’ bold­est near/​medium-term pre­dic­tions?

Bird ConceptMar 28, 2019, 1:14 PM
31 points
17 comments1 min readLW link

Drexler on AI Risk

PeterMcCluskeyFeb 1, 2019, 5:11 AM
34 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

Six AI Risk/​Strat­egy Ideas

Wei DaiAug 27, 2019, 12:40 AM
64 points
18 comments4 min readLW link1 review

New re­port: In­tel­li­gence Ex­plo­sion Microeconomics

Eliezer YudkowskyApr 29, 2013, 11:14 PM
72 points
251 comments3 min readLW link

Book re­view: Hu­man Compatible

PeterMcCluskeyJan 19, 2020, 3:32 AM
37 points
2 comments5 min readLW link
(www.bayesianinvestor.com)

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTroutOct 10, 2019, 5:24 AM
63 points
35 comments5 min readLW link

Book Re­view: The AI Does Not Hate You

PeterMcCluskeyOct 28, 2019, 5:45 PM
26 points
0 comments5 min readLW link
(www.bayesianinvestor.com)

[Link] Book Re­view: ‘The AI Does Not Hate You’ by Tom Chivers (Scott Aaron­son)

eigenOct 7, 2019, 6:16 PM
19 points
0 comments1 min readLW link

Book Re­view: Life 3.0: Be­ing Hu­man in the Age of Ar­tifi­cial Intelligence

J Thomas MorosJan 18, 2018, 5:18 PM
8 points
0 comments1 min readLW link
(ferocioustruth.com)

Book Re­view: Weapons of Math Destruction

ZviJun 4, 2017, 9:20 PM
1 point
0 comments16 min readLW link

DARPA Digi­tal Tu­tor: Four Months to To­tal Tech­ni­cal Ex­per­tise?

SebastianG Jul 6, 2020, 11:34 PM
200 points
19 comments7 min readLW link

Paper: Su­per­in­tel­li­gence as a Cause or Cure for Risks of Astro­nom­i­cal Suffering

Kaj_SotalaJan 3, 2018, 2:39 PM
1 point
6 comments1 min readLW link
(www.informatica.si)

Prevent­ing s-risks via in­dex­i­cal un­cer­tainty, acausal trade and dom­i­na­tion in the multiverse

avturchinSep 27, 2018, 10:09 AM
11 points
6 comments4 min readLW link

Pre­face to CLR’s Re­search Agenda on Co­op­er­a­tion, Con­flict, and TAI

JesseCliftonDec 13, 2019, 9:02 PM
59 points
10 comments2 min readLW link

Sec­tions 1 & 2: In­tro­duc­tion, Strat­egy and Governance

JesseCliftonDec 17, 2019, 9:27 PM
34 points
5 comments14 min readLW link

Sec­tions 3 & 4: Cred­i­bil­ity, Peace­ful Bar­gain­ing Mechanisms

JesseCliftonDec 17, 2019, 9:46 PM
19 points
2 comments12 min readLW link

Sec­tions 5 & 6: Con­tem­po­rary Ar­chi­tec­tures, Hu­mans in the Loop

JesseCliftonDec 20, 2019, 3:52 AM
27 points
4 comments10 min readLW link

Sec­tion 7: Foun­da­tions of Ra­tional Agency

JesseCliftonDec 22, 2019, 2:05 AM
14 points
4 comments8 min readLW link

What counts as defec­tion?

TurnTroutJul 12, 2020, 10:03 PM
81 points
21 comments5 min readLW link1 review

The Com­mit­ment Races problem

Daniel KokotajloAug 23, 2019, 1:58 AM
122 points
39 comments5 min readLW link

Align­ment Newslet­ter #36

Rohin ShahDec 12, 2018, 1:10 AM
21 points
0 comments11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #47

Rohin ShahMar 4, 2019, 4:30 AM
18 points
0 comments8 min readLW link
(mailchi.mp)

Un­der­stand­ing “Deep Dou­ble Des­cent”

evhubDec 6, 2019, 12:00 AM
135 points
51 comments5 min readLW link4 reviews

[LINK] Strong AI Startup Raises $15M

olalondeAug 21, 2012, 8:47 PM
24 points
13 comments1 min readLW link

An­nounc­ing the AI Align­ment Prize

cousin_itNov 3, 2017, 3:47 PM
95 points
78 comments1 min readLW link

I’m leav­ing AI al­ign­ment – you bet­ter stay

rmoehnMar 12, 2020, 5:58 AM
150 points
19 comments5 min readLW link

New pa­per: AGI Agent Safety by Iter­a­tively Im­prov­ing the Utility Function

Koen.HoltmanJul 15, 2020, 2:05 PM
21 points
2 comments6 min readLW link

[Question] How should AI de­bate be judged?

abramdemskiJul 15, 2020, 10:20 PM
49 points
27 comments6 min readLW link

Align­ment pro­pos­als and com­plex­ity classes

evhubJul 16, 2020, 12:27 AM
33 points
26 comments13 min readLW link

[AN #107]: The con­ver­gent in­stru­men­tal sub­goals of goal-di­rected agents

Rohin ShahJul 16, 2020, 6:47 AM
13 points
1 comment8 min readLW link
(mailchi.mp)

[AN #108]: Why we should scru­ti­nize ar­gu­ments for AI risk

Rohin ShahJul 16, 2020, 6:47 AM
19 points
6 comments12 min readLW link
(mailchi.mp)

En­vi­ron­ments as a bot­tle­neck in AGI development

Richard_NgoJul 17, 2020, 5:02 AM
36 points
19 comments6 min readLW link

[Question] Can an agent use in­ter­ac­tive proofs to check the al­ign­ment of suc­ce­sors?

PabloAMCJul 17, 2020, 7:07 PM
7 points
2 comments1 min readLW link

Les­sons on AI Takeover from the conquistadors

Jul 17, 2020, 10:35 PM
58 points
30 comments5 min readLW link

What Would I Do? Self-pre­dic­tion in Sim­ple Algorithms

Scott GarrabrantJul 20, 2020, 4:27 AM
54 points
13 comments5 min readLW link

Wri­teup: Progress on AI Safety via Debate

Feb 5, 2020, 9:04 PM
94 points
18 comments33 min readLW link

Oper­a­tional­iz­ing Interpretability

lifelonglearnerJul 20, 2020, 5:22 AM
20 points
0 comments4 min readLW link

Learn­ing Values in Practice

Stuart_ArmstrongJul 20, 2020, 6:38 PM
24 points
0 comments5 min readLW link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

CullenJul 20, 2020, 10:52 PM
10 points
1 comment2 min readLW link
(cullenokeefe.com)

The Redis­cov­ery of In­te­ri­or­ity in Ma­chine Learning

DanBJul 21, 2020, 5:02 AM
5 points
4 comments1 min readLW link
(danburfoot.net)

The “AI Dun­geons” Dragon Model is heav­ily path de­pen­dent (test­ing GPT-3 on ethics)

Rafael HarthJul 21, 2020, 12:14 PM
44 points
9 comments6 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

BuckJul 21, 2020, 8:01 PM
78 points
44 comments3 min readLW link

Align­ment As A Bot­tle­neck To Use­ful­ness Of GPT-3

johnswentworthJul 21, 2020, 8:02 PM
111 points
57 comments3 min readLW link

$1000 bounty for OpenAI to show whether GPT3 was “de­liber­ately” pre­tend­ing to be stupi­der than it is

Bird ConceptJul 21, 2020, 6:42 PM
59 points
40 comments2 min readLW link
(twitter.com)

[Preprint] The Com­pu­ta­tional Limits of Deep Learning

Gordon Seidoh WorleyJul 21, 2020, 9:25 PM
9 points
2 comments1 min readLW link
(arxiv.org)

[AN #109]: Teach­ing neu­ral nets to gen­er­al­ize the way hu­mans would

Rohin ShahJul 22, 2020, 5:10 PM
17 points
3 comments9 min readLW link
(mailchi.mp)

Re­search agenda for AI safety and a bet­ter civilization

agilecavemanJul 22, 2020, 6:35 AM
12 points
2 comments16 min readLW link

Weak HCH ac­cesses EXP

evhubJul 22, 2020, 10:36 PM
14 points
0 comments3 min readLW link

GPT-3 Gems

TurnTroutJul 23, 2020, 12:46 AM
33 points
10 comments48 min readLW link

Op­ti­miz­ing ar­bi­trary ex­pres­sions with a lin­ear num­ber of queries to a Log­i­cal In­duc­tion Or­a­cle (Car­toon Guide)

Donald HobsonJul 23, 2020, 9:37 PM
3 points
2 comments2 min readLW link

[Question] Con­struct a port­fo­lio to profit from AI progress.

sapphireJul 25, 2020, 8:18 AM
29 points
13 comments1 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_PorterOct 16, 2012, 4:33 AM
21 points
39 comments1 min readLW link

Goal re­ten­tion dis­cus­sion with Eliezer

Max TegmarkSep 4, 2014, 10:23 PM
93 points
26 comments6 min readLW link

[Question] Where do peo­ple dis­cuss do­ing things with GPT-3?

skybrianJul 26, 2020, 2:31 PM
2 points
7 comments1 min readLW link

You Can Prob­a­bly Am­plify GPT3 Directly

Past AccountJul 26, 2020, 9:58 PM
34 points
14 comments6 min readLW link

[up­dated] how does gpt2′s train­ing cor­pus cap­ture in­ter­net dis­cus­sion? not well

nostalgebraistJul 27, 2020, 10:30 PM
25 points
3 comments2 min readLW link
(nostalgebraist.tumblr.com)

Agen­tic Lan­guage Model Memes

FactorialCodeAug 1, 2020, 6:03 PM
16 points
1 comment2 min readLW link

A com­mu­nity-cu­rated repos­i­tory of in­ter­est­ing GPT-3 stuff

Rudi CJul 28, 2020, 2:16 PM
8 points
0 comments1 min readLW link
(github.com)

[Question] Does the lot­tery ticket hy­poth­e­sis sug­gest the scal­ing hy­poth­e­sis?

Daniel KokotajloJul 28, 2020, 7:52 PM
14 points
17 comments1 min readLW link

[Question] To what ex­tent are the scal­ing prop­er­ties of Trans­former net­works ex­cep­tional?

abramdemskiJul 28, 2020, 8:06 PM
30 points
1 comment1 min readLW link

[Question] What hap­pens to var­i­ance as neu­ral net­work train­ing is scaled? What does it im­ply about “lot­tery tick­ets”?

abramdemskiJul 28, 2020, 8:22 PM
25 points
4 comments1 min readLW link

[Question] How will in­ter­net fo­rums like LW be able to defend against GPT-style spam?

ChristianKlJul 28, 2020, 8:12 PM
14 points
18 comments1 min readLW link

Pre­dic­tions for GPT-N

hippkeJul 29, 2020, 1:16 AM
36 points
31 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_itJan 15, 2018, 2:33 PM
80 points
68 comments2 min readLW link

Jeff Hawk­ins on neu­ro­mor­phic AGI within 20 years

Steven ByrnesJul 15, 2019, 7:16 PM
167 points
24 comments12 min readLW link

Cas­cades, Cy­cles, In­sight...

Eliezer YudkowskyNov 24, 2008, 9:33 AM
31 points
31 comments8 min readLW link

...Re­cur­sion, Magic

Eliezer YudkowskyNov 25, 2008, 9:10 AM
27 points
28 comments5 min readLW link

Refer­ences & Re­sources for LessWrong

XiXiDuOct 10, 2010, 2:54 PM
153 points
106 comments20 min readLW link

[Question] A game de­signed to beat AI?

Long tryMar 17, 2020, 3:51 AM
13 points
29 comments1 min readLW link

Truly Part Of You

Eliezer YudkowskyNov 21, 2007, 2:18 AM
149 points
59 comments4 min readLW link

[AN #110]: Learn­ing fea­tures from hu­man feed­back to en­able re­ward learning

Rohin ShahJul 29, 2020, 5:20 PM
13 points
2 comments10 min readLW link
(mailchi.mp)

Struc­tured Tasks for Lan­guage Models

Past AccountJul 29, 2020, 2:17 PM
5 points
0 comments1 min readLW link

En­gag­ing Se­ri­ously with Short Timelines

sapphireJul 29, 2020, 7:21 PM
43 points
23 comments3 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben PaceJul 29, 2020, 9:49 PM
79 points
14 comments7 min readLW link

Learn­ing the prior and generalization

evhubJul 29, 2020, 10:49 PM
34 points
16 comments4 min readLW link

[Question] Is the work on AI al­ign­ment rele­vant to GPT?

Richard_KennawayJul 30, 2020, 12:23 PM
20 points
5 comments1 min readLW link

Ver­ifi­ca­tion and Transparency

DanielFilanAug 8, 2019, 1:50 AM
34 points
6 comments2 min readLW link
(danielfilan.com)

Robin Han­son on Lump­iness of AI Services

DanielFilanFeb 17, 2019, 11:08 PM
15 points
2 comments2 min readLW link
(www.overcomingbias.com)

One Way to Think About ML Transparency

Matthew BarnettSep 2, 2019, 11:27 PM
26 points
28 comments5 min readLW link

What is In­ter­pretabil­ity?

Mar 17, 2020, 8:23 PM
34 points
0 comments11 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhubSep 10, 2019, 11:03 PM
61 points
28 comments1 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTroutFeb 28, 2020, 4:05 PM
39 points
17 comments2 min readLW link

Bayesian Evolv­ing-to-Extinction

abramdemskiFeb 14, 2020, 11:55 PM
38 points
13 comments5 min readLW link

Do Suffi­ciently Ad­vanced Agents Use Logic?

abramdemskiSep 13, 2019, 7:53 PM
41 points
11 comments9 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTroutOct 1, 2019, 9:03 PM
62 points
19 comments2 min readLW link

At­tain­able Utility Preser­va­tion: Concepts

TurnTroutFeb 17, 2020, 5:20 AM
38 points
20 comments1 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

Feb 22, 2020, 12:38 AM
61 points
8 comments10 min readLW link1 review

How Low Should Fruit Hang Be­fore We Pick It?

TurnTroutFeb 25, 2020, 2:08 AM
28 points
9 comments12 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTroutFeb 27, 2020, 12:52 AM
28 points
21 comments8 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTroutFeb 27, 2020, 9:42 PM
33 points
8 comments4 min readLW link

Power as Easily Ex­ploitable Opportunities

TurnTroutAug 1, 2020, 2:14 AM
30 points
5 comments6 min readLW link

[Question] Would AGIs par­ent young AGIs?

Vishrut AryaAug 2, 2020, 12:57 AM
3 points
6 comments1 min readLW link

If I were a well-in­ten­tioned AI… I: Image classifier

Stuart_ArmstrongFeb 26, 2020, 12:39 PM
35 points
4 comments5 min readLW link

Non-Con­se­quen­tial­ist Co­op­er­a­tion?

abramdemskiJan 11, 2019, 9:15 AM
48 points
15 comments7 min readLW link

Cu­ri­os­ity Killed the Cat and the Asymp­tot­i­cally Op­ti­mal Agent

michaelcohenFeb 20, 2020, 5:28 PM
27 points
15 comments1 min readLW link

If I were a well-in­ten­tioned AI… IV: Mesa-optimising

Stuart_ArmstrongMar 2, 2020, 12:16 PM
26 points
2 comments6 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel KokotajloFeb 27, 2020, 6:10 PM
27 points
5 comments8 min readLW link

Clar­ify­ing Power-Seek­ing and In­stru­men­tal Convergence

TurnTroutDec 20, 2019, 7:59 PM
42 points
7 comments3 min readLW link

How im­por­tant are MDPs for AGI (Safety)?

michaelcohenMar 26, 2020, 8:32 PM
14 points
8 comments2 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhubFeb 5, 2020, 10:53 PM
33 points
10 comments4 min readLW link

is gpt-3 few-shot ready for real ap­pli­ca­tions?

nostalgebraistAug 3, 2020, 7:50 PM
31 points
5 comments9 min readLW link
(nostalgebraist.tumblr.com)

In­ter­pretabil­ity in ML: A Broad Overview

lifelonglearnerAug 4, 2020, 7:03 PM
52 points
5 comments15 min readLW link

In­finite Data/​Com­pute Ar­gu­ments in Alignment

johnswentworthAug 4, 2020, 8:21 PM
49 points
6 comments2 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew BarnettAug 8, 2019, 12:10 AM
21 points
1 comment8 min readLW link

Un­der­stand­ing Re­cent Im­pact Measures

Matthew BarnettAug 7, 2019, 4:57 AM
16 points
6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew BarnettAug 6, 2019, 1:22 AM
23 points
0 comments8 min readLW link

Op­ti­miza­tion Reg­u­lariza­tion through Time Penalty

Linda LinseforsJan 1, 2019, 1:05 PM
11 points
4 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemskiJul 21, 2018, 8:06 AM
19 points
4 comments4 min readLW link

Thoughts on Quantilizers

Stuart_ArmstrongJun 2, 2017, 4:24 PM
2 points
0 comments2 min readLW link

Quan­tiliz­ers max­i­mize ex­pected util­ity sub­ject to a con­ser­va­tive cost constraint

jessicataSep 28, 2015, 2:17 AM
25 points
0 comments5 min readLW link

Quan­tilal con­trol for finite MDPs

Vanessa KosoyApr 12, 2018, 9:21 AM
14 points
0 comments13 min readLW link

The limits of corrigibility

Stuart_ArmstrongApr 10, 2018, 10:49 AM
27 points
9 comments4 min readLW link

Align­ment Newslet­ter #16: 07/​23/​18

Rohin ShahJul 23, 2018, 4:20 PM
42 points
0 comments12 min readLW link
(mailchi.mp)

Mea­sur­ing hard­ware overhang

hippkeAug 5, 2020, 7:59 PM
106 points
14 comments4 min readLW link

[AN #111]: The Cir­cuits hy­pothe­ses for deep learning

Rohin ShahAug 5, 2020, 5:40 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

Self-Fulfilling Prophe­cies Aren’t Always About Self-Awareness

John_MaxwellNov 18, 2019, 11:11 PM
14 points
7 comments4 min readLW link

The Good­hart Game

John_MaxwellNov 18, 2019, 11:22 PM
13 points
5 comments5 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_MaxwellMar 11, 2020, 4:27 PM
43 points
20 comments4 min readLW link

The Dual­ist Pre­dict-O-Matic ($100 prize)

John_MaxwellOct 17, 2019, 6:45 AM
16 points
35 comments5 min readLW link

[Question] What AI safety prob­lems need solv­ing for safe AI re­search as­sis­tants?

John_MaxwellNov 5, 2019, 2:09 AM
14 points
13 comments1 min readLW link

Refin­ing the Evolu­tion­ary Anal­ogy to AI

lberglundAug 7, 2020, 11:13 PM
9 points
2 comments4 min readLW link

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworthAug 8, 2020, 6:31 PM
136 points
29 comments3 min readLW link

[Question] How much is known about the “in­fer­ence rules” of log­i­cal in­duc­tion?

Eigil RischelAug 8, 2020, 10:45 AM
11 points
7 comments1 min readLW link

If I were a well-in­ten­tioned AI… II: Act­ing in a world

Stuart_ArmstrongFeb 27, 2020, 11:58 AM
20 points
0 comments3 min readLW link

If I were a well-in­ten­tioned AI… III: Ex­tremal Goodhart

Stuart_ArmstrongFeb 28, 2020, 11:24 AM
22 points
0 comments5 min readLW link

Towards a For­mal­i­sa­tion of Log­i­cal Counterfactuals

BunthutAug 8, 2020, 10:14 PM
6 points
2 comments2 min readLW link

[Question] 10/​50/​90% chance of GPT-N Trans­for­ma­tive AI?

human_generated_textAug 9, 2020, 12:10 AM
24 points
8 comments1 min readLW link

[Question] Can we ex­pect more value from AI al­ign­ment than from an ASI with the goal of run­ning al­ter­nate tra­jec­to­ries of our uni­verse?

Maxime RichéAug 9, 2020, 5:17 PM
2 points
5 comments1 min readLW link

In defense of Or­a­cle (“Tool”) AI research

Steven ByrnesAug 7, 2019, 7:14 PM
21 points
11 comments4 min readLW link

How GPT-N will es­cape from its AI-box

hippkeAug 12, 2020, 7:34 PM
7 points
9 comments1 min readLW link

Strong im­pli­ca­tion of prefer­ence uncertainty

Stuart_ArmstrongAug 12, 2020, 7:02 PM
20 points
3 comments2 min readLW link

[AN #112]: Eng­ineer­ing a Safer World

Rohin ShahAug 13, 2020, 5:20 PM
25 points
2 comments12 min readLW link
(mailchi.mp)

Room and Board for Peo­ple Self-Learn­ing ML or Do­ing In­de­pen­dent ML Research

SamuelKnocheAug 14, 2020, 5:19 PM
7 points
1 comment1 min readLW link

Talk and Q&A—Dan Hendrycks—Paper: Align­ing AI With Shared Hu­man Values. On Dis­cord at Aug 28, 2020 8:00-10:00 AM GMT+8.

wassnameAug 14, 2020, 11:57 PM
1 point
0 comments1 min readLW link

Search ver­sus design

Alex FlintAug 16, 2020, 4:53 PM
89 points
41 comments36 min readLW link1 review

Work on Se­cu­rity In­stead of Friendli­ness?

Wei DaiJul 21, 2012, 6:28 PM
51 points
107 comments2 min readLW link

Goal-Direct­ed­ness: What Suc­cess Looks Like

adamShimiAug 16, 2020, 6:33 PM
9 points
0 comments2 min readLW link

[Question] A way to beat su­per­ra­tional/​EDT agents?

Abhimanyu Pallavi SudhirAug 17, 2020, 2:33 PM
5 points
13 comments1 min readLW link

Learn­ing hu­man prefer­ences: op­ti­mistic and pes­simistic scenarios

Stuart_ArmstrongAug 18, 2020, 1:05 PM
27 points
6 comments6 min readLW link

Mesa-Search vs Mesa-Control

abramdemskiAug 18, 2020, 6:51 PM
54 points
45 comments7 min readLW link

Why we want un­bi­ased learn­ing processes

Stuart_ArmstrongFeb 20, 2018, 2:48 PM
13 points
3 comments3 min readLW link

In­tu­itive ex­am­ples of re­ward func­tion learn­ing?

Stuart_ArmstrongMar 6, 2018, 4:54 PM
7 points
3 comments2 min readLW link

Open-Cat­e­gory Classification

TurnTroutMar 28, 2018, 2:49 PM
13 points
6 comments10 min readLW link

Look­ing for ad­ver­sar­ial col­lab­o­ra­tors to test our De­bate protocol

Beth BarnesAug 19, 2020, 3:15 AM
52 points
5 comments1 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTroutFeb 26, 2018, 2:20 AM
10 points
2 comments10 min readLW link

Am­bi­guity Detection

TurnTroutMar 1, 2018, 4:23 AM
11 points
9 comments4 min readLW link

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTroutDec 28, 2018, 9:46 PM
24 points
0 comments3 min readLW link
(arxiv.org)

What You See Isn’t Always What You Want

TurnTroutSep 13, 2019, 4:17 AM
30 points
12 comments3 min readLW link

[Question] In­stru­men­tal Oc­cam?

abramdemskiJan 31, 2020, 7:27 PM
30 points
15 comments1 min readLW link

Com­pact vs. Wide Models

VaniverJul 16, 2018, 4:09 AM
31 points
5 comments3 min readLW link

Alex Ir­pan: “My AI Timelines Have Sped Up”

VaniverAug 19, 2020, 4:23 PM
43 points
20 comments1 min readLW link
(www.alexirpan.com)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin ShahAug 19, 2020, 5:10 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

AI safety as feather­less bipeds *with broad flat nails*

Stuart_ArmstrongAug 19, 2020, 10:22 AM
37 points
1 comment1 min readLW link

Time Magaz­ine has an ar­ti­cle about the Sin­gu­lar­ity...

RaemonFeb 11, 2011, 2:20 AM
40 points
13 comments1 min readLW link

How rapidly are GPUs im­prov­ing in price perfor­mance?

gallabytesNov 25, 2018, 7:54 PM
31 points
9 comments1 min readLW link
(mediangroup.org)

Our val­ues are un­der­defined, change­able, and manipulable

Stuart_ArmstrongNov 2, 2017, 11:09 AM
25 points
6 comments3 min readLW link

[Question] What fund­ing sources ex­ist for tech­ni­cal AI safety re­search?

johnswentworthOct 1, 2019, 3:30 PM
26 points
5 comments1 min readLW link

Hu­mans can drive cars

ApprenticeJan 30, 2014, 11:55 AM
53 points
89 comments2 min readLW link

A Less Wrong sin­gu­lar­ity ar­ti­cle?

Kaj_SotalaNov 17, 2009, 2:15 PM
31 points
215 comments1 min readLW link

The Bayesian Tyrant

abramdemskiAug 20, 2020, 12:08 AM
132 points
20 comments6 min readLW link1 review

Con­cept Safety: Pro­duc­ing similar AI-hu­man con­cept spaces

Kaj_SotalaApr 14, 2015, 8:39 PM
50 points
45 comments8 min readLW link

[LINK] What should a rea­son­able per­son be­lieve about the Sin­gu­lar­ity?

Kaj_SotalaJan 13, 2011, 9:32 AM
38 points
14 comments2 min readLW link

The many ways AIs be­have badly

Stuart_ArmstrongApr 24, 2018, 11:40 AM
10 points
3 comments2 min readLW link

July 2020 gw­ern.net newsletter

gwernAug 20, 2020, 4:39 PM
29 points
0 comments1 min readLW link
(www.gwern.net)

Do what we mean vs. do what we say

Rohin ShahAug 30, 2018, 10:03 PM
34 points
14 comments1 min readLW link

[Question] What’s a De­com­pos­able Align­ment Topic?

Logan RiggsAug 21, 2020, 10:57 PM
26 points
16 comments1 min readLW link

Tools ver­sus agents

Stuart_ArmstrongMay 16, 2012, 1:00 PM
47 points
39 comments5 min readLW link

An un­al­igned benchmark

paulfchristianoNov 17, 2018, 3:51 PM
31 points
0 comments9 min readLW link

Fol­low­ing hu­man norms

Rohin ShahJan 20, 2019, 11:59 PM
30 points
10 comments5 min readLW link

nos­talge­braist: Re­cur­sive Good­hart’s Law

Kaj_SotalaAug 26, 2020, 11:07 AM
53 points
27 comments1 min readLW link
(nostalgebraist.tumblr.com)

[AN #114]: The­ory-in­spired safety solu­tions for pow­er­ful Bayesian RL agents

Rohin ShahAug 26, 2020, 5:20 PM
21 points
3 comments8 min readLW link
(mailchi.mp)

[Question] How hard would it be to change GPT-3 in a way that al­lows au­dio?

ChristianKlAug 28, 2020, 2:42 PM
8 points
5 comments1 min readLW link

Safe Scram­bling?

HoagyAug 29, 2020, 2:31 PM
3 points
1 comment2 min readLW link

(Hu­mor) AI Align­ment Crit­i­cal Failure Table

Kaj_SotalaAug 31, 2020, 7:51 PM
24 points
2 comments1 min readLW link
(sl4.org)

What is am­bi­tious value learn­ing?

Rohin ShahNov 1, 2018, 4:20 PM
49 points
28 comments2 min readLW link

The easy goal in­fer­ence prob­lem is still hard

paulfchristianoNov 3, 2018, 2:41 PM
50 points
19 comments4 min readLW link

[AN #115]: AI safety re­search prob­lems in the AI-GA framework

Rohin ShahSep 2, 2020, 5:10 PM
19 points
16 comments6 min readLW link
(mailchi.mp)

Emo­tional valence vs RL re­ward: a video game analogy

Steven ByrnesSep 3, 2020, 3:28 PM
12 points
6 comments4 min readLW link

Us­ing GPT-N to Solve In­ter­pretabil­ity of Neu­ral Net­works: A Re­search Agenda

Sep 3, 2020, 6:27 PM
67 points
12 comments2 min readLW link

“Learn­ing to Sum­ma­rize with Hu­man Feed­back”—OpenAI

[deleted]Sep 7, 2020, 5:59 PM
57 points
3 comments1 min readLW link

[AN #116]: How to make ex­pla­na­tions of neu­rons compositional

Rohin ShahSep 9, 2020, 5:20 PM
21 points
2 comments9 min readLW link
(mailchi.mp)

Safer sand­box­ing via col­lec­tive separation

Richard_NgoSep 9, 2020, 7:49 PM
24 points
6 comments4 min readLW link

[Question] Do mesa-op­ti­mizer risk ar­gu­ments rely on the train-test paradigm?

Ben CottierSep 10, 2020, 3:36 PM
12 points
7 comments1 min readLW link

Safety via se­lec­tion for obedience

Richard_NgoSep 10, 2020, 10:04 AM
31 points
1 comment5 min readLW link

How Much Com­pu­ta­tional Power Does It Take to Match the Hu­man Brain?

habrykaSep 12, 2020, 6:38 AM
44 points
1 comment1 min readLW link
(www.openphilanthropy.org)

De­ci­sion The­ory is multifaceted

Michele CampoloSep 13, 2020, 10:30 PM
7 points
12 comments8 min readLW link

AI Safety Dis­cus­sion Day

Linda LinseforsSep 15, 2020, 2:40 PM
20 points
0 comments1 min readLW link

[AN #117]: How neu­ral nets would fare un­der the TEVV framework

Rohin ShahSep 16, 2020, 5:20 PM
27 points
0 comments7 min readLW link
(mailchi.mp)

Ap­ply­ing the Coun­ter­fac­tual Pri­soner’s Dilemma to Log­i­cal Uncertainty

Chris_LeongSep 16, 2020, 10:34 AM
9 points
5 comments2 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_DavisSep 17, 2020, 2:23 AM
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

The “Backchain­ing to Lo­cal Search” Tech­nique in AI Alignment

adamShimiSep 18, 2020, 3:05 PM
28 points
1 comment2 min readLW link

Draft re­port on AI timelines

Ajeya CotraSep 18, 2020, 11:47 PM
207 points
56 comments1 min readLW link1 review

Why GPT wants to mesa-op­ti­mize & how we might change this

John_MaxwellSep 19, 2020, 1:48 PM
55 points
32 comments9 min readLW link

My (Mis)Ad­ven­tures With Al­gorith­mic Ma­chine Learning

AHartNtknSep 20, 2020, 5:31 AM
16 points
4 comments41 min readLW link

[Question] What AI com­pa­nies would be most likely to have a pos­i­tive long-term im­pact on the world as a re­sult of in­vest­ing in them?

MikkWSep 21, 2020, 11:41 PM
8 points
2 comments2 min readLW link

An­thro­po­mor­phi­sa­tion vs value learn­ing: type 1 vs type 2 errors

Stuart_ArmstrongSep 22, 2020, 10:46 AM
16 points
10 comments1 min readLW link

AI Ad­van­tages [Gems from the Wiki]

Sep 22, 2020, 10:44 PM
22 points
7 comments2 min readLW link
(www.lesswrong.com)

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren ElverlinSep 27, 2020, 5:51 PM
17 points
6 comments1 min readLW link

De­hu­man­i­sa­tion *er­rors*

Stuart_ArmstrongSep 23, 2020, 9:51 AM
13 points
0 comments1 min readLW link

[AN #118]: Risks, solu­tions, and pri­ori­ti­za­tion in a world with many AI systems

Rohin ShahSep 23, 2020, 6:20 PM
15 points
6 comments10 min readLW link
(mailchi.mp)

[Question] David Deutsch on Univer­sal Ex­plain­ers and AI

alanfSep 24, 2020, 7:50 AM
3 points
8 comments2 min readLW link

KL Diver­gence as Code Patch­ing Efficiency

Past AccountSep 27, 2020, 4:06 PM
17 points
0 comments8 min readLW link

[Question] What to do with imi­ta­tion hu­mans, other than ask­ing them what the right thing to do is?

Charlie SteinerSep 27, 2020, 9:51 PM
10 points
6 comments1 min readLW link

[Question] What De­ci­sion The­ory is Im­plied By Pre­dic­tive Pro­cess­ing?

johnswentworthSep 28, 2020, 5:20 PM
55 points
17 comments1 min readLW link

AGI safety from first prin­ci­ples: Superintelligence

Richard_NgoSep 28, 2020, 7:53 PM
80 points
6 comments9 min readLW link

AGI safety from first prin­ci­ples: Introduction

Richard_NgoSep 28, 2020, 7:53 PM
109 points
18 comments2 min readLW link1 review

[Question] Ex­am­ples of self-gov­er­nance to re­duce tech­nol­ogy risk?

JiaSep 29, 2020, 7:31 PM
10 points
4 comments1 min readLW link

AGI safety from first prin­ci­ples: Goals and Agency

Richard_NgoSep 29, 2020, 7:06 PM
70 points
15 comments15 min readLW link

“Un­su­per­vised” trans­la­tion as an (in­tent) al­ign­ment problem

paulfchristianoSep 30, 2020, 12:50 AM
61 points
15 comments4 min readLW link
(ai-alignment.com)

[AN #119]: AI safety when agents are shaped by en­vi­ron­ments, not rewards

Rohin ShahSep 30, 2020, 5:10 PM
11 points
0 comments11 min readLW link
(mailchi.mp)

AGI safety from first prin­ci­ples: Control

Richard_NgoOct 2, 2020, 9:51 PM
61 points
4 comments9 min readLW link

AI race con­sid­er­a­tions in a re­port by the U.S. House Com­mit­tee on Armed Services

NunoSempereOct 4, 2020, 12:11 PM
42 points
4 comments13 min readLW link

[Question] Is there any work on in­cor­po­rat­ing aleatoric un­cer­tainty and/​or in­her­ent ran­dom­ness into AIXI?

David Scott Krueger (formerly: capybaralet)Oct 4, 2020, 8:10 AM
9 points
7 comments1 min readLW link

AGI safety from first prin­ci­ples: Conclusion

Richard_NgoOct 4, 2020, 11:06 PM
65 points
4 comments3 min readLW link

Univer­sal Eudaimonia

hg00Oct 5, 2020, 1:45 PM
19 points
6 comments2 min readLW link

The Align­ment Prob­lem: Ma­chine Learn­ing and Hu­man Values

Rohin ShahOct 6, 2020, 5:41 PM
120 points
7 comments6 min readLW link1 review
(www.amazon.com)

[AN #120]: Trac­ing the in­tel­lec­tual roots of AI and AI alignment

Rohin ShahOct 7, 2020, 5:10 PM
13 points
4 comments10 min readLW link
(mailchi.mp)

[Question] Brain­storm­ing pos­i­tive vi­sions of AI

jungofthewonOct 7, 2020, 4:09 PM
52 points
25 comments1 min readLW link

[Question] How can an AI demon­strate purely through chat that it is an AI, and not a hu­man?

hugh.mannOct 7, 2020, 5:53 PM
3 points
4 comments1 min readLW link

[Question] Why isn’t JS a pop­u­lar lan­guage for deep learn­ing?

Will ClarkOct 8, 2020, 2:36 PM
12 points
21 comments1 min readLW link

[Question] If GPT-6 is hu­man-level AGI but costs $200 per page of out­put, what would hap­pen?

Daniel KokotajloOct 9, 2020, 12:00 PM
28 points
30 comments1 min readLW link

[Question] Shouldn’t there be a Chi­nese trans­la­tion of Hu­man Com­pat­i­ble?

mako yassOct 9, 2020, 8:47 AM
18 points
13 comments1 min readLW link

Ideal­ized Fac­tored Cognition

Rafael HarthNov 30, 2020, 6:49 PM
34 points
6 comments11 min readLW link

[Question] Re­views of the book ‘The Align­ment Prob­lem’

Mati_RoyOct 11, 2020, 7:41 AM
8 points
3 comments1 min readLW link

[Question] Re­views of TV show NeXt (about AI safety)

Mati_RoyOct 11, 2020, 4:31 AM
25 points
4 comments1 min readLW link

The Achilles Heel Hy­poth­e­sis for AI

scasperOct 13, 2020, 2:35 PM
20 points
6 comments1 min readLW link

Toy Prob­lem: De­tec­tive Story Alignment

johnswentworthOct 13, 2020, 9:02 PM
34 points
4 comments2 min readLW link

[Question] Does any­one worry about A.I. fo­rums like this where they re­in­force each other’s bi­ases/​ are led by big tech?

misabella16Oct 13, 2020, 3:14 PM
4 points
3 comments1 min readLW link

[AN #121]: Fore­cast­ing trans­for­ma­tive AI timelines us­ing biolog­i­cal anchors

Rohin ShahOct 14, 2020, 5:20 PM
27 points
5 comments14 min readLW link
(mailchi.mp)

Gra­di­ent hacking

evhubOct 16, 2019, 12:53 AM
99 points
39 comments3 min readLW link2 reviews

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhubOct 15, 2019, 12:06 AM
31 points
13 comments6 min readLW link

Outer al­ign­ment and imi­ta­tive amplification

evhubJan 10, 2020, 12:26 AM
24 points
11 comments9 min readLW link

Safe ex­plo­ra­tion and corrigibility

evhubDec 28, 2019, 11:12 PM
17 points
4 comments4 min readLW link

[Question] What are some non-purely-sam­pling ways to do deep RL?

evhubDec 5, 2019, 12:09 AM
15 points
9 comments2 min readLW link

More vari­a­tions on pseudo-alignment

evhubNov 4, 2019, 11:24 PM
26 points
8 comments3 min readLW link

Towards an em­piri­cal in­ves­ti­ga­tion of in­ner alignment

evhubSep 23, 2019, 8:43 PM
44 points
9 comments6 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhubSep 7, 2019, 6:11 PM
66 points
11 comments8 min readLW link

Con­crete ex­per­i­ments in in­ner alignment

evhubSep 6, 2019, 10:16 PM
63 points
12 comments6 min readLW link

Towards a mechanis­tic un­der­stand­ing of corrigibility

evhubAug 22, 2019, 11:20 PM
44 points
26 comments6 min readLW link

A Con­crete Pro­posal for Ad­ver­sar­ial IDA

evhubMar 26, 2019, 7:50 PM
16 points
5 comments5 min readLW link

Nuances with as­crip­tion universality

evhubFeb 12, 2019, 11:38 PM
20 points
1 comment2 min readLW link

Box in­ver­sion hypothesis

Jan KulveitOct 20, 2020, 4:20 PM
59 points
4 comments3 min readLW link

[Question] Has any­one re­searched speci­fi­ca­tion gam­ing with biolog­i­cal an­i­mals?

David Scott Krueger (formerly: capybaralet)Oct 21, 2020, 12:20 AM
9 points
3 comments1 min readLW link

Sun­day Oc­to­ber 25, 12:00PM (PT) — Scott Garrabrant on “Carte­sian Frames”

Ben PaceOct 21, 2020, 3:27 AM
48 points
3 comments2 min readLW link

[Question] Could we use recom­mender sys­tems to figure out hu­man val­ues?

Olga BabeevaOct 20, 2020, 9:35 PM
7 points
2 comments1 min readLW link

[Question] When was the term “AI al­ign­ment” coined?

David Scott Krueger (formerly: capybaralet)Oct 21, 2020, 6:27 PM
11 points
8 comments1 min readLW link

[AN #122]: Ar­gu­ing for AGI-driven ex­is­ten­tial risk from first principles

Rohin ShahOct 21, 2020, 5:10 PM
28 points
0 comments9 min readLW link
(mailchi.mp)

[Question] What’s the differ­ence be­tween GAI and a gov­ern­ment?

DirectedEvolutionOct 21, 2020, 11:04 PM
11 points
5 comments1 min readLW link

Mo­ral AI: Options

ManfredJul 11, 2015, 9:46 PM
14 points
6 comments4 min readLW link

Can few-shot learn­ing teach AI right from wrong?

Charlie SteinerJul 20, 2018, 7:45 AM
13 points
3 comments6 min readLW link

Some Com­ments on Stu­art Arm­strong’s “Re­search Agenda v0.9”

Charlie SteinerJul 8, 2019, 7:03 PM
21 points
12 comments4 min readLW link

The Ar­tifi­cial In­ten­tional Stance

Charlie SteinerJul 27, 2019, 7:00 AM
12 points
0 comments4 min readLW link

What’s the dream for giv­ing nat­u­ral lan­guage com­mands to AI?

Charlie SteinerOct 8, 2019, 1:42 PM
8 points
8 comments7 min readLW link

Su­per­vised learn­ing of out­puts in the brain

Steven ByrnesOct 26, 2020, 2:32 PM
27 points
9 comments10 min readLW link

Hu­mans are stun­ningly ra­tio­nal and stun­ningly irrational

Stuart_ArmstrongOct 23, 2020, 2:13 PM
21 points
4 comments2 min readLW link

Re­ply to Je­bari and Lund­borg on Ar­tifi­cial Superintelligence

Richard_NgoOct 25, 2020, 1:50 PM
31 points
4 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Ad­di­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantOct 26, 2020, 3:12 PM
61 points
6 comments11 min readLW link

Se­cu­rity Mind­set and Take­off Speeds

DanielFilanOct 27, 2020, 3:20 AM
54 points
23 comments8 min readLW link
(danielfilan.com)

Biex­ten­sional Equivalence

Scott GarrabrantOct 28, 2020, 2:07 PM
43 points
13 comments10 min readLW link

Draft pa­pers for REALab and De­cou­pled Ap­proval on tampering

Oct 28, 2020, 4:01 PM
47 points
2 comments1 min readLW link

[AN #123]: In­fer­ring what is valuable in or­der to al­ign recom­mender systems

Rohin ShahOct 28, 2020, 5:00 PM
20 points
1 comment8 min readLW link
(mailchi.mp)

“Scal­ing Laws for Au­tore­gres­sive Gen­er­a­tive Model­ing”, Henighan et al 2020 {OA}

gwernOct 29, 2020, 1:45 AM
26 points
11 comments1 min readLW link
(arxiv.org)

Con­trol­lables and Ob­serv­ables, Revisited

Scott GarrabrantOct 29, 2020, 4:38 PM
34 points
5 comments8 min readLW link

AI risk hub in Sin­ga­pore?

Daniel KokotajloOct 29, 2020, 11:45 AM
57 points
18 comments4 min readLW link

Func­tors and Coarse Worlds

Scott GarrabrantOct 30, 2020, 3:19 PM
50 points
4 comments8 min readLW link

[Question] Re­sponses to Chris­ti­ano on take­off speeds?

Richard_NgoOct 30, 2020, 3:16 PM
29 points
8 comments1 min readLW link

/​r/​MLS­cal­ing: new sub­red­dit for NN scal­ing re­search/​discussion

gwernOct 30, 2020, 8:50 PM
20 points
0 comments1 min readLW link
(www.reddit.com)

“In­ner Align­ment Failures” Which Are Ac­tu­ally Outer Align­ment Failures

johnswentworthOct 31, 2020, 8:18 PM
61 points
38 comments5 min readLW link

Au­to­mated in­tel­li­gence is not AI

KatjaGraceNov 1, 2020, 11:30 PM
54 points
10 comments2 min readLW link
(meteuphoric.com)

Con­fu­ci­anism in AI Alignment

johnswentworthNov 2, 2020, 9:16 PM
33 points
28 comments6 min readLW link

[AN #124]: Prov­ably safe ex­plo­ra­tion through shielding

Rohin ShahNov 4, 2020, 6:20 PM
13 points
0 comments9 min readLW link
(mailchi.mp)

Defin­ing ca­pa­bil­ity and al­ign­ment in gra­di­ent descent

Edouard HarrisNov 5, 2020, 2:36 PM
22 points
6 comments10 min readLW link

Sub-Sums and Sub-Tensors

Scott GarrabrantNov 5, 2020, 6:06 PM
34 points
4 comments8 min readLW link

Mul­ti­plica­tive Oper­a­tions on Carte­sian Frames

Scott GarrabrantNov 3, 2020, 7:27 PM
34 points
23 comments12 min readLW link

Subagents of Carte­sian Frames

Scott GarrabrantNov 2, 2020, 10:02 PM
48 points
5 comments8 min readLW link

[Question] What con­sid­er­a­tions in­fluence whether I have more in­fluence over short or long timelines?

Daniel KokotajloNov 5, 2020, 7:56 PM
27 points
30 comments1 min readLW link

Ad­di­tive and Mul­ti­plica­tive Subagents

Scott GarrabrantNov 6, 2020, 2:26 PM
20 points
7 comments12 min readLW link

Com­mit­ting, As­sum­ing, Ex­ter­nal­iz­ing, and Internalizing

Scott GarrabrantNov 9, 2020, 4:59 PM
31 points
25 comments10 min readLW link

Build­ing AGI Us­ing Lan­guage Models

leogaoNov 9, 2020, 4:33 PM
11 points
1 comment1 min readLW link
(leogao.dev)

Why You Should Care About Goal-Directedness

adamShimiNov 9, 2020, 12:48 PM
37 points
15 comments9 min readLW link

Clar­ify­ing in­ner al­ign­ment terminology

evhubNov 9, 2020, 8:40 PM
98 points
17 comments3 min readLW link1 review

Eight Defi­ni­tions of Observability

Scott GarrabrantNov 10, 2020, 11:37 PM
34 points
26 comments12 min readLW link

[AN #125]: Neu­ral net­work scal­ing laws across mul­ti­ple modalities

Rohin ShahNov 11, 2020, 6:20 PM
25 points
7 comments9 min readLW link
(mailchi.mp)

Time in Carte­sian Frames

Scott GarrabrantNov 11, 2020, 8:25 PM
48 points
16 comments7 min readLW link

Learn­ing Nor­ma­tivity: A Re­search Agenda

abramdemskiNov 11, 2020, 9:59 PM
76 points
18 comments19 min readLW link

[Question] Any work on hon­ey­pots (to de­tect treach­er­ous turn at­tempts)?

David Scott Krueger (formerly: capybaralet)Nov 12, 2020, 5:41 AM
17 points
4 comments1 min readLW link

Misal­ign­ment and mi­suse: whose val­ues are man­i­fest?

KatjaGraceNov 13, 2020, 10:10 AM
42 points
7 comments2 min readLW link
(meteuphoric.com)

A Self-Embed­ded Prob­a­bil­is­tic Model

johnswentworthNov 13, 2020, 8:36 PM
30 points
2 comments5 min readLW link

TU Darm­stadt, Com­puter Science Master’s with a fo­cus on Ma­chine Learning

Master Programs ML/AINov 14, 2020, 3:50 PM
6 points
0 comments8 min readLW link

EPF Lau­sanne, ML re­lated MSc programs

Master Programs ML/AINov 14, 2020, 3:51 PM
3 points
0 comments4 min readLW link

ETH Zurich, ML re­lated MSc programs

Master Programs ML/AINov 14, 2020, 3:49 PM
3 points
0 comments10 min readLW link

Univer­sity of Oxford, Master’s Statis­ti­cal Science

Master Programs ML/AINov 14, 2020, 3:51 PM
3 points
0 comments3 min readLW link

Univer­sity of Ed­in­burgh, Master’s Ar­tifi­cial Intelligence

Master Programs ML/AINov 14, 2020, 3:49 PM
4 points
0 comments12 min readLW link

Univer­sity of Am­s­ter­dam (UvA), Master’s Ar­tifi­cial Intelligence

Master Programs ML/AINov 14, 2020, 3:49 PM
16 points
6 comments21 min readLW link

Univer­sity of Tübin­gen, Master’s Ma­chine Learning

Master Programs ML/AINov 14, 2020, 3:50 PM
14 points
0 comments7 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael HarthNov 15, 2020, 5:14 PM
68 points
10 comments15 min readLW link

Solomonoff In­duc­tion and Sleep­ing Beauty

ikeNov 17, 2020, 2:28 AM
7 points
0 comments2 min readLW link

The Poin­t­ers Prob­lem: Hu­man Values Are A Func­tion Of Hu­mans’ La­tent Variables

johnswentworthNov 18, 2020, 5:47 PM
104 points
43 comments11 min readLW link2 reviews

The ethics of AI for the Rout­ledge En­cy­clo­pe­dia of Philosophy

Stuart_ArmstrongNov 18, 2020, 5:55 PM
45 points
8 comments1 min readLW link

Per­sua­sion Tools: AI takeover with­out AGI or agency?

Daniel KokotajloNov 20, 2020, 4:54 PM
74 points
24 comments11 min readLW link1 review

UDT might not pay a Coun­ter­fac­tual Mugger

winwonceNov 21, 2020, 11:27 PM
5 points
18 comments2 min readLW link

Chang­ing the AI race pay­off matrix

GurkenglasNov 22, 2020, 10:25 PM
7 points
2 comments1 min readLW link

Syn­tax, se­man­tics, and sym­bol ground­ing, simplified

Stuart_ArmstrongNov 23, 2020, 4:12 PM
30 points
4 comments9 min readLW link

Com­men­tary on AGI Safety from First Principles

Richard_NgoNov 23, 2020, 9:37 PM
80 points
4 comments54 min readLW link

[Question] Cri­tiques of the Agent Foun­da­tions agenda?

JsevillamolNov 24, 2020, 4:11 PM
16 points
3 comments1 min readLW link

[Question] How should OpenAI com­mu­ni­cate about the com­mer­cial perfor­mances of the GPT-3 API?

Maxime RichéNov 24, 2020, 8:34 AM
2 points
0 comments1 min readLW link

[AN #126]: Avoid­ing wire­head­ing by de­cou­pling ac­tion feed­back from ac­tion effects

Rohin ShahNov 26, 2020, 11:20 PM
24 points
1 comment10 min readLW link
(mailchi.mp)

[Question] Is this a good way to bet on short timelines?

Daniel KokotajloNov 28, 2020, 12:51 PM
16 points
8 comments1 min readLW link

Pre­face to the Se­quence on Fac­tored Cognition

Rafael HarthNov 30, 2020, 6:49 PM
35 points
7 comments2 min readLW link

[Linkpost] AlphaFold: a solu­tion to a 50-year-old grand challenge in biology

adamShimiNov 30, 2020, 5:33 PM
54 points
22 comments1 min readLW link
(deepmind.com)

What is “pro­tein fold­ing”? A brief explanation

jasoncrawfordDec 1, 2020, 2:46 AM
69 points
9 comments4 min readLW link
(rootsofprogress.org)

[Question] In a mul­ti­po­lar sce­nario, how do peo­ple ex­pect sys­tems to be trained to in­ter­act with sys­tems de­vel­oped by other labs?

JesseCliftonDec 1, 2020, 8:04 PM
11 points
6 comments1 min readLW link

[AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

Rohin ShahDec 2, 2020, 6:20 PM
46 points
0 comments13 min readLW link
(mailchi.mp)

Beyond 175 billion pa­ram­e­ters: Can we an­ti­ci­pate fu­ture GPT-X Ca­pa­bil­ities?

bakztfutureDec 4, 2020, 11:42 PM
−1 points
1 comment2 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven ByrnesNov 24, 2019, 1:40 AM
25 points
3 comments7 min readLW link

[RXN#7] Rus­sian x-risks newslet­ter fall 2020

avturchinDec 5, 2020, 4:28 PM
12 points
0 comments3 min readLW link

The AI Safety Game (UPDATED)

Daniel KokotajloDec 5, 2020, 10:27 AM
44 points
9 comments3 min readLW link

Values Form a Shift­ing Land­scape (and why you might care)

VojtaKovarikDec 5, 2020, 11:56 PM
28 points
6 comments4 min readLW link

AI Prob­lems Shared by Non-AI Systems

VojtaKovarikDec 5, 2020, 10:15 PM
7 points
2 comments4 min readLW link

Chance that “AI safety ba­si­cally [doesn’t need] to be solved, we’ll just solve it by de­fault un­less we’re com­pletely com­pletely care­less”

Dec 8, 2020, 9:08 PM
27 points
0 comments5 min readLW link

Min­i­mal Maps, Semi-De­ci­sions, and Neu­ral Representations

Past AccountDec 6, 2020, 3:15 PM
30 points
2 comments4 min readLW link

Launch­ing the Fore­cast­ing AI Progress Tournament

TamayDec 7, 2020, 2:08 PM
20 points
0 comments1 min readLW link
(www.metaculus.com)

[AN #128]: Pri­ori­tiz­ing re­search on AI ex­is­ten­tial safety based on its ap­pli­ca­tion to gov­er­nance demands

Rohin ShahDec 9, 2020, 6:20 PM
16 points
2 comments10 min readLW link
(mailchi.mp)

Sum­mary of AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

peterbarnettDec 9, 2020, 11:28 PM
10 points
0 comments13 min readLW link

Clar­ify­ing Fac­tored Cognition

Rafael HarthDec 13, 2020, 8:02 PM
23 points
2 comments3 min readLW link

Ho­mo­gene­ity vs. het­ero­gene­ity in AI take­off scenarios

evhubDec 16, 2020, 1:37 AM
95 points
48 comments4 min readLW link

LBIT Proofs 8: Propo­si­tions 53-58

DiffractorDec 16, 2020, 3:29 AM
7 points
0 comments18 min readLW link

LBIT Proofs 6: Propo­si­tions 39-47

DiffractorDec 16, 2020, 3:33 AM
7 points
0 comments23 min readLW link

LBIT Proofs 5: Propo­si­tions 29-38

DiffractorDec 16, 2020, 3:35 AM
7 points
0 comments21 min readLW link

LBIT Proofs 3: Propo­si­tions 19-22

DiffractorDec 16, 2020, 3:40 AM
7 points
0 comments17 min readLW link

LBIT Proofs 2: Propo­si­tions 10-18

DiffractorDec 16, 2020, 3:45 AM
7 points
0 comments20 min readLW link

LBIT Proofs 1: Propo­si­tions 1-9

DiffractorDec 16, 2020, 3:48 AM
7 points
0 comments25 min readLW link

LBIT Proofs 4: Propo­si­tions 22-28

DiffractorDec 16, 2020, 3:38 AM
7 points
0 comments17 min readLW link

LBIT Proofs 7: Propo­si­tions 48-52

DiffractorDec 16, 2020, 3:31 AM
7 points
0 comments20 min readLW link

Less Ba­sic In­framea­sure Theory

DiffractorDec 16, 2020, 3:52 AM
22 points
1 comment61 min readLW link

[AN #129]: Ex­plain­ing dou­ble de­scent by mea­sur­ing bias and variance

Rohin ShahDec 16, 2020, 6:10 PM
14 points
1 comment7 min readLW link
(mailchi.mp)

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d6Dec 16, 2020, 1:32 PM
26 points
15 comments15 min readLW link
(cerebralab.com)

Beta test GPT-3 based re­search assistant

jungofthewonDec 16, 2020, 1:42 PM
34 points
2 comments1 min readLW link

[Question] How long till In­verse AlphaFold?

Daniel KokotajloDec 17, 2020, 7:56 PM
41 points
18 comments1 min readLW link

Hier­ar­chi­cal plan­ning: con­text agents

Charlie SteinerDec 19, 2020, 11:24 AM
21 points
6 comments9 min readLW link

[Question] Is there a com­mu­nity al­igned with the idea of cre­at­ing species of AGI sys­tems for them to be­come our suc­ces­sors?

iamhefestoDec 20, 2020, 7:06 PM
−2 points
7 comments1 min readLW link

Intuition

Rafael HarthDec 20, 2020, 9:49 PM
26 points
1 comment6 min readLW link

2020 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 21, 2020, 3:27 PM
137 points
14 comments68 min readLW link

TAI Safety Biblio­graphic Database

JessRiedelDec 22, 2020, 5:42 PM
70 points
10 comments17 min readLW link

An­nounc­ing AXRP, the AI X-risk Re­search Podcast

DanielFilanDec 23, 2020, 8:00 PM
54 points
6 comments1 min readLW link
(danielfilan.com)

[AN #130]: A new AI x-risk pod­cast, and re­views of the field

Rohin ShahDec 24, 2020, 6:20 PM
8 points
0 comments7 min readLW link
(mailchi.mp)

Can we model tech­nolog­i­cal sin­gu­lar­ity as the phase tran­si­tion?

Valentin2026Dec 26, 2020, 3:20 AM
4 points
3 comments4 min readLW link

AGI Align­ment Should Solve Cor­po­rate Alignment

magfrumpDec 27, 2020, 2:23 AM
19 points
6 comments6 min readLW link

Against GDP as a met­ric for timelines and take­off speeds

Daniel KokotajloDec 29, 2020, 5:42 PM
131 points
16 comments14 min readLW link1 review

AXRP Epi­sode 3 - Ne­go­tiable Re­in­force­ment Learn­ing with An­drew Critch

DanielFilanDec 29, 2020, 8:45 PM
26 points
0 comments27 min readLW link

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilanDec 29, 2020, 8:41 PM
12 points
5 comments33 min readLW link

AXRP Epi­sode 2 - Learn­ing Hu­man Bi­ases with Ro­hin Shah

DanielFilanDec 29, 2020, 8:43 PM
13 points
0 comments35 min readLW link

Dario Amodei leaves OpenAI

Daniel KokotajloDec 29, 2020, 7:31 PM
69 points
12 comments1 min readLW link

[Question] What Are Some Alter­na­tive Ap­proaches to Un­der­stand­ing Agency/​In­tel­li­gence?

intersticeDec 29, 2020, 11:21 PM
15 points
12 comments1 min readLW link

Why Neu­ral Net­works Gen­er­al­ise, and Why They Are (Kind of) Bayesian

Joar SkalseDec 29, 2020, 1:33 PM
67 points
58 comments1 min readLW link1 review

De­bate Minus Fac­tored Cognition

abramdemskiDec 29, 2020, 10:59 PM
37 points
42 comments11 min readLW link

[AN #131]: For­mal­iz­ing the ar­gu­ment of ig­nored at­tributes in a util­ity function

Rohin ShahDec 31, 2020, 6:20 PM
13 points
4 comments9 min readLW link
(mailchi.mp)

Reflec­tions on Larks’ 2020 AI al­ign­ment liter­a­ture review

Alex FlintJan 1, 2021, 10:53 PM
79 points
8 comments6 min readLW link

Men­tal sub­agent im­pli­ca­tions for AI Safety

moridinamaelJan 3, 2021, 6:59 PM
11 points
0 comments3 min readLW link

The Na­tional Defense Autho­riza­tion Act Con­tains AI Provisions

ryan_bJan 5, 2021, 3:51 PM
30 points
24 comments1 min readLW link

The Poin­t­ers Prob­lem: Clar­ifi­ca­tions/​Variations

abramdemskiJan 5, 2021, 5:29 PM
50 points
14 comments18 min readLW link

[AN #132]: Com­plex and sub­tly in­cor­rect ar­gu­ments as an ob­sta­cle to debate

Rohin ShahJan 6, 2021, 6:20 PM
19 points
1 comment19 min readLW link
(mailchi.mp)

Out-of-body rea­son­ing (OOBR)

Jon ZeroJan 9, 2021, 4:10 PM
5 points
0 comments4 min readLW link

Re­view of Soft Take­off Can Still Lead to DSA

Daniel KokotajloJan 10, 2021, 6:10 PM
75 points
15 comments6 min readLW link

Re­view of ‘De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More’

TurnTroutJan 12, 2021, 3:57 AM
40 points
1 comment2 min readLW link

[AN #133]: Build­ing ma­chines that can co­op­er­ate (with hu­mans, in­sti­tu­tions, or other ma­chines)

Rohin ShahJan 13, 2021, 6:10 PM
14 points
0 comments9 min readLW link
(mailchi.mp)

An Ex­plo­ra­tory Toy AI Take­off Model

niplavJan 13, 2021, 6:13 PM
10 points
3 comments12 min readLW link

Some re­cent sur­vey pa­pers on (mostly near-term) AI safety, se­cu­rity, and assurance

Aryeh EnglanderJan 13, 2021, 9:50 PM
11 points
0 comments3 min readLW link

Thoughts on Ia­son Gabriel’s Ar­tifi­cial In­tel­li­gence, Values, and Alignment

Alex FlintJan 14, 2021, 12:58 PM
35 points
14 comments4 min readLW link

Why I’m ex­cited about Debate

Richard_NgoJan 15, 2021, 11:37 PM
73 points
12 comments7 min readLW link

Ex­cerpt from Ar­bital Solomonoff in­duc­tion dialogue

Richard_NgoJan 17, 2021, 3:49 AM
36 points
6 comments5 min readLW link
(arbital.com)

Short sum­mary of mAIry’s room

Stuart_ArmstrongJan 18, 2021, 6:11 PM
26 points
2 comments4 min readLW link

DALL-E does sym­bol grounding

p.b.Jan 17, 2021, 9:20 PM
6 points
0 comments1 min readLW link

Some thoughts on risks from nar­row, non-agen­tic AI

Richard_NgoJan 19, 2021, 12:04 AM
35 points
21 comments16 min readLW link

Against the Back­ward Ap­proach to Goal-Directedness

adamShimiJan 19, 2021, 6:46 PM
19 points
6 comments4 min readLW link

[AN #134]: Un­der­speci­fi­ca­tion as a cause of frag­ility to dis­tri­bu­tion shift

Rohin ShahJan 21, 2021, 6:10 PM
13 points
0 comments7 min readLW link
(mailchi.mp)

Coun­ter­fac­tual con­trol incentives

Stuart_ArmstrongJan 21, 2021, 4:54 PM
21 points
10 comments9 min readLW link

Policy re­stric­tions and Se­cret keep­ing AI

Donald HobsonJan 24, 2021, 8:59 PM
6 points
3 comments3 min readLW link

FC fi­nal: Can Fac­tored Cog­ni­tion schemes scale?

Rafael HarthJan 24, 2021, 10:18 PM
15 points
0 comments17 min readLW link

[AN #135]: Five prop­er­ties of goal-di­rected systems

Rohin ShahJan 27, 2021, 6:10 PM
33 points
0 comments8 min readLW link
(mailchi.mp)

AMA on EA Fo­rum: Ajeya Co­tra, re­searcher at Open Phil

Ajeya CotraJan 29, 2021, 11:05 PM
23 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Play with neu­ral net

KatjaGraceJan 30, 2021, 10:50 AM
17 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

A Cri­tique of Non-Obstruction

Joe CollmanFeb 3, 2021, 8:45 AM
13 points
10 comments4 min readLW link

Dist­in­guish­ing claims about train­ing vs deployment

Richard_NgoFeb 3, 2021, 11:30 AM
61 points
30 comments9 min readLW link

Graph­i­cal World Models, Coun­ter­fac­tu­als, and Ma­chine Learn­ing Agents

Koen.HoltmanFeb 17, 2021, 11:07 AM
6 points
2 comments10 min readLW link

OpenAI: “Scal­ing Laws for Trans­fer”, Her­nan­dez et al.

Lukas FinnvedenFeb 4, 2021, 12:49 PM
13 points
3 comments1 min readLW link
(arxiv.org)

Evolu­tions Build­ing Evolu­tions: Lay­ers of Gen­er­ate and Test

plexFeb 5, 2021, 6:21 PM
11 points
1 comment6 min readLW link

Episte­mol­ogy of HCH

adamShimiFeb 9, 2021, 11:46 AM
16 points
2 comments10 min readLW link

[Question] Math­e­mat­i­cal Models of Progress?

abramdemskiFeb 16, 2021, 12:21 AM
28 points
8 comments2 min readLW link

[Question] Sugges­tions of posts on the AF to review

adamShimiFeb 16, 2021, 12:40 PM
56 points
20 comments1 min readLW link

Disen­tan­gling Cor­rigi­bil­ity: 2015-2021

Koen.HoltmanFeb 16, 2021, 6:01 PM
17 points
20 comments9 min readLW link

Carte­sian frames as gen­er­al­ised models

Stuart_ArmstrongFeb 16, 2021, 4:09 PM
20 points
0 comments5 min readLW link

[AN #138]: Why AI gov­er­nance should find prob­lems rather than just solv­ing them

Rohin ShahFeb 17, 2021, 6:50 PM
12 points
0 comments9 min readLW link
(mailchi.mp)

Safely con­trol­ling the AGI agent re­ward function

Koen.HoltmanFeb 17, 2021, 2:47 PM
7 points
0 comments5 min readLW link

AXRP Epi­sode 4 - Risks from Learned Op­ti­miza­tion with Evan Hubinger

DanielFilanFeb 18, 2021, 12:03 AM
41 points
10 comments86 min readLW link

Utility Max­i­miza­tion = De­scrip­tion Length Minimization

johnswentworthFeb 18, 2021, 6:04 PM
183 points
40 comments5 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrumpFeb 20, 2021, 9:42 AM
12 points
16 comments7 min readLW link

AI Safety Begin­ners Meetup (Euro­pean Time)

Linda LinseforsFeb 20, 2021, 1:20 PM
8 points
2 comments1 min readLW link

Min­i­mal Map Constraints

Past AccountFeb 21, 2021, 5:49 PM
6 points
0 comments3 min readLW link

[AN #139]: How the sim­plic­ity of re­al­ity ex­plains the suc­cess of neu­ral nets

Rohin ShahFeb 24, 2021, 6:30 PM
26 points
6 comments12 min readLW link
(mailchi.mp)

My Thoughts on the Ap­per­cep­tion Engine

J BostockFeb 25, 2021, 7:43 PM
4 points
1 comment3 min readLW link

The Case for Pri­vacy Optimism

bmgarfinkelMar 10, 2020, 8:30 PM
43 points
1 comment32 min readLW link
(benmgarfinkel.wordpress.com)

[Question] How might cryp­tocur­ren­cies af­fect AGI timelines?

Dawn DrescherFeb 28, 2021, 7:16 PM
13 points
40 comments2 min readLW link

Fun with +12 OOMs of Compute

Daniel KokotajloMar 1, 2021, 1:30 PM
212 points
78 comments12 min readLW link1 review

Links for Feb 2021

ikeMar 1, 2021, 5:13 AM
6 points
0 comments6 min readLW link
(misinfounderload.substack.com)

In­tro­duc­tion to Re­in­force­ment Learning

Dr. BirdbrainFeb 28, 2021, 11:03 PM
4 points
1 comment3 min readLW link

Cu­ri­os­ity about Align­ing Values

esweetMar 3, 2021, 12:22 AM
3 points
7 comments1 min readLW link

How does bee learn­ing com­pare with ma­chine learn­ing?

eleniMar 4, 2021, 1:59 AM
62 points
15 comments24 min readLW link

Some re­cent in­ter­views with AI/​math lu­mi­nar­ies.

fowlertmMar 4, 2021, 1:26 AM
2 points
0 comments1 min readLW link

A Semitech­ni­cal In­tro­duc­tory Dialogue on Solomonoff Induction

Eliezer YudkowskyMar 4, 2021, 5:27 PM
127 points
34 comments54 min readLW link

Con­nect­ing the good reg­u­la­tor the­o­rem with se­man­tics and sym­bol grounding

Stuart_ArmstrongMar 4, 2021, 2:35 PM
11 points
0 comments2 min readLW link

[AN #140]: The­o­ret­i­cal mod­els that pre­dict scal­ing laws

Rohin ShahMar 4, 2021, 6:10 PM
45 points
0 comments10 min readLW link
(mailchi.mp)

Take­aways from the In­tel­li­gence Ris­ing RPG

Mar 5, 2021, 10:27 AM
50 points
8 comments12 min readLW link

GPT-3 and the fu­ture of knowl­edge work

fowlertmMar 5, 2021, 5:40 PM
16 points
0 comments2 min readLW link

The case for al­ign­ing nar­rowly su­per­hu­man models

Ajeya CotraMar 5, 2021, 10:29 PM
187 points
74 comments38 min readLW link

MIRI com­ments on Co­tra’s “Case for Align­ing Nar­rowly Su­per­hu­man Models”

Rob BensingerMar 5, 2021, 11:43 PM
136 points
13 comments26 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam ClarkeMar 7, 2021, 9:44 PM
15 points
5 comments1 min readLW link

CLR’s re­cent work on multi-agent systems

JesseCliftonMar 9, 2021, 2:28 AM
54 points
1 comment13 min readLW link

De-con­fus­ing my­self about Pas­cal’s Mug­ging and New­comb’s Problem

DirectedEvolutionMar 9, 2021, 8:45 PM
7 points
1 comment3 min readLW link

Open Prob­lems with Myopia

Mar 10, 2021, 6:38 PM
57 points
16 comments8 min readLW link

[AN #141]: The case for prac­tic­ing al­ign­ment work on GPT-3 and other large models

Rohin ShahMar 10, 2021, 6:30 PM
27 points
4 comments8 min readLW link
(mailchi.mp)

[Link] Whit­tle­stone et al., The So­cietal Im­pli­ca­tions of Deep Re­in­force­ment Learning

Aryeh EnglanderMar 10, 2021, 6:13 PM
11 points
1 comment1 min readLW link
(jair.org)

Four Mo­ti­va­tions for Learn­ing Normativity

abramdemskiMar 11, 2021, 8:13 PM
42 points
7 comments5 min readLW link

[Question] What’s a good way to test ba­sic ma­chine learn­ing code?

KennyMar 11, 2021, 9:27 PM
5 points
9 comments1 min readLW link

[Video] In­tel­li­gence and Stu­pidity: The Orthog­o­nal­ity Thesis

plexMar 13, 2021, 12:32 AM
5 points
1 comment1 min readLW link
(www.youtube.com)

AI x-risk re­duc­tion: why I chose academia over industry

David Scott Krueger (formerly: capybaralet)Mar 14, 2021, 5:25 PM
56 points
14 comments3 min readLW link

[Question] Par­tial-Con­scious­ness as se­man­tic/​sym­bolic rep­re­sen­ta­tional lan­guage model trained on NN

Joe KwonMar 16, 2021, 6:51 PM
2 points
3 comments1 min readLW link

[AN #142]: The quest to un­der­stand a net­work well enough to reim­ple­ment it by hand

Rohin ShahMar 17, 2021, 5:10 PM
34 points
4 comments8 min readLW link
(mailchi.mp)

In­ter­mit­tent Distil­la­tions #1

Mark XuMar 17, 2021, 5:15 AM
25 points
1 comment10 min readLW link

HCH Spec­u­la­tion Post #2A

Charlie SteinerMar 17, 2021, 1:26 PM
42 points
7 comments9 min readLW link

The Age of Imag­i­na­tive Machines

Yuli_BanMar 18, 2021, 12:35 AM
10 points
1 comment11 min readLW link

Gen­er­al­iz­ing POWER to multi-agent games

Mar 22, 2021, 2:41 AM
52 points
17 comments7 min readLW link

My re­search methodology

paulfchristianoMar 22, 2021, 9:20 PM
148 points
36 comments16 min readLW link
(ai-alignment.com)

“In­fra-Bayesi­anism with Vanessa Kosoy” – Watch/​Dis­cuss Party

Ben PaceMar 22, 2021, 11:44 PM
27 points
45 comments1 min readLW link

Prefer­ences and bi­ases, the in­for­ma­tion argument

Stuart_ArmstrongMar 23, 2021, 12:44 PM
14 points
5 comments1 min readLW link

[AN #143]: How to make em­bed­ded agents that rea­son prob­a­bil­is­ti­cally about their environments

Rohin ShahMar 24, 2021, 5:20 PM
13 points
3 comments8 min readLW link
(mailchi.mp)

Toy model of prefer­ence, bias, and ex­tra information

Stuart_ArmstrongMar 24, 2021, 10:14 AM
9 points
0 comments4 min readLW link

On lan­guage mod­el­ing and fu­ture ab­stract rea­son­ing research

alexlyzhovMar 25, 2021, 5:43 PM
3 points
1 comment1 min readLW link
(docs.google.com)

In­framea­sures and Do­main Theory

DiffractorMar 28, 2021, 9:19 AM
27 points
3 comments33 min readLW link

In­fra-Do­main Proofs 2

DiffractorMar 28, 2021, 9:15 AM
13 points
0 comments21 min readLW link

In­fra-Do­main proofs 1

DiffractorMar 28, 2021, 9:16 AM
13 points
0 comments23 min readLW link

Sce­nar­ios and Warn­ing Signs for Ajeya’s Ag­gres­sive, Con­ser­va­tive, and Best Guess AI Timelines

Kevin LiuMar 29, 2021, 1:38 AM
25 points
1 comment9 min readLW link
(kliu.io)

[Question] How do we pre­pare for fi­nal crunch time?

Eli TyreMar 30, 2021, 5:47 AM
116 points
30 comments8 min readLW link1 review

[Question] TAI?

Logan ZoellnerMar 30, 2021, 12:41 PM
12 points
8 comments1 min readLW link

A use for Clas­si­cal AI—Ex­pert Systems

GlpusnaMar 31, 2021, 2:37 AM
1 point
2 comments2 min readLW link

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_CritchMar 31, 2021, 11:50 PM
203 points
60 comments22 min readLW link

AI and the Prob­a­bil­ity of Conflict

tonyoconnorApr 1, 2021, 7:00 AM
8 points
10 comments8 min readLW link

“AI and Com­pute” trend isn’t pre­dic­tive of what is happening

alexlyzhovApr 2, 2021, 12:44 AM
133 points
15 comments1 min readLW link

[AN #144]: How lan­guage mod­els can also be fine­tuned for non-lan­guage tasks

Rohin ShahApr 2, 2021, 5:20 PM
19 points
0 comments6 min readLW link
(mailchi.mp)

2012 Robin Han­son com­ment on “In­tel­li­gence Ex­plo­sion: Ev­i­dence and Im­port”

Rob BensingerApr 2, 2021, 4:26 PM
28 points
4 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex FlintApr 2, 2021, 7:51 PM
59 points
4 comments7 min readLW link

[Question] How do scal­ing laws work for fine-tun­ing?

Daniel KokotajloApr 4, 2021, 12:18 PM
24 points
10 comments1 min readLW link

Avert­ing suffer­ing with sen­tience throt­tlers (pro­posal)

QuinnApr 5, 2021, 10:54 AM
8 points
7 comments3 min readLW link

Reflec­tive Bayesianism

abramdemskiApr 6, 2021, 7:48 PM
58 points
27 comments13 min readLW link

[Question] What will GPT-4 be in­ca­pable of?

Michaël TrazziApr 6, 2021, 7:57 PM
34 points
32 comments1 min readLW link

I Trained a Neu­ral Net­work to Play Helltaker

lsusrApr 7, 2021, 8:24 AM
29 points
5 comments3 min readLW link

[AN #145]: Our three year an­niver­sary!

Rohin ShahApr 9, 2021, 5:48 PM
19 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter Three Year Retrospective

Rohin ShahApr 7, 2021, 2:39 PM
55 points
0 comments5 min readLW link

Which coun­ter­fac­tu­als should an AI fol­low?

Stuart_ArmstrongApr 7, 2021, 4:47 PM
19 points
5 comments7 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven ByrnesApr 8, 2021, 3:14 PM
60 points
7 comments26 min readLW link

The Ja­panese Quiz: a Thought Ex­per­i­ment of Statis­ti­cal Epistemology

DanBApr 8, 2021, 5:37 PM
11 points
0 comments9 min readLW link

A pos­si­ble prefer­ence algorithm

Stuart_ArmstrongApr 8, 2021, 6:25 PM
22 points
0 comments4 min readLW link

If you don’t de­sign for ex­trap­o­la­tion, you’ll ex­trap­o­late poorly—pos­si­bly fatally

Stuart_ArmstrongApr 8, 2021, 6:10 PM
17 points
0 comments4 min readLW link

AXRP Epi­sode 6 - De­bate and Imi­ta­tive Gen­er­al­iza­tion with Beth Barnes

DanielFilanApr 8, 2021, 9:20 PM
24 points
3 comments59 min readLW link

My Cur­rent Take on Counterfactuals

abramdemskiApr 9, 2021, 5:51 PM
53 points
57 comments25 min readLW link

Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers

Apr 9, 2021, 7:19 PM
139 points
16 comments102 min readLW link

Why un­rig­gable *al­most* im­plies uninfluenceable

Stuart_ArmstrongApr 9, 2021, 5:07 PM
11 points
0 comments4 min readLW link

In­ter­mit­tent Distil­la­tions #2

Mark XuApr 14, 2021, 6:47 AM
32 points
4 comments9 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilanFeb 6, 2019, 9:50 PM
58 points
5 comments12 min readLW link
(danielfilan.com)

Su­per­ra­tional Agents Kelly Bet In­fluence!

abramdemskiApr 16, 2021, 10:08 PM
41 points
5 comments5 min readLW link

Defin­ing “op­ti­mizer”

ChantielApr 17, 2021, 3:38 PM
9 points
6 comments1 min readLW link

Alex Flint on “A soft­ware en­g­ineer’s per­spec­tive on log­i­cal in­duc­tion”

RaemonApr 17, 2021, 6:56 AM
21 points
8 comments1 min readLW link

[Question] Pa­ram­e­ter count of ML sys­tems through time?

JsevillamolApr 19, 2021, 12:54 PM
31 points
4 comments1 min readLW link

Gra­da­tions of In­ner Align­ment Obstacles

abramdemskiApr 20, 2021, 10:18 PM
80 points
22 comments9 min readLW link

Where are in­ten­tions to be found?

Alex FlintApr 21, 2021, 12:51 AM
44 points
12 comments9 min readLW link

[AN #147]: An overview of the in­ter­pretabil­ity landscape

Rohin ShahApr 21, 2021, 5:10 PM
14 points
2 comments7 min readLW link
(mailchi.mp)

NTK/​GP Models of Neu­ral Nets Can’t Learn Features

intersticeApr 22, 2021, 3:01 AM
31 points
33 comments3 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky WilkinsenApr 22, 2021, 8:37 PM
5 points
5 comments1 min readLW link

Prob­a­bil­ity the­ory and log­i­cal in­duc­tion as lenses

Alex FlintApr 23, 2021, 2:41 AM
43 points
7 comments6 min readLW link

Nat­u­ral­ism and AI alignment

Michele CampoloApr 24, 2021, 4:16 PM
5 points
12 comments8 min readLW link

Mal­i­cious non-state ac­tors and AI safety

ketiApr 25, 2021, 3:21 AM
2 points
13 comments2 min readLW link

An­nounc­ing the Align­ment Re­search Center

paulfchristianoApr 26, 2021, 11:30 PM
177 points
6 comments1 min readLW link
(ai-alignment.com)

[Linkpost] Treach­er­ous turns in the wild

Mark XuApr 26, 2021, 10:51 PM
31 points
6 comments1 min readLW link
(lukemuehlhauser.com)

FAQ: Ad­vice for AI Align­ment Researchers

Rohin ShahApr 26, 2021, 6:59 PM
67 points
2 comments1 min readLW link
(rohinshah.com)

Pit­falls of the agent model

Alex FlintApr 27, 2021, 10:19 PM
19 points
4 comments20 min readLW link

[AN #148]: An­a­lyz­ing gen­er­al­iza­tion across more axes than just ac­cu­racy or loss

Rohin ShahApr 28, 2021, 6:30 PM
24 points
5 comments11 min readLW link
(mailchi.mp)

AMA: Paul Chris­ti­ano, al­ign­ment researcher

paulfchristianoApr 28, 2021, 6:55 PM
117 points
198 comments1 min readLW link

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June KuApr 29, 2021, 3:38 PM
21 points
7 comments1 min readLW link

Low-stakes alignment

paulfchristianoApr 30, 2021, 12:10 AM
70 points
9 comments7 min readLW link1 review
(ai-alignment.com)

[Weekly Event] Align­ment Re­searcher Coffee Time (in Walled Gar­den)

adamShimiMay 2, 2021, 12:59 PM
37 points
0 comments1 min readLW link

Pars­ing Abram on Gra­da­tions of In­ner Align­ment Obstacles

Alex FlintMay 4, 2021, 5:44 PM
22 points
4 comments6 min readLW link

Mun­dane solu­tions to ex­otic problems

paulfchristianoMay 4, 2021, 6:20 PM
56 points
8 comments5 min readLW link
(ai-alignment.com)

April 15, 2040

NisanMay 4, 2021, 9:18 PM
97 points
19 comments2 min readLW link

[AN #149]: The newslet­ter’s ed­i­to­rial policy

Rohin ShahMay 5, 2021, 5:10 PM
19 points
3 comments8 min readLW link
(mailchi.mp)

Pars­ing Chris Min­gard on Neu­ral Networks

Alex FlintMay 6, 2021, 10:16 PM
67 points
27 comments6 min readLW link

Life and ex­pand­ing steer­able consequences

Alex FlintMay 7, 2021, 6:33 PM
46 points
3 comments4 min readLW link

Do­main The­ory and the Pri­soner’s Dilemma: FairBot

GurkenglasMay 7, 2021, 7:33 AM
14 points
5 comments2 min readLW link

Pre-Train­ing + Fine-Tun­ing Fa­vors Deception

Mark XuMay 8, 2021, 6:36 PM
27 points
2 comments3 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time (05/​10)

adamShimiMay 9, 2021, 11:05 AM
16 points
2 comments1 min readLW link

[Question] Is driv­ing worth the risk?

Adam ZernerMay 11, 2021, 5:04 AM
26 points
29 comments7 min readLW link

Yam­polskiy on AI Risk Skepticism

Gordon Seidoh WorleyMay 11, 2021, 2:50 PM
15 points
5 comments1 min readLW link
(www.researchgate.net)

Hu­man pri­ors, fea­tures and mod­els, lan­guages, and Sol­monoff induction

Stuart_ArmstrongMay 10, 2021, 10:55 AM
16 points
2 comments4 min readLW link

[AN #150]: The sub­types of Co­op­er­a­tive AI research

Rohin ShahMay 12, 2021, 5:20 PM
15 points
0 comments6 min readLW link
(mailchi.mp)

Un­der­stand­ing the Lot­tery Ticket Hy­poth­e­sis

Alex FlintMay 14, 2021, 12:25 AM
50 points
9 comments8 min readLW link

Con­cern­ing not get­ting lost

Alex FlintMay 14, 2021, 7:38 PM
50 points
9 comments4 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time (05/​17)

adamShimiMay 15, 2021, 10:07 PM
7 points
0 comments1 min readLW link

Op­ti­miz­ers: To Define or not to Define

J BostockMay 16, 2021, 7:55 PM
4 points
0 comments4 min readLW link

In­ter­mit­tent Distil­la­tions #3

Mark XuMay 15, 2021, 7:13 AM
19 points
1 comment11 min readLW link

AXRP Epi­sode 7 - Side Effects with Vic­to­ria Krakovna

DanielFilanMay 14, 2021, 3:50 AM
34 points
6 comments43 min readLW link

Sav­ing Time

Scott GarrabrantMay 18, 2021, 8:11 PM
131 points
19 comments4 min readLW link

[Question] Are there any meth­ods for NNs or other ML sys­tems to get in­for­ma­tion from knock­out-like or as­say-like ex­per­i­ments?

J BostockMay 18, 2021, 9:33 PM
2 points
1 comment1 min readLW link

SGD’s Bias

johnswentworthMay 18, 2021, 11:19 PM
60 points
16 comments3 min readLW link

This Sun­day, 12PM PT: Scott Garrabrant on “Finite Fac­tored Sets”

RaemonMay 19, 2021, 1:48 AM
33 points
4 comments1 min readLW link

[AN #151]: How spar­sity in the fi­nal layer makes a neu­ral net debuggable

Rohin ShahMay 19, 2021, 5:20 PM
19 points
0 comments6 min readLW link
(mailchi.mp)

The Vari­a­tional Char­ac­ter­i­za­tion of KL-Diver­gence, Er­ror Catas­tro­phes, and Generalization

Past AccountMay 20, 2021, 8:57 PM
38 points
5 comments3 min readLW link

Or­a­cles, In­form­ers, and Controllers

ozziegooenMay 25, 2021, 2:16 PM
15 points
2 comments3 min readLW link

Knowl­edge is not just map/​ter­ri­tory resemblance

Alex FlintMay 25, 2021, 5:58 PM
28 points
4 comments3 min readLW link

MDP mod­els are de­ter­mined by the agent ar­chi­tec­ture and the en­vi­ron­men­tal dynamics

TurnTroutMay 26, 2021, 12:14 AM
23 points
34 comments3 min readLW link

[Question] List of good AI safety pro­ject ideas?

Aryeh EnglanderMay 26, 2021, 10:36 PM
24 points
8 comments1 min readLW link

AXRP Epi­sode 7.5 - Fore­cast­ing Trans­for­ma­tive AI from Biolog­i­cal An­chors with Ajeya Cotra

DanielFilanMay 28, 2021, 12:20 AM
24 points
1 comment67 min readLW link

Pre­dict re­sponses to the “ex­is­ten­tial risk from AI” survey

Rob BensingerMay 28, 2021, 1:32 AM
44 points
6 comments2 min readLW link

Teach­ing ML to an­swer ques­tions hon­estly in­stead of pre­dict­ing hu­man answers

paulfchristianoMay 28, 2021, 5:30 PM
53 points
18 comments16 min readLW link
(ai-alignment.com)

The blue-min­imis­ing robot and model splintering

Stuart_ArmstrongMay 28, 2021, 3:09 PM
13 points
4 comments3 min readLW link1 review

[Question] Use of GPT-3 for iden­ti­fy­ing Phish­ing and other email based at­tacks?

jmhMay 29, 2021, 5:11 PM
6 points
0 comments1 min readLW link

[Event] Weekly Align­ment Re­search Coffee Time

adamShimiMay 29, 2021, 1:26 PM
12 points
5 comments1 min readLW link

What is the most effec­tive way to donate to AGI XRisk miti­ga­tion?

JoshuaFoxMay 30, 2021, 11:08 AM
44 points
11 comments1 min readLW link

“Ex­is­ten­tial risk from AI” sur­vey results

Rob BensingerJun 1, 2021, 8:02 PM
56 points
8 comments11 min readLW link

April 2021 Gw­ern.net newsletter

gwernJun 3, 2021, 3:13 PM
20 points
0 comments1 min readLW link
(www.gwern.net)

The un­der­ly­ing model of a morphism

Stuart_ArmstrongJun 4, 2021, 10:29 PM
10 points
0 comments5 min readLW link

We need a stan­dard set of com­mu­nity ad­vice for how to fi­nan­cially pre­pare for AGI

GeneSmithJun 7, 2021, 7:24 AM
50 points
53 comments5 min readLW link

Some AI Gover­nance Re­search Ideas

Jun 7, 2021, 2:40 PM
29 points
2 comments2 min readLW link

Big pic­ture of pha­sic dopamine

Steven ByrnesJun 8, 2021, 1:07 PM
59 points
18 comments36 min readLW link

Bayeswatch 6: Mechwarrior

lsusrJun 7, 2021, 8:20 PM
47 points
8 comments2 min readLW link

Spec­u­la­tions against GPT-n writ­ing al­ign­ment papers

Donald HobsonJun 7, 2021, 9:13 PM
31 points
6 comments2 min readLW link

The re­verse Good­hart problem

Stuart_ArmstrongJun 8, 2021, 3:48 PM
16 points
22 comments1 min readLW link

Against intelligence

George3d6Jun 8, 2021, 1:03 PM
12 points
17 comments10 min readLW link
(cerebralab.com)

Danger­ous op­ti­mi­sa­tion in­cludes var­i­ance minimisation

Stuart_ArmstrongJun 8, 2021, 11:34 AM
32 points
5 comments2 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

Jun 8, 2021, 5:12 PM
60 points
11 comments7 min readLW link

AXRP Epi­sode 8 - As­sis­tance Games with Dy­lan Had­field-Menell

DanielFilanJun 8, 2021, 11:20 PM
22 points
1 comment71 min readLW link

“De­ci­sion Trans­former” (Tool AIs are se­cret Agent AIs)

gwernJun 9, 2021, 1:06 AM
37 points
4 comments1 min readLW link
(sites.google.com)

Evan Hub­inger on Ho­mo­gene­ity in Take­off Speeds, Learned Op­ti­miza­tion and Interpretability

Michaël TrazziJun 8, 2021, 7:20 PM
28 points
0 comments55 min readLW link

A naive al­ign­ment strat­egy and op­ti­mism about generalization

paulfchristianoJun 10, 2021, 12:10 AM
44 points
4 comments3 min readLW link
(ai-alignment.com)

Knowl­edge is not just mu­tual information

Alex FlintJun 10, 2021, 1:01 AM
27 points
6 comments4 min readLW link

The Ap­pren­tice Experiment

johnswentworthJun 10, 2021, 3:29 AM
148 points
11 comments4 min readLW link

[Question] ML is now au­tomat­ing parts of chip R&D. How big a deal is this?

Daniel KokotajloJun 10, 2021, 9:51 AM
45 points
17 comments1 min readLW link

Oh No My AI (Filk)

Gordon Seidoh WorleyJun 11, 2021, 3:05 PM
42 points
7 comments1 min readLW link

May 2021 Gw­ern.net newsletter

gwernJun 11, 2021, 2:13 PM
31 points
0 comments1 min readLW link
(www.gwern.net)

[Question] What other prob­lems would a suc­cess­ful AI safety al­gorithm solve?

DirectedEvolutionJun 13, 2021, 9:07 PM
12 points
4 comments1 min readLW link

Avoid­ing the in­stru­men­tal policy by hid­ing in­for­ma­tion about humans

paulfchristianoJun 13, 2021, 8:00 PM
31 points
2 comments2 min readLW link

An­swer­ing ques­tions hon­estly given world-model mismatches

paulfchristianoJun 13, 2021, 6:00 PM
34 points
2 comments16 min readLW link
(ai-alignment.com)

Vignettes Work­shop (AI Im­pacts)

Daniel KokotajloJun 15, 2021, 12:05 PM
47 points
3 comments1 min readLW link

Three Paths to Ex­is­ten­tial Risk from AI

harsimonyJun 16, 2021, 1:37 AM
1 point
2 comments1 min readLW link
(harsimony.wordpress.com)

[AN #152]: How we’ve over­es­ti­mated few-shot learn­ing capabilities

Rohin ShahJun 16, 2021, 5:20 PM
22 points
6 comments8 min readLW link
(mailchi.mp)

AI-Based Code Gen­er­a­tion Us­ing GPT-J-6B

Tomás B.Jun 16, 2021, 3:05 PM
21 points
15 comments1 min readLW link
(minimaxir.com)

In­suffi­cient Values

Jun 16, 2021, 2:33 PM
29 points
15 comments5 min readLW link

[Question] Pros and cons of work­ing on near-term tech­ni­cal AI safety and assurance

Aryeh EnglanderJun 17, 2021, 8:17 PM
11 points
1 comment2 min readLW link

Non-poi­sonous cake: an­thropic up­dates are normal

Stuart_ArmstrongJun 18, 2021, 2:51 PM
27 points
11 comments2 min readLW link

Knowl­edge is not just pre­cip­i­ta­tion of action

Alex FlintJun 18, 2021, 11:26 PM
21 points
6 comments7 min readLW link

I’m no longer sure that I buy dutch book ar­gu­ments and this makes me skep­ti­cal of the “util­ity func­tion” abstraction

Eli TyreJun 22, 2021, 3:53 AM
45 points
29 comments4 min readLW link

Fre­quent ar­gu­ments about alignment

John SchulmanJun 23, 2021, 12:46 AM
95 points
16 comments5 min readLW link

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

Jun 23, 2021, 11:23 PM
63 points
5 comments9 min readLW link

[AN #153]: Ex­per­i­ments that demon­strate failures of ob­jec­tive robustness

Rohin ShahJun 26, 2021, 5:10 PM
25 points
1 comment8 min readLW link
(mailchi.mp)

An­throp­ics and Embed­ded Agency

dadadarrenJun 26, 2021, 1:45 AM
7 points
2 comments2 min readLW link

Deep limi­ta­tions? Ex­am­in­ing ex­pert dis­agree­ment over deep learning

Richard_NgoJun 27, 2021, 12:55 AM
17 points
5 comments1 min readLW link
(link.springer.com)

Finite Fac­tored Sets: LW tran­script with run­ning commentary

Jun 27, 2021, 4:02 PM
30 points
0 comments51 min readLW link

Brute force search­ing for alignment

Donald HobsonJun 27, 2021, 9:54 PM
23 points
3 comments2 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 5

RemmeltJun 28, 2021, 3:15 PM
24 points
0 comments6 min readLW link

Search by abstraction

p.b.Jun 29, 2021, 8:56 PM
4 points
0 comments1 min readLW link

[Question] Is there a “co­her­ent de­ci­sions im­ply con­sis­tent util­ities”-style ar­gu­ment for non-lex­i­co­graphic prefer­ences?

TetraspaceJun 29, 2021, 7:14 PM
3 points
20 comments1 min readLW link

Try­ing to ap­prox­i­mate Statis­ti­cal Models as Scor­ing Tables

JsevillamolJun 29, 2021, 5:20 PM
18 points
2 comments9 min readLW link

Do in­co­her­ent en­tities have stronger rea­son to be­come more co­her­ent than less?

KatjaGraceJun 30, 2021, 5:50 AM
46 points
5 comments4 min readLW link
(worldspiritsockpuppet.com)

[AN #154]: What eco­nomic growth the­ory has to say about trans­for­ma­tive AI

Rohin ShahJun 30, 2021, 5:20 PM
12 points
0 comments9 min readLW link
(mailchi.mp)

Progress on Causal In­fluence Diagrams

tom4everittJun 30, 2021, 3:34 PM
71 points
6 comments9 min readLW link

Could Ad­vanced AI Drive Ex­plo­sive Eco­nomic Growth?

Matthew BarnettJun 30, 2021, 10:17 PM
15 points
4 comments2 min readLW link
(www.openphilanthropy.org)

Ex­per­i­men­tally eval­u­at­ing whether hon­esty generalizes

paulfchristianoJul 1, 2021, 5:47 PM
99 points
23 comments9 min readLW link

Should VS Would and New­comb’s Paradox

dadadarrenJul 3, 2021, 11:45 PM
5 points
36 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg SeverensJul 3, 2021, 9:23 PM
4 points
0 comments1 min readLW link

An­thropic Effects in Es­ti­mat­ing Evolu­tion Difficulty

Mark XuJul 5, 2021, 4:02 AM
12 points
2 comments3 min readLW link

A sim­ple ex­am­ple of con­di­tional or­thog­o­nal­ity in finite fac­tored sets

DanielFilanJul 6, 2021, 12:36 AM
43 points
3 comments5 min readLW link
(danielfilan.com)

[Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

tgbJul 6, 2021, 3:17 PM
7 points
10 comments1 min readLW link

A sec­ond ex­am­ple of con­di­tional or­thog­o­nal­ity in finite fac­tored sets

DanielFilanJul 7, 2021, 1:40 AM
46 points
0 comments2 min readLW link
(danielfilan.com)

Agency and the un­re­li­able au­tonomous car

Alex FlintJul 7, 2021, 2:58 PM
29 points
24 comments10 min readLW link

How much chess en­g­ine progress is about adapt­ing to big­ger com­put­ers?

paulfchristianoJul 7, 2021, 10:35 PM
114 points
23 comments6 min readLW link

BASALT: A Bench­mark for Learn­ing from Hu­man Feedback

Rohin ShahJul 8, 2021, 5:40 PM
56 points
20 comments2 min readLW link
(bair.berkeley.edu)

[AN #155]: A Minecraft bench­mark for al­gorithms that learn with­out re­ward functions

Rohin ShahJul 8, 2021, 5:20 PM
21 points
5 comments7 min readLW link
(mailchi.mp)

Look­ing for Col­lab­o­ra­tors for an AGI Re­search Project

Rafael CosmanJul 8, 2021, 5:01 PM
3 points
5 comments3 min readLW link

Jack­pot! An AI Vignette

Ben GoldhaberJul 8, 2021, 8:32 PM
13 points
0 comments2 min readLW link

In­ter­mit­tent Distil­la­tions #4: Semi­con­duc­tors, Eco­nomics, In­tel­li­gence, and Tech­nolog­i­cal Progress.

Mark XuJul 8, 2021, 10:14 PM
81 points
9 comments10 min readLW link

Finite Fac­tored Sets: Con­di­tional Orthogonality

Scott GarrabrantJul 9, 2021, 6:01 AM
27 points
2 comments7 min readLW link

The ac­cu­mu­la­tion of knowl­edge: liter­a­ture review

Alex FlintJul 10, 2021, 6:36 PM
29 points
3 comments7 min readLW link

The in­escapa­bil­ity of knowledge

Alex FlintJul 11, 2021, 10:59 PM
28 points
17 comments5 min readLW link

[Link] Musk’s non-miss­ing mood

jimrandomhJul 12, 2021, 10:09 PM
70 points
21 comments1 min readLW link
(lukemuehlhauser.com)

[Question] What will the twen­ties look like if AGI is 30 years away?

Daniel KokotajloJul 13, 2021, 8:14 AM
29 points
18 comments1 min readLW link

An­swer­ing ques­tions hon­estly in­stead of pre­dict­ing hu­man an­swers: lots of prob­lems and some solutions

evhubJul 13, 2021, 6:49 PM
53 points
25 comments31 min readLW link

Model-based RL, De­sires, Brains, Wireheading

Steven ByrnesJul 14, 2021, 3:11 PM
17 points
1 comment13 min readLW link

A closer look at chess scal­ings (into the past)

hippkeJul 15, 2021, 8:13 AM
49 points
14 comments4 min readLW link

AlphaFold 2 pa­per re­leased: “Highly ac­cu­rate pro­tein struc­ture pre­dic­tion with AlphaFold”, Jumper et al 2021

gwernJul 15, 2021, 7:27 PM
39 points
10 comments1 min readLW link
(www.nature.com)

Bench­mark­ing an old chess en­g­ine on new hardware

hippkeJul 16, 2021, 7:58 AM
71 points
3 comments5 min readLW link

[AN #156]: The scal­ing hy­poth­e­sis: a plan for build­ing AGI

Rohin ShahJul 16, 2021, 5:10 PM
44 points
20 comments8 min readLW link
(mailchi.mp)

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Stuart_ArmstrongJul 16, 2021, 11:39 PM
15 points
1 comment6 min readLW link

(2009) Shane Legg—Fund­ing safe AGI

Tomás B.Jul 17, 2021, 4:46 PM
36 points
2 comments1 min readLW link
(www.vetta.org)

[Question] Equiv­a­lent of In­for­ma­tion The­ory but for Com­pu­ta­tion?

J BostockJul 17, 2021, 9:38 AM
5 points
27 comments1 min readLW link

A Models-cen­tric Ap­proach to Cor­rigible Alignment

J BostockJul 17, 2021, 5:27 PM
2 points
0 comments6 min readLW link

A model of de­ci­sion-mak­ing in the brain (the short ver­sion)

Steven ByrnesJul 18, 2021, 2:39 PM
20 points
0 comments3 min readLW link

[Question] Any tax­onomies of con­scious ex­pe­rience?

JohnDavidBustardJul 18, 2021, 6:28 PM
7 points
10 comments1 min readLW link

[Question] Work on Bayesian fit­ting of AI trends of perfor­mance?

JsevillamolJul 19, 2021, 6:45 PM
3 points
0 comments1 min readLW link

Some thoughts on David Rood­man’s GWP model and its re­la­tion to AI timelines

Tom DavidsonJul 19, 2021, 10:59 PM
30 points
1 comment8 min readLW link

In search of benev­olence (or: what should you get Clippy for Christ­mas?)

Joe CarlsmithJul 20, 2021, 1:12 AM
20 points
0 comments33 min readLW link

En­tropic bound­ary con­di­tions to­wards safe ar­tifi­cial superintelligence

Santiago Nunez-CorralesJul 20, 2021, 10:15 PM
3 points
0 comments2 min readLW link
(www.tandfonline.com)

Re­ward splin­ter­ing for AI design

Stuart_ArmstrongJul 21, 2021, 4:13 PM
30 points
1 comment8 min readLW link

Re-Define In­tent Align­ment?

abramdemskiJul 22, 2021, 7:00 PM
27 points
33 comments4 min readLW link

[AN #157]: Mea­sur­ing mis­al­ign­ment in the tech­nol­ogy un­der­ly­ing Copilot

Rohin ShahJul 23, 2021, 5:20 PM
28 points
18 comments7 min readLW link
(mailchi.mp)

Ex­am­ples of hu­man-level AI run­ning un­al­igned.

df fdJul 23, 2021, 8:49 AM
−3 points
0 comments2 min readLW link
(sortale.substack.com)

AXRP Epi­sode 10 - AI’s Fu­ture and Im­pacts with Katja Grace

DanielFilanJul 23, 2021, 10:10 PM
34 points
2 comments76 min readLW link

Wanted: Foom-scared al­ign­ment re­search partner

Icarus GallagherJul 26, 2021, 7:23 PM
40 points
5 comments1 min readLW link

Re­fac­tor­ing Align­ment (at­tempt #2)

abramdemskiJul 26, 2021, 8:12 PM
46 points
17 comments8 min readLW link

[Question] How much com­pute was used to train Deep­Mind’s gen­er­ally ca­pa­ble agents?

Daniel KokotajloJul 29, 2021, 11:34 AM
32 points
11 comments1 min readLW link

[Question] Did they or didn’t they learn tool use?

Daniel KokotajloJul 29, 2021, 1:26 PM
16 points
8 comments1 min readLW link

[AN #158]: Should we be op­ti­mistic about gen­er­al­iza­tion?

Rohin ShahJul 29, 2021, 5:20 PM
19 points
0 comments8 min readLW link
(mailchi.mp)

[Question] Very Un­nat­u­ral Tasks?

OrfeasJul 31, 2021, 9:22 PM
4 points
5 comments1 min readLW link

[Question] Is iter­ated am­plifi­ca­tion re­ally more pow­er­ful than imi­ta­tion?

ChantielAug 2, 2021, 11:20 PM
5 points
0 comments2 min readLW link

What does GPT-3 un­der­stand? Sym­bol ground­ing and Chi­nese rooms

Stuart_ArmstrongAug 3, 2021, 1:14 PM
40 points
15 comments12 min readLW link

Garrabrant and Shah on hu­man mod­el­ing in AGI

Rob BensingerAug 4, 2021, 4:35 AM
57 points
10 comments47 min readLW link

Value load­ing in the hu­man brain: a worked example

Steven ByrnesAug 4, 2021, 5:20 PM
45 points
2 comments8 min readLW link

[AN #159]: Build­ing agents that know how to ex­per­i­ment, by train­ing on pro­ce­du­rally gen­er­ated games

Rohin ShahAug 4, 2021, 5:10 PM
18 points
4 comments14 min readLW link
(mailchi.mp)

[Question] How many pa­ram­e­ters do self-driv­ing-car neu­ral nets have?

Daniel KokotajloAug 6, 2021, 11:24 AM
9 points
3 comments1 min readLW link

Rage Against The MOOChine

BoraskoAug 7, 2021, 5:57 PM
20 points
12 comments7 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimiAug 8, 2021, 1:05 PM
36 points
3 comments5 min readLW link1 review

In­stru­men­tal Con­ver­gence: Power as Rademacher Complexity

Past AccountAug 12, 2021, 4:02 PM
6 points
0 comments3 min readLW link

A new defi­ni­tion of “op­ti­mizer”

ChantielAug 9, 2021, 1:42 PM
5 points
0 comments7 min readLW link

Goal-Direct­ed­ness and Be­hav­ior, Redux

adamShimiAug 9, 2021, 2:26 PM
14 points
4 comments2 min readLW link

Au­tomat­ing Au­dit­ing: An am­bi­tious con­crete tech­ni­cal re­search proposal

evhubAug 11, 2021, 8:32 PM
77 points
9 comments14 min readLW link1 review

Some crite­ria for sand­wich­ing projects

dmzAug 12, 2021, 3:40 AM
18 points
1 comment4 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimiAug 12, 2021, 8:37 PM
11 points
9 comments4 min readLW link

[AN #160]: Build­ing AIs that learn and think like people

Rohin ShahAug 13, 2021, 5:10 PM
28 points
6 comments10 min readLW link
(mailchi.mp)

[Question] How would the Scal­ing Hy­poth­e­sis change things?

Aryeh EnglanderAug 13, 2021, 3:42 PM
4 points
4 comments1 min readLW link

A re­view of “Agents and De­vices”

adamShimiAug 13, 2021, 8:42 AM
10 points
0 comments4 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimiAug 14, 2021, 3:16 PM
16 points
8 comments8 min readLW link

[Question] What are some open ex­po­si­tion prob­lems in AI?

Sai Sasank YAug 16, 2021, 3:05 PM
4 points
2 comments1 min readLW link

Think­ing about AI re­la­tion­ally

TekhneMakreAug 16, 2021, 10:03 PM
5 points
0 comments2 min readLW link

Finite Fac­tored Sets: Polyno­mi­als and Probability

Scott GarrabrantAug 17, 2021, 9:53 PM
21 points
2 comments8 min readLW link

How Deep­Mind’s Gen­er­ally Ca­pable Agents Were Trained

1a3ornAug 20, 2021, 6:52 PM
87 points
6 comments19 min readLW link

[AN #161]: Creat­ing gen­er­al­iz­able re­ward func­tions for mul­ti­ple tasks by learn­ing a model of func­tional similarity

Rohin ShahAug 20, 2021, 5:20 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

Im­pli­ca­tion of AI timelines on plan­ning and solutions

JJ HepburnAug 21, 2021, 5:12 AM
18 points
5 comments2 min readLW link

Au­tore­gres­sive Propaganda

lsusrAug 22, 2021, 2:18 AM
25 points
3 comments3 min readLW link

AI Risk for Epistemic Minimalists

Alex FlintAug 22, 2021, 3:39 PM
57 points
12 comments13 min readLW link1 review

The Codex Skep­tic FAQ

Michaël TrazziAug 24, 2021, 4:01 PM
49 points
24 comments2 min readLW link

How to turn money into AI safety?

Charlie SteinerAug 25, 2021, 10:49 AM
66 points
26 comments8 min readLW link

In­tro­duc­tion to Re­duc­ing Goodhart

Charlie SteinerAug 26, 2021, 6:38 PM
40 points
10 comments4 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos RamirezAug 27, 2021, 1:48 AM
29 points
17 comments8 min readLW link

[AN #162]: Foun­da­tion mod­els: a paradigm shift within AI

Rohin ShahAug 27, 2021, 5:20 PM
21 points
0 comments8 min readLW link
(mailchi.mp)

A short in­tro­duc­tion to ma­chine learning

Richard_NgoAug 30, 2021, 2:31 PM
67 points
0 comments8 min readLW link

[Question] What could small scale dis­asters from AI look like?

CharlesDAug 31, 2021, 3:52 PM
14 points
8 comments1 min readLW link

NIST AI Risk Man­age­ment Frame­work re­quest for in­for­ma­tion (RFI)

Aryeh EnglanderSep 1, 2021, 12:15 AM
15 points
0 comments2 min readLW link

Re­ward splin­ter­ing as re­verse of interpretability

Stuart_ArmstrongAug 31, 2021, 10:27 PM
10 points
0 comments1 min readLW link

What are bi­ases, any­way? Mul­ti­ple type signatures

Stuart_ArmstrongAug 31, 2021, 9:16 PM
11 points
0 comments3 min readLW link

Finite Fac­tored Sets: Applications

Scott GarrabrantAug 31, 2021, 9:19 PM
27 points
1 comment10 min readLW link

Finite Fac­tored Sets: In­fer­ring Time

Scott GarrabrantAug 31, 2021, 9:18 PM
17 points
5 comments4 min readLW link

US Mili­tary Global In­for­ma­tion Dom­i­nance Experiments

NunoSempereSep 1, 2021, 1:34 PM
25 points
0 comments4 min readLW link
(www.defense.gov)

Com­pe­tent Preferences

Charlie SteinerSep 2, 2021, 2:26 PM
27 points
2 comments6 min readLW link

For­mal­iz­ing Ob­jec­tions against Sur­ro­gate Goals

VojtaKovarikSep 2, 2021, 4:24 PM
5 points
23 comments20 min readLW link

[Question] Is there a name for the the­ory that “There will be fast take­off in real-world ca­pa­bil­ities be­cause al­most ev­ery­thing is AGI-com­plete”?

David Scott Krueger (formerly: capybaralet)Sep 2, 2021, 11:00 PM
31 points
8 comments1 min readLW link

Thoughts on gra­di­ent hacking

Richard_NgoSep 3, 2021, 1:02 PM
33 points
12 comments4 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippkeSep 3, 2021, 2:19 PM
5 points
14 comments1 min readLW link

All Pos­si­ble Views About Hu­man­ity’s Fu­ture Are Wild

HoldenKarnofskySep 3, 2021, 8:19 PM
140 points
40 comments8 min readLW link1 review

The Most Im­por­tant Cen­tury: Se­quence Introduction

HoldenKarnofskySep 3, 2021, 8:19 PM
68 points
5 comments4 min readLW link1 review

[Question] Are there sub­stan­tial re­search efforts to­wards al­ign­ing nar­row AIs?

RossinSep 4, 2021, 6:40 PM
11 points
4 comments2 min readLW link

Multi-Agent In­verse Re­in­force­ment Learn­ing: Subop­ti­mal De­mon­stra­tions and Alter­na­tive Solu­tion Concepts

sage_bergersonSep 7, 2021, 4:11 PM
5 points
0 comments1 min readLW link

Bayeswatch 7: Wildfire

lsusrSep 8, 2021, 5:35 AM
47 points
6 comments3 min readLW link

[AN #163]: Us­ing finite fac­tored sets for causal and tem­po­ral inference

Rohin ShahSep 8, 2021, 5:20 PM
38 points
0 comments10 min readLW link
(mailchi.mp)

Gra­di­ent de­scent is not just more effi­cient ge­netic algorithms

leogaoSep 8, 2021, 4:23 PM
54 points
14 comments1 min readLW link

Sam Alt­man Q&A Notes—Aftermath

p.b.Sep 8, 2021, 8:20 AM
45 points
35 comments2 min readLW link

[Question] Does blockchain tech­nol­ogy offer po­ten­tial solu­tions to some AI al­ign­ment prob­lems?

pilordSep 9, 2021, 4:51 PM
−4 points
8 comments2 min readLW link

Countably Fac­tored Spaces

DiffractorSep 9, 2021, 4:24 AM
47 points
3 comments18 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

BuckSep 9, 2021, 7:46 PM
87 points
12 comments5 min readLW link

GPT-X, DALL-E, and our Mul­ti­modal Fu­ture [video se­ries]

bakztfutureSep 9, 2021, 11:05 PM
0 points
1 comment1 min readLW link
(youtube.com)

Bayeswatch 8: Antimatter

lsusrSep 10, 2021, 5:01 AM
29 points
6 comments3 min readLW link

Mea­sure­ment, Op­ti­miza­tion, and Take-off Speed

jsteinhardtSep 10, 2021, 7:30 PM
47 points
4 comments13 min readLW link

Bayeswatch 9: Zombies

lsusrSep 11, 2021, 5:57 AM
41 points
15 comments3 min readLW link

[Question] Is MIRI’s read­ing list up to date?

Aryeh EnglanderSep 11, 2021, 6:56 PM
25 points
5 comments1 min readLW link

Soldiers, Scouts, and Al­ba­trosses.

JanSep 12, 2021, 10:36 AM
5 points
0 comments1 min readLW link
(universalprior.substack.com)

GPT-Aug­mented Blogging

lsusrSep 14, 2021, 11:55 AM
52 points
18 comments13 min readLW link

[AN #164]: How well can lan­guage mod­els write code?

Rohin ShahSep 15, 2021, 5:20 PM
13 points
7 comments9 min readLW link
(mailchi.mp)

I wanted to in­ter­view Eliezer Yud­kowsky but he’s busy so I simu­lated him instead

lsusrSep 16, 2021, 7:34 AM
110 points
33 comments5 min readLW link

Eco­nomic AI Safety

jsteinhardtSep 16, 2021, 8:50 PM
35 points
3 comments5 min readLW link

Jit­ters No Ev­i­dence of Stu­pidity in RL

1a3ornSep 16, 2021, 10:43 PM
82 points
18 comments3 min readLW link

Im­mo­bile AI makes a move: anti-wire­head­ing, on­tol­ogy change, and model splintering

Stuart_ArmstrongSep 17, 2021, 3:24 PM
32 points
3 comments2 min readLW link

Great Power Conflict

Zach Stein-PerlmanSep 17, 2021, 3:00 PM
11 points
7 comments4 min readLW link

The the­ory-prac­tice gap

BuckSep 17, 2021, 10:51 PM
133 points
14 comments6 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusrSep 20, 2021, 6:36 AM
70 points
16 comments6 min readLW link

AI, learn to be con­ser­va­tive, then learn to be less so: re­duc­ing side-effects, learn­ing pre­served fea­tures, and go­ing be­yond conservatism

Stuart_ArmstrongSep 20, 2021, 11:56 AM
14 points
4 comments3 min readLW link

Sig­moids be­hav­ing badly: arXiv paper

Stuart_ArmstrongSep 20, 2021, 10:29 AM
24 points
1 comment1 min readLW link

[Question] How much should you be will­ing to pay for an AGI?

Logan ZoellnerSep 20, 2021, 11:51 AM
11 points
5 comments1 min readLW link

An­nounc­ing the Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety!

DanielFilanSep 21, 2021, 12:33 AM
64 points
2 comments1 min readLW link
(grants.futureoflife.org)

Red­wood Re­search’s cur­rent project

BuckSep 21, 2021, 11:30 PM
143 points
29 comments15 min readLW link

[Question] What are good mod­els of col­lu­sion in AI?

EconomicModelSep 22, 2021, 3:16 PM
7 points
1 comment1 min readLW link

[AN #165]: When large mod­els are more likely to lie

Rohin ShahSep 22, 2021, 5:30 PM
23 points
0 comments8 min readLW link
(mailchi.mp)

Neu­ral net /​ de­ci­sion tree hy­brids: a po­ten­tial path to­ward bridg­ing the in­ter­pretabil­ity gap

Nathan Helm-BurgerSep 23, 2021, 12:38 AM
21 points
2 comments12 min readLW link

What is Com­pute? - Trans­for­ma­tive AI and Com­pute [1/​4]

lennartSep 23, 2021, 4:25 PM
24 points
8 comments19 min readLW link

Fore­cast­ing Trans­for­ma­tive AI, Part 1: What Kind of AI?

HoldenKarnofskySep 24, 2021, 12:46 AM
17 points
17 comments9 min readLW link

Path­ways: Google’s AGI

Lê Nguyên HoangSep 25, 2021, 7:02 AM
44 points
5 comments1 min readLW link

Cog­ni­tive Bi­ases in Large Lan­guage Models

JanSep 25, 2021, 8:59 PM
17 points
3 comments12 min readLW link
(universalprior.substack.com)

Trans­for­ma­tive AI and Com­pute [Sum­mary]

lennartSep 26, 2021, 11:41 AM
13 points
0 comments9 min readLW link

Beyond fire alarms: free­ing the groupstruck

KatjaGraceSep 26, 2021, 9:30 AM
81 points
15 comments54 min readLW link
(worldspiritsockpuppet.com)

[Question] Any write­ups on GPT agency?

OzyrusSep 26, 2021, 10:55 PM
4 points
6 comments1 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard HarrisSep 27, 2021, 3:55 PM
75 points
13 comments10 min readLW link

A Con­fused Chemist’s Re­view of AlphaFold 2

J BostockSep 27, 2021, 11:10 AM
23 points
4 comments5 min readLW link

[Question] Col­lec­tion of ar­gu­ments to ex­pect (outer and in­ner) al­ign­ment failure?

Sam ClarkeSep 28, 2021, 4:55 PM
20 points
10 comments1 min readLW link

Brain-in­spired AGI and the “life­time an­chor”

Steven ByrnesSep 29, 2021, 1:09 PM
64 points
16 comments13 min readLW link

[Question] What Heuris­tics Do You Use to Think About Align­ment Topics?

Logan RiggsSep 29, 2021, 2:31 AM
5 points
3 comments1 min readLW link

Bayeswatch 10: Spyware

lsusrSep 29, 2021, 7:01 AM
97 points
7 comments4 min readLW link

Un­solved ML Safety Problems

jsteinhardtSep 29, 2021, 4:00 PM
58 points
2 comments3 min readLW link
(bounded-regret.ghost.io)

Some Ex­ist­ing Selec­tion Theorems

johnswentworthSep 30, 2021, 4:13 PM
48 points
2 comments4 min readLW link

Fore­cast­ing Com­pute—Trans­for­ma­tive AI and Com­pute [2/​4]

lennartOct 2, 2021, 3:54 PM
17 points
0 comments19 min readLW link

Nu­clear Es­pi­onage and AI Governance

GuiveOct 4, 2021, 11:04 PM
26 points
5 comments24 min readLW link

Model­ling and Un­der­stand­ing SGD

J BostockOct 5, 2021, 1:41 PM
8 points
0 comments3 min readLW link

Force neu­ral nets to use mod­els, then de­tect these

Stuart_ArmstrongOct 5, 2021, 11:31 AM
17 points
8 comments2 min readLW link

[Question] Is GPT-3 already sam­ple-effi­cient?

Daniel KokotajloOct 6, 2021, 1:38 PM
36 points
32 comments1 min readLW link

Prefer­ences from (real and hy­po­thet­i­cal) psy­chol­ogy papers

Stuart_ArmstrongOct 6, 2021, 9:06 AM
15 points
0 comments2 min readLW link

Au­to­mated Fact Check­ing: A Look at the Field

HoagyOct 6, 2021, 11:52 PM
12 points
0 comments8 min readLW link

Safety-ca­pa­bil­ities trade­off di­als are in­evitable in AGI

Steven ByrnesOct 7, 2021, 7:03 PM
57 points
4 comments3 min readLW link

Bayeswatch 11: Parabellum

lsusrOct 9, 2021, 7:08 AM
32 points
12 comments2 min readLW link

Steel­man ar­gu­ments against the idea that AGI is in­evitable and will ar­rive soon

RomanSOct 9, 2021, 6:22 AM
19 points
13 comments4 min readLW link

In­tel­li­gence or Evolu­tion?

Ramana KumarOct 9, 2021, 5:14 PM
50 points
15 comments3 min readLW link

Bayeswatch 12: The Sin­gu­lar­ity War

lsusrOct 10, 2021, 1:04 AM
32 points
6 comments2 min readLW link

The Ex­trap­o­la­tion Problem

lsusrOct 10, 2021, 5:11 AM
25 points
8 comments2 min readLW link

The eval­u­a­tion func­tion of an AI is not its aim

Yair HalberstadtOct 10, 2021, 2:52 PM
13 points
5 comments3 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimiOct 11, 2021, 8:20 AM
97 points
11 comments15 min readLW link

Bayeswatch 13: Spaceship

lsusrOct 12, 2021, 9:35 PM
51 points
4 comments1 min readLW link

Com­pute Gover­nance and Con­clu­sions—Trans­for­ma­tive AI and Com­pute [3/​4]

lennartOct 14, 2021, 8:23 AM
13 points
0 comments5 min readLW link

Clas­si­cal sym­bol ground­ing and causal graphs

Stuart_ArmstrongOct 14, 2021, 6:04 PM
22 points
2 comments5 min readLW link

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

Sam BowmanOct 15, 2021, 8:57 PM
46 points
15 comments1 min readLW link

[Question] Memetic haz­ards of AGI ar­chi­tec­ture posts

OzyrusOct 16, 2021, 4:10 PM
9 points
12 comments1 min readLW link

[Pre­dic­tion] We are in an Al­gorith­mic Over­hang, Part 2

lsusrOct 17, 2021, 7:48 AM
20 points
29 comments2 min readLW link

Epistemic Strate­gies of Selec­tion Theorems

adamShimiOct 18, 2021, 8:57 AM
32 points
1 comment12 min readLW link

On The Risks of Emer­gent Be­hav­ior in Foun­da­tion Models

jsteinhardtOct 18, 2021, 8:00 PM
30 points
0 comments3 min readLW link
(bounded-regret.ghost.io)

Beyond the hu­man train­ing dis­tri­bu­tion: would the AI CEO cre­ate al­most-ille­gal ted­dies?

Stuart_ArmstrongOct 18, 2021, 9:10 PM
36 points
2 comments3 min readLW link

[AN #167]: Con­crete ML safety prob­lems and their rele­vance to x-risk

Rohin ShahOct 20, 2021, 5:10 PM
19 points
4 comments9 min readLW link
(mailchi.mp)

Bor­ing ma­chine learn­ing is where it’s at

George3d6Oct 20, 2021, 11:23 AM
28 points
16 comments3 min readLW link
(cerebralab.com)

AGI Safety Fun­da­men­tals cur­ricu­lum and application

Richard_NgoOct 20, 2021, 9:44 PM
67 points
0 comments8 min readLW link
(docs.google.com)

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimiOct 22, 2021, 8:22 AM
5 points
0 comments6 min readLW link

Gen­eral al­ign­ment plus hu­man val­ues, or al­ign­ment via hu­man val­ues?

Stuart_ArmstrongOct 22, 2021, 10:11 AM
45 points
27 comments3 min readLW link

Naive self-su­per­vised ap­proaches to truth­ful AI

ryan_greenblattOct 23, 2021, 1:03 PM
9 points
4 comments2 min readLW link

My ML Scal­ing bibliography

gwernOct 23, 2021, 2:41 PM
35 points
9 comments1 min readLW link
(www.gwern.net)

Selfish­ness, prefer­ence falsifi­ca­tion, and AI alignment

jessicataOct 28, 2021, 12:16 AM
52 points
29 comments13 min readLW link
(unstableontology.com)

[AN #168]: Four tech­ni­cal top­ics for which Open Phil is so­lic­it­ing grant proposals

Rohin ShahOct 28, 2021, 5:20 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

Fore­cast­ing progress in lan­guage models

Oct 28, 2021, 8:40 PM
54 points
5 comments11 min readLW link
(www.metaculus.com)

Re­quest for pro­pos­als for pro­jects in AI al­ign­ment that work with deep learn­ing systems

Oct 29, 2021, 7:26 AM
87 points
0 comments5 min readLW link

Interpretability

Oct 29, 2021, 7:28 AM
59 points
13 comments12 min readLW link

Truth­ful and hon­est AI

Oct 29, 2021, 7:28 AM
41 points
1 comment13 min readLW link

Mea­sur­ing and fore­cast­ing risks

Oct 29, 2021, 7:27 AM
20 points
0 comments12 min readLW link

Tech­niques for en­hanc­ing hu­man feedback

Oct 29, 2021, 7:27 AM
22 points
0 comments2 min readLW link

Stu­art Rus­sell and Me­lanie Mitchell on Munk Debates

Alex FlintOct 29, 2021, 7:13 PM
29 points
3 comments3 min readLW link

True Sto­ries of Al­gorith­mic Improvement

johnswentworthOct 29, 2021, 8:57 PM
91 points
7 comments5 min readLW link

Must true AI sleep?

YimbyGeorgeOct 30, 2021, 4:47 PM
0 points
1 comment1 min readLW link

Nate Soares on the Ul­ti­mate New­comb’s Problem

Rob BensingerOct 31, 2021, 7:42 PM
56 points
20 comments1 min readLW link

Models Model­ing Models

Charlie SteinerNov 2, 2021, 7:08 AM
20 points
5 comments10 min readLW link

[Question] What’s the differ­ence be­tween newer Atari-play­ing AI and the older Deep­mind one (from 2014)?

RaemonNov 2, 2021, 11:36 PM
27 points
8 comments1 min readLW link

Ap­ply to the ML for Align­ment Boot­camp (MLAB) in Berkeley [Jan 3 - Jan 22]

Nov 3, 2021, 6:22 PM
95 points
4 comments1 min readLW link

[Ex­ter­nal Event] 2022 IEEE In­ter­na­tional Con­fer­ence on As­sured Au­ton­omy (ICAA) - sub­mis­sion dead­line extended

Aryeh EnglanderNov 5, 2021, 3:29 PM
13 points
0 comments3 min readLW link

Y2K: Suc­cess­ful Prac­tice for AI Alignment

DarmaniNov 5, 2021, 6:09 AM
47 points
5 comments6 min readLW link

Some Re­marks on Reg­u­la­tor The­o­rems No One Asked For

Past AccountNov 5, 2021, 7:33 PM
19 points
1 comment4 min readLW link

How should we com­pare neu­ral net­work rep­re­sen­ta­tions?

jsteinhardtNov 5, 2021, 10:10 PM
24 points
0 comments3 min readLW link
(bounded-regret.ghost.io)

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

JanNov 5, 2021, 9:42 PM
41 points
2 comments12 min readLW link
(universalprior.substack.com)

Com­ments on OpenPhil’s In­ter­pretabil­ity RFP

paulfchristianoNov 5, 2021, 10:36 PM
84 points
5 comments7 min readLW link

How do we be­come con­fi­dent in the safety of a ma­chine learn­ing sys­tem?

evhubNov 8, 2021, 10:49 PM
92 points
2 comments32 min readLW link

[Question] What ex­actly is GPT-3′s base ob­jec­tive?

Daniel KokotajloNov 10, 2021, 12:57 AM
60 points
15 comments2 min readLW link

Re­lax­ation-Based Search, From Every­day Life To Un­fa­mil­iar Territory

johnswentworthNov 10, 2021, 9:47 PM
57 points
3 comments8 min readLW link

Us­ing blin­ders to help you see things for what they are

Adam ZernerNov 11, 2021, 7:07 AM
13 points
2 comments2 min readLW link

AGI is at least as far away as Nu­clear Fu­sion.

Logan ZoellnerNov 11, 2021, 9:33 PM
0 points
8 comments1 min readLW link

Mea­sur­ing and Fore­cast­ing Risks from AI

jsteinhardtNov 12, 2021, 2:30 AM
24 points
0 comments3 min readLW link
(bounded-regret.ghost.io)

Why I’m ex­cited about Red­wood Re­search’s cur­rent project

paulfchristianoNov 12, 2021, 7:26 PM
112 points
6 comments7 min readLW link

A Defense of Func­tional De­ci­sion Theory

HeighnNov 12, 2021, 8:59 PM
21 points
120 comments10 min readLW link

Com­ments on Car­l­smith’s “Is power-seek­ing AI an ex­is­ten­tial risk?”

So8resNov 13, 2021, 4:29 AM
137 points
13 comments40 min readLW link

[Question] What’s the like­li­hood of only sub ex­po­nen­tial growth for AGI?

M. Y. ZuoNov 13, 2021, 10:46 PM
5 points
22 comments1 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicqNov 14, 2021, 2:08 PM
2 points
3 comments2 min readLW link

My un­der­stand­ing of the al­ign­ment problem

danieldeweyNov 15, 2021, 6:13 PM
43 points
3 comments3 min readLW link

“Sum­ma­riz­ing Books with Hu­man Feed­back” (re­cur­sive GPT-3)

gwernNov 15, 2021, 5:41 PM
24 points
4 comments1 min readLW link
(openai.com)

Quan­tilizer ≡ Op­ti­mizer with a Bounded Amount of Output

itaibn0Nov 16, 2021, 1:03 AM
10 points
4 comments2 min readLW link

Two Stupid AI Align­ment Ideas

aphyerNov 16, 2021, 4:13 PM
24 points
3 comments4 min readLW link

[Question] What are the mu­tual benefits of AGI-hu­man col­lab­o­ra­tion that would oth­er­wise be un­ob­tain­able?

M. Y. ZuoNov 17, 2021, 3:09 AM
1 point
4 comments1 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimiNov 17, 2021, 9:42 PM
47 points
3 comments1 min readLW link

Ngo and Yud­kowsky on AI ca­pa­bil­ity gains

Nov 18, 2021, 10:19 PM
129 points
61 comments39 min readLW link

“Ac­qui­si­tion of Chess Knowl­edge in AlphaZero”: prob­ing AZ over time

jsdNov 18, 2021, 11:24 PM
11 points
9 comments1 min readLW link
(arxiv.org)

How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency

johnswentworthNov 19, 2021, 12:00 AM
314 points
33 comments13 min readLW link

Good­hart: Endgame

Charlie SteinerNov 19, 2021, 1:26 AM
23 points
3 comments8 min readLW link

More de­tailed pro­posal for mea­sur­ing al­ign­ment of cur­rent models

Beth BarnesNov 20, 2021, 12:03 AM
31 points
0 comments8 min readLW link

From lan­guage to ethics by au­to­mated reasoning

Michele CampoloNov 21, 2021, 3:16 PM
4 points
4 comments6 min readLW link

Mo­rally un­der­defined situ­a­tions can be deadly

Stuart_ArmstrongNov 22, 2021, 2:48 PM
17 points
8 comments2 min readLW link

Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds”

Eliezer YudkowskyNov 22, 2021, 7:35 PM
191 points
181 comments60 min readLW link1 review

Po­ten­tial Align­ment men­tal tool: Keep­ing track of the types

Donald HobsonNov 22, 2021, 8:05 PM
28 points
1 comment2 min readLW link

For­mal­iz­ing Policy-Mod­ifi­ca­tion Corrigibility

TurnTroutDec 3, 2021, 1:31 AM
23 points
6 comments6 min readLW link

[AN #169]: Col­lab­o­rat­ing with hu­mans with­out hu­man data

Rohin ShahNov 24, 2021, 6:30 PM
33 points
0 comments8 min readLW link
(mailchi.mp)

Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

Nov 25, 2021, 4:45 PM
117 points
95 comments68 min readLW link

Lat­a­cora might be of in­ter­est to some AI Safety organizations

NunoSempereNov 25, 2021, 11:57 PM
14 points
10 comments1 min readLW link
(www.latacora.com)

Solve Cor­rigi­bil­ity Week

Logan RiggsNov 28, 2021, 5:00 PM
39 points
21 comments1 min readLW link

TTS au­dio of “Ngo and Yud­kowsky on al­ign­ment difficulty”

Quintin PopeNov 28, 2021, 6:11 PM
4 points
3 comments1 min readLW link

Red­wood Re­search is hiring for sev­eral roles

Nov 29, 2021, 12:16 AM
44 points
0 comments1 min readLW link

Com­pute Re­search Ques­tions and Met­rics—Trans­for­ma­tive AI and Com­pute [4/​4]

lennartNov 28, 2021, 10:49 PM
6 points
0 comments16 min readLW link

Com­ments on Allan Dafoe on AI Governance

Alex FlintNov 29, 2021, 4:16 PM
13 points
0 comments7 min readLW link

Soares, Tal­linn, and Yud­kowsky dis­cuss AGI cognition

Nov 29, 2021, 7:26 PM
118 points
35 comments40 min readLW link

Self-study­ing to de­velop an in­side-view model of AI al­ign­ment; co-studiers wel­come!

Vael GatesNov 30, 2021, 9:25 AM
13 points
0 comments4 min readLW link

Ma­chine Agents, Hy­brid Su­per­in­tel­li­gences, and The Loss of Hu­man Con­trol (Chap­ter 1)

Justin BullockNov 30, 2021, 5:35 PM
4 points
0 comments8 min readLW link

AXRP Epi­sode 12 - AI Ex­is­ten­tial Risk with Paul Christiano

DanielFilanDec 2, 2021, 2:20 AM
36 points
0 comments125 min readLW link

Mo­ral­ity is Scary

Wei DaiDec 2, 2021, 6:35 AM
175 points
125 comments4 min readLW link

Syd­ney AI Safety Fellowship

Chris_LeongDec 2, 2021, 7:34 AM
22 points
0 comments2 min readLW link

$100/​$50 re­wards for good references

Stuart_ArmstrongDec 3, 2021, 4:55 PM
20 points
5 comments1 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan ZoellnerDec 3, 2021, 6:31 PM
7 points
5 comments1 min readLW link

[Linkpost] A Gen­eral Lan­guage As­sis­tant as a Lab­o­ra­tory for Alignment

Quintin PopeDec 3, 2021, 7:42 PM
37 points
2 comments2 min readLW link

Agency: What it is and why it matters

Daniel KokotajloDec 4, 2021, 9:32 PM
25 points
2 comments2 min readLW link

[Question] Are limited-hori­zon agents a good heuris­tic for the off-switch prob­lem?

Yonadav ShavitDec 5, 2021, 7:27 PM
5 points
19 comments1 min readLW link

In­tro­duc­tion to in­ac­cessible information

Ryan KiddDec 9, 2021, 1:28 AM
27 points
6 comments8 min readLW link

More Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

Dec 6, 2021, 8:33 PM
85 points
30 comments40 min readLW link

Ex­ter­mi­nat­ing hu­mans might be on the to-do list of a Friendly AI

RomanSDec 7, 2021, 2:15 PM
5 points
8 comments2 min readLW link

In­ter­views on Im­prov­ing the AI Safety Pipeline

Chris_LeongDec 7, 2021, 12:03 PM
55 points
16 comments17 min readLW link

Let’s buy out Cyc, for use in AGI in­ter­pretabil­ity sys­tems?

Steven ByrnesDec 7, 2021, 8:46 PM
47 points
10 comments2 min readLW link

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin ShahDec 8, 2021, 6:10 PM
21 points
1 comment7 min readLW link
(mailchi.mp)

[MLSN #2]: Ad­ver­sar­ial Training

Dan_HDec 9, 2021, 5:16 PM
26 points
0 comments3 min readLW link

Su­per­vised learn­ing and self-mod­el­ing: What’s “su­per­hu­man?”

Charlie SteinerDec 9, 2021, 12:44 PM
12 points
1 comment8 min readLW link

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob BensingerDec 12, 2021, 2:08 AM
66 points
37 comments7 min readLW link

Trans­form­ing my­opic op­ti­miza­tion to or­di­nary op­ti­miza­tion—Do we want to seek con­ver­gence for my­opic op­ti­miza­tion prob­lems?

tailcalledDec 11, 2021, 8:38 PM
12 points
1 comment5 min readLW link

Red­wood’s Tech­nique-Fo­cused Epistemic Strategy

adamShimiDec 12, 2021, 4:36 PM
48 points
1 comment7 min readLW link

[Question] [Re­solved] Who else prefers “AI al­ign­ment” to “AI safety?”

Evan_GaensbauerDec 13, 2021, 12:35 AM
5 points
8 comments1 min readLW link

Hard-Cod­ing Neu­ral Computation

MadHatterDec 13, 2021, 4:35 AM
32 points
8 comments27 min readLW link

Solv­ing In­ter­pretabil­ity Week

Logan RiggsDec 13, 2021, 5:09 PM
11 points
5 comments1 min readLW link

Un­der­stand­ing and con­trol­ling auto-in­duced dis­tri­bu­tional shift

L Rudolf LDec 13, 2021, 2:59 PM
26 points
3 comments16 min readLW link

Lan­guage Model Align­ment Re­search Internships

Ethan PerezDec 13, 2021, 7:53 PM
68 points
1 comment1 min readLW link

En­abling More Feed­back for AI Safety Researchers

frances_lorenzDec 13, 2021, 8:10 PM
17 points
0 comments3 min readLW link

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

Dec 14, 2021, 8:09 PM
212 points
88 comments1 min readLW link
(docs.google.com)

In­ter­lude: Agents as Automobiles

Daniel KokotajloDec 14, 2021, 6:49 PM
25 points
6 comments5 min readLW link

ARC is hiring!

Dec 14, 2021, 8:09 PM
62 points
2 comments1 min readLW link

Ngo’s view on al­ign­ment difficulty

Dec 14, 2021, 9:34 PM
63 points
7 comments17 min readLW link

The Nat­u­ral Ab­strac­tion Hy­poth­e­sis: Im­pli­ca­tions and Evidence

CallumMcDougallDec 14, 2021, 11:14 PM
30 points
8 comments19 min readLW link

Elic­i­ta­tion for Model­ing Trans­for­ma­tive AI Risks

DavidmanheimDec 16, 2021, 3:24 PM
30 points
2 comments9 min readLW link

Some mo­ti­va­tions to gra­di­ent hack

peterbarnettDec 17, 2021, 3:06 AM
8 points
0 comments6 min readLW link

In­tro­duc­ing the Prin­ci­ples of In­tel­li­gent Be­havi­our in Biolog­i­cal and So­cial Sys­tems (PIBBSS) Fellowship

adamShimiDec 18, 2021, 3:23 PM
51 points
4 comments10 min readLW link

[Question] Im­por­tant ML sys­tems from be­fore 2012?

JsevillamolDec 18, 2021, 12:12 PM
12 points
5 comments1 min readLW link

[Ex­tended Dead­line: Jan 23rd] An­nounc­ing the PIBBSS Sum­mer Re­search Fellowship

Nora_AmmannDec 18, 2021, 4:56 PM
6 points
1 comment1 min readLW link

Ex­plor­ing De­ci­sion The­o­ries With Coun­ter­fac­tu­als and Dy­namic Agent Self-Pointers

JoshuaOSHickmanDec 18, 2021, 9:50 PM
2 points
0 comments4 min readLW link

Don’t In­fluence the In­fluencers!

lhcDec 19, 2021, 9:02 AM
14 points
2 comments10 min readLW link

SGD Un­der­stood through Prob­a­bil­ity Current

J BostockDec 19, 2021, 11:26 PM
23 points
1 comment5 min readLW link

Worst-case think­ing in AI alignment

BuckDec 23, 2021, 1:29 AM
139 points
15 comments6 min readLW link

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 23, 2021, 2:06 PM
164 points
26 comments73 min readLW link

Re­ply to Eliezer on Biolog­i­cal Anchors

HoldenKarnofskyDec 23, 2021, 4:15 PM
146 points
46 comments15 min readLW link

Risks from AI persuasion

Beth BarnesDec 24, 2021, 1:48 AM
68 points
15 comments31 min readLW link

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

Tom LieberumDec 24, 2021, 6:05 PM
16 points
2 comments3 min readLW link

Mechanis­tic In­ter­pretabil­ity for the MLP Lay­ers (rough early thoughts)

MadHatterDec 24, 2021, 7:24 AM
11 points
2 comments1 min readLW link
(www.youtube.com)

My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
50 points
4 comments28 min readLW link

Re­in­force­ment Learn­ing Study Group

Kay KozaronekDec 26, 2021, 11:11 PM
20 points
9 comments1 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskeyDec 28, 2021, 2:20 AM
10 points
0 comments6 min readLW link
(www.bayesianinvestor.com)

Re­v­erse-en­g­ineer­ing us­ing interpretability

Beth BarnesDec 29, 2021, 11:21 PM
21 points
1 comment5 min readLW link

Coun­terex­am­ples to some ELK proposals

paulfchristianoDec 31, 2021, 5:05 PM
50 points
10 comments7 min readLW link

We Choose To Align AI

johnswentworthJan 1, 2022, 8:06 PM
259 points
15 comments3 min readLW link

Why don’t we just, like, try and build safe AGI?

SunJan 1, 2022, 11:24 PM
0 points
4 comments1 min readLW link

[Question] Tag for AI al­ign­ment?

Alex_AltairJan 2, 2022, 6:55 PM
7 points
6 comments1 min readLW link

How an alien the­ory of mind might be unlearnable

Stuart_ArmstrongJan 3, 2022, 11:16 AM
26 points
35 comments5 min readLW link

Shad­ows Of The Com­ing Race (1879)

CapybasiliskJan 3, 2022, 3:55 PM
49 points
4 comments7 min readLW link

Ap­ply for re­search in­tern­ships at ARC!

paulfchristianoJan 3, 2022, 8:26 PM
61 points
0 comments1 min readLW link

Promis­ing posts on AF that have fallen through the cracks

Evan R. MurphyJan 4, 2022, 3:39 PM
33 points
6 comments2 min readLW link

You can’t un­der­stand hu­man agency with­out un­der­stand­ing amoeba agency

ShmiJan 6, 2022, 4:42 AM
19 points
36 comments1 min readLW link

Satisf-AI: A Route to Re­duc­ing Risks From AI

harsimonyJan 6, 2022, 2:34 AM
4 points
1 comment4 min readLW link
(harsimony.wordpress.com)

Im­por­tance of fore­sight eval­u­a­tions within ELK

Jonathan UesatoJan 6, 2022, 3:34 PM
25 points
1 comment10 min readLW link

Goal-di­rect­ed­ness: my baseline beliefs

Morgan_RogersJan 8, 2022, 1:09 PM
21 points
3 comments3 min readLW link

The Un­rea­son­able Fea­si­bil­ity Of Play­ing Chess Un­der The Influence

JanJan 12, 2022, 11:09 PM
29 points
17 comments13 min readLW link
(universalprior.substack.com)

New year, new re­search agenda post

Charlie SteinerJan 12, 2022, 5:58 PM
29 points
4 comments16 min readLW link

Value ex­trap­o­la­tion par­tially re­solves sym­bol grounding

Stuart_ArmstrongJan 12, 2022, 4:30 PM
24 points
10 comments1 min readLW link

2020 Re­view Article

VaniverJan 14, 2022, 4:58 AM
74 points
3 comments7 min readLW link

The Greedy Doc­tor Prob­lem… turns out to be rele­vant to the ELK prob­lem?

JanJan 14, 2022, 11:58 AM
33 points
10 comments14 min readLW link
(universalprior.substack.com)

PIBBSS Fel­low­ship: Bounty for Refer­rals & Dead­line Extension

Anna GajdovaJan 17, 2022, 4:23 PM
7 points
0 comments1 min readLW link

Differ­ent way clas­sifiers can be diverse

Stuart_ArmstrongJan 17, 2022, 4:30 PM
10 points
5 comments2 min readLW link

Scalar re­ward is not enough for al­igned AGI

Peter VamplewJan 17, 2022, 9:02 PM
15 points
3 comments11 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM
72 points
15 comments3 min readLW link

Thought Ex­per­i­ments Provide a Third Anchor

jsteinhardtJan 18, 2022, 4:00 PM
44 points
20 comments4 min readLW link
(bounded-regret.ghost.io)

An­chor Weights for ML

jsteinhardtJan 20, 2022, 4:20 PM
17 points
2 comments2 min readLW link
(bounded-regret.ghost.io)

Es­ti­mat­ing train­ing com­pute of Deep Learn­ing models

Jan 20, 2022, 4:12 PM
37 points
4 comments1 min readLW link

Shar­ing Pow­er­ful AI Models

apcJan 21, 2022, 11:57 AM
6 points
4 comments1 min readLW link

[AN #171]: Disagree­ments be­tween al­ign­ment “op­ti­mists” and “pes­simists”

Rohin ShahJan 21, 2022, 6:30 PM
32 points
1 comment7 min readLW link
(mailchi.mp)

A one-ques­tion Tur­ing test for GPT-3

Jan 22, 2022, 6:17 PM
84 points
23 comments5 min readLW link

ML Sys­tems Will Have Weird Failure Modes

jsteinhardtJan 26, 2022, 1:40 AM
54 points
8 comments6 min readLW link
(bounded-regret.ghost.io)

Search Is All You Need

blake8086Jan 25, 2022, 11:13 PM
33 points
13 comments3 min readLW link

Aligned AI Needs Slack

ShmiJan 26, 2022, 9:29 AM
23 points
10 comments1 min readLW link

Em­piri­cal Find­ings Gen­er­al­ize Sur­pris­ingly Far

jsteinhardtFeb 1, 2022, 10:30 PM
46 points
0 comments6 min readLW link
(bounded-regret.ghost.io)

OpenAI Solves (Some) For­mal Math Olympiad Problems

Michaël TrazziFeb 2, 2022, 9:49 PM
77 points
26 comments2 min readLW link

Ob­served pat­terns around ma­jor tech­nolog­i­cal advancements

Richard Korzekwa Feb 3, 2022, 12:30 AM
45 points
15 comments11 min readLW link
(aiimpacts.org)

Paper­clip­pers, s-risks, hope

superads91Feb 4, 2022, 7:03 PM
13 points
17 comments1 min readLW link

AI Wri­teup Part 1

SNlFeb 4, 2022, 9:16 PM
8 points
1 comment18 min readLW link

Align­ment ver­sus AI Alignment

Alex FlintFeb 4, 2022, 10:59 PM
87 points
15 comments22 min readLW link

Ca­pa­bil­ity Phase Tran­si­tion Examples

gwernFeb 8, 2022, 3:32 AM
39 points
1 comment1 min readLW link
(www.reddit.com)

A broad basin of at­trac­tion around hu­man val­ues?

Wei DaiApr 12, 2022, 5:15 AM
105 points
16 comments2 min readLW link

Ap­pendix: More Is Differ­ent In Other Domains

jsteinhardtFeb 8, 2022, 4:00 PM
12 points
1 comment4 min readLW link
(bounded-regret.ghost.io)

[In­tro to brain-like-AGI safety] 2. “Learn­ing from scratch” in the brain

Steven ByrnesFeb 2, 2022, 1:22 PM
43 points
12 comments25 min readLW link

Bet­ter im­pos­si­bil­ity re­sult for un­bounded utilities

paulfchristianoFeb 9, 2022, 6:10 AM
29 points
24 comments5 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogaoFeb 10, 2022, 6:56 AM
30 points
3 comments1 min readLW link
(eaidata.bmk.sh)

In­fer­ring util­ity func­tions from lo­cally non-tran­si­tive preferences

JanFeb 10, 2022, 10:33 AM
28 points
15 comments8 min readLW link
(universalprior.substack.com)

A sum­mary of al­ign­ing nar­rowly su­per­hu­man models

guguFeb 10, 2022, 6:26 PM
8 points
0 comments8 min readLW link

Idea: build al­ign­ment dataset for very ca­pa­ble models

Quintin PopeFeb 12, 2022, 7:30 PM
9 points
2 comments3 min readLW link

Goal-di­rect­ed­ness: ex­plor­ing explanations

Morgan_RogersFeb 14, 2022, 4:20 PM
13 points
3 comments18 min readLW link

Is ELK enough? Di­a­mond, Ma­trix and Child AI

adamShimiFeb 15, 2022, 2:29 AM
17 points
10 comments4 min readLW link

What Does The Nat­u­ral Ab­strac­tion Frame­work Say About ELK?

johnswentworthFeb 15, 2022, 2:27 AM
34 points
0 comments6 min readLW link

Some Hacky ELK Ideas

johnswentworthFeb 15, 2022, 2:27 AM
34 points
8 comments5 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

Feb 15, 2022, 6:16 PM
15 points
4 comments8 min readLW link

Be­com­ing Stronger as Episte­mol­o­gist: Introduction

adamShimiFeb 15, 2022, 6:15 AM
29 points
2 comments4 min readLW link

REPL’s: a type sig­na­ture for agents

scottviteriFeb 15, 2022, 10:57 PM
23 points
5 comments2 min readLW link

REPL’s and ELK

scottviteriFeb 17, 2022, 1:14 AM
9 points
4 comments1 min readLW link

[Link] Eric Sch­midt’s new AI2050 Fund

Aryeh EnglanderFeb 16, 2022, 9:21 PM
32 points
3 comments2 min readLW link

Align­ment re­searchers, how use­ful is ex­tra com­pute for you?

Lauro LangoscoFeb 19, 2022, 3:35 PM
7 points
4 comments1 min readLW link

[Question] 2 (naive?) ideas for alignment

Jonathan MoregårdFeb 20, 2022, 7:01 PM
3 points
1 comment1 min readLW link

The Big Pic­ture Of Align­ment (Talk Part 1)

johnswentworthFeb 21, 2022, 5:49 AM
98 points
35 comments1 min readLW link
(www.youtube.com)

[Question] Fa­vorite /​ most ob­scure re­search on un­der­stand­ing DNNs?

Vivek HebbarFeb 21, 2022, 5:49 AM
16 points
1 comment1 min readLW link

Two Challenges for ELK

derek shillerFeb 21, 2022, 5:49 AM
7 points
0 comments4 min readLW link

[Question] Do any AI al­ign­ment orgs hire re­motely?

RobertMFeb 21, 2022, 10:33 PM
24 points
9 comments2 min readLW link

More GPT-3 and sym­bol grounding

Stuart_ArmstrongFeb 23, 2022, 6:30 PM
21 points
7 comments3 min readLW link

Trans­former in­duc­tive bi­ases & RASP

Vivek HebbarFeb 24, 2022, 12:42 AM
15 points
4 comments1 min readLW link
(proceedings.mlr.press)

A com­ment on Ajeya Co­tra’s draft re­port on AI timelines

Matthew BarnettFeb 24, 2022, 12:41 AM
69 points
13 comments7 min readLW link

The Big Pic­ture Of Align­ment (Talk Part 2)

johnswentworthFeb 25, 2022, 2:53 AM
33 points
12 comments1 min readLW link
(www.youtube.com)

Trust-max­i­miz­ing AGI

Feb 25, 2022, 3:13 PM
7 points
26 comments9 min readLW link
(universalprior.substack.com)

IMO challenge bet with Eliezer

paulfchristianoFeb 26, 2022, 4:50 AM
162 points
25 comments3 min readLW link

New Speaker Series on AI Align­ment Start­ing March 3

Zechen ZhangFeb 26, 2022, 7:31 PM
7 points
1 comment1 min readLW link

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:50 PM
64 points
6 comments13 min readLW link
(www.neelnanda.io)

Shah and Yud­kowsky on al­ign­ment failures

Feb 28, 2022, 7:18 PM
83 points
38 comments91 min readLW link

ELK Thought Dump

abramdemskiFeb 28, 2022, 6:46 PM
58 points
18 comments17 min readLW link

Late 2021 MIRI Con­ver­sa­tions: AMA /​ Discussion

Rob BensingerFeb 28, 2022, 8:03 PM
119 points
208 comments1 min readLW link

[Question] What are the causal­ity effects of an agents pres­ence in a re­in­force­ment learn­ing environment

Jonas KgomoMar 1, 2022, 9:57 PM
0 points
2 comments1 min readLW link

Mus­ings on the Speed Prior

evhubMar 2, 2022, 4:04 AM
19 points
4 comments10 min readLW link

AI Perfor­mance on Hu­man Tasks

Asher EllisMar 3, 2022, 8:13 PM
58 points
3 comments21 min readLW link

In­tro­duc­ing my­self: Henry Lie­ber­man, MIT CSAIL, why­cantwe.org

Henry A LiebermanMar 3, 2022, 11:42 PM
−2 points
9 comments1 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donorMar 6, 2022, 6:43 PM
36 points
11 comments5 min readLW link

Why work at AI Im­pacts?

KatjaMar 6, 2022, 10:10 PM
50 points
7 comments13 min readLW link
(aiimpacts.org)

Per­sonal imi­ta­tion software

FlaglandbaseMar 7, 2022, 7:55 AM
6 points
6 comments1 min readLW link

[MLSN #3]: NeurIPS Safety Paper Roundup

Dan HMar 8, 2022, 3:17 PM
45 points
0 comments4 min readLW link

ELK prize results

Mar 9, 2022, 12:01 AM
130 points
50 comments21 min readLW link

[Question] Non-co­er­cive mo­ti­va­tion for al­ign­ment re­search?

Jonathan MoregårdMar 8, 2022, 8:50 PM
1 point
0 comments1 min readLW link

On pre­sent­ing the case for AI risk

Aryeh EnglanderMar 9, 2022, 1:41 AM
54 points
18 comments4 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

micMar 9, 2022, 3:14 PM
50 points
0 comments2 min readLW link

Deriv­ing Our World From Small Datasets

CapybasiliskMar 9, 2022, 12:34 AM
5 points
4 comments2 min readLW link

Value ex­trap­o­la­tion, con­cept ex­trap­o­la­tion, model splintering

Stuart_ArmstrongMar 8, 2022, 10:50 PM
14 points
1 comment2 min readLW link

The Proof of Doom

johnlawrenceaspdenMar 9, 2022, 7:37 PM
27 points
18 comments3 min readLW link

A Rephras­ing Of and Foot­note To An Embed­ded Agency Proposal

JoshuaOSHickmanMar 9, 2022, 6:13 PM
5 points
0 comments5 min readLW link

ELK Sub—Note-tak­ing in in­ter­nal rollouts

HoagyMar 9, 2022, 5:23 PM
6 points
0 comments5 min readLW link

[Question] Are there any im­pos­si­bil­ity the­o­rems for strong and safe AI?

David JohnstonMar 11, 2022, 1:41 AM
5 points
3 comments1 min readLW link

Com­pute Trends — Com­par­i­son to OpenAI’s AI and Compute

Mar 12, 2022, 6:09 PM
23 points
3 comments3 min readLW link

ELK con­test sub­mis­sion: route un­der­stand­ing through the hu­man ontology

Mar 14, 2022, 9:42 PM
21 points
2 comments2 min readLW link

Dual use of ar­tifi­cial-in­tel­li­gence-pow­ered drug discovery

VaniverMar 15, 2022, 2:52 AM
91 points
15 comments1 min readLW link
(www.nature.com)

[In­tro to brain-like-AGI safety] 8. Take­aways from neuro 1/​2: On AGI development

Steven ByrnesMar 16, 2022, 1:59 PM
41 points
2 comments15 min readLW link

Some (po­ten­tially) fund­able AI Safety Ideas

Logan RiggsMar 16, 2022, 12:48 PM
21 points
5 comments5 min readLW link

What do paradigm shifts look like?

leogaoMar 16, 2022, 7:17 PM
15 points
2 comments1 min readLW link

[Question] What is the equiv­a­lent of the “do” op­er­a­tor for finite fac­tored sets?

Chris van MerwijkMar 17, 2022, 8:05 AM
8 points
2 comments1 min readLW link

[Question] What to do af­ter in­vent­ing AGI?

elephantcrewMar 18, 2022, 10:30 PM
9 points
4 comments1 min readLW link

Goal-di­rect­ed­ness: im­perfect rea­son­ing, limited knowl­edge and in­ac­cu­rate beliefs

Morgan_RogersMar 19, 2022, 5:28 PM
4 points
1 comment21 min readLW link

Wargam­ing AGI Development

ryan_bMar 19, 2022, 5:59 PM
36 points
13 comments5 min readLW link

Ex­plor­ing Finite Fac­tored Sets with some toy examples

Thomas KehrenbergMar 19, 2022, 10:08 PM
36 points
1 comment9 min readLW link
(tm.kehrenberg.net)

Nat­u­ral Value Learning

Chris van MerwijkMar 20, 2022, 12:44 PM
7 points
10 comments4 min readLW link

Why will an AGI be ra­tio­nal?

azsantoskMar 21, 2022, 9:54 PM
4 points
8 comments2 min readLW link

We can­not di­rectly choose an AGI’s util­ity function

azsantoskMar 21, 2022, 10:08 PM
12 points
18 comments3 min readLW link

Progress Re­port 1: in­ter­pretabil­ity ex­per­i­ments & learn­ing, test­ing com­pres­sion hypotheses

Nathan Helm-BurgerMar 22, 2022, 8:12 PM
11 points
0 comments2 min readLW link

Les­sons After a Cou­ple Months of Try­ing to Do ML Research

KevinRoWangMar 22, 2022, 11:45 PM
68 points
8 comments6 min readLW link

Job Offer­ing: Help Com­mu­ni­cate Infrabayesianism

Mar 23, 2022, 6:35 PM
135 points
21 comments1 min readLW link

A sur­vey of tool use and work­flows in al­ign­ment research

Mar 23, 2022, 11:44 PM
43 points
5 comments1 min readLW link

Why Agent Foun­da­tions? An Overly Ab­stract Explanation

johnswentworthMar 25, 2022, 11:17 PM
247 points
54 comments8 min readLW link

[ASoT] Ob­ser­va­tions about ELK

leogaoMar 26, 2022, 12:42 AM
30 points
0 comments3 min readLW link

[Question] When peo­ple ask for your P(doom), do you give them your in­side view or your bet­ting odds?

Vivek HebbarMar 26, 2022, 11:08 PM
11 points
12 comments1 min readLW link

Com­pute Gover­nance: The Role of Com­mod­ity Hardware

JanMar 26, 2022, 10:08 AM
14 points
7 comments7 min readLW link
(universalprior.substack.com)

Agency and Coherence

David UdellMar 26, 2022, 7:25 PM
23 points
2 comments3 min readLW link

[ASoT] Some ways ELK could still be solv­able in practice

leogaoMar 27, 2022, 1:15 AM
26 points
1 comment2 min readLW link

[Question] Your spe­cific at­ti­tudes to­wards AI safety

Esben KranMar 27, 2022, 10:33 PM
8 points
22 comments1 min readLW link

[ASoT] Search­ing for con­se­quen­tial­ist structure

leogaoMar 27, 2022, 7:09 PM
25 points
2 comments4 min readLW link

Vaniver’s ELK Submission

VaniverMar 28, 2022, 9:14 PM
10 points
0 comments7 min readLW link

Towards a bet­ter cir­cuit prior: Im­prov­ing on ELK state-of-the-art

evhubMar 29, 2022, 1:56 AM
19 points
0 comments16 min readLW link

Strate­gies for differ­en­tial di­vul­ga­tion of key ideas in AI capability

azsantoskMar 29, 2022, 3:22 AM
8 points
0 comments6 min readLW link

[ASoT] Some thoughts about de­cep­tive mesaoptimization

leogaoMar 28, 2022, 9:14 PM
24 points
5 comments7 min readLW link

[Question] What would make you con­fi­dent that AGI has been achieved?

YitzMar 29, 2022, 11:02 PM
17 points
6 comments1 min readLW link

Progress Re­port 2

Nathan Helm-BurgerMar 30, 2022, 2:29 AM
4 points
1 comment1 min readLW link

[ASoT] Some thoughts about LM monologue limi­ta­tions and ELK

leogaoMar 30, 2022, 2:26 PM
10 points
0 comments2 min readLW link

Pro­ce­du­rally eval­u­at­ing fac­tual ac­cu­racy: a re­quest for research

Jacob_HiltonMar 30, 2022, 4:37 PM
24 points
2 comments6 min readLW link

No, EDT Did Not Get It Right All Along: Why the Coin Flip Creation Prob­lem Is Irrelevant

HeighnMar 30, 2022, 6:41 PM
6 points
6 comments3 min readLW link

ELK Com­pu­ta­tional Com­plex­ity: Three Levels of Difficulty

abramdemskiMar 30, 2022, 8:56 PM
46 points
9 comments7 min readLW link

[Link] Train­ing Com­pute-Op­ti­mal Large Lan­guage Models

nostalgebraistMar 31, 2022, 6:01 PM
50 points
23 comments1 min readLW link
(arxiv.org)

New­comb’s prob­lem is just a stan­dard time con­sis­tency problem

basil.halperinMar 31, 2022, 5:32 PM
12 points
6 comments12 min readLW link

The Calcu­lus of New­comb’s Problem

HeighnApr 1, 2022, 2:41 PM
3 points
6 comments2 min readLW link

New Scal­ing Laws for Large Lan­guage Models

1a3ornApr 1, 2022, 8:41 PM
223 points
21 comments5 min readLW link

In­ter­act­ing with a Boxed AI

aphyerApr 1, 2022, 10:42 PM
11 points
19 comments4 min readLW link

Op­ti­mal­ity is the tiger, and agents are its teeth

VeedracApr 2, 2022, 12:46 AM
197 points
31 comments16 min readLW link

[Question] How can a lay­man con­tribute to AI Align­ment efforts, given shorter timeline/​doomier sce­nar­ios?

AprilSRApr 2, 2022, 4:34 AM
13 points
5 comments1 min readLW link

AI Gover­nance across Slow/​Fast Take­off and Easy/​Hard Align­ment spectra

DavidmanheimApr 3, 2022, 7:45 AM
27 points
6 comments3 min readLW link

[Question] What are some ways in which we can die with more dig­nity?

Chris_LeongApr 3, 2022, 5:32 AM
14 points
19 comments1 min readLW link

[Question] Should we push for ban­ning mak­ing hiring de­ci­sions based on AI?

ChristianKlApr 3, 2022, 7:46 PM
10 points
6 comments1 min readLW link

Bayeswatch 9.5: Rest & Relaxation

lsusrApr 4, 2022, 1:13 AM
24 points
1 comment2 min readLW link

Bayeswatch 6.5: Therapy

lsusrApr 4, 2022, 1:20 AM
15 points
0 comments1 min readLW link

The­o­ries of Mo­du­lar­ity in the Biolog­i­cal Literature

Apr 4, 2022, 12:48 PM
47 points
13 comments7 min readLW link

Google’s new 540 billion pa­ram­e­ter lan­guage model

Matthew BarnettApr 4, 2022, 5:49 PM
108 points
83 comments1 min readLW link
(storage.googleapis.com)

Call For Distillers

johnswentworthApr 4, 2022, 6:25 PM
192 points
42 comments3 min readLW link

Is the scal­ing race fi­nally on?

p.b.Apr 4, 2022, 7:53 PM
24 points
0 comments2 min readLW link

Yud­kowsky Con­tra Chris­ti­ano on AI Take­off Speeds [Linkpost]

aogApr 5, 2022, 2:09 AM
18 points
0 comments11 min readLW link

[Cross-post] Half baked ideas: defin­ing and mea­sur­ing Ar­tifi­cial In­tel­li­gence sys­tem effectiveness

David JohnstonApr 5, 2022, 12:29 AM
2 points
0 comments7 min readLW link

[Question] Why is Toby Ord’s like­li­hood of hu­man ex­tinc­tion due to AI so low?

ChristianKlApr 5, 2022, 12:16 PM
8 points
9 comments1 min readLW link

Non-pro­gram­mers in­tro to AI for programmers

DustinApr 5, 2022, 6:12 PM
6 points
0 comments2 min readLW link

What Would A Fight Between Hu­man­ity And AGI Look Like?

johnswentworthApr 5, 2022, 8:03 PM
79 points
22 comments3 min readLW link

Su­per­vise Pro­cess, not Outcomes

Apr 5, 2022, 10:18 PM
119 points
8 comments10 min readLW link

AXRP Epi­sode 14 - In­fra-Bayesian Phys­i­cal­ism with Vanessa Kosoy

DanielFilanApr 5, 2022, 11:10 PM
23 points
9 comments52 min readLW link

[Question] What’s the prob­lem with hav­ing an AI al­ign it­self?

FinalFormal2Apr 6, 2022, 12:59 AM
0 points
3 comments1 min readLW link

What if we stopped mak­ing GPUs for a bit?

MrPointyApr 5, 2022, 11:02 PM
−3 points
2 comments1 min readLW link

Don’t die with dig­nity; in­stead play to your outs

Jeffrey LadishApr 6, 2022, 7:53 AM
243 points
58 comments5 min readLW link

What I Was Think­ing About Be­fore Alignment

johnswentworthApr 6, 2022, 4:08 PM
77 points
8 comments5 min readLW link

[Link] A min­i­mal vi­able product for alignment

janleikeApr 6, 2022, 3:38 PM
51 points
38 comments1 min readLW link

[Link] Why I’m ex­cited about AI-as­sisted hu­man feedback

janleikeApr 6, 2022, 3:37 PM
29 points
0 comments1 min readLW link

Test­ing PaLM prompts on GPT3

YitzApr 6, 2022, 5:21 AM
103 points
15 comments8 min readLW link

[ASoT] Some thoughts about im­perfect world modeling

leogaoApr 7, 2022, 3:42 PM
7 points
0 comments4 min readLW link

Truth­ful­ness, stan­dards and credibility

Joe CollmanApr 7, 2022, 10:31 AM
12 points
2 comments32 min readLW link

What if “friendly/​un­friendly” GAI isn’t a thing?

homunqApr 7, 2022, 4:54 PM
−1 points
4 comments2 min readLW link

Pro­duc­tive Mis­takes, Not Perfect Answers

adamShimiApr 7, 2022, 4:41 PM
95 points
11 comments6 min readLW link

Believ­able near-term AI disaster

DagonApr 7, 2022, 6:20 PM
8 points
2 comments2 min readLW link

How BoMAI Might fail

Donald HobsonApr 7, 2022, 3:32 PM
11 points
3 comments2 min readLW link

Deep­Mind: The Pod­cast—Ex­cerpts on AGI

WilliamKielyApr 7, 2022, 10:09 PM
75 points
10 comments5 min readLW link

AI Align­ment and Recognition

Chris_LeongApr 8, 2022, 5:39 AM
7 points
2 comments1 min readLW link

Re­v­erse (in­tent) al­ign­ment may al­low for safer Oracles

azsantoskApr 8, 2022, 2:48 AM
4 points
0 comments4 min readLW link

AIs should learn hu­man prefer­ences, not biases

Stuart_ArmstrongApr 8, 2022, 1:45 PM
10 points
1 comment1 min readLW link

[Question] Is there a pos­si­bil­ity that the up­com­ing scal­ing of data in lan­guage mod­els causes A.G.I.?

ArtMiApr 8, 2022, 6:56 AM
2 points
0 comments1 min readLW link

Differ­ent per­spec­tives on con­cept extrapolation

Stuart_ArmstrongApr 8, 2022, 10:42 AM
42 points
7 comments5 min readLW link

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not RelevantApr 8, 2022, 4:07 PM
112 points
165 comments4 min readLW link

Con­vinc­ing All Ca­pa­bil­ity Researchers

Logan RiggsApr 8, 2022, 5:40 PM
120 points
70 comments3 min readLW link

Lan­guage Model Tools for Align­ment Research

Logan RiggsApr 8, 2022, 5:32 PM
27 points
0 comments2 min readLW link

[Question] What would the cre­ation of al­igned AGI look like for us?

PerhapsApr 8, 2022, 6:05 PM
3 points
4 comments1 min readLW link

Take­aways From 3 Years Work­ing In Ma­chine Learning

George3d6Apr 8, 2022, 5:14 PM
34 points
10 comments11 min readLW link
(www.epistem.ink)

[Question] Can AI sys­tems have ex­tremely im­pres­sive out­puts and also not need to be al­igned be­cause they aren’t gen­eral enough or some­thing?

WilliamKielyApr 9, 2022, 6:03 AM
6 points
3 comments1 min readLW link

Why In­stru­men­tal Goals are not a big AI Safety Problem

Jonathan PaulsonApr 9, 2022, 12:10 AM
0 points
9 comments3 min readLW link

Emer­gent Ven­tures/​Sch­midt (new grantor for in­di­vi­d­ual re­searchers)

gwernApr 9, 2022, 2:41 PM
21 points
6 comments1 min readLW link
(marginalrevolution.com)

Strate­gies for keep­ing AIs nar­row in the short term

RossinApr 9, 2022, 4:42 PM
9 points
3 comments3 min readLW link

A con­crete bet offer to those with short AI timelines

Apr 9, 2022, 9:41 PM
195 points
104 comments4 min readLW link

Fi­nally En­ter­ing Alignment

Ulisse MiniApr 10, 2022, 5:01 PM
75 points
8 comments2 min readLW link

[Question] Does non-ac­cess to out­puts pre­vent re­cur­sive self-im­prove­ment?

Gunnar_ZarnckeApr 10, 2022, 6:37 PM
14 points
0 comments1 min readLW link

[Question] Con­vince me that hu­man­ity is as doomed by AGI as Yud­kowsky et al., seems to believe

YitzApr 10, 2022, 9:02 PM
91 points
142 comments2 min readLW link

[Question] Could we set a re­s­olu­tion/​stop­per for the up­per bound of the util­ity func­tion of an AI?

FinalFormal2Apr 11, 2022, 3:10 AM
−5 points
2 comments1 min readLW link

What can peo­ple not smart/​tech­ni­cal enough for AI re­search/​AI risk work do to re­duce AI-risk/​max­i­mize AI safety? (which is most peo­ple?)

Alex K. Chen (parrot)Apr 11, 2022, 2:05 PM
7 points
3 comments3 min readLW link

We should stop be­ing so con­fi­dent that AI co­or­di­na­tion is unlikely

trevorApr 11, 2022, 10:27 PM
14 points
7 comments1 min readLW link

The Reg­u­la­tory Op­tion: A re­sponse to near 0% sur­vival odds

Matthew LowensteinApr 11, 2022, 10:00 PM
45 points
21 comments6 min readLW link

[Question] How can I de­ter­mine that Elicit is not some weak AGI’s at­tempt at tak­ing over the world ?

Lucie PhilipponApr 12, 2022, 12:54 AM
5 points
3 comments1 min readLW link

[Question] Three ques­tions about mesa-optimizers

Eric NeymanApr 12, 2022, 2:58 AM
23 points
5 comments3 min readLW link

A Small Nega­tive Re­sult on Debate

Sam BowmanApr 12, 2022, 6:19 PM
42 points
11 comments1 min readLW link

The Peerless

Tamsin LeakeApr 13, 2022, 1:07 AM
18 points
2 comments1 min readLW link
(carado.moe)

Con­vinc­ing Peo­ple of Align­ment with Street Epistemology

Logan RiggsApr 12, 2022, 11:43 PM
54 points
4 comments3 min readLW link

[Question] “Frag­ility of Value” vs. LLMs

Not RelevantApr 13, 2022, 2:02 AM
32 points
32 comments1 min readLW link

How dath ilan co­or­di­nates around solv­ing alignment

Thomas KwaApr 13, 2022, 4:22 AM
46 points
37 comments5 min readLW link

[Question] What’s a good prob­a­bil­ity dis­tri­bu­tion fam­ily (e.g. “log-nor­mal”) to use for AGI timelines?

David Scott Krueger (formerly: capybaralet)Apr 13, 2022, 4:45 AM
9 points
12 comments1 min readLW link

Take­off speeds have a huge effect on what it means to work on AI x-risk

BuckApr 13, 2022, 5:38 PM
117 points
25 comments2 min readLW link

De­sign, Im­ple­ment and Verify

rwallaceApr 13, 2022, 6:14 PM
32 points
13 comments4 min readLW link

[Question] What to in­clude in a guest lec­ture on ex­is­ten­tial risks from AI?

Aryeh EnglanderApr 13, 2022, 5:03 PM
20 points
9 comments1 min readLW link

A Quick Guide to Con­fronting Doom

RubyApr 13, 2022, 7:30 PM
224 points
36 comments2 min readLW link

Ex­plor­ing toy neu­ral nets un­der node re­moval. Sec­tion 1.

Donald HobsonApr 13, 2022, 11:30 PM
12 points
7 comments8 min readLW link

[Question] Un­change­able Code pos­si­ble ?

AntonTimmerApr 14, 2022, 11:17 AM
7 points
9 comments1 min readLW link

How to be­come an AI safety researcher

peterbarnettApr 15, 2022, 11:41 AM
19 points
0 comments14 min readLW link

Early 2022 Paper Round-up

jsteinhardtApr 14, 2022, 8:50 PM
80 points
4 comments3 min readLW link
(bounded-regret.ghost.io)

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouartiApr 14, 2022, 8:28 PM
10 points
7 comments1 min readLW link

Pivotal acts from Math AIs

azsantoskApr 15, 2022, 12:25 AM
10 points
4 comments5 min readLW link

Refine: An In­cu­ba­tor for Con­cep­tual Align­ment Re­search Bets

adamShimiApr 15, 2022, 8:57 AM
123 points
13 comments4 min readLW link

My least fa­vorite thing

sudoApr 14, 2022, 10:33 PM
41 points
30 comments3 min readLW link

[Question] Con­strain­ing nar­row AI in a cor­po­rate setting

MaximumLibertyApr 15, 2022, 10:36 PM
28 points
4 comments1 min readLW link

Pop Cul­ture Align­ment Re­search and Taxes

JanApr 16, 2022, 3:45 PM
16 points
14 comments11 min readLW link
(universalprior.substack.com)

Org an­nounce­ment: [AC]RC

Vivek HebbarApr 17, 2022, 5:24 PM
79 points
12 comments1 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not RelevantApr 17, 2022, 10:27 PM
91 points
16 comments2 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Resources

Chris ScammellApr 18, 2022, 6:36 PM
139 points
7 comments17 min readLW link

Is “Con­trol” of a Su­per­in­tel­li­gence Pos­si­ble?

Mahdi ComplexApr 18, 2022, 4:03 PM
9 points
14 comments1 min readLW link

[Closed] Hiring a math­e­mat­i­cian to work on the learn­ing-the­o­retic AI al­ign­ment agenda

Vanessa KosoyApr 19, 2022, 6:44 AM
84 points
21 comments2 min readLW link

[Question] The two miss­ing core rea­sons why al­ign­ing at-least-par­tially su­per­hu­man AGI is hard

Joel BurgetApr 19, 2022, 5:15 PM
7 points
2 comments1 min readLW link

[Question] How does the world look like 10 years af­ter we have de­ployed an al­igned AGI?

mukashiApr 19, 2022, 11:34 AM
4 points
3 comments1 min readLW link

[Question] Clar­ifi­ca­tion on Defi­ni­tion of AGI

stanislawApr 19, 2022, 12:41 PM
0 points
1 comment1 min readLW link

[Question] What’s the Re­la­tion­ship Between “Hu­man Values” and the Brain’s Re­ward Sys­tem?

intersticeApr 19, 2022, 5:15 AM
36 points
16 comments1 min readLW link

De­cep­tive Agents are a Good Way to Do Things

David UdellApr 19, 2022, 6:04 PM
15 points
0 comments1 min readLW link

The Scale Prob­lem in AI

tailcalledApr 19, 2022, 5:46 PM
22 points
17 comments3 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_ArmstrongApr 19, 2022, 10:01 AM
12 points
2 comments1 min readLW link

“Pivotal Act” In­ten­tions: Nega­tive Con­se­quences and Fal­la­cious Arguments

Andrew_CritchApr 19, 2022, 8:25 PM
96 points
56 comments7 min readLW link

GPT-3 and con­cept extrapolation

Stuart_ArmstrongApr 20, 2022, 10:39 AM
19 points
28 comments1 min readLW link

[In­tro to brain-like-AGI safety] 12. Two paths for­ward: “Con­trol­led AGI” and “So­cial-in­stinct AGI”

Steven ByrnesApr 20, 2022, 12:58 PM
33 points
10 comments16 min readLW link

Pr­ereg­is­tra­tion: Air Con­di­tioner Test

johnswentworthApr 21, 2022, 7:48 PM
109 points
64 comments9 min readLW link

[Question] Choice := An­throp­ics un­cer­tainty? And po­ten­tial im­pli­ca­tions for agency

Antoine de ScorrailleApr 21, 2022, 4:38 PM
5 points
1 comment1 min readLW link

Un­der­stand­ing the Merg­ing of Opinions with In­creas­ing In­for­ma­tion theorem

ViktoriaMalyasovaApr 21, 2022, 2:13 PM
13 points
1 comment5 min readLW link

Early 2022 Paper Round-up (Part 2)

jsteinhardtApr 21, 2022, 11:40 PM
10 points
0 comments5 min readLW link
(bounded-regret.ghost.io)

[Question] What are the num­bers in mind for the su­per-short AGI timelines so many long-ter­mists are alarmed about?

Evan_GaensbauerApr 21, 2022, 11:32 PM
22 points
14 comments1 min readLW link

AI Will Multiply

harsimonyApr 22, 2022, 4:33 AM
13 points
4 comments1 min readLW link
(harsimony.wordpress.com)

Hu­man­ity as an en­tity: An al­ter­na­tive to Co­her­ent Ex­trap­o­lated Volition

Victor NovikovApr 22, 2022, 12:48 PM
0 points
2 comments4 min readLW link

[ASoT] Con­se­quen­tial­ist mod­els as a su­per­set of mesaoptimizers

leogaoApr 23, 2022, 5:57 PM
36 points
2 comments4 min readLW link

Skil­ling-up in ML Eng­ineer­ing for Align­ment: re­quest for comments

Apr 23, 2022, 3:11 PM
19 points
0 comments1 min readLW link

[Question] Want­ing to change what you want

MithrandirApr 23, 2022, 4:23 AM
−1 points
1 comment1 min readLW link

Progress Re­port 5: ty­ing it together

Nathan Helm-BurgerApr 23, 2022, 9:07 PM
10 points
0 comments2 min readLW link

Cal­ling for Stu­dent Sub­mis­sions: AI Safety Distil­la­tion Contest

ArisApr 24, 2022, 1:53 AM
48 points
15 comments4 min readLW link

Ex­am­in­ing Evolu­tion as an Up­per Bound for AGI Timelines

meanderingmooseApr 24, 2022, 7:08 PM
5 points
1 comment9 min readLW link

AI safety rais­ing aware­ness re­sources bleg

iivonenApr 24, 2022, 5:13 PM
6 points
1 comment1 min readLW link

In­tu­itions about solv­ing hard problems

Richard_NgoApr 25, 2022, 3:29 PM
92 points
23 comments6 min readLW link

[Re­quest for Distil­la­tion] Co­her­ence of Distributed De­ci­sions With Differ­ent In­puts Im­plies Conditioning

johnswentworthApr 25, 2022, 5:01 PM
22 points
14 comments2 min readLW link

dalle2 comments

nostalgebraistApr 26, 2022, 5:30 AM
183 points
13 comments13 min readLW link
(nostalgebraist.tumblr.com)

Make a neu­ral net­work in ~10 minutes

Arjun YadavApr 26, 2022, 5:24 AM
8 points
0 comments4 min readLW link
(arjunyadav.net)

Law-Fol­low­ing AI 1: Se­quence In­tro­duc­tion and Structure

CullenApr 27, 2022, 5:26 PM
16 points
10 comments9 min readLW link

Law-Fol­low­ing AI 2: In­tent Align­ment + Su­per­in­tel­li­gence → Lawless AI (By De­fault)

CullenApr 27, 2022, 5:27 PM
5 points
2 comments6 min readLW link

Law-Fol­low­ing AI 3: Lawless AI Agents Un­der­mine Sta­bi­liz­ing Agreements

CullenApr 27, 2022, 5:30 PM
2 points
2 comments3 min readLW link

If you’re very op­ti­mistic about ELK then you should be op­ti­mistic about outer alignment

Sam MarksApr 27, 2022, 7:30 PM
17 points
8 comments3 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

KakiliApr 27, 2022, 10:07 PM
10 points
2 comments8 min readLW link

The Speed + Sim­plic­ity Prior is prob­a­bly anti-deceptive

Yonadav ShavitApr 27, 2022, 7:30 PM
30 points
29 comments12 min readLW link

Slides: Po­ten­tial Risks From Ad­vanced AI

Aryeh EnglanderApr 28, 2022, 2:15 AM
7 points
0 comments1 min readLW link

How Might an Align­ment At­trac­tor Look like?

ShmiApr 28, 2022, 6:46 AM
47 points
15 comments2 min readLW link

Naive com­ments on AGIlignment

EricfApr 28, 2022, 1:08 AM
2 points
4 comments1 min readLW link

[Question] Is al­ign­ment pos­si­ble?

ShayApr 28, 2022, 9:18 PM
0 points
5 comments1 min readLW link

Learn­ing the smooth prior

Apr 29, 2022, 9:10 PM
31 points
0 comments12 min readLW link

[Linkpost] New multi-modal Deep­mind model fus­ing Chin­chilla with images and videos

p.b.Apr 30, 2022, 3:47 AM
53 points
18 comments1 min readLW link

Note-Tak­ing with­out Hid­den Messages

HoagyApr 30, 2022, 11:15 AM
7 points
1 comment4 min readLW link

[Question] Why hasn’t deep learn­ing gen­er­ated sig­nifi­cant eco­nomic value yet?

Alex_AltairApr 30, 2022, 8:27 PM
112 points
95 comments2 min readLW link

What is the solu­tion to the Align­ment prob­lem?

AlgonApr 30, 2022, 11:19 PM
24 points
2 comments1 min readLW link

[Linkpost] Value ex­trac­tion via lan­guage model abduction

Paul BricmanMay 1, 2022, 7:11 PM
4 points
3 comments1 min readLW link
(paulbricman.com)

ELK shaving

Miss Aligned AIMay 1, 2022, 9:05 PM
6 points
1 comment1 min readLW link

So has AI con­quered Bridge ?

Ponder StibbonsMay 2, 2022, 3:01 PM
16 points
2 comments14 min readLW link

In­for­ma­tion se­cu­rity con­sid­er­a­tions for AI and the long term future

May 2, 2022, 8:54 PM
74 points
6 comments10 min readLW link

Is evolu­tion­ary in­fluence the mesa ob­jec­tive that we’re in­ter­ested in?

David JohnstonMay 3, 2022, 1:18 AM
3 points
2 comments5 min readLW link

Var­i­ous Align­ment Strate­gies (and how likely they are to work)

Logan ZoellnerMay 3, 2022, 4:54 PM
73 points
34 comments11 min readLW link

In­tro­duc­ing the ML Safety Schol­ars Program

May 4, 2022, 4:01 PM
73 points
2 comments3 min readLW link

Franken­stein: A Modern AGI

SableMay 5, 2022, 4:16 PM
9 points
10 comments9 min readLW link

[Question] What is bias in al­ign­ment terms?

Jonas KgomoMay 4, 2022, 9:35 PM
0 points
2 comments1 min readLW link

Ethan Ca­ballero on Pri­vate Scal­ing Progress

Michaël TrazziMay 5, 2022, 6:32 PM
62 points
1 comment2 min readLW link
(theinsideview.github.io)

Ap­ply to the sec­ond iter­a­tion of the ML for Align­ment Boot­camp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2]

BuckMay 6, 2022, 4:23 AM
68 points
0 comments6 min readLW link

The case for be­com­ing a black-box in­ves­ti­ga­tor of lan­guage models

BuckMay 6, 2022, 2:35 PM
118 points
19 comments3 min readLW link

Get­ting GPT-3 to pre­dict Me­tac­u­lus questions

MathiasKBMay 6, 2022, 6:01 AM
68 points
8 comments2 min readLW link

But What’s Your *New Align­ment In­sight,* out of a Fu­ture-Text­book Para­graph?

David UdellMay 7, 2022, 3:10 AM
24 points
18 comments5 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe CarlsmithMay 8, 2022, 3:50 AM
20 points
1 comment29 min readLW link

A Bird’s Eye View of the ML Field [Prag­matic AI Safety #2]

May 9, 2022, 5:18 PM
126 points
5 comments35 min readLW link

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

May 9, 2022, 5:06 PM
70 points
1 comment6 min readLW link

Jobs: Help scale up LM al­ign­ment re­search at NYU

Sam BowmanMay 9, 2022, 2:12 PM
60 points
1 comment1 min readLW link

When is AI safety re­search harm­ful?

NathanBarnardMay 9, 2022, 6:19 PM
2 points
0 comments8 min readLW link

AI Align­ment YouTube Playlists

May 9, 2022, 9:33 PM
29 points
4 comments1 min readLW link

Ex­am­in­ing Arm­strong’s cat­e­gory of gen­er­al­ized models

Morgan_RogersMay 10, 2022, 9:07 AM
14 points
0 comments7 min readLW link

An In­side View of AI Alignment

Ansh RadhakrishnanMay 11, 2022, 2:16 AM
31 points
2 comments2 min readLW link

[Question] What are your recom­men­da­tions for tech­ni­cal AI al­ign­ment pod­casts?

Evan_GaensbauerMay 11, 2022, 9:52 PM
5 points
4 comments1 min readLW link

Deep­mind’s Gato: Gen­er­al­ist Agent

Daniel KokotajloMay 12, 2022, 4:01 PM
164 points
61 comments1 min readLW link

“A Gen­er­al­ist Agent”: New Deep­Mind Publication

1a3ornMay 12, 2022, 3:30 PM
79 points
43 comments1 min readLW link

A ten­ta­tive di­alogue with a Friendly-boxed-su­per-AGI on brain uploads

Ramiro P.May 12, 2022, 7:40 PM
1 point
12 comments4 min readLW link

Pos­i­tive out­comes un­der an un­al­igned AGI takeover

YitzMay 12, 2022, 7:45 AM
19 points
12 comments3 min readLW link

The Last Paperclip

Logan ZoellnerMay 12, 2022, 7:25 PM
57 points
15 comments17 min readLW link

RLHF

Ansh RadhakrishnanMay 12, 2022, 9:18 PM
16 points
5 comments5 min readLW link

[Question] What to do when start­ing a busi­ness in an im­mi­nent-AGI world?

ryan_bMay 12, 2022, 9:07 PM
25 points
7 comments1 min readLW link

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

May 13, 2022, 12:17 PM
145 points
35 comments9 min readLW link

“Tech com­pany sin­gu­lar­i­ties”, and steer­ing them to re­duce x-risk

Andrew_CritchMay 13, 2022, 5:24 PM
73 points
12 comments4 min readLW link

Against Time in Agent Models

johnswentworthMay 13, 2022, 7:55 PM
50 points
12 comments3 min readLW link

Frame for Take-Off Speeds to in­form com­pute gov­er­nance & scal­ing alignment

Logan RiggsMay 13, 2022, 10:23 PM
15 points
2 comments2 min readLW link

Align­ment as Constraints

Logan RiggsMay 13, 2022, 10:07 PM
10 points
0 comments2 min readLW link

Fermi es­ti­ma­tion of the im­pact you might have work­ing on AI safety

Fabien RogerMay 13, 2022, 5:49 PM
6 points
0 comments1 min readLW link

An ob­ser­va­tion about Hub­inger et al.’s frame­work for learned optimization

carboniferous_umbraculum May 13, 2022, 4:20 PM
33 points
9 comments8 min readLW link

Thoughts on AI Safety Camp

Charlie SteinerMay 13, 2022, 7:16 AM
24 points
7 comments7 min readLW link

Clar­ify­ing the con­fu­sion around in­ner alignment

Rauno ArikeMay 13, 2022, 11:05 PM
27 points
0 comments11 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenzMay 14, 2022, 4:01 PM
34 points
0 comments1 min readLW link

The AI Count­down Clock

River LewisMay 15, 2022, 6:37 PM
40 points
27 comments2 min readLW link
(heytraveler.substack.com)

Sur­viv­ing Au­toma­tion In The 21st Cen­tury—Part 1

George3d6May 15, 2022, 7:16 PM
27 points
17 comments8 min readLW link
(www.epistem.ink)

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimonyMay 15, 2022, 11:05 PM
57 points
28 comments1 min readLW link

Op­ti­miza­tion at a Distance

johnswentworthMay 16, 2022, 5:58 PM
78 points
13 comments4 min readLW link

[Question] To what ex­tent is your AGI timeline bi­modal or oth­er­wise “bumpy”?

jchanMay 16, 2022, 5:42 PM
13 points
2 comments1 min readLW link

Proxy mis­speci­fi­ca­tion and the ca­pa­bil­ities vs. value learn­ing race

Sam MarksMay 16, 2022, 6:58 PM
19 points
1 comment4 min readLW link

How to in­vest in ex­pec­ta­tion of AGI?

JakobovskiMay 17, 2022, 11:03 AM
3 points
4 comments1 min readLW link

[In­tro to brain-like-AGI safety] 15. Con­clu­sion: Open prob­lems, how to help, AMA

Steven ByrnesMay 17, 2022, 3:11 PM
81 points
11 comments14 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

May 17, 2022, 3:26 PM
25 points
0 comments3 min readLW link

What are the pos­si­ble tra­jec­to­ries of an AGI/​ASI world?

JakobovskiMay 17, 2022, 1:28 PM
0 points
2 comments1 min readLW link

Max­ent and Ab­strac­tions: Cur­rent Best Arguments

johnswentworthMay 18, 2022, 7:54 PM
34 points
2 comments3 min readLW link

How to get into AI safety research

Stuart_ArmstrongMay 18, 2022, 6:05 PM
44 points
7 comments1 min readLW link

A bridge to Dath Ilan? Im­proved gov­er­nance on the crit­i­cal path to AI al­ign­ment.

Jackson WagnerMay 18, 2022, 3:51 PM
23 points
0 comments11 min readLW link

We have achieved Noob Gains in AI

phdeadMay 18, 2022, 8:56 PM
114 points
21 comments7 min readLW link

[Question] Why does gra­di­ent de­scent always work on neu­ral net­works?

MichaelDickensMay 20, 2022, 9:13 PM
15 points
11 comments1 min readLW link

How RL Agents Be­have When Their Ac­tions Are Mod­ified? [Distil­la­tion post]

PabloAMCMay 20, 2022, 6:47 PM
21 points
0 comments8 min readLW link

Over-digi­tal­iza­tion: A Pre­lude to Analo­gia (Chap­ter 6)

Justin BullockMay 20, 2022, 4:39 PM
3 points
0 comments13 min readLW link

Clar­ify­ing what ELK is try­ing to achieve

Towards_KeeperhoodMay 21, 2022, 7:34 AM
7 points
0 comments5 min readLW link

[Short ver­sion] In­for­ma­tion Loss --> Basin flatness

Vivek HebbarMay 21, 2022, 12:59 PM
11 points
0 comments1 min readLW link

In­for­ma­tion Loss --> Basin flatness

Vivek HebbarMay 21, 2022, 12:58 PM
47 points
31 comments7 min readLW link

What kinds of al­gorithms do multi-hu­man imi­ta­tors learn?

May 22, 2022, 2:27 PM
20 points
0 comments3 min readLW link

Are hu­man imi­ta­tors su­per­hu­man mod­els with ex­plicit con­straints on ca­pa­bil­ities?

Chris van MerwijkMay 22, 2022, 12:46 PM
41 points
3 comments1 min readLW link

Ad­ver­sar­ial at­tacks and op­ti­mal control

JanMay 22, 2022, 6:22 PM
16 points
7 comments8 min readLW link
(universalprior.substack.com)

CNN fea­ture vi­su­al­iza­tion in 50 lines of code

StefanHexMay 26, 2022, 11:02 AM
17 points
4 comments5 min readLW link

[Question] [Align­ment] Is there a cen­sus on who’s work­ing on what?

CedarMay 23, 2022, 3:33 PM
23 points
6 comments1 min readLW link

AXRP Epi­sode 15 - Nat­u­ral Ab­strac­tions with John Wentworth

DanielFilanMay 23, 2022, 5:40 AM
32 points
1 comment57 min readLW link

Why I’m Wor­ried About AI

peterbarnettMay 23, 2022, 9:13 PM
21 points
2 comments12 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

May 24, 2022, 12:00 AM
49 points
2 comments21 min readLW link

The No Free Lunch the­o­rems and their Razor

Adrià Garriga-alonsoMay 24, 2022, 6:40 AM
47 points
3 comments9 min readLW link

Google’s Ima­gen uses larger text encoder

Ben LivengoodMay 24, 2022, 9:55 PM
27 points
2 comments1 min readLW link

au­ton­omy: the miss­ing AGI in­gre­di­ent?

nostalgebraistMay 25, 2022, 12:33 AM
61 points
13 comments6 min readLW link

Paper: Teach­ing GPT3 to ex­press un­cer­tainty in words

Owain_EvansMay 31, 2022, 1:27 PM
96 points
7 comments4 min readLW link

Croe­sus, Cer­berus, and the mag­pies: a gen­tle in­tro­duc­tion to Elic­it­ing La­tent Knowledge

Alexandre VariengienMay 27, 2022, 5:58 PM
14 points
0 comments16 min readLW link

[Question] How much white col­lar work could be au­to­mated us­ing ex­ist­ing ML mod­els?

AMMay 26, 2022, 8:09 AM
25 points
4 comments1 min readLW link

The Poin­t­ers Prob­lem—Distilled

Nina PanicksseryMay 26, 2022, 10:44 PM
9 points
0 comments2 min readLW link

Iter­ated Distil­la­tion-Am­plifi­ca­tion, Gato, and Proto-AGI [Re-Ex­plained]

Gabe MMay 27, 2022, 5:42 AM
21 points
4 comments6 min readLW link

Boot­strap­ping Lan­guage Models

harsimonyMay 27, 2022, 7:43 PM
7 points
5 comments2 min readLW link

Un­der­stand­ing Selec­tion Theorems

adamkMay 28, 2022, 1:49 AM
35 points
3 comments7 min readLW link

[Question] What have been the ma­jor “triumphs” in the field of AI over the last ten years?

lcMay 28, 2022, 7:49 PM
35 points
10 comments1 min readLW link

[Question] Bayesian Per­sua­sion?

Karthik TadepalliMay 28, 2022, 5:52 PM
8 points
2 comments1 min readLW link

Distributed Decisions

johnswentworthMay 29, 2022, 2:43 AM
65 points
4 comments6 min readLW link

The Prob­lem With The Cur­rent State of AGI Definitions

YitzMay 29, 2022, 1:58 PM
40 points
22 comments8 min readLW link

Func­tional Anal­y­sis Read­ing Group

Ulisse MiniMay 28, 2022, 2:40 AM
4 points
0 comments1 min readLW link

[Question] Im­pact of ” ‘Let’s think step by step’ is all you need”?

yrimonJul 24, 2022, 8:59 PM
20 points
2 comments1 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

May 30, 2022, 8:25 PM
43 points
3 comments25 min readLW link

[Question] What is the state of Chi­nese AI re­search?

RatiosMay 31, 2022, 10:05 AM
34 points
17 comments1 min readLW link

The Brain That Builds Itself

JanMay 31, 2022, 9:42 AM
55 points
6 comments8 min readLW link
(universalprior.substack.com)

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterushMay 31, 2022, 10:03 PM
4 points
0 comments4 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru23Jun 1, 2022, 1:36 PM
5 points
0 comments7 min readLW link

Paradigms of AI al­ign­ment: com­po­nents and enablers

VikaJun 2, 2022, 6:19 AM
48 points
4 comments8 min readLW link

The Bio An­chors Forecast

Ansh RadhakrishnanJun 2, 2022, 1:32 AM
12 points
0 comments3 min readLW link

[MLSN #4]: Many New In­ter­pretabil­ity Papers, Vir­tual Logit Match­ing, Ra­tion­al­iza­tion Helps Robustness

Dan HJun 3, 2022, 1:20 AM
18 points
0 comments4 min readLW link

The pro­to­typ­i­cal catas­trophic AI ac­tion is get­ting root ac­cess to its datacenter

BuckJun 2, 2022, 11:46 PM
142 points
10 comments2 min readLW link

Ad­ver­sar­ial train­ing, im­por­tance sam­pling, and anti-ad­ver­sar­ial train­ing for AI whistleblowing

BuckJun 2, 2022, 11:48 PM
33 points
0 comments3 min readLW link

Deep Learn­ing Sys­tems Are Not Less In­ter­pretable Than Logic/​Prob­a­bil­ity/​Etc

johnswentworthJun 4, 2022, 5:41 AM
118 points
52 comments2 min readLW link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:11 PM
63 points
0 comments39 min readLW link

Noisy en­vi­ron­ment reg­u­late util­ity maximizers

Niclas KupperJun 5, 2022, 6:48 PM
4 points
0 comments7 min readLW link

Why agents are powerful

Daniel KokotajloJun 6, 2022, 1:37 AM
35 points
7 comments7 min readLW link

Why do some peo­ple try to make AGI?

TekhneMakreJun 6, 2022, 9:14 AM
14 points
7 comments3 min readLW link

Some ideas for fol­low-up pro­jects to Red­wood Re­search’s re­cent paper

JanBJun 6, 2022, 1:29 PM
10 points
0 comments7 min readLW link

Read­ing the ethi­cists 2: Hunt­ing for AI al­ign­ment papers

Charlie SteinerJun 6, 2022, 3:49 PM
21 points
1 comment7 min readLW link

DALL-E 2 - Unoffi­cial Nat­u­ral Lan­guage Image Edit­ing, Art Cri­tique Survey

bakztfutureJun 6, 2022, 6:27 PM
0 points
0 comments1 min readLW link
(bakztfuture.substack.com)

Think­ing about Broad Classes of Utility-like Functions

J BostockJun 7, 2022, 2:05 PM
7 points
0 comments4 min readLW link

Thoughts on For­mal­iz­ing Composition

Tom LieberumJun 7, 2022, 7:51 AM
13 points
0 comments7 min readLW link

“Pivotal Acts” means some­thing specific

RaemonJun 7, 2022, 9:56 PM
114 points
23 comments2 min readLW link

Why I don’t be­lieve in doom

mukashiJun 7, 2022, 11:49 PM
6 points
30 comments4 min readLW link

[Question] Has any­one ac­tu­ally tried to con­vince Terry Tao or other top math­e­mat­i­ci­ans to work on al­ign­ment?

P.Jun 8, 2022, 10:26 PM
52 points
49 comments4 min readLW link

To­day in AI Risk His­tory: The Ter­mi­na­tor (1984 film) was re­leased.

ImpassionataJun 9, 2022, 1:32 AM
−3 points
6 comments1 min readLW link

There’s prob­a­bly a trade­off be­tween AI ca­pa­bil­ity and safety, and we should act like it

David JohnstonJun 9, 2022, 12:17 AM
3 points
3 comments1 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofskyJun 9, 2022, 3:50 PM
168 points
29 comments17 min readLW link
(www.cold-takes.com)

[Question] If there was a mil­len­nium equiv­a­lent prize for AI al­ign­ment, what would the prob­lems be?

Yair HalberstadtJun 9, 2022, 4:56 PM
17 points
4 comments1 min readLW link

[Linkpost & Dis­cus­sion] AI Trained on 4Chan Be­comes ‘Hate Speech Ma­chine’ [and out­performs GPT-3 on Truth­fulQA Bench­mark?!]

YitzJun 9, 2022, 10:59 AM
16 points
5 comments2 min readLW link
(www.vice.com)

If no near-term al­ign­ment strat­egy, re­search should aim for the long-term

harsimonyJun 9, 2022, 7:10 PM
7 points
1 comment1 min readLW link

How Do Selec­tion The­o­rems Re­late To In­ter­pretabil­ity?

johnswentworthJun 9, 2022, 7:39 PM
57 points
14 comments3 min readLW link

Bureau­cracy of AIs

Logan ZoellnerJun 9, 2022, 11:03 PM
11 points
6 comments14 min readLW link

Tao, Kont­se­vich & oth­ers on HLAI in Math

intersticeJun 10, 2022, 2:25 AM
41 points
5 comments2 min readLW link
(www.youtube.com)

Open Prob­lems in AI X-Risk [PAIS #5]

Jun 10, 2022, 2:08 AM
50 points
3 comments36 min readLW link

[Question] why as­sume AGIs will op­ti­mize for fixed goals?

nostalgebraistJun 10, 2022, 1:28 AM
119 points
52 comments4 min readLW link

Progress Re­port 6: get the tool working

Nathan Helm-BurgerJun 10, 2022, 11:18 AM
4 points
0 comments2 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchinJun 10, 2022, 5:24 PM
10 points
2 comments1 min readLW link

[Question] Is AI Align­ment Im­pos­si­ble?

HeighnJun 10, 2022, 10:08 AM
3 points
3 comments1 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_AltairJun 10, 2022, 5:38 PM
21 points
4 comments8 min readLW link

[linkpost] The fi­nal AI bench­mark: BIG-bench

RomanSJun 10, 2022, 8:53 AM
30 points
19 comments1 min readLW link

[Question] Could Pa­tent-Trol­ling de­lay AI timelines?

Pablo RepettoJun 10, 2022, 2:53 AM
1 point
3 comments1 min readLW link

How fast can we perform a for­ward pass?

jsteinhardtJun 10, 2022, 11:30 PM
53 points
9 comments15 min readLW link
(bounded-regret.ghost.io)

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan CzechowskiJun 11, 2022, 9:41 AM
28 points
0 comments4 min readLW link

AGI Safety Com­mu­ni­ca­tions Initiative

inesJun 11, 2022, 5:34 PM
7 points
0 comments1 min readLW link

[Question] How much stupi­der than hu­mans can AI be and still kill us all through sheer num­bers and re­source ac­cess?

ShmiJun 12, 2022, 1:01 AM
11 points
12 comments1 min readLW link

A claim that Google’s LaMDA is sentient

Ben LivengoodJun 12, 2022, 4:18 AM
31 points
134 comments1 min readLW link

Let’s not name spe­cific AI labs in an ad­ver­sar­ial context

acylhalideJun 12, 2022, 5:38 PM
8 points
17 comments1 min readLW link

[Question] How much does cy­ber­se­cu­rity re­duce AI risk?

DarmaniJun 12, 2022, 10:13 PM
34 points
23 comments1 min readLW link

[Question] How are com­pute as­sets dis­tributed in the world?

Chris van MerwijkJun 12, 2022, 10:13 PM
29 points
7 comments1 min readLW link

The beau­tiful mag­i­cal en­chanted golden Dall-e Mini is underrated

p.b.Jun 13, 2022, 7:58 AM
14 points
0 comments1 min readLW link

Why so lit­tle AI risk on ra­tio­nal­ist-ad­ja­cent blogs?

Grant DemareeJun 13, 2022, 6:31 AM
46 points
23 comments8 min readLW link

[Question] What’s the “This AI is of moral con­cern.” fire alarm?

Quintin PopeJun 13, 2022, 8:05 AM
37 points
56 comments2 min readLW link

On A List of Lethalities

ZviJun 13, 2022, 12:30 PM
154 points
48 comments54 min readLW link
(thezvi.wordpress.com)

[Question] Can you MRI a deep learn­ing model?

Yair HalberstadtJun 13, 2022, 1:43 PM
3 points
3 comments1 min readLW link

What are some smaller-but-con­crete challenges re­lated to AI safety that are im­pact­ing peo­ple to­day?

nonzerosumJun 13, 2022, 5:36 PM
3 points
2 comments1 min readLW link

Con­ti­nu­ity Assumptions

Jan_KulveitJun 13, 2022, 9:31 PM
26 points
13 comments4 min readLW link

Crypto-fed Computation

aaguirreJun 13, 2022, 9:20 PM
22 points
7 comments7 min readLW link

Blake Richards on Why he is Skep­ti­cal of Ex­is­ten­tial Risk from AI

Michaël TrazziJun 14, 2022, 7:09 PM
41 points
12 comments4 min readLW link
(theinsideview.ai)

I ap­plied for a MIRI job in 2020. Here’s what hap­pened next.

ViktoriaMalyasovaJun 15, 2022, 7:37 PM
78 points
17 comments7 min readLW link

[Question] What are all the AI Align­ment and AI Safety Com­mu­ni­ca­tion Hubs?

Gunnar_ZarnckeJun 15, 2022, 4:16 PM
25 points
5 comments1 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_LeongJun 15, 2022, 8:33 AM
7 points
17 comments1 min readLW link

Will vague “AI sen­tience” con­cerns do more for AI safety than any­thing else we might do?

Aryeh EnglanderJun 14, 2022, 11:53 PM
12 points
1 comment1 min readLW link

“Brain en­thu­si­asts” in AI Safety

Jun 18, 2022, 9:59 AM
57 points
5 comments10 min readLW link
(universalprior.substack.com)

FYI: I’m work­ing on a book about the threat of AGI/​ASI for a gen­eral au­di­ence. I hope it will be of value to the cause and the community

Darren McKeeJun 15, 2022, 6:08 PM
40 points
17 comments2 min readLW link

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8resJun 15, 2022, 1:10 PM
253 points
48 comments10 min readLW link

AI Risk, as Seen on Snapchat

dkirmaniJun 16, 2022, 7:31 PM
23 points
8 comments1 min readLW link

Hu­mans are very re­li­able agents

alyssavanceJun 16, 2022, 10:02 PM
248 points
35 comments3 min readLW link

A pos­si­ble AI-in­oc­u­la­tion due to early “robot up­ris­ing”

ShmiJun 16, 2022, 9:21 PM
16 points
2 comments1 min readLW link

A trans­parency and in­ter­pretabil­ity tech tree

evhubJun 16, 2022, 11:44 PM
136 points
10 comments19 min readLW link

Value ex­trap­o­la­tion vs Wireheading

Stuart_ArmstrongJun 17, 2022, 3:02 PM
16 points
1 comment1 min readLW link

#SAT with Ten­sor Networks

Adam JermynJun 17, 2022, 1:20 PM
4 points
0 comments2 min readLW link

wrap­per-minds are the enemy

nostalgebraistJun 17, 2022, 1:58 AM
92 points
36 comments8 min readLW link

[Question] Is there an unified way to make sense of ai failure modes?

walking_mushroomJun 17, 2022, 6:00 PM
3 points
1 comment1 min readLW link

Quan­tify­ing Gen­eral Intelligence

JasonBrownJun 17, 2022, 9:57 PM
9 points
6 comments13 min readLW link

Pivotal out­comes and pivotal processes

Andrew_CritchJun 17, 2022, 11:43 PM
79 points
32 comments4 min readLW link

Scott Aaron­son is join­ing OpenAI to work on AI safety

peterbarnettJun 18, 2022, 4:06 AM
117 points
31 comments1 min readLW link
(scottaaronson.blog)

Can DALL-E un­der­stand sim­ple ge­om­e­try?

Isaac KingJun 18, 2022, 4:37 AM
25 points
2 comments1 min readLW link

Spe­cific prob­lems with spe­cific an­i­mal com­par­i­sons for AI policy

trevorJun 19, 2022, 1:27 AM
3 points
1 comment2 min readLW link

Agent level parallelism

Johannes C. MayerJun 18, 2022, 8:56 PM
6 points
5 comments1 min readLW link

[Link-post] On Defer­ence and Yud­kowsky’s AI Risk Estimates

bmgJun 19, 2022, 5:25 PM
27 points
7 comments1 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristianoJun 19, 2022, 7:15 PM
777 points
205 comments20 min readLW link

Let’s See You Write That Cor­rigi­bil­ity Tag

Eliezer YudkowskyJun 19, 2022, 9:11 PM
109 points
67 comments1 min readLW link

Are we there yet?

theflowerpotJun 20, 2022, 11:19 AM
2 points
2 comments1 min readLW link

On cor­rigi­bil­ity and its basin

Donald HobsonJun 20, 2022, 4:33 PM
16 points
3 comments2 min readLW link

Parable: The Bomb that doesn’t Explode

Lone PineJun 20, 2022, 4:41 PM
14 points
5 comments2 min readLW link

Key Papers in Lan­guage Model Safety

aogJun 20, 2022, 3:00 PM
37 points
1 comment22 min readLW link

Sur­vey re AIS/​LTism office in NYC

RyanCareyJun 20, 2022, 7:21 PM
7 points
0 comments1 min readLW link

An AI defense-offense sym­me­try thesis

Chris van MerwijkJun 20, 2022, 10:01 AM
10 points
9 comments3 min readLW link

[Question] How easy/​fast is it for a AGI to hack com­put­ers/​a hu­man brain?

Noosphere89Jun 21, 2022, 12:34 AM
0 points
1 comment1 min readLW link

A Toy Model of Gra­di­ent Hacking

Oam PatelJun 20, 2022, 10:01 PM
25 points
7 comments4 min readLW link

De­bat­ing Whether AI is Con­scious Is A Dis­trac­tion from Real Problems

sidhe_theyJun 21, 2022, 4:56 PM
4 points
10 comments1 min readLW link
(techpolicy.press)

The in­or­di­nately slow spread of good AGI con­ver­sa­tions in ML

Rob BensingerJun 21, 2022, 4:09 PM
160 points
66 comments8 min readLW link

[Question] What is the differ­ence be­tween AI mis­al­ign­ment and bad pro­gram­ming?

puzzleGuzzleJun 21, 2022, 9:52 PM
6 points
2 comments1 min readLW link

Se­cu­rity Mind­set: Les­sons from 20+ years of Soft­ware Se­cu­rity Failures Rele­vant to AGI Alignment

elspoodJun 21, 2022, 11:55 PM
331 points
40 comments7 min readLW link

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas / Heather KrossJun 21, 2022, 11:23 PM
74 points
12 comments6 min readLW link
(www.thinkingmuchbetter.com)

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel NellessenJun 22, 2022, 5:59 PM
2 points
1 comment3 min readLW link
(snellessen.com)

Air Con­di­tioner Test Re­sults & Discussion

johnswentworthJun 22, 2022, 10:26 PM
80 points
38 comments6 min readLW link

Loose thoughts on AGI risk

YitzJun 23, 2022, 1:02 AM
7 points
3 comments1 min readLW link

[Question] What’s the con­tin­gency plan if we get AGI to­mor­row?

YitzJun 23, 2022, 3:10 AM
61 points
24 comments1 min readLW link

[Question] What are the best “policy” ap­proaches in wor­lds where al­ign­ment is difficult?

LHAJun 23, 2022, 1:53 AM
1 point
0 comments1 min readLW link

[Question] Is CIRL a promis­ing agenda?

Chris_LeongJun 23, 2022, 5:12 PM
25 points
12 comments1 min readLW link

Half-baked AI Safety ideas thread

Aryeh EnglanderJun 23, 2022, 4:11 PM
58 points
60 comments1 min readLW link

20 Cri­tiques of AI Safety That I Found on Twitter

dkirmaniJun 23, 2022, 7:23 PM
21 points
16 comments1 min readLW link

Linkpost: Robin Han­son—Why Not Wait On AI Risk?

Yair HalberstadtJun 24, 2022, 2:23 PM
41 points
14 comments1 min readLW link
(www.overcomingbias.com)

Raphaël Millière on Gen­er­al­iza­tion and Scal­ing Maximalism

Michaël TrazziJun 24, 2022, 6:18 PM
21 points
2 comments4 min readLW link
(theinsideview.ai)

[Question] Do al­ign­ment con­cerns ex­tend to pow­er­ful non-AI agents?

OzyrusJun 24, 2022, 6:26 PM
21 points
13 comments1 min readLW link

Depen­den­cies for AGI pessimism

YitzJun 24, 2022, 10:25 PM
6 points
4 comments1 min readLW link

What if the best path for a per­son who wants to work on AGI al­ign­ment is to join Face­book or Google?

dbaschJun 24, 2022, 9:23 PM
2 points
3 comments1 min readLW link

[Link] Ad­ver­sar­i­ally trained neu­ral rep­re­sen­ta­tions may already be as ro­bust as cor­re­spond­ing biolog­i­cal neu­ral representations

Gunnar_ZarnckeJun 24, 2022, 8:51 PM
35 points
9 comments1 min readLW link

AI-Writ­ten Cri­tiques Help Hu­mans No­tice Flaws

paulfchristianoJun 25, 2022, 5:22 PM
133 points
5 comments3 min readLW link
(openai.com)

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGodJun 25, 2022, 1:53 PM
5 points
3 comments6 min readLW link

[Question] Should any hu­man en­slave an AGI sys­tem?

AlignmentMirrorJun 25, 2022, 7:35 PM
−13 points
44 comments1 min readLW link

The Ba­sics of AGI Policy (Flowchart)

trevorJun 26, 2022, 2:01 AM
18 points
8 comments2 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
209 points
36 comments2 min readLW link

Robin Han­son asks “Why Not Wait On AI Risk?”

Gunnar_ZarnckeJun 26, 2022, 11:32 PM
22 points
4 comments1 min readLW link
(www.overcomingbias.com)

Epistemic mod­esty and how I think about AI risk

Aryeh EnglanderJun 27, 2022, 6:47 PM
22 points
4 comments4 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

Jun 27, 2022, 3:58 PM
166 points
14 comments7 min readLW link

Scott Aaron­son and Steven Pinker De­bate AI Scaling

LironJun 28, 2022, 4:04 PM
37 points
10 comments1 min readLW link
(scottaaronson.blog)

Four rea­sons I find AI safety emo­tion­ally compelling

Jun 28, 2022, 2:10 PM
38 points
3 comments4 min readLW link

Some al­ter­na­tive AI safety re­search projects

Michele CampoloJun 28, 2022, 2:09 PM
9 points
0 comments3 min readLW link

Assess­ing AlephAlphas Mul­ti­modal Model

p.b.Jun 28, 2022, 9:28 AM
30 points
5 comments3 min readLW link

Kurzge­sagt – The Last Hu­man (Youtube)

habrykaJun 29, 2022, 3:28 AM
54 points
7 comments1 min readLW link
(www.youtube.com)

Can We Align AI by Hav­ing It Learn Hu­man Prefer­ences? I’m Scared (sum­mary of last third of Hu­man Com­pat­i­ble)

apollonianbluesJun 29, 2022, 4:09 AM
19 points
3 comments6 min readLW link

Look­ing back on my al­ign­ment PhD

TurnTroutJul 1, 2022, 3:19 AM
287 points
60 comments11 min readLW link

Will Ca­pa­bil­ities Gen­er­al­ise More?

Ramana KumarJun 29, 2022, 5:12 PM
109 points
38 comments4 min readLW link

Gra­di­ent hack­ing: defi­ni­tions and examples

Richard_NgoJun 29, 2022, 9:35 PM
24 points
1 comment5 min readLW link

[Question] Cor­rect­ing hu­man er­ror vs do­ing ex­actly what you’re told—is there liter­a­ture on this in con­text of gen­eral sys­tem de­sign?

Jan CzechowskiJun 29, 2022, 9:30 PM
6 points
0 comments1 min readLW link

Most Func­tions Have Un­de­sir­able Global Extrema

En KepeigJun 30, 2022, 5:10 PM
8 points
5 comments3 min readLW link

$500 bounty for al­ign­ment con­test ideas

Orpheus16Jun 30, 2022, 1:56 AM
29 points
5 comments2 min readLW link

Quick sur­vey on AI al­ign­ment resources

frances_lorenzJun 30, 2022, 7:09 PM
14 points
0 comments1 min readLW link

[Linkpost] Solv­ing Quan­ti­ta­tive Rea­son­ing Prob­lems with Lan­guage Models

YitzJun 30, 2022, 6:58 PM
76 points
15 comments2 min readLW link
(storage.googleapis.com)

GPT-3 Catch­ing Fish in Morse Code

Megan KinnimentJun 30, 2022, 9:22 PM
110 points
27 comments8 min readLW link

Selec­tion pro­cesses for subagents

Ryan KiddJun 30, 2022, 11:57 PM
33 points
2 comments9 min readLW link

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

micJul 1, 2022, 3:59 AM
13 points
0 comments11 min readLW link

Safetywashing

Adam SchollJul 1, 2022, 11:56 AM
212 points
17 comments1 min readLW link

[Question] AGI al­ign­ment with what?

AlignmentMirrorJul 1, 2022, 10:22 AM
6 points
10 comments1 min readLW link

What Is The True Name of Mo­du­lar­ity?

Jul 1, 2022, 2:55 PM
21 points
10 comments12 min readLW link

AXRP Epi­sode 16 - Prepar­ing for De­bate AI with Ge­offrey Irving

DanielFilanJul 1, 2022, 10:20 PM
14 points
0 comments37 min readLW link

Agenty AGI – How Tempt­ing?

PeterMcCluskeyJul 1, 2022, 11:40 PM
21 points
3 comments5 min readLW link
(www.bayesianinvestor.com)

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan HJul 2, 2022, 12:09 AM
40 points
0 comments1 min readLW link
(arxiv.org)

Minerva

AlgonJul 1, 2022, 8:06 PM
35 points
6 comments2 min readLW link
(ai.googleblog.com)

Could an AI Align­ment Sand­box be use­ful?

Michael SoareverixJul 2, 2022, 5:06 AM
2 points
1 comment1 min readLW link

Goal-di­rect­ed­ness: tack­ling complexity

Morgan_RogersJul 2, 2022, 1:51 PM
8 points
0 comments38 min readLW link

[Question] Which one of these two aca­demic routes should I take to end up in AI Safety?

Martín SotoJul 3, 2022, 1:05 AM
5 points
2 comments1 min readLW link

Won­der and The Golden AI Rule

JeffreyKJul 3, 2022, 6:21 PM
0 points
4 comments6 min readLW link

De­ci­sion the­ory and dy­namic inconsistency

paulfchristianoJul 3, 2022, 10:20 PM
66 points
33 comments10 min readLW link
(sideways-view.com)

AI Fore­cast­ing: One Year In

jsteinhardtJul 4, 2022, 5:10 AM
131 points
12 comments6 min readLW link
(bounded-regret.ghost.io)

Re­mak­ing Effi­cien­tZero (as best I can)

HoagyJul 4, 2022, 11:03 AM
34 points
9 comments22 min readLW link

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.bartenJul 4, 2022, 9:47 PM
4 points
7 comments2 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_ArmstrongJul 4, 2022, 8:48 PM
80 points
12 comments4 min readLW link

An­thropic’s SoLU (Soft­max Lin­ear Unit)

Joel BurgetJul 4, 2022, 6:38 PM
15 points
1 comment4 min readLW link
(transformer-circuits.pub)

[AN #172] Sorry for the long hi­a­tus!

Rohin ShahJul 5, 2022, 6:20 AM
54 points
0 comments3 min readLW link
(mailchi.mp)

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworthJul 7, 2022, 2:07 AM
115 points
20 comments4 min readLW link

Race Along Rashomon Ridge

Jul 7, 2022, 3:20 AM
49 points
15 comments8 min readLW link

Con­fu­sions in My Model of AI Risk

peterbarnettJul 7, 2022, 1:05 AM
21 points
9 comments5 min readLW link

Safety con­sid­er­a­tions for on­line gen­er­a­tive modeling

Sam MarksJul 7, 2022, 6:31 PM
41 points
9 comments14 min readLW link

Re­in­force­ment Learner Wireheading

Nate ShowellJul 8, 2022, 5:32 AM
8 points
2 comments4 min readLW link

MATS Models

johnswentworthJul 9, 2022, 12:14 AM
84 points
5 comments16 min readLW link

Train first VS prune first in neu­ral net­works.

Donald HobsonJul 9, 2022, 3:53 PM
20 points
5 comments2 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah TekofskyJul 8, 2022, 10:13 PM
19 points
8 comments2 min readLW link

Re­port from a civ­i­liza­tional ob­server on Earth

owencbJul 9, 2022, 5:26 PM
49 points
12 comments6 min readLW link

Vi­su­al­iz­ing Neu­ral net­works, how to blame the bias

Donald HobsonJul 9, 2022, 3:52 PM
7 points
1 comment6 min readLW link

Com­ment on “Propo­si­tions Con­cern­ing Digi­tal Minds and So­ciety”

Zack_M_DavisJul 10, 2022, 5:48 AM
95 points
12 comments8 min readLW link

Hes­sian and Basin volume

Vivek HebbarJul 10, 2022, 6:59 AM
33 points
9 comments4 min readLW link

Check­sum Sen­sor Alignment

lsusrJul 11, 2022, 3:31 AM
12 points
2 comments1 min readLW link

The Align­ment Problem

lsusrJul 11, 2022, 3:03 AM
45 points
20 comments3 min readLW link

[Question] How do AI timelines af­fect how you live your life?

Quadratic ReciprocityJul 11, 2022, 1:54 PM
77 points
47 comments1 min readLW link

Three Min­i­mum Pivotal Acts Pos­si­ble by Nar­row AI

Michael SoareverixJul 12, 2022, 9:51 AM
0 points
4 comments2 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8resJul 12, 2022, 2:49 AM
258 points
81 comments29 min readLW link

[Question] What is wrong with this ap­proach to cor­rigi­bil­ity?

Rafael CosmanJul 12, 2022, 10:55 PM
7 points
8 comments1 min readLW link

MIRI Con­ver­sa­tions: Tech­nol­ogy Fore­cast­ing & Grad­u­al­ism (Distil­la­tion)

CallumMcDougallJul 13, 2022, 3:55 PM
31 points
1 comment20 min readLW link

[Question] Which AI Safety re­search agen­das are the most promis­ing?

Chris_LeongJul 13, 2022, 7:54 AM
27 points
6 comments1 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM
53 points
3 comments1 min readLW link
(github.com)

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam BowmanJul 13, 2022, 9:14 PM
40 points
6 comments5 min readLW link

[Question] How to im­press stu­dents with re­cent ad­vances in ML?

Charbel-RaphaëlJul 14, 2022, 12:03 AM
12 points
2 comments1 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM
94 points
8 comments36 min readLW link

Mus­ings on the Hu­man Ob­jec­tive Function

Michael SoareverixJul 15, 2022, 7:13 AM
3 points
0 comments3 min readLW link

Peter Singer’s first pub­lished piece on AI

FaiJul 15, 2022, 6:18 AM
20 points
5 comments1 min readLW link
(link.springer.com)

Notes on Learn­ing the Prior

carboniferous_umbraculum Jul 15, 2022, 5:28 PM
21 points
2 comments25 min readLW link

Pro­posed Orthog­o­nal­ity Th­e­ses #2-5

rjbgJul 14, 2022, 10:59 PM
6 points
0 comments2 min readLW link

A story about a du­plic­i­tous API

LiLiLiJul 15, 2022, 6:26 PM
2 points
0 comments1 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM
89 points
16 comments6 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM
38 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points
130 comments3 min readLW link

Align­ment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM
11 points
7 comments2 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM
14 points
6 comments1 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM
24 points
0 comments10 min readLW link

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM
303 points
60 comments6 min readLW link

Quan­tiliz­ers and Gen­er­a­tive Models

Adam JermynJul 18, 2022, 4:32 PM
24 points
5 comments4 min readLW link

Train­ing goals for large lan­guage models

Johannes TreutleinJul 18, 2022, 7:09 AM
26 points
5 comments19 min readLW link

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM
20 points
0 comments1 min readLW link
(epochai.org)

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM
310 points
89 comments84 min readLW link

At what point will we know if Eliezer’s pre­dic­tions are right or wrong?

anonymous123456Jul 18, 2022, 10:06 PM
5 points
6 comments1 min readLW link

A daily rou­tine I do for my AI safety re­search work

scasperJul 19, 2022, 9:58 PM
15 points
7 comments1 min readLW link

Pit­falls with Proofs

scasperJul 19, 2022, 10:21 PM
19 points
21 comments8 min readLW link

Which sin­gu­lar­ity schools plus the no sin­gu­lar­ity school was right?

Noosphere89Jul 23, 2022, 3:16 PM
9 points
27 comments9 min readLW link

Defin­ing Op­ti­miza­tion in a Deeper Way Part 3

J BostockJul 20, 2022, 10:06 PM
8 points
0 comments2 min readLW link

[AN #173] Re­cent lan­guage model re­sults from DeepMind

Rohin ShahJul 21, 2022, 2:30 AM
37 points
9 comments8 min readLW link
(mailchi.mp)

[Question] How much to op­ti­mize for the short-timelines sce­nario?

SoerenMindJul 21, 2022, 10:47 AM
19 points
3 comments1 min readLW link

Mak­ing DALL-E Count

DirectedEvolutionJul 22, 2022, 9:11 AM
23 points
12 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models with Restrictions

Adam JermynJul 21, 2022, 8:33 PM
16 points
4 comments8 min readLW link

Gen­eral al­ign­ment properties

TurnTroutAug 8, 2022, 11:40 PM
46 points
2 comments1 min readLW link

Which val­ues are sta­ble un­der on­tol­ogy shifts?

Richard_NgoJul 23, 2022, 2:40 AM
68 points
47 comments3 min readLW link
(thinkingcomplete.blogspot.com)

Try­ing out Prompt Eng­ineer­ing on TruthfulQA

Megan KinnimentJul 23, 2022, 2:04 AM
10 points
0 comments8 min readLW link

Sym­bolic dis­til­la­tion, Diffu­sion, En­tropy, Repli­ca­tors, Agents, oh my (a mid-low qual­ity think­ing out loud post)

the gears to ascensionJul 23, 2022, 9:13 PM
2 points
2 comments6 min readLW link

Eaves­drop­ping on Aliens: A Data De­cod­ing Challenge

anonymousaisafetyJul 24, 2022, 4:35 AM
44 points
9 comments4 min readLW link

How much should we worry about mesa-op­ti­miza­tion challenges?

sudoJul 25, 2022, 3:56 AM
4 points
13 comments2 min readLW link

[Question] Does agent foun­da­tions cover all fu­ture ML sys­tems?

Jonas HallgrenJul 25, 2022, 1:17 AM
2 points
0 comments1 min readLW link

[Question] How op­ti­mistic should we be about AI figur­ing out how to in­ter­pret it­self?

oh54321Jul 25, 2022, 10:09 PM
3 points
1 comment1 min readLW link

Ac­tive In­fer­ence as a for­mal­i­sa­tion of in­stru­men­tal convergence

Roman LeventovJul 26, 2022, 5:55 PM
6 points
2 comments3 min readLW link
(direct.mit.edu)

«Boundaries» Se­quence (In­dex Post)

Andrew_CritchJul 26, 2022, 7:12 PM
23 points
1 comment1 min readLW link

Mo­ral strate­gies at differ­ent ca­pa­bil­ity levels

Richard_NgoJul 27, 2022, 6:50 PM
95 points
14 comments5 min readLW link
(thinkingcomplete.blogspot.com)

Prin­ci­ples of Pri­vacy for Align­ment Research

johnswentworthJul 27, 2022, 7:53 PM
68 points
30 comments7 min readLW link

Seek­ing beta read­ers who are ig­no­rant of biol­ogy but knowl­edge­able about AI safety

Holly_ElmoreJul 27, 2022, 11:02 PM
10 points
6 comments1 min readLW link

Defin­ing Op­ti­miza­tion in a Deeper Way Part 4

J BostockJul 28, 2022, 5:02 PM
7 points
0 comments5 min readLW link

An­nounc­ing the AI Safety Field Build­ing Hub, a new effort to provide AISFB pro­jects, men­tor­ship, and funding

Vael GatesJul 28, 2022, 9:29 PM
49 points
3 comments6 min readLW link

Distil­la­tion Con­test—Re­sults and Recap

ArisJul 29, 2022, 5:40 PM
33 points
0 comments7 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimiJul 29, 2022, 6:59 PM
62 points
3 comments16 min readLW link

How trans­parency changed over time

ViktoriaMalyasovaJul 30, 2022, 4:36 AM
21 points
0 comments6 min readLW link

Trans­lat­ing be­tween La­tent Spaces

Jul 30, 2022, 3:25 AM
20 points
1 comment8 min readLW link

AGI-level rea­soner will ap­pear sooner than an agent; what the hu­man­ity will do with this rea­soner is critical

Roman LeventovJul 30, 2022, 8:56 PM
24 points
10 comments1 min readLW link

chin­chilla’s wild implications

nostalgebraistJul 31, 2022, 1:18 AM
366 points
114 comments11 min readLW link

Tech­ni­cal AI Align­ment Study Group

Eric KAug 1, 2022, 6:33 PM
5 points
0 comments1 min readLW link

[Question] Which in­tro-to-AI-risk text would you recom­mend to...

SherrinfordAug 1, 2022, 9:36 AM
12 points
1 comment1 min readLW link

Two-year up­date on my per­sonal AI timelines

Ajeya CotraAug 2, 2022, 11:07 PM
287 points
60 comments16 min readLW link

What are the Red Flags for Neu­ral Net­work Suffer­ing? - Seeds of Science call for reviewers

rogersbaconAug 2, 2022, 10:37 PM
24 points
5 comments1 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhubAug 3, 2022, 10:56 PM
18 points
0 comments14 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_FriedrichAug 3, 2022, 7:17 PM
1 point
0 comments1 min readLW link
(forms.gle)

High Reli­a­bil­ity Orgs, and AI Companies

RaemonAug 4, 2022, 5:45 AM
73 points
6 comments12 min readLW link

In­ter­pretabil­ity isn’t Free

Joel BurgetAug 4, 2022, 3:02 PM
10 points
1 comment2 min readLW link

[Question] AI al­ign­ment: Would a lazy self-preser­va­tion in­stinct be suffi­cient?

BrainFrogAug 4, 2022, 5:53 PM
−1 points
4 comments1 min readLW link

[Question] What drives progress, the­ory or ap­pli­ca­tion?

lberglundAug 5, 2022, 1:14 AM
5 points
1 comment1 min readLW link

The Prag­mas­cope Idea

johnswentworthAug 4, 2022, 9:52 PM
55 points
19 comments3 min readLW link

$20K In Boun­ties for AI Safety Public Materials

Aug 5, 2022, 2:52 AM
68 points
7 comments6 min readLW link

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworthAug 5, 2022, 7:23 PM
73 points
48 comments6 min readLW link

Rant on Prob­lem Fac­tor­iza­tion for Alignment

johnswentworthAug 5, 2022, 7:23 PM
73 points
48 comments6 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

Aug 6, 2022, 2:46 AM
69 points
6 comments7 min readLW link

Why I Am Skep­ti­cal of AI Reg­u­la­tion as an X-Risk Miti­ga­tion Strategy

A RayAug 6, 2022, 5:46 AM
31 points
14 comments2 min readLW link

My ad­vice on find­ing your own path

A RayAug 6, 2022, 4:57 AM
34 points
3 comments3 min readLW link

A De­cep­tively Sim­ple Ar­gu­ment in fa­vor of Prob­lem Factorization

Logan ZoellnerAug 6, 2022, 5:32 PM
3 points
4 comments1 min readLW link

[Question] Can we get full au­dio for Eliezer’s con­ver­sa­tion with Sam Har­ris?

JakubKAug 7, 2022, 8:35 PM
30 points
8 comments1 min readLW link

How Deadly Will Roughly-Hu­man-Level AGI Be?

David UdellAug 8, 2022, 1:59 AM
12 points
6 comments1 min readLW link

Broad Bas­ins and Data Compression

Aug 8, 2022, 8:33 PM
29 points
6 comments7 min readLW link

En­cul­tured AI Pre-plan­ning, Part 1: En­abling New Benchmarks

Aug 8, 2022, 10:44 PM
62 points
2 comments6 min readLW link

En­cul­tured AI, Part 1 Ap­pendix: Rele­vant Re­search Examples

Aug 8, 2022, 10:44 PM
11 points
1 comment7 min readLW link

Disagree­ments about Align­ment: Why, and how, we should try to solve them

ojorgensenAug 9, 2022, 6:49 PM
8 points
1 comment16 min readLW link

[Question] Many Gods re­fu­ta­tion and In­stru­men­tal Goals. (Proper one)

aditya malikAug 9, 2022, 11:59 AM
0 points
15 comments1 min readLW link

[Question] Is it pos­si­ble to find ven­ture cap­i­tal for AI re­search org with strong safety fo­cus?

AnonResearchAug 9, 2022, 4:12 PM
6 points
1 comment1 min readLW link

Us­ing GPT-3 to aug­ment hu­man intelligence

Henrik KarlssonAug 10, 2022, 3:54 PM
48 points
7 comments18 min readLW link
(escapingflatland.substack.com)

Emer­gent Abil­ities of Large Lan­guage Models [Linkpost]

aogAug 10, 2022, 6:02 PM
25 points
2 comments1 min readLW link
(arxiv.org)

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

Aug 10, 2022, 6:14 PM
26 points
30 comments11 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_NgoAug 10, 2022, 10:46 PM
93 points
13 comments27 min readLW link

How much al­ign­ment data will we need in the long run?

Jacob_HiltonAug 10, 2022, 9:39 PM
34 points
15 comments4 min readLW link

Thoughts on the good reg­u­la­tor theorem

JonasMossAug 11, 2022, 12:08 PM
8 points
0 comments4 min readLW link

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Aug 11, 2022, 5:45 PM
164 points
56 comments13 min readLW link

[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTroutAug 11, 2022, 10:22 PM
76 points
41 comments2 min readLW link

Dis­sected boxed AI

Nathan1123Aug 12, 2022, 2:37 AM
−8 points
2 comments1 min readLW link

Steelmin­ing via Analogy

Paul BricmanAug 13, 2022, 9:59 AM
24 points
0 comments2 min readLW link
(paulbricman.com)

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

Aug 12, 2022, 3:17 PM
71 points
3 comments3 min readLW link
(vkrakovna.wordpress.com)

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM
85 points
49 comments1 min readLW link

Timelines ex­pla­na­tion post part 1 of ?

Nathan Helm-BurgerAug 12, 2022, 4:13 PM
10 points
1 comment2 min readLW link

A lit­tle play­ing around with Blen­der­bot3

Nathan Helm-BurgerAug 12, 2022, 4:06 PM
9 points
0 comments1 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM
364 points
34 comments14 min readLW link

the In­su­lated Goal-Pro­gram idea

Tamsin LeakeAug 13, 2022, 9:57 AM
39 points
3 comments2 min readLW link
(carado.moe)

goal-pro­gram bricks

Tamsin LeakeAug 13, 2022, 10:08 AM
27 points
2 comments2 min readLW link
(carado.moe)

How I think about alignment

Linda LinseforsAug 13, 2022, 10:01 AM
30 points
11 comments5 min readLW link

Refine’s First Blog Post Day

adamShimiAug 13, 2022, 10:23 AM
55 points
3 comments1 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimiAug 13, 2022, 10:01 AM
30 points
1 comment2 min readLW link

An ex­tended rocket al­ign­ment analogy

rememberAug 13, 2022, 6:22 PM
25 points
3 comments4 min readLW link

Cul­ti­vat­ing Valiance

Shoshannah TekofskyAug 13, 2022, 6:47 PM
35 points
4 comments4 min readLW link

Evolu­tion is a bad anal­ogy for AGI: in­ner alignment

Quintin PopeAug 13, 2022, 10:15 PM
52 points
6 comments8 min readLW link

A brief note on Sim­plic­ity Bias

carboniferous_umbraculum Aug 14, 2022, 2:05 AM
16 points
0 comments4 min readLW link

Seek­ing In­terns/​RAs for Mechanis­tic In­ter­pretabil­ity Projects

Neel NandaAug 15, 2022, 7:11 AM
61 points
0 comments2 min readLW link

Ex­treme Security

lcAug 15, 2022, 12:11 PM
39 points
4 comments5 min readLW link

On Prefer­ence Ma­nipu­la­tion in Re­ward Learn­ing Processes

Felix HofstätterAug 15, 2022, 7:32 PM
8 points
0 comments4 min readLW link

Limits of Ask­ing ELK if Models are Deceptive

Oam PatelAug 15, 2022, 8:44 PM
6 points
2 comments4 min readLW link

What Makes an Idea Un­der­stand­able? On Ar­chi­tec­turally and Cul­turally Nat­u­ral Ideas.

Aug 16, 2022, 2:09 AM
17 points
2 comments16 min readLW link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni AngelouAug 16, 2022, 4:49 AM
10 points
0 comments5 min readLW link

Un­der­stand­ing differ­ences be­tween hu­mans and in­tel­li­gence-in-gen­eral to build safe AGI

Florian_DietzAug 16, 2022, 8:27 AM
7 points
8 comments1 min readLW link

Au­ton­omy as tak­ing re­spon­si­bil­ity for refer­ence maintenance

Ramana KumarAug 17, 2022, 12:50 PM
52 points
3 comments5 min readLW link

Thoughts on ‘List of Lethal­ities’

Alex Lawsen Aug 17, 2022, 6:33 PM
25 points
0 comments10 min readLW link

Hu­man Mimicry Mainly Works When We’re Already Close

johnswentworthAug 17, 2022, 6:41 PM
68 points
16 comments5 min readLW link

The Core of the Align­ment Prob­lem is...

Aug 17, 2022, 8:07 PM
58 points
10 comments9 min readLW link

Con­crete Ad­vice for Form­ing In­side Views on AI Safety

Neel NandaAug 17, 2022, 10:02 PM
18 points
6 comments10 min readLW link

An­nounc­ing En­cul­tured AI: Build­ing a Video Game

Aug 18, 2022, 2:16 AM
103 points
26 comments4 min readLW link

An­nounc­ing the Distil­la­tion for Align­ment Practicum (DAP)

Aug 18, 2022, 7:50 PM
21 points
3 comments3 min readLW link

Align­ment’s phlo­gis­ton

Eleni AngelouAug 18, 2022, 10:27 PM
10 points
2 comments2 min readLW link

[Question] Are lan­guage mod­els close to the su­per­hu­man level in philos­o­phy?

Roman LeventovAug 19, 2022, 4:43 AM
5 points
2 comments2 min readLW link

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark XuAug 19, 2022, 7:41 PM
84 points
4 comments15 min readLW link

Refine’s Se­cond Blog Post Day

adamShimiAug 20, 2022, 1:01 PM
19 points
0 comments1 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM
23 points
1 comment2 min readLW link

Re­duc­ing Good­hart: An­nounce­ment, Ex­ec­u­tive Summary

Charlie SteinerAug 20, 2022, 9:49 AM
14 points
0 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon FischerAug 21, 2022, 5:13 PM
26 points
3 comments8 min readLW link

Beyond Hyperanthropomorphism

PointlessOneAug 21, 2022, 5:55 PM
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

AXRP Epi­sode 17 - Train­ing for Very High Reli­a­bil­ity with Daniel Ziegler

DanielFilanAug 21, 2022, 11:50 PM
16 points
0 comments34 min readLW link

[Question] What if we solve AI Safety but no one cares

142857Aug 22, 2022, 5:38 AM
18 points
5 comments1 min readLW link

Find­ing Goals in the World Model

Aug 22, 2022, 6:06 PM
55 points
8 comments13 min readLW link

[Question] AI Box Ex­per­i­ment: Are peo­ple still in­ter­ested?

DoubleAug 31, 2022, 3:04 AM
31 points
13 comments1 min readLW link

Stable Diffu­sion has been released

P.Aug 22, 2022, 7:42 PM
15 points
7 comments1 min readLW link
(stability.ai)

Dis­cus­sion on uti­liz­ing AI for alignment

eliflandAug 23, 2022, 2:36 AM
16 points
3 comments1 min readLW link
(www.foxy-scout.com)

It Looks Like You’re Try­ing To Take Over The Narrative

George3d6Aug 24, 2022, 1:36 PM
2 points
20 comments9 min readLW link
(www.epistem.ink)

Thoughts about OOD alignment

CatneeAug 24, 2022, 3:31 PM
11 points
10 comments2 min readLW link

Vingean Agency

abramdemskiAug 24, 2022, 8:08 PM
57 points
13 comments3 min readLW link

In­ter­species diplo­macy as a po­ten­tially pro­duc­tive lens on AGI alignment

Shariq HashmeAug 24, 2022, 5:59 PM
5 points
1 comment2 min readLW link

OpenAI’s Align­ment Plans

dkirmaniAug 24, 2022, 7:39 PM
60 points
17 comments5 min readLW link
(openai.com)

What Makes A Good Mea­sure­ment De­vice?

johnswentworthAug 24, 2022, 10:45 PM
35 points
7 comments2 min readLW link

Eval­u­at­ing OpenAI’s al­ign­ment plans us­ing train­ing stories

ojorgensenAug 25, 2022, 4:12 PM
3 points
0 comments5 min readLW link

A Test for Lan­guage Model Consciousness

Ethan PerezAug 25, 2022, 7:41 PM
18 points
14 comments10 min readLW link

Seek­ing Stu­dent Sub­mis­sions: Edit Your Source Code Contest

ArisAug 26, 2022, 2:08 AM
28 points
5 comments2 min readLW link

Basin broad­ness de­pends on the size and num­ber of or­thog­o­nal features

Aug 27, 2022, 5:29 PM
34 points
21 comments6 min readLW link

Suffi­ciently many Godzillas as an al­ign­ment strategy

142857Aug 28, 2022, 12:08 AM
8 points
3 comments1 min readLW link

Ar­tifi­cial Mo­ral Ad­vi­sors: A New Per­spec­tive from Mo­ral Psychology

David GrossAug 28, 2022, 4:37 PM
25 points
1 comment1 min readLW link
(dl.acm.org)

First thing AI will do when it takes over is get fis­sion going

visiaxAug 28, 2022, 5:56 AM
−2 points
0 comments1 min readLW link

Robert Long On Why Ar­tifi­cial Sen­tience Might Matter

Michaël TrazziAug 28, 2022, 5:30 PM
26 points
5 comments5 min readLW link
(theinsideview.ai)

How Do AI Timelines Affect Ex­is­ten­tial Risk?

Stephen McAleeseAug 29, 2022, 4:57 PM
7 points
9 comments23 min readLW link

[Question] What is the best cri­tique of AI ex­is­ten­tial risk ar­gu­ments?

joshcAug 30, 2022, 2:18 AM
5 points
10 comments1 min readLW link

Can We Align a Self-Im­prov­ing AGI?

Peter S. ParkAug 30, 2022, 12:14 AM
8 points
5 comments11 min readLW link

LessWrong’s pre­dic­tion on apoc­a­lypse due to AGI (Aug 2022)

LetUsTalkAug 29, 2022, 6:46 PM
7 points
13 comments1 min readLW link

[Question] How can I rec­on­cile the two most likely re­quire­ments for hu­man­i­ties near-term sur­vival.

Erlja Jkdf.Aug 29, 2022, 6:46 PM
1 point
6 comments1 min readLW link

How likely is de­cep­tive al­ign­ment?

evhubAug 30, 2022, 7:34 PM
72 points
21 comments60 min readLW link

In­ner Align­ment via Superpowers

Aug 30, 2022, 8:01 PM
37 points
13 comments4 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni AngelouSep 3, 2022, 12:47 PM
9 points
0 comments3 min readLW link

New 80,000 Hours prob­lem pro­file on ex­is­ten­tial risks from AI

Benjamin HiltonAug 31, 2022, 5:36 PM
28 points
7 comments7 min readLW link
(80000hours.org)

Sur­vey of NLP Re­searchers: NLP is con­tribut­ing to AGI progress; ma­jor catas­tro­phe plausible

Sam BowmanAug 31, 2022, 1:39 AM
89 points
6 comments2 min readLW link

In­fra-Ex­er­cises, Part 1

Sep 1, 2022, 5:06 AM
49 points
9 comments1 min readLW link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni AngelouSep 1, 2022, 4:57 PM
7 points
8 comments3 min readLW link

A Sur­vey of Foun­da­tional Meth­ods in In­verse Re­in­force­ment Learning

adamkSep 1, 2022, 6:21 PM
16 points
0 comments12 min readLW link

AI Safety and Neigh­bor­ing Com­mu­ni­ties: A Quick-Start Guide, as of Sum­mer 2022

Sam BowmanSep 1, 2022, 7:15 PM
74 points
2 comments7 min readLW link

A Richly In­ter­ac­tive AGI Align­ment Chart

lisperatiSep 2, 2022, 12:44 AM
14 points
6 comments1 min readLW link

Re­place­ment for PONR concept

Daniel KokotajloSep 2, 2022, 12:09 AM
44 points
6 comments2 min readLW link

AI co­or­di­na­tion needs clear wins

evhubSep 1, 2022, 11:41 PM
134 points
15 comments2 min readLW link

Simulators

janusSep 2, 2022, 12:45 PM
472 points
103 comments44 min readLW link
(generative.ink)

Laz­i­ness in AI

Richard HenageSep 2, 2022, 5:04 PM
11 points
5 comments1 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubcSep 2, 2022, 6:14 PM
9 points
10 comments6 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhubSep 2, 2022, 9:57 PM
35 points
13 comments3 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno ArikeSep 3, 2022, 3:29 PM
10 points
2 comments1 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno ArikeSep 3, 2022, 3:29 PM
10 points
2 comments1 min readLW link

Bugs or Fea­tures?

qbolecSep 3, 2022, 7:04 AM
69 points
9 comments2 min readLW link

Pri­vate al­ign­ment re­search shar­ing and coordination

porbySep 4, 2022, 12:01 AM
54 points
10 comments5 min readLW link

AXRP Epi­sode 18 - Con­cept Ex­trap­o­la­tion with Stu­art Armstrong

DanielFilanSep 3, 2022, 11:12 PM
10 points
1 comment39 min readLW link

[Question] Help me find a good Hackathon sub­ject

Charbel-RaphaëlSep 4, 2022, 8:40 AM
6 points
18 comments1 min readLW link

How To Know What the AI Knows—An ELK Distillation

Fabien RogerSep 4, 2022, 12:46 AM
5 points
0 comments5 min readLW link

AI Gover­nance Needs Tech­ni­cal Work

MauSep 5, 2022, 10:28 PM
39 points
1 comment9 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil CrawfordSep 6, 2022, 5:17 PM
6 points
0 comments3 min readLW link

pro­gram searches

Tamsin LeakeSep 5, 2022, 8:04 PM
21 points
2 comments2 min readLW link
(carado.moe)

Alex Lawsen On Fore­cast­ing AI Progress

Michaël TrazziSep 6, 2022, 9:32 AM
18 points
0 comments2 min readLW link
(theinsideview.ai)

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM
8 points
1 comment2 min readLW link

AI-as­sisted list of ten con­crete al­ign­ment things to do right now

lemonhopeSep 7, 2022, 8:38 AM
8 points
5 comments4 min readLW link

Progress Re­port 7: mak­ing GPT go hur­rdurr in­stead of brrrrrrr

Nathan Helm-BurgerSep 7, 2022, 3:28 AM
21 points
0 comments4 min readLW link

Is there a list of pro­jects to get started with In­ter­pretabil­ity?

Franziska FischerSep 7, 2022, 4:27 AM
8 points
2 comments1 min readLW link

Un­der­stand­ing and avoid­ing value drift

TurnTroutSep 9, 2022, 4:16 AM
40 points
9 comments6 min readLW link

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel KokotajloSep 8, 2022, 4:41 AM
88 points
4 comments1 min readLW link
(github.blog)

Thoughts on AGI con­scious­ness /​ sentience

Steven ByrnesSep 8, 2022, 4:40 PM
37 points
37 comments6 min readLW link

What Should AI Owe To Us? Ac­countable and Aligned AI Sys­tems via Con­trac­tu­al­ist AI Alignment

xuanSep 8, 2022, 3:04 PM
30 points
15 comments25 min readLW link

A rough idea for solv­ing ELK: An ap­proach for train­ing gen­er­al­ist agents like GATO to make plans and de­scribe them to hu­mans clearly and hon­estly.

Michael SoareverixSep 8, 2022, 3:20 PM
2 points
2 comments2 min readLW link

Dath Ilan’s Views on Stop­gap Corrigibility

David UdellSep 22, 2022, 4:16 PM
50 points
17 comments13 min readLW link
(www.glowfic.com)

Most Peo­ple Start With The Same Few Bad Ideas

johnswentworthSep 9, 2022, 12:29 AM
161 points
30 comments3 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM
20 points
6 comments10 min readLW link

AI al­ign­ment with hu­mans… but with which hu­mans?

geoffreymillerSep 9, 2022, 6:21 PM
11 points
33 comments3 min readLW link

Eval­u­a­tions pro­ject @ ARC is hiring a re­searcher and a web­dev/​engineer

Beth BarnesSep 9, 2022, 10:46 PM
94 points
7 comments10 min readLW link

Swap and Scale

Stephen FowlerSep 9, 2022, 10:41 PM
17 points
3 comments1 min readLW link

Alex­aTM − 20 Billion Pa­ram­e­ter Model With Im­pres­sive Performance

MrThinkSep 9, 2022, 9:46 PM
5 points
0 comments1 min readLW link

[Fun][Link] Align­ment SMBC Comic

Gunnar_ZarnckeSep 9, 2022, 9:38 PM
7 points
2 comments1 min readLW link
(www.smbc-comics.com)

Path de­pen­dence in ML in­duc­tive biases

Sep 10, 2022, 1:38 AM
43 points
13 comments10 min readLW link

ethics and an­throp­ics of ho­mo­mor­phi­cally en­crypted computations

Tamsin LeakeSep 9, 2022, 10:49 AM
43 points
49 comments3 min readLW link
(carado.moe)

Join ASAP! (AI Safety Ac­countabil­ity Pro­gramme) 🚀

CallumMcDougallSep 10, 2022, 11:15 AM
19 points
0 comments3 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM
44 points
7 comments6 min readLW link

[Question] Why do Peo­ple Think In­tel­li­gence Will be “Easy”?

DragonGodSep 12, 2022, 5:32 PM
15 points
32 comments2 min readLW link

Black Box In­ves­ti­ga­tion Re­search Hackathon

Sep 12, 2022, 7:20 AM
9 points
4 comments2 min readLW link

Ar­gu­ment against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM
58 points
21 comments5 min readLW link
(twitter.com)

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul BricmanSep 12, 2022, 12:00 PM
6 points
0 comments14 min readLW link

Deep Q-Net­works Explained

Jay BaileySep 13, 2022, 12:01 PM
37 points
4 comments22 min readLW link

Git Re-Basin: Merg­ing Models mod­ulo Per­mu­ta­tion Sym­me­tries [Linkpost]

aogSep 14, 2022, 8:55 AM
21 points
0 comments2 min readLW link
(arxiv.org)

Some ideas for epis­tles to the AI ethicists

Charlie SteinerSep 14, 2022, 9:07 AM
19 points
0 comments4 min readLW link

The prob­lem with the me­dia pre­sen­ta­tion of “be­liev­ing in AI”

Roman LeventovSep 14, 2022, 9:05 PM
3 points
0 comments1 min readLW link

When is in­tent al­ign­ment suffi­cient or nec­es­sary to re­duce AGI con­flict?

Sep 14, 2022, 7:39 PM
32 points
0 comments9 min readLW link

When would AGIs en­gage in con­flict?

Sep 14, 2022, 7:38 PM
37 points
3 comments13 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc10014Sep 14, 2022, 8:37 PM
8 points
0 comments16 min readLW link

How should Deep­Mind’s Chin­chilla re­vise our AI fore­casts?

Cleo NardoSep 15, 2022, 5:54 PM
34 points
12 comments13 min readLW link

Ra­tional An­i­ma­tions’ Script Writ­ing Contest

WriterSep 15, 2022, 4:56 PM
22 points
1 comment3 min readLW link

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul BricmanSep 16, 2022, 2:45 PM
30 points
0 comments16 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

ShmiSep 16, 2022, 8:35 PM
31 points
58 comments1 min readLW link

Refine Blog­post Day #3: The short­forms I did write

Alexander Gietelink OldenzielSep 16, 2022, 9:03 PM
23 points
0 comments1 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmzSep 17, 2022, 3:55 AM
135 points
9 comments6 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimiSep 17, 2022, 5:03 PM
18 points
0 comments1 min readLW link

There is no royal road to alignment

Eleni AngelouSep 18, 2022, 3:33 AM
4 points
2 comments3 min readLW link

Prize and fast track to al­ign­ment re­search at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM
65 points
4 comments3 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

T431Sep 17, 2022, 10:27 PM
17 points
4 comments1 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

T431Sep 17, 2022, 10:27 PM
17 points
4 comments1 min readLW link

Ap­ply for men­tor­ship in AI Safety field-building

Orpheus16Sep 17, 2022, 7:06 PM
9 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Sparse tri­nary weighted RNNs as a path to bet­ter lan­guage model interpretability

Am8ryllisSep 17, 2022, 7:48 PM
19 points
13 comments3 min readLW link

Pod­casts on sur­veys, slower AI, AI ar­gu­ments, etc

KatjaGraceSep 18, 2022, 7:30 AM
13 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

In­ner al­ign­ment: what are we point­ing at?

lemonhopeSep 18, 2022, 11:09 AM
7 points
2 comments1 min readLW link

The In­ter-Agent Facet of AI Alignment

Michael OesterleSep 18, 2022, 8:39 PM
12 points
1 comment5 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 2

Quintin PopeSep 19, 2022, 1:41 PM
60 points
2 comments10 min readLW link

Safety timelines: How long will it take to solve al­ign­ment?

Sep 19, 2022, 12:53 PM
35 points
7 comments6 min readLW link
(forum.effectivealtruism.org)

Prize idea: Trans­mit MIRI and Eliezer’s worldviews

eliflandSep 19, 2022, 9:21 PM
45 points
18 comments2 min readLW link

A noob goes to the SERI MATS presentations

Lowell DenningsSep 19, 2022, 5:35 PM
26 points
0 comments5 min readLW link

How to make your CPU as fast as a GPU—Ad­vances in Spar­sity w/​ Nir Shavit

the gears to ascensionSep 20, 2022, 3:48 AM
0 points
0 comments27 min readLW link
(www.youtube.com)

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogaoSep 21, 2022, 12:36 AM
69 points
7 comments4 min readLW link

Here Be AGI Dragons

Eris DiscordiaSep 21, 2022, 10:28 PM
−2 points
0 comments5 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

DavidmanheimSep 21, 2022, 7:32 PM
13 points
0 comments1 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

Sep 22, 2022, 1:55 PM
13 points
0 comments27 min readLW link

[Question] AI career

ondragonSep 22, 2022, 3:48 AM
2 points
0 comments1 min readLW link

Sha­har Avin On How To Reg­u­late Ad­vanced AI Systems

Michaël TrazziSep 23, 2022, 3:46 PM
31 points
0 comments4 min readLW link
(theinsideview.ai)

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

geoffreymillerSep 23, 2022, 5:03 PM
10 points
2 comments10 min readLW link

In­tel­li­gence as a Platform

Robert KennedySep 23, 2022, 5:51 AM
10 points
5 comments3 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sep 23, 2022, 5:58 PM
123 points
26 comments33 min readLW link

Un­der what cir­cum­stances have gov­ern­ments can­cel­led AI-type sys­tems?

David GrossSep 23, 2022, 9:11 PM
7 points
1 comment1 min readLW link
(www.carnegieuktrust.org.uk)

[Question] I’m plan­ning to start cre­at­ing more write-ups sum­ma­riz­ing my thoughts on var­i­ous is­sues, mostly re­lated to AI ex­is­ten­tial safety. What do you want to hear my nu­anced takes on?

David Scott Krueger (formerly: capybaralet)Sep 24, 2022, 12:38 PM
9 points
10 comments1 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

AorouSep 24, 2022, 2:33 AM
7 points
6 comments3 min readLW link

AI coöper­a­tion is more pos­si­ble than you think

423175Sep 24, 2022, 9:26 PM
6 points
0 comments2 min readLW link

An Un­ex­pected GPT-3 De­ci­sion in a Sim­ple Gam­ble

casualphysicsenjoyerSep 25, 2022, 4:46 PM
8 points
4 comments1 min readLW link

Pri­ori­tiz­ing the Arts in re­sponse to AI automation

CaseySep 25, 2022, 2:25 AM
18 points
11 comments2 min readLW link

Plan­ning ca­pac­ity and daemons

lemonhopeSep 26, 2022, 12:15 AM
2 points
0 comments5 min readLW link

Re­call and Re­gur­gi­ta­tion in GPT2

Megan KinnimentOct 3, 2022, 7:35 PM
33 points
1 comment26 min readLW link

[MLSN #5]: Prize Compilation

Dan HSep 26, 2022, 9:55 PM
14 points
1 comment2 min readLW link

Loss of Align­ment is not the High-Order Bit for AI Risk

yieldthoughtSep 26, 2022, 9:16 PM
14 points
20 comments2 min readLW link

In­verse Scal­ing Prize: Round 1 Winners

Sep 26, 2022, 7:57 PM
88 points
16 comments4 min readLW link
(irmckenzie.co.uk)

[Question] Does the ex­is­tence of shared hu­man val­ues im­ply al­ign­ment is “easy”?

MorpheusSep 26, 2022, 6:01 PM
7 points
14 comments1 min readLW link

Why we’re not found­ing a hu­man-data-for-al­ign­ment org

Sep 27, 2022, 8:14 PM
80 points
5 comments29 min readLW link
(forum.effectivealtruism.org)

Be Not Afraid

Alex BeymanSep 27, 2022, 10:04 PM
8 points
0 comments6 min readLW link

Strange Loops—Self-Refer­ence from Num­ber The­ory to AI

ojorgensenSep 28, 2022, 2:10 PM
9 points
5 comments18 min readLW link

AI Safety Endgame Stories

Ivan VendrovSep 28, 2022, 4:58 PM
27 points
11 comments11 min readLW link

Es­ti­mat­ing the Cur­rent and Fu­ture Num­ber of AI Safety Researchers

Stephen McAleeseSep 28, 2022, 9:11 PM
24 points
11 comments9 min readLW link
(forum.effectivealtruism.org)

Clar­ify­ing the Agent-Like Struc­ture Problem

johnswentworthSep 29, 2022, 9:28 PM
53 points
14 comments6 min readLW link

Emer­gency learning

Stuart_ArmstrongJan 28, 2017, 10:05 AM
13 points
10 comments4 min readLW link

EAG DC: Meta-Bot­tle­necks in Prevent­ing AI Doom

Joseph BloomSep 30, 2022, 5:53 PM
5 points
0 comments1 min readLW link

In­ter­est­ing pa­pers: for­mally ver­ify­ing DNNs

the gears to ascensionSep 30, 2022, 8:49 AM
13 points
0 comments3 min readLW link

linkpost: loss basin visualization

Nathan Helm-BurgerSep 30, 2022, 3:42 AM
14 points
1 comment1 min readLW link

Four us­ages of “loss” in AI

TurnTroutOct 2, 2022, 12:52 AM
42 points
18 comments5 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc CarauleanuOct 1, 2022, 1:49 AM
10 points
0 comments1 min readLW link

Google could build a con­scious AI in three months

derek shillerOct 1, 2022, 1:24 PM
9 points
18 comments1 min readLW link

AGI by 2050 prob­a­bil­ity less than 1%

fuminOct 1, 2022, 7:45 PM
−10 points
4 comments9 min readLW link
(docs.google.com)

[Question] Do an­thropic con­sid­er­a­tions un­der­cut the evolu­tion an­chor from the Bio An­chors re­port?

Ege ErdilOct 1, 2022, 8:02 PM
20 points
13 comments2 min readLW link

A re­view of the Bio-An­chors report

jylin04Oct 3, 2022, 10:27 AM
45 points
4 comments1 min readLW link
(docs.google.com)

Data for IRL: What is needed to learn hu­man val­ues?

Jan WehnerOct 3, 2022, 9:23 AM
18 points
6 comments12 min readLW link

my cur­rent out­look on AI risk mitigation

Tamsin LeakeOct 3, 2022, 8:06 PM
58 points
4 comments11 min readLW link
(carado.moe)

No free lunch the­o­rem is irrelevant

CatneeOct 4, 2022, 12:21 AM
12 points
7 comments1 min readLW link

Paper+Sum­mary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius HobbhahnOct 4, 2022, 7:22 AM
44 points
11 comments1 min readLW link
(arxiv.org)

How are you deal­ing with on­tol­ogy iden­ti­fi­ca­tion?

Erik JennerOct 4, 2022, 11:28 PM
33 points
10 comments3 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A fol­low-up survey

Oct 5, 2022, 2:03 PM
13 points
2 comments7 min readLW link

Track­ing Com­pute Stocks and Flows: Case Stud­ies?

CullenOct 5, 2022, 5:57 PM
11 points
5 comments1 min readLW link

Char­i­ta­ble Reads of Anti-AGI-X-Risk Ar­gu­ments, Part 1

sstichOct 5, 2022, 5:03 AM
3 points
4 comments3 min readLW link

Neu­ral Tan­gent Ker­nel Distillation

Oct 5, 2022, 6:11 PM
68 points
20 comments8 min readLW link

More Re­cent Progress in the The­ory of Neu­ral Networks

jylin04Oct 6, 2022, 4:57 PM
78 points
6 comments4 min readLW link

Analysing a 2036 Takeover Scenario

ukc10014Oct 6, 2022, 8:48 PM
8 points
2 comments27 min readLW link

Warn­ing Shots Prob­a­bly Wouldn’t Change The Pic­ture Much

So8resOct 6, 2022, 5:15 AM
111 points
40 comments2 min readLW link

Align­ment Might Never Be Solved, By Hu­mans or AI

intersticeOct 7, 2022, 4:14 PM
30 points
6 comments3 min readLW link

linkpost: neuro-sym­bolic hy­brid ai

Nathan Helm-BurgerOct 6, 2022, 9:52 PM
16 points
0 comments1 min readLW link
(youtu.be)

Poly­se­man­tic­ity and Ca­pac­ity in Neu­ral Networks

Oct 7, 2022, 5:51 PM
78 points
9 comments3 min readLW link

[Question] De­liber­ate prac­tice for re­search?

Alex_AltairOct 8, 2022, 3:45 AM
16 points
2 comments1 min readLW link

[Question] How many GPUs does NVIDIA make?

leogaoOct 8, 2022, 5:54 PM
27 points
2 comments1 min readLW link

SERI MATS Pro­gram—Win­ter 2022 Cohort

Oct 8, 2022, 7:09 PM
71 points
12 comments4 min readLW link

[Question] Toy al­ign­ment prob­lem: So­cial Ne­work KPI design

qbolecOct 8, 2022, 10:14 PM
7 points
1 comment1 min readLW link

My ten­ta­tive in­ter­pretabil­ity re­search agenda—topol­ogy match­ing.

Maxwell ClarkeOct 8, 2022, 10:14 PM
10 points
2 comments4 min readLW link

[Question] AI Risk Micro­dy­nam­ics Survey

FroolowOct 9, 2022, 8:04 PM
3 points
0 comments1 min readLW link

Pos­si­ble miracles

Oct 9, 2022, 6:17 PM
60 points
33 comments8 min readLW link

The Le­bowski The­o­rem — Char­i­ta­ble Reads of Anti-AGI-X-Risk Ar­gu­ments, Part 2

sstichOct 8, 2022, 10:39 PM
1 point
10 comments7 min readLW link

Embed­ding AI into AR goggles

aixarOct 9, 2022, 8:08 PM
−12 points
0 comments1 min readLW link

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul BricmanOct 13, 2022, 12:36 PM
13 points
8 comments7 min readLW link

Re­sults from the lan­guage model hackathon

Esben KranOct 10, 2022, 8:29 AM
21 points
1 comment4 min readLW link

Don’t ex­pect AGI any­time soon

cveresOct 10, 2022, 10:38 PM
−14 points
6 comments1 min readLW link

Disen­tan­gling in­ner al­ign­ment failures

Erik JennerOct 10, 2022, 6:50 PM
14 points
5 comments4 min readLW link

Anony­mous ad­vice: If you want to re­duce AI risk, should you take roles that ad­vance AI ca­pa­bil­ities?

Benjamin HiltonOct 11, 2022, 2:16 PM
54 points
10 comments1 min readLW link

Pret­tified AI Safety Game Cards

abramdemskiOct 11, 2022, 7:35 PM
46 points
6 comments1 min readLW link

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio FrancaOct 11, 2022, 10:50 PM
5 points
0 comments9 min readLW link

Align­ment 201 curriculum

Richard_NgoOct 12, 2022, 6:03 PM
102 points
3 comments1 min readLW link
(www.agisafetyfundamentals.com)

Ar­ti­cle Re­view: Google’s AlphaTensor

Robert_AIZIOct 12, 2022, 6:04 PM
8 points
2 comments10 min readLW link

[Question] Pre­vi­ous Work on Re­cre­at­ing Neu­ral Net­work In­put from In­ter­me­di­ate Layer Activations

bglassOct 12, 2022, 7:28 PM
1 point
3 comments1 min readLW link

You are bet­ter at math (and al­ign­ment) than you think

trevorOct 13, 2022, 3:07 AM
37 points
7 comments22 min readLW link
(www.lesswrong.com)

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGraceOct 14, 2022, 1:00 PM
336 points
122 comments34 min readLW link
(aiimpacts.org)

Another prob­lem with AI con­fine­ment: or­di­nary CPUs can work as ra­dio transmitters

RomanSOct 14, 2022, 8:28 AM
34 points
1 comment1 min readLW link
(news.softpedia.com)

“AGI soon, but Nar­row works Bet­ter”

AnthonyRepettoOct 14, 2022, 9:35 PM
1 point
9 comments2 min readLW link

[Question] Best re­source to go from “typ­i­cal smart tech-savvy per­son” to “per­son who gets AGI risk ur­gency”?

LironOct 15, 2022, 10:26 PM
14 points
8 comments1 min readLW link

[Question] Ques­tions about the al­ign­ment problem

GG10Oct 17, 2022, 1:42 AM
−5 points
13 comments3 min readLW link

[Question] Creat­ing su­per­in­tel­li­gence with­out AGI

AntbOct 17, 2022, 7:01 PM
7 points
3 comments1 min readLW link

AI Safety Ideas: An Open AI Safety Re­search Platform

Esben KranOct 17, 2022, 5:01 PM
24 points
0 comments1 min readLW link

Is GPT-N bounded by hu­man ca­pac­i­ties? No.

Cleo NardoOct 17, 2022, 11:26 PM
5 points
4 comments2 min readLW link

A prag­matic met­ric for Ar­tifi­cial Gen­eral Intelligence

lorepieriOct 17, 2022, 10:07 PM
6 points
0 comments1 min readLW link
(lorenzopieri.com)

Is GitHub Copi­lot in le­gal trou­ble?

tcelferactOct 18, 2022, 4:19 PM
34 points
2 comments1 min readLW link

Me­tac­u­lus is build­ing a team ded­i­cated to AI forecasting

ChristianWilliamsOct 18, 2022, 4:08 PM
3 points
0 comments1 min readLW link

[Question] Where can I find solu­tion to the ex­er­cises of AGISF?

Charbel-RaphaëlOct 18, 2022, 2:11 PM
7 points
0 comments1 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

Oct 18, 2022, 6:40 PM
43 points
9 comments33 min readLW link

An Ex­tremely Opinionated An­no­tated List of My Favourite Mechanis­tic In­ter­pretabil­ity Papers

Neel NandaOct 18, 2022, 9:08 PM
66 points
5 comments12 min readLW link
(www.neelnanda.io)

Distil­led Rep­re­sen­ta­tions Re­search Agenda

Oct 18, 2022, 8:59 PM
15 points
2 comments8 min readLW link

[Question] Should we push for re­quiring AI train­ing data to be li­censed?

ChristianKlOct 19, 2022, 5:49 PM
38 points
32 comments1 min readLW link

Hacker-AI and Digi­tal Ghosts – Pre-AGI

Erland WittkotterOct 19, 2022, 3:33 PM
9 points
7 comments8 min readLW link

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
86 points
11 comments1 min readLW link
(arxiv.org)

The her­i­ta­bil­ity of hu­man val­ues: A be­hav­ior ge­netic cri­tique of Shard Theory

geoffreymillerOct 20, 2022, 3:51 PM
63 points
58 comments21 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
52 points
22 comments1 min readLW link

Tra­jec­to­ries to 2036

ukc10014Oct 20, 2022, 8:23 PM
1 point
1 comment14 min readLW link

In­tel­li­gent be­havi­our across sys­tems, scales and substrates

Nora_AmmannOct 21, 2022, 5:09 PM
11 points
0 comments10 min readLW link

A frame­work and open ques­tions for game the­o­retic shard modeling

Garrett BakerOct 21, 2022, 9:40 PM
11 points
4 comments4 min readLW link

[Question] The Last Year - is there an ex­ist­ing novel about the last year be­fore AI doom?

Luca PetrolatiOct 22, 2022, 8:44 PM
4 points
4 comments1 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannellOct 23, 2022, 9:48 PM
36 points
43 comments17 min readLW link

The op­ti­mal timing of spend­ing on AGI safety work; why we should prob­a­bly be spend­ing more now

Tristan CookOct 24, 2022, 5:42 PM
62 points
0 comments1 min readLW link

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel NandaOct 24, 2022, 8:45 PM
62 points
8 comments3 min readLW link
(neelnanda.io)

Con­sider try­ing Vivek Heb­bar’s al­ign­ment exercises

Orpheus16Oct 24, 2022, 7:46 PM
36 points
1 comment4 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM
22 points
0 comments1 min readLW link
(github.com)

What does it take to defend the world against out-of-con­trol AGIs?

Steven ByrnesOct 25, 2022, 2:47 PM
141 points
31 comments30 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM
7 points
1 comment1 min readLW link

Maps and Blueprint; the Two Sides of the Align­ment Equation

Nora_AmmannOct 25, 2022, 4:29 PM
21 points
1 comment5 min readLW link

A Walk­through of A Math­e­mat­i­cal Frame­work for Trans­former Circuits

Neel NandaOct 25, 2022, 8:24 PM
49 points
5 comments1 min readLW link
(www.youtube.com)

Paper: In-con­text Re­in­force­ment Learn­ing with Al­gorithm Distil­la­tion [Deep­mind]

LawrenceCOct 26, 2022, 6:45 PM
28 points
5 comments1 min readLW link
(arxiv.org)

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

Oct 27, 2022, 1:32 AM
134 points
14 comments12 min readLW link

You won’t solve al­ign­ment with­out agent foundations

Mikhail SaminNov 6, 2022, 8:07 AM
21 points
3 comments8 min readLW link

AI & ML Safety Up­dates W43

Oct 28, 2022, 1:18 PM
9 points
3 comments3 min readLW link

Prizes for ML Safety Bench­mark Ideas

joshcOct 28, 2022, 2:51 AM
36 points
3 comments1 min readLW link

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM
26 points
1 comment1 min readLW link
(braininspired.co)

Join the in­ter­pretabil­ity re­search hackathon

Esben KranOct 28, 2022, 4:26 PM
15 points
0 comments1 min readLW link

In­stru­men­tal ig­nor­ing AI, Dumb but not use­less.

Donald HobsonOct 30, 2022, 4:55 PM
7 points
6 comments2 min readLW link

«Boundaries», Part 3a: Defin­ing bound­aries as di­rected Markov blankets

Andrew_CritchOct 30, 2022, 6:31 AM
58 points
13 comments15 min readLW link

[Book] In­ter­pretable Ma­chine Learn­ing: A Guide for Mak­ing Black Box Models Explainable

Esben KranOct 31, 2022, 11:38 AM
19 points
1 comment1 min readLW link
(christophm.github.io)

“Cars and Elephants”: a hand­wavy ar­gu­ment/​anal­ogy against mechanis­tic interpretability

David Scott Krueger (formerly: capybaralet)Oct 31, 2022, 9:26 PM
47 points
25 comments2 min readLW link

ML Safety Schol­ars Sum­mer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM
29 points
0 comments1 min readLW link

What sorts of sys­tems can be de­cep­tive?

Andrei AlexandruOct 31, 2022, 10:00 PM
14 points
0 comments7 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM
67 points
100 comments2 min readLW link

Real-Time Re­search Record­ing: Can a Trans­former Re-Derive Po­si­tional Info?

Neel NandaNov 1, 2022, 11:56 PM
68 points
14 comments1 min readLW link
(youtu.be)

On the cor­re­spon­dence be­tween AI-mis­al­ign­ment and cog­ni­tive dis­so­nance us­ing a be­hav­ioral eco­nomics model

Stijn BruersNov 1, 2022, 5:39 PM
4 points
0 comments6 min readLW link

WFW?: Op­por­tu­nity and The­ory of Impact

DavidCorfieldNov 2, 2022, 1:24 AM
1 point
0 comments1 min readLW link

AI Safety Needs Great Product Builders

goodgravyNov 2, 2022, 11:33 AM
14 points
2 comments1 min readLW link

A Mys­tery About High Di­men­sional Con­cept Encoding

Fabien RogerNov 3, 2022, 5:05 PM
46 points
13 comments7 min readLW link

Ethan Ca­ballero on Bro­ken Neu­ral Scal­ing Laws, De­cep­tion, and Re­cur­sive Self Improvement

Nov 4, 2022, 6:09 PM
14 points
11 comments5 min readLW link
(theinsideview.ai)

Can we pre­dict the abil­ities of fu­ture AI? MLAISU W44

Nov 4, 2022, 3:19 PM
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

My sum­mary of “Prag­matic AI Safety”

Eleni AngelouNov 5, 2022, 12:54 PM
2 points
0 comments5 min readLW link

Re­view of the Challenge

SD MarlowNov 5, 2022, 6:38 AM
−14 points
5 comments2 min readLW link

How to store hu­man val­ues on a computer

Oliver SiegelNov 5, 2022, 7:17 PM
−12 points
17 comments1 min readLW link

Should AI fo­cus on prob­lem-solv­ing or strate­gic plan­ning? Why not both?

Oliver SiegelNov 5, 2022, 7:17 PM
−12 points
3 comments1 min readLW link

In­stead of tech­ni­cal re­search, more peo­ple should fo­cus on buy­ing time

Nov 5, 2022, 8:43 PM
80 points
51 comments14 min readLW link

[Question] Is there some kind of back­log or de­lay for data cen­ter AI?

trevorNov 7, 2022, 8:18 AM
5 points
2 comments1 min readLW link

A Walk­through of In­ter­pretabil­ity in the Wild (w/​ au­thors Kevin Wang, Arthur Conmy & Alexan­dre Variengien)

Neel NandaNov 7, 2022, 10:39 PM
29 points
15 comments3 min readLW link
(youtu.be)

How could we know that an AGI sys­tem will have good con­se­quences?

So8resNov 7, 2022, 10:42 PM
86 points
24 comments5 min readLW link

Peo­ple care about each other even though they have im­perfect mo­ti­va­tional poin­t­ers?

TurnTroutNov 8, 2022, 6:15 PM
32 points
25 comments7 min readLW link

[ASoT] Thoughts on GPT-N

Ulisse MiniNov 8, 2022, 7:14 AM
8 points
0 comments1 min readLW link

In­verse scal­ing can be­come U-shaped

Edouard HarrisNov 8, 2022, 7:04 PM
27 points
15 comments1 min readLW link
(arxiv.org)

Counterfactability

Scott GarrabrantNov 7, 2022, 5:39 AM
36 points
4 comments11 min readLW link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:40 PM
73 points
9 comments6 min readLW link
(danielfilan.com)

[ASoT] In­stru­men­tal con­ver­gence is useful

Ulisse MiniNov 9, 2022, 8:20 PM
5 points
9 comments1 min readLW link

Me­satrans­la­tion and Metatranslation

jdpNov 9, 2022, 6:46 PM
23 points
4 comments11 min readLW link

The In­ter­pretabil­ity Playground

Esben KranNov 10, 2022, 5:15 PM
8 points
0 comments1 min readLW link
(alignmentjam.com)

Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

TurnTroutNov 29, 2022, 6:23 AM
55 points
27 comments15 min readLW link

[Question] What are some low-cost out­side-the-box ways to do/​fund al­ign­ment re­search?

trevorNov 11, 2022, 5:25 AM
10 points
0 comments1 min readLW link

In­stru­men­tal con­ver­gence is what makes gen­eral in­tel­li­gence possible

tailcalledNov 11, 2022, 4:38 PM
72 points
11 comments4 min readLW link

A short cri­tique of Vanessa Kosoy’s PreDCA

Martín SotoNov 13, 2022, 4:00 PM
25 points
8 comments4 min readLW link

[Question] Why don’t we have self driv­ing cars yet?

Linda LinseforsNov 14, 2022, 12:19 PM
21 points
16 comments1 min readLW link

Win­ners of the AI Safety Nudge Competition

Marc CarauleanuNov 15, 2022, 1:06 AM
4 points
0 comments1 min readLW link

[Question] Will nan­otech/​biotech be what leads to AI doom?

tailcalledNov 15, 2022, 5:38 PM
4 points
8 comments2 min readLW link

[Question] What is our cur­rent best in­fo­haz­ard policy for AGI (safety) re­search?

Roman LeventovNov 15, 2022, 10:33 PM
12 points
2 comments1 min readLW link

Disagree­ment with bio an­chors that lead to shorter timelines

Marius HobbhahnNov 16, 2022, 2:40 PM
72 points
16 comments7 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
82 points
3 comments12 min readLW link

[Question] Is there some rea­son LLMs haven’t seen broader use?

tailcalledNov 16, 2022, 8:04 PM
25 points
27 comments1 min readLW link

AI Fore­cast­ing Re­search Ideas

JsevillamolNov 17, 2022, 5:37 PM
21 points
2 comments1 min readLW link

Re­sults from the in­ter­pretabil­ity hackathon

Nov 17, 2022, 2:51 PM
80 points
0 comments6 min readLW link

Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

Nov 18, 2022, 1:48 AM
60 points
61 comments12 min readLW link

AI Ethics != Ai Safety

DentinNov 18, 2022, 3:02 AM
2 points
0 comments1 min readLW link

Up­date to Mys­ter­ies of mode col­lapse: text-davinci-002 not RLHF

janusNov 19, 2022, 11:51 PM
69 points
8 comments2 min readLW link

Limits to the Con­trol­la­bil­ity of AGI

Nov 20, 2022, 7:18 PM
10 points
2 comments9 min readLW link

[ASoT] Reflec­tivity in Nar­row AI

Ulisse MiniNov 21, 2022, 12:51 AM
6 points
1 comment1 min readLW link

Here’s the exit.

ValentineNov 21, 2022, 6:07 PM
85 points
138 comments10 min readLW link

Clar­ify­ing wire­head­ing terminology

leogaoNov 24, 2022, 4:53 AM
53 points
6 comments1 min readLW link

A Walk­through of In-Con­text Learn­ing and In­duc­tion Heads (w/​ Charles Frye) Part 1 of 2

Neel NandaNov 22, 2022, 5:12 PM
20 points
0 comments1 min readLW link
(www.youtube.com)

An­nounc­ing AI safety Men­tors and Mentees

Marius HobbhahnNov 23, 2022, 3:21 PM
54 points
7 comments10 min readLW link

My take on Ja­cob Can­nell’s take on AGI safety

Steven ByrnesNov 28, 2022, 2:01 PM
61 points
13 comments30 min readLW link

Don’t al­ign agents to eval­u­a­tions of plans

TurnTroutNov 26, 2022, 9:16 PM
37 points
46 comments18 min readLW link

[Question] Dumb and ill-posed ques­tion: Is con­cep­tual re­search like this MIRI pa­per on the shut­down prob­lem/​Cor­rigi­bil­ity “real”

joraineNov 24, 2022, 5:08 AM
25 points
11 comments1 min readLW link

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Nov 25, 2022, 2:36 PM
36 points
4 comments6 min readLW link
(vkrakovna.wordpress.com)

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Orpheus16Nov 25, 2022, 8:47 PM
37 points
2 comments9 min readLW link

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristianoNov 25, 2022, 6:50 PM
121 points
17 comments21 min readLW link
(ai-alignment.com)

The First Filter

Nov 26, 2022, 7:37 PM
55 points
5 comments1 min readLW link

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

Nov 28, 2022, 4:17 PM
36 points
2 comments30 min readLW link

On the Di­plo­macy AI

ZviNov 28, 2022, 1:20 PM
119 points
29 comments11 min readLW link
(thezvi.wordpress.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofskyNov 29, 2022, 7:30 PM
68 points
9 comments33 min readLW link
(www.cold-takes.com)

Dist­in­guish­ing test from training

So8resNov 29, 2022, 9:41 PM
65 points
10 comments6 min readLW link

[Question] Do any of the AI Risk eval­u­a­tions fo­cus on hu­mans as the risk?

jmhNov 30, 2022, 3:09 AM
10 points
8 comments1 min readLW link

Ap­ply to at­tend win­ter AI al­ign­ment work­shops (Dec 28-30 & Jan 3-5) near Berkeley

Dec 1, 2022, 8:46 PM
25 points
1 comment1 min readLW link

The­o­ries of im­pact for Science of Deep Learning

Marius HobbhahnDec 1, 2022, 2:39 PM
16 points
0 comments11 min readLW link

In­ner and outer al­ign­ment de­com­pose one hard prob­lem into two ex­tremely hard problems

TurnTroutDec 2, 2022, 2:43 AM
96 points
18 comments53 min readLW link

The Plan − 2022 Update

johnswentworthDec 1, 2022, 8:43 PM
211 points
33 comments8 min readLW link

Find­ing gliders in the game of life

paulfchristianoDec 1, 2022, 8:40 PM
91 points
7 comments16 min readLW link
(ai-alignment.com)

Take 1: We’re not go­ing to re­verse-en­g­ineer the AI.

Charlie SteinerDec 1, 2022, 10:41 PM
38 points
4 comments4 min readLW link

Un­der­stand­ing goals in com­plex systems

Johannes C. MayerDec 1, 2022, 11:49 PM
9 points
0 comments1 min readLW link
(www.youtube.com)

Mas­ter­ing Strat­ego (Deep­mind)

svemirskiDec 2, 2022, 2:21 AM
6 points
0 comments1 min readLW link
(www.deepmind.com)

Jailbreak­ing ChatGPT on Re­lease Day

ZviDec 2, 2022, 1:10 PM
237 points
74 comments6 min readLW link
(thezvi.wordpress.com)

[Question] Did I just catch GPTchat do­ing some­thing un­ex­pect­edly in­sight­ful?

trevorDec 2, 2022, 7:48 AM
9 points
0 comments1 min readLW link

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Charlie SteinerDec 3, 2022, 12:54 AM
16 points
1 comment2 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

Dec 3, 2022, 12:59 AM
32 points
0 comments17 min readLW link

Log­i­cal in­duc­tion for soft­ware engineers

Alex FlintDec 3, 2022, 7:55 PM
124 points
2 comments27 min readLW link

ChatGPT is sur­pris­ingly and un­can­ningly good at pre­tend­ing to be sentient

Victor NovikovDec 3, 2022, 2:47 PM
17 points
11 comments18 min readLW link

Monthly Shorts 11/​22

CelerDec 5, 2022, 7:30 AM
8 points
0 comments3 min readLW link
(keller.substack.com)

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie SteinerDec 5, 2022, 10:39 AM
34 points
4 comments1 min readLW link

The No Free Lunch the­o­rem for dummies

Steven ByrnesDec 5, 2022, 9:46 PM
28 points
16 comments3 min readLW link

[Link] Why I’m op­ti­mistic about OpenAI’s al­ign­ment approach

janleikeDec 5, 2022, 10:51 PM
93 points
13 comments1 min readLW link
(aligned.substack.com)

Up­dat­ing my AI timelines

Matthew BarnettDec 5, 2022, 8:46 PM
134 points
40 comments2 min readLW link

ChatGPT and Ide­olog­i­cal Tur­ing Test

ViliamDec 5, 2022, 9:45 PM
41 points
1 comment1 min readLW link

Ver­ifi­ca­tion Is Not Easier Than Gen­er­a­tion In General

johnswentworthDec 6, 2022, 5:20 AM
56 points
23 comments1 min readLW link

[Question] What are the ma­jor un­der­ly­ing di­vi­sions in AI safety?

Chris LeongDec 6, 2022, 3:28 AM
5 points
2 comments1 min readLW link

Take 5: Another prob­lem for nat­u­ral ab­strac­tions is laz­i­ness.

Charlie SteinerDec 6, 2022, 7:00 AM
30 points
4 comments3 min readLW link

Mesa-Op­ti­miz­ers via Grokking

orthonormalDec 6, 2022, 8:05 PM
35 points
4 comments6 min readLW link

[Question] How do finite fac­tored sets com­pare with phase space?

Alex_AltairDec 6, 2022, 8:05 PM
14 points
1 comment1 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

Dec 6, 2022, 7:54 PM
159 points
77 comments9 min readLW link

Take 6: CAIS is ac­tu­ally Or­wellian.

Charlie SteinerDec 7, 2022, 1:50 PM
14 points
5 comments2 min readLW link

[Question] Look­ing for ideas of pub­lic as­sets (stocks, funds, ETFs) that I can in­vest in to have a chance at prof­it­ing from the mass adop­tion and com­mer­cial­iza­tion of AI technology

AnnapurnaDec 7, 2022, 10:35 PM
15 points
9 comments1 min readLW link

You should con­sider launch­ing an AI startup

joshcDec 8, 2022, 12:28 AM
5 points
16 comments4 min readLW link

Ma­chine Learn­ing Consent

jefftkDec 8, 2022, 3:50 AM
38 points
14 comments3 min readLW link
(www.jefftk.com)

Rele­vant to nat­u­ral ab­strac­tions: Eu­clidean Sym­me­try Equiv­ar­i­ant Ma­chine Learn­ing—Overview, Ap­pli­ca­tions, and Open Questions

the gears to ascensionDec 8, 2022, 6:01 PM
7 points
0 comments1 min readLW link
(youtu.be)

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM
68 points
5 comments14 min readLW link
(www.cold-takes.com)

[Question] How is the “sharp left turn defined”?

Chris_LeongDec 9, 2022, 12:04 AM
13 points
3 comments1 min readLW link

Linkpost for a gen­er­al­ist al­gorith­mic learner: ca­pa­ble of car­ry­ing out sort­ing, short­est paths, string match­ing, con­vex hull find­ing in one network

lovetheusersDec 9, 2022, 12:02 AM
7 points
1 comment1 min readLW link
(twitter.com)

Timelines ARE rele­vant to al­ign­ment re­search (timelines 2 of ?)

Nathan Helm-BurgerAug 24, 2022, 12:19 AM
11 points
5 comments6 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo NardoDec 9, 2022, 5:53 PM
11 points
0 comments5 min readLW link

[Question] Does a LLM have a util­ity func­tion?

DagonDec 9, 2022, 5:19 PM
16 points
6 comments1 min readLW link

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

Dec 9, 2022, 10:38 AM
14 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

Take 8: Queer the in­ner/​outer al­ign­ment di­chotomy.

Charlie SteinerDec 9, 2022, 5:46 PM
26 points
2 comments2 min readLW link

My thoughts on OpenAI’s Align­ment plan

Donald HobsonDec 10, 2022, 10:35 AM
20 points
0 comments6 min readLW link

[ASoT] Nat­u­ral ab­strac­tions and AlphaZero

Ulisse MiniDec 10, 2022, 5:53 PM
31 points
1 comment1 min readLW link
(arxiv.org)

[Question] How promis­ing are le­gal av­enues to re­strict AI train­ing data?

thehalliardDec 10, 2022, 4:31 PM
9 points
2 comments1 min readLW link

Con­sider us­ing re­versible au­tomata for al­ign­ment research

Alex_AltairDec 11, 2022, 1:00 AM
81 points
29 comments2 min readLW link

[fic­tion] Our Fi­nal Hour

Mati_RoyDec 11, 2022, 5:49 AM
16 points
5 comments3 min readLW link

A crisis for on­line com­mu­ni­ca­tion: bots and bot users will over­run the In­ter­net?

Mitchell_PorterDec 11, 2022, 9:11 PM
23 points
11 comments1 min readLW link

Refram­ing in­ner alignment

davidadDec 11, 2022, 1:53 PM
47 points
13 comments4 min readLW link

Side-chan­nels: in­put ver­sus output

davidadDec 12, 2022, 12:32 PM
35 points
9 comments2 min readLW link

Psy­cholog­i­cal Di­sor­ders and Problems

Dec 12, 2022, 6:15 PM
35 points
5 comments1 min readLW link

Prod­ding ChatGPT to solve a ba­sic alge­bra problem

ShmiDec 12, 2022, 4:09 AM
14 points
6 comments1 min readLW link
(twitter.com)

A brain­teaser for lan­guage models

Adam ScherlisDec 12, 2022, 2:43 AM
46 points
3 comments2 min readLW link

Take 9: No, RLHF/​IDA/​de­bate doesn’t solve outer al­ign­ment.

Charlie SteinerDec 12, 2022, 11:51 AM
36 points
14 comments2 min readLW link

12 ca­reer-re­lated ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Orpheus16Dec 12, 2022, 10:36 PM
18 points
0 comments2 min readLW link

Finite Fac­tored Sets in Pictures

Magdalena WacheDec 11, 2022, 6:49 PM
149 points
31 comments12 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

Dec 12, 2022, 10:09 PM
20 points
2 comments3 min readLW link

Take 10: Fine-tun­ing with RLHF is aes­thet­i­cally un­satis­fy­ing.

Charlie SteinerDec 13, 2022, 7:04 AM
30 points
3 comments2 min readLW link

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristianoDec 13, 2022, 7:10 AM
233 points
5 comments2 min readLW link
(ai-alignment.com)

Okay, I feel it now

g1Dec 13, 2022, 11:01 AM
84 points
14 comments1 min readLW link

What Does It Mean to Align AI With Hu­man Values?

AlgonDec 13, 2022, 4:56 PM
8 points
3 comments1 min readLW link
(www.quantamagazine.org)

[Question] Is the ChatGPT-simu­lated Linux vir­tual ma­chine real?

KenoubiDec 13, 2022, 3:41 PM
18 points
7 comments1 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
80 points
10 comments22 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasperDec 13, 2022, 2:47 PM
37 points
16 comments3 min readLW link

My AGI safety re­search—2022 re­view, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM
34 points
6 comments6 min readLW link

Try­ing to dis­am­biguate differ­ent ques­tions about whether RLHF is “good”

BuckDec 14, 2022, 4:03 AM
92 points
40 comments7 min readLW link

Pre­dict­ing GPU performance

Dec 14, 2022, 4:27 PM
59 points
24 comments1 min readLW link
(epochai.org)

[Question] Is the AI timeline too short to have chil­dren?

YorethDec 14, 2022, 6:32 PM
33 points
20 comments1 min readLW link

«Boundaries», Part 3b: Align­ment prob­lems in terms of bound­aries

Andrew_CritchDec 14, 2022, 10:34 PM
49 points
2 comments13 min readLW link

[Question] Is Paul Chris­ti­ano still as op­ti­mistic about Ap­proval-Directed Agents as he was in 2018?

Chris_LeongDec 14, 2022, 11:28 PM
8 points
0 comments1 min readLW link

Align­ing al­ign­ment with performance

Marv KDec 14, 2022, 10:19 PM
2 points
0 comments2 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidadDec 15, 2022, 1:42 PM
39 points
0 comments3 min readLW link

The next decades might be wild

Marius HobbhahnDec 15, 2022, 4:10 PM
157 points
27 comments41 min readLW link

High-level hopes for AI alignment

HoldenKarnofskyDec 15, 2022, 6:00 PM
42 points
3 comments19 min readLW link
(www.cold-takes.com)

[Question] How is ARC plan­ning to use ELK?

jacquesthibsDec 15, 2022, 8:11 PM
23 points
5 comments1 min readLW link

AI over­hangs de­pend on whether al­gorithms, com­pute and data are sub­sti­tutes or complements

NathanBarnardDec 16, 2022, 2:23 AM
2 points
0 comments3 min readLW link

Paper: Trans­form­ers learn in-con­text by gra­di­ent descent

LawrenceCDec 16, 2022, 11:10 AM
26 points
11 comments2 min readLW link
(arxiv.org)

How im­por­tant are ac­cu­rate AI timelines for the op­ti­mal spend­ing sched­ule on AI risk in­ter­ven­tions?

Tristan CookDec 16, 2022, 4:05 PM
27 points
2 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben KranDec 16, 2022, 11:03 AM
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

Can we effi­ciently ex­plain model be­hav­iors?

paulfchristianoDec 16, 2022, 7:40 PM
63 points
0 comments9 min readLW link
(ai-alignment.com)

[Question] Col­lege Selec­tion Ad­vice for Tech­ni­cal Alignment

TempCollegeAskDec 16, 2022, 5:11 PM
11 points
8 comments1 min readLW link

Paper: Con­sti­tu­tional AI: Harm­less­ness from AI Feed­back (An­thropic)

LawrenceCDec 16, 2022, 10:12 PM
60 points
10 comments1 min readLW link
(www.anthropic.com)

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTroutDec 17, 2022, 9:43 PM
42 points
12 comments2 min readLW link

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie SteinerDec 18, 2022, 2:14 PM
29 points
0 comments2 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM
5 points
0 comments2 min readLW link

Event [Berkeley]: Align­ment Col­lab­o­ra­tor Speed-Meeting

Dec 19, 2022, 2:24 AM
18 points
2 comments1 min readLW link

The ‘Old AI’: Les­sons for AI gov­er­nance from early elec­tric­ity regulation

Dec 19, 2022, 2:42 AM
7 points
0 comments13 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven ByrnesDec 20, 2022, 5:08 PM
19 points
4 comments2 min readLW link

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

RemmeltDec 19, 2022, 12:02 PM
8 points
6 comments31 min readLW link

Next Level Seinfeld

ZviDec 19, 2022, 1:30 PM
45 points
6 comments1 min readLW link
(thezvi.wordpress.com)

Solu­tion to The Align­ment Problem

AlgonDec 19, 2022, 8:12 PM
10 points
0 comments2 min readLW link

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceCDec 19, 2022, 10:52 PM
80 points
14 comments17 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworthDec 20, 2022, 1:22 AM
41 points
14 comments12 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie SteinerDec 20, 2022, 5:01 AM
23 points
0 comments3 min readLW link

[link, 2019] AI paradigm: in­ter­ac­tive learn­ing from un­la­beled instructions

the gears to ascensionDec 20, 2022, 6:45 AM
2 points
0 comments2 min readLW link
(jgrizou.github.io)

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

Dec 20, 2022, 8:08 PM
45 points
6 comments1 min readLW link
(www.anthropic.com)

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Orpheus16Dec 20, 2022, 9:39 PM
14 points
2 comments11 min readLW link

Google Search loses to ChatGPT fair and square

ShmiDec 21, 2022, 8:11 AM
12 points
6 comments1 min readLW link
(www.surgehq.ai)

A Com­pre­hen­sive Mechanis­tic In­ter­pretabil­ity Ex­plainer & Glossary

Neel NandaDec 21, 2022, 12:35 PM
40 points
0 comments2 min readLW link
(neelnanda.io)

Price’s equa­tion for neu­ral networks

tailcalledDec 21, 2022, 1:09 PM
22 points
3 comments2 min readLW link

[Question] [DISC] Are Values Ro­bust?

DragonGodDec 21, 2022, 1:00 AM
12 points
5 comments2 min readLW link

Me­taphor.systems

the gears to ascensionDec 21, 2022, 9:31 PM
9 points
2 comments1 min readLW link
(metaphor.systems)

The Hu­man’s Hid­den Utility Func­tion (Maybe)

lukeprogJan 23, 2012, 7:39 PM
64 points
90 comments3 min readLW link

Us­ing vec­tor fields to vi­su­al­ise prefer­ences and make them consistent

Jan 28, 2020, 7:44 PM
41 points
32 comments11 min readLW link

[Ar­ti­cle re­view] Ar­tifi­cial In­tel­li­gence, Values, and Alignment

MichaelAMar 9, 2020, 12:42 PM
13 points
5 comments10 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

Aug 15, 2019, 9:29 PM
78 points
12 comments9 min readLW link

Failures in tech­nol­ogy fore­cast­ing? A re­ply to Ord and Yudkowsky

MichaelAMay 8, 2020, 12:41 PM
44 points
19 comments11 min readLW link

[Link and com­men­tary] The Offense-Defense Balance of Scien­tific Knowl­edge: Does Pub­lish­ing AI Re­search Re­duce Mi­suse?

MichaelAFeb 16, 2020, 7:56 PM
24 points
4 comments3 min readLW link

How can In­ter­pretabil­ity help Align­ment?

May 23, 2020, 4:16 PM
37 points
3 comments9 min readLW link

A Prob­lem With Patternism

B JacobsMay 19, 2020, 8:16 PM
5 points
52 comments1 min readLW link

Goal-di­rect­ed­ness is be­hav­ioral, not structural

adamShimiJun 8, 2020, 11:05 PM
6 points
12 comments3 min readLW link

Learn­ing Deep Learn­ing: Join­ing data sci­ence re­search as a mathematician

magfrumpOct 19, 2017, 7:14 PM
10 points
4 comments3 min readLW link

Will AI un­dergo dis­con­tin­u­ous progress?

Sammy MartinFeb 21, 2020, 10:16 PM
26 points
21 comments20 min readLW link

The Value Defi­ni­tion Problem

Sammy MartinNov 18, 2019, 7:56 PM
14 points
6 comments11 min readLW link

Life at Three Tails of the Bell Curve

lsusrJun 27, 2020, 8:49 AM
63 points
10 comments4 min readLW link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KRJun 29, 2020, 10:06 PM
15 points
2 comments8 min readLW link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

CullenJun 29, 2020, 5:00 PM
8 points
7 comments2 min readLW link

Null-box­ing New­comb’s Problem

YitzJul 13, 2020, 4:32 PM
33 points
10 comments4 min readLW link

No non­sense ver­sion of the “racial al­gorithm bias”

Yuxi_LiuJul 13, 2019, 3:39 PM
115 points
20 comments2 min readLW link

Ed­u­ca­tion 2.0 — A brand new ed­u­ca­tion system

aryanJul 15, 2020, 10:09 AM
−8 points
3 comments6 min readLW link

What it means to optimise

Neel NandaJul 25, 2020, 9:40 AM
5 points
0 comments8 min readLW link
(www.neelnanda.io)

[Question] Where are peo­ple think­ing and talk­ing about global co­or­di­na­tion for AI safety?

Wei DaiMay 22, 2019, 6:24 AM
103 points
22 comments1 min readLW link

The strat­egy-steal­ing assumption

paulfchristianoSep 16, 2019, 3:23 PM
72 points
46 comments12 min readLW link3 reviews

Con­ver­sa­tion with Paul Christiano

abergalSep 11, 2019, 11:20 PM
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepicNov 14, 2011, 5:02 PM
112 points
9 comments56 min readLW link

Re­sources for AI Align­ment Cartography

GyrodiotApr 4, 2020, 2:20 PM
45 points
8 comments9 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)Feb 6, 2019, 7:09 PM
25 points
17 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize round 2 win­ners and next round

cousin_itApr 16, 2018, 3:08 AM
64 points
29 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 3 win­ners and next round

cousin_itJul 15, 2018, 7:40 AM
93 points
7 comments1 min readLW link

Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve

Eliezer YudkowskyNov 26, 2017, 3:58 PM
76 points
48 comments20 min readLW link

Ar­bital scrape

emmabJun 6, 2019, 11:11 PM
89 points
23 comments1 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer YudkowskyJul 15, 2009, 2:27 AM
116 points
605 comments2 min readLW link

Self-fulfilling correlations

PhilGoetzAug 26, 2010, 9:07 PM
144 points
50 comments3 min readLW link

Zoom In: An In­tro­duc­tion to Circuits

evhubMar 10, 2020, 7:36 PM
84 points
11 comments2 min readLW link
(distill.pub)

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer YudkowskyDec 12, 2018, 1:40 AM
87 points
6 comments9 min readLW link

Im­plicit extortion

paulfchristianoApr 13, 2018, 4:33 PM
29 points
16 comments6 min readLW link
(ai-alignment.com)

Bayesian Judo

Eliezer YudkowskyJul 31, 2007, 5:53 AM
87 points
108 comments1 min readLW link

An­nounc­ing Align­men­tFo­rum.org Beta

RaemonJul 10, 2018, 8:19 PM
67 points
35 comments2 min readLW link

An­nounc­ing the Align­ment Newsletter

Rohin ShahApr 9, 2018, 9:16 PM
29 points
3 comments1 min readLW link

He­len Toner on China, CSET, and AI

Rob BensingerApr 21, 2019, 4:10 AM
68 points
3 comments7 min readLW link
(rationallyspeakingpodcast.org)

A sim­ple en­vi­ron­ment for show­ing mesa misalignment

Matthew BarnettSep 26, 2019, 4:44 AM
70 points
9 comments2 min readLW link

The E-Coli Test for AI Alignment

johnswentworthDec 16, 2018, 8:10 AM
69 points
24 comments1 min readLW link

Re­cent Progress in the The­ory of Neu­ral Networks

intersticeDec 4, 2019, 11:11 PM
76 points
9 comments9 min readLW link

The Art of the Ar­tifi­cial: In­sights from ‘Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach’

TurnTroutMar 25, 2018, 6:55 AM
31 points
8 comments15 min readLW link

Head­ing off a near-term AGI arms race

lincolnquirkAug 22, 2012, 2:23 PM
10 points
70 comments1 min readLW link

Out­perform­ing the hu­man Atari benchmark

VaniverMar 31, 2020, 7:33 PM
58 points
5 comments1 min readLW link
(deepmind.com)

Con­ver­sa­tional Pre­sen­ta­tion of Why Au­toma­tion is Differ­ent This Time

ryan_bJan 17, 2018, 10:11 PM
33 points
26 comments1 min readLW link

A rant against robots

Lê Nguyên HoangJan 14, 2020, 10:03 PM
64 points
7 comments5 min readLW link

Clar­ify­ing “AI Align­ment”

paulfchristianoNov 15, 2018, 2:41 PM
64 points
82 comments3 min readLW link2 reviews

Tiling Agents for Self-Mod­ify­ing AI (OPFAI #2)

Eliezer YudkowskyJun 6, 2013, 8:24 PM
84 points
259 comments3 min readLW link

EDT solves 5 and 10 with con­di­tional oracles

jessicataSep 30, 2018, 7:57 AM
59 points
8 comments13 min readLW link

AGI and Friendly AI in the dom­i­nant AI textbook

lukeprogMar 11, 2011, 4:12 AM
73 points
27 comments3 min readLW link

Ta­boo­ing ‘Agent’ for Pro­saic Alignment

Hjalmar_WijkAug 23, 2019, 2:55 AM
54 points
10 comments6 min readLW link

Is this what FAI out­reach suc­cess looks like?

Charlie SteinerMar 9, 2018, 1:12 PM
17 points
3 comments1 min readLW link
(www.youtube.com)

Align­ing a toy model of optimization

paulfchristianoJun 28, 2019, 8:23 PM
52 points
26 comments3 min readLW link

Deep­Mind ar­ti­cle: AI Safety Gridworlds

scarcegreengrassNov 30, 2017, 4:13 PM
24 points
5 comments1 min readLW link
(deepmind.com)

Bot­world: a cel­lu­lar au­toma­ton for study­ing self-mod­ify­ing agents em­bed­ded in their environment

So8resApr 12, 2014, 12:56 AM
78 points
55 comments7 min readLW link

“UDT2” and “against UD+ASSA”

Wei DaiMay 12, 2019, 4:18 AM
50 points
7 comments7 min readLW link

Us­ing ly­ing to de­tect hu­man values

Stuart_ArmstrongMar 15, 2018, 11:37 AM
19 points
6 comments1 min readLW link

Another AI Win­ter?

PeterMcCluskeyDec 25, 2019, 12:58 AM
47 points
14 comments4 min readLW link
(www.bayesianinvestor.com)

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana KumarJun 21, 2019, 12:50 PM
43 points
6 comments1 min readLW link
(arxiv.org)

The Ur­gent Meta-Ethics of Friendly Ar­tifi­cial Intelligence

lukeprogFeb 1, 2011, 2:15 PM
76 points
252 comments1 min readLW link

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogsMay 15, 2018, 8:11 PM
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Self-con­firm­ing pre­dic­tions can be ar­bi­trar­ily bad

Stuart_ArmstrongMay 3, 2019, 11:34 AM
46 points
11 comments5 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]Jul 23, 2014, 12:24 AM
62 points
28 comments3 min readLW link

[Question] What are the most plau­si­ble “AI Safety warn­ing shot” sce­nar­ios?

Daniel KokotajloMar 26, 2020, 8:59 PM
35 points
16 comments1 min readLW link

AGI in a vuln­er­a­ble world

Mar 26, 2020, 12:10 AM
42 points
21 comments1 min readLW link
(aiimpacts.org)

Three Kinds of Competitiveness

Daniel KokotajloMar 31, 2020, 1:00 AM
36 points
18 comments5 min readLW link

Biolog­i­cal hu­mans and the ris­ing tide of AI

cousin_itJan 29, 2018, 4:04 PM
22 points
23 comments1 min readLW link

HLAI 2018 Field Report

Gordon Seidoh WorleyAug 29, 2018, 12:11 AM
48 points
12 comments5 min readLW link

Mag­i­cal Categories

Eliezer YudkowskyAug 24, 2008, 7:51 PM
65 points
133 comments9 min readLW link

Align­ment as Translation

johnswentworthMar 19, 2020, 9:40 PM
62 points
39 comments4 min readLW link

Re­solv­ing hu­man val­ues, com­pletely and adequately

Stuart_ArmstrongMar 30, 2018, 3:35 AM
32 points
30 comments12 min readLW link

Will trans­parency help catch de­cep­tion? Per­haps not

Matthew BarnettNov 4, 2019, 8:52 PM
43 points
5 comments7 min readLW link

A dilemma for pro­saic AI alignment

Daniel KokotajloDec 17, 2019, 10:11 PM
40 points
30 comments3 min readLW link

[1911.08265] Mas­ter­ing Atari, Go, Chess and Shogi by Plan­ning with a Learned Model | Arxiv

DragonGodNov 21, 2019, 1:18 AM
52 points
4 comments1 min readLW link
(arxiv.org)

Glenn Beck dis­cusses the Sin­gu­lar­ity, cites SI researchers

BrihaspatiJun 12, 2012, 4:45 PM
73 points
183 comments10 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_ArmstrongApr 7, 2014, 11:00 AM
73 points
417 comments7 min readLW link

Hu­man-Aligned AI Sum­mer School: A Summary

Michaël TrazziAug 11, 2018, 8:11 AM
39 points
5 comments4 min readLW link

Top 9+2 myths about AI risk

Stuart_ArmstrongJun 29, 2015, 8:41 PM
68 points
45 comments2 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin ShahJul 6, 2019, 1:45 AM
41 points
3 comments4 min readLW link

Look­ing for AI Safety Ex­perts to Provide High Level Guidance for RAISE

OferMay 6, 2018, 2:06 AM
17 points
5 comments1 min readLW link

[Question] How much fund­ing and re­searchers were in AI, and AI Safety, in 2018?

RaemonMar 3, 2019, 9:46 PM
41 points
11 comments1 min readLW link

Deep learn­ing—deeper flaws?

Richard_NgoSep 24, 2018, 6:40 PM
39 points
17 comments4 min readLW link
(thinkingcomplete.blogspot.com)

A model of UDT with a con­crete prior over log­i­cal statements

BenyaAug 28, 2012, 9:45 PM
62 points
24 comments4 min readLW link

Mal­ign gen­er­al­iza­tion with­out in­ter­nal search

Matthew BarnettJan 12, 2020, 6:03 PM
43 points
12 comments4 min readLW link

An­nounc­ing the sec­ond AI Safety Camp

LachouetteJun 11, 2018, 6:59 PM
34 points
0 comments1 min readLW link

Vaniver’s View on Fac­tored Cognition

VaniverAug 23, 2019, 2:54 AM
48 points
4 comments8 min readLW link

De­tached Lever Fallacy

Eliezer YudkowskyJul 31, 2008, 6:57 PM
70 points
41 comments7 min readLW link

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM
65 points
5 comments4 min readLW link

The first AI Safety Camp & onwards

RemmeltJun 7, 2018, 8:13 PM
45 points
0 comments8 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin ShahFeb 12, 2019, 10:25 PM
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

Sel­ling Nonapples

Eliezer YudkowskyNov 13, 2008, 8:10 PM
71 points
78 comments7 min readLW link

The AI Align­ment Prob­lem Has Already Been Solved(?) Once

SquirrelInHellApr 22, 2017, 1:24 PM
50 points
45 comments4 min readLW link
(squirrelinhell.blogspot.com)

Trace README

johnswentworthMar 11, 2020, 9:08 PM
35 points
1 comment8 min readLW link

[Link] Com­puter im­proves its Civ­i­liza­tion II game­play by read­ing the manual

Kaj_SotalaJul 13, 2011, 12:00 PM
49 points
5 comments4 min readLW link

Idea: Open Ac­cess AI Safety Journal

Gordon Seidoh WorleyMar 23, 2018, 6:27 PM
28 points
11 comments1 min readLW link

Another take on agent foun­da­tions: for­mal­iz­ing zero-shot reasoning

zhukeepaJul 1, 2018, 6:12 AM
59 points
20 comments12 min readLW link

Log­i­cal Up­date­less­ness as a Ro­bust Del­e­ga­tion Problem

Scott GarrabrantOct 27, 2017, 9:16 PM
30 points
2 comments2 min readLW link

Some thoughts af­ter read­ing Ar­tifi­cial In­tel­li­gence: A Modern Approach

swift_spiralMar 19, 2019, 11:39 PM
38 points
4 comments2 min readLW link

AI safety with­out goal-di­rected behavior

Rohin ShahJan 7, 2019, 7:48 AM
65 points
15 comments4 min readLW link

No Univer­sally Com­pel­ling Arguments

Eliezer YudkowskyJun 26, 2008, 8:29 AM
62 points
57 comments5 min readLW link

What AI Safety Re­searchers Have Writ­ten About the Na­ture of Hu­man Values

avturchinJan 16, 2019, 1:59 PM
50 points
3 comments15 min readLW link

Disam­biguat­ing “al­ign­ment” and re­lated no­tions

David Scott Krueger (formerly: capybaralet)Jun 5, 2018, 3:35 PM
22 points
21 comments2 min readLW link

In­duc­tive bi­ases stick around

evhubDec 18, 2019, 7:52 PM
63 points
14 comments3 min readLW link

Bill Gates: prob­lem of strong AI with con­flict­ing goals “very wor­thy of study and time”

Paul CrowleyJan 22, 2015, 8:21 PM
73 points
18 comments1 min readLW link

So You Want to Save the World

lukeprogJan 1, 2012, 7:39 AM
54 points
149 comments12 min readLW link

Me­taphilo­soph­i­cal com­pe­tence can’t be dis­en­tan­gled from alignment

zhukeepaApr 1, 2018, 12:38 AM
32 points
39 comments3 min readLW link

Some Thoughts on Metaphilosophy

Wei DaiFeb 10, 2019, 12:28 AM
62 points
27 comments4 min readLW link

Rea­sons com­pute may not drive AI ca­pa­bil­ities growth

Tristan HDec 19, 2018, 10:13 PM
42 points
10 comments8 min readLW link

Dis­tance Func­tions are Hard

Grue_SlinkyAug 13, 2019, 5:33 PM
31 points
19 comments6 min readLW link

Take­aways from safety by de­fault interviews

Apr 3, 2020, 5:20 PM
28 points
2 comments13 min readLW link
(aiimpacts.org)

Bridge Col­lapse: Re­duc­tion­ism as Eng­ineer­ing Problem

Rob BensingerFeb 18, 2014, 10:03 PM
78 points
62 comments15 min readLW link

Prob­a­bil­ity as Min­i­mal Map

johnswentworthSep 1, 2019, 7:19 PM
49 points
10 comments5 min readLW link

Policy Alignment

abramdemskiJun 30, 2018, 12:24 AM
50 points
25 comments8 min readLW link

Stable Poin­t­ers to Value: An Agent Embed­ded in Its Own Utility Function

abramdemskiAug 17, 2017, 12:22 AM
15 points
9 comments5 min readLW link

Stable Poin­t­ers to Value II: En­vi­ron­men­tal Goals

abramdemskiFeb 9, 2018, 6:03 AM
18 points
2 comments4 min readLW link

The Ar­gu­ment from Philo­soph­i­cal Difficulty

Wei DaiFeb 10, 2019, 12:28 AM
54 points
31 comments1 min readLW link

hu­man psy­chol­in­guists: a crit­i­cal appraisal

nostalgebraistDec 31, 2019, 12:20 AM
174 points
59 comments16 min readLW link2 reviews
(nostalgebraist.tumblr.com)

My take on agent foun­da­tions: for­mal­iz­ing metaphilo­soph­i­cal competence

zhukeepaApr 1, 2018, 6:33 AM
20 points
6 comments1 min readLW link

Cri­tique my Model: The EV of AGI to Selfish Individuals

ozziegooenApr 8, 2018, 8:04 PM
19 points
9 comments4 min readLW link

AI Safety De­bate and Its Applications

VojtaKovarikJul 23, 2019, 10:31 PM
36 points
5 comments12 min readLW link

TAISU 2019 Field Report

Gordon Seidoh WorleyOct 15, 2019, 1:09 AM
36 points
5 comments5 min readLW link

Hu­man-AI Collaboration

Rohin ShahOct 22, 2019, 6:32 AM
42 points
7 comments2 min readLW link
(bair.berkeley.edu)

An­a­lyz­ing the Prob­lem GPT-3 is Try­ing to Solve

adamShimiAug 6, 2020, 9:58 PM
16 points
2 comments4 min readLW link

[LINK] Speed su­per­in­tel­li­gence?

Stuart_ArmstrongAug 14, 2014, 3:57 PM
53 points
20 comments1 min readLW link

A big Sin­gu­lar­ity-themed Hol­ly­wood movie out in April offers many op­por­tu­ni­ties to talk about AI risk

chaosmageJan 7, 2014, 5:48 PM
49 points
85 comments1 min readLW link

New pa­per: (When) is Truth-tel­ling Fa­vored in AI de­bate?

VojtaKovarikDec 26, 2019, 7:59 PM
32 points
7 comments5 min readLW link
(medium.com)

Ar­tifi­cial Addition

Eliezer YudkowskyNov 20, 2007, 7:58 AM
68 points
129 comments6 min readLW link

Ex­plor­ing safe exploration

evhubJan 6, 2020, 9:07 PM
37 points
8 comments3 min readLW link

‘Dumb’ AI ob­serves and ma­nipu­lates controllers

Stuart_ArmstrongJan 13, 2015, 1:35 PM
52 points
19 comments2 min readLW link

AI Read­ing Group Thoughts (1/​?): The Man­date of Heaven

AlicornAug 10, 2018, 12:24 AM
45 points
18 comments4 min readLW link

AI Read­ing Group Thoughts (2/​?): Re­con­struc­tive Psychosurgery

AlicornSep 25, 2018, 4:25 AM
27 points
6 comments3 min readLW link

(notes on) Policy Desider­ata for Su­per­in­tel­li­gent AI: A Vec­tor Field Approach

Ben PaceFeb 4, 2019, 10:08 PM
43 points
5 comments7 min readLW link

AI Gover­nance: A Re­search Agenda

habrykaSep 5, 2018, 6:00 PM
25 points
3 comments1 min readLW link
(www.fhi.ox.ac.uk)

Global on­line de­bate on the gov­er­nance of AI

CarolineJJan 5, 2018, 3:31 PM
8 points
5 comments1 min readLW link

[AN #61] AI policy and gov­er­nance, from two peo­ple in the field

Rohin ShahAug 5, 2019, 5:00 PM
12 points
2 comments9 min readLW link
(mailchi.mp)

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 19, 2019, 3:00 AM
130 points
18 comments62 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei DaiMar 20, 2019, 9:11 AM
35 points
3 comments1 min readLW link

The Align­ment Newslet­ter #1: 04/​09/​18

Rohin ShahApr 9, 2018, 4:00 PM
12 points
3 comments4 min readLW link

The Align­ment Newslet­ter #2: 04/​16/​18

Rohin ShahApr 16, 2018, 4:00 PM
8 points
0 comments5 min readLW link

The Align­ment Newslet­ter #3: 04/​23/​18

Rohin ShahApr 23, 2018, 4:00 PM
9 points
0 comments6 min readLW link

The Align­ment Newslet­ter #4: 04/​30/​18

Rohin ShahApr 30, 2018, 4:00 PM
8 points
0 comments3 min readLW link

The Align­ment Newslet­ter #5: 05/​07/​18

Rohin ShahMay 7, 2018, 4:00 PM
8 points
0 comments7 min readLW link

The Align­ment Newslet­ter #6: 05/​14/​18

Rohin ShahMay 14, 2018, 4:00 PM
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #7: 05/​21/​18

Rohin ShahMay 21, 2018, 4:00 PM
8 points
0 comments5 min readLW link

The Align­ment Newslet­ter #8: 05/​28/​18

Rohin ShahMay 28, 2018, 4:00 PM
8 points
0 comments6 min readLW link

The Align­ment Newslet­ter #9: 06/​04/​18

Rohin ShahJun 4, 2018, 4:00 PM
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #10: 06/​11/​18

Rohin ShahJun 11, 2018, 4:00 PM
16 points
0 comments9 min readLW link

The Align­ment Newslet­ter #11: 06/​18/​18

Rohin ShahJun 18, 2018, 4:00 PM
8 points
0 comments10 min readLW link

The Align­ment Newslet­ter #12: 06/​25/​18

Rohin ShahJun 25, 2018, 4:00 PM
15 points
0 comments3 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin ShahJul 2, 2018, 4:10 PM
70 points
12 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #14

Rohin ShahJul 9, 2018, 4:20 PM
14 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #15: 07/​16/​18

Rohin ShahJul 16, 2018, 4:10 PM
42 points
0 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #17

Rohin ShahJul 30, 2018, 4:10 PM
32 points
0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #18

Rohin ShahAug 6, 2018, 4:00 PM
17 points
0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #19

Rohin ShahAug 14, 2018, 2:10 AM
18 points
0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #20

Rohin ShahAug 20, 2018, 4:00 PM
12 points
2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #21

Rohin ShahAug 27, 2018, 4:20 PM
25 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #22

Rohin ShahSep 3, 2018, 4:10 PM
18 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #23

Rohin ShahSep 10, 2018, 5:10 PM
16 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #24

Rohin ShahSep 17, 2018, 4:20 PM
10 points
6 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #25

Rohin ShahSep 24, 2018, 4:10 PM
18 points
3 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #26

Rohin ShahOct 2, 2018, 4:10 PM
13 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #27

Rohin ShahOct 9, 2018, 1:10 AM
16 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #28

Rohin ShahOct 15, 2018, 9:20 PM
11 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #29

Rohin ShahOct 22, 2018, 4:20 PM
15 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #30

Rohin ShahOct 29, 2018, 4:10 PM
29 points
2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #31

Rohin ShahNov 5, 2018, 11:50 PM
17 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #32

Rohin ShahNov 12, 2018, 5:20 PM
18 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #33

Rohin ShahNov 19, 2018, 5:20 PM
23 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #34

Rohin ShahNov 26, 2018, 11:10 PM
24 points
0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #35

Rohin ShahDec 4, 2018, 1:10 AM
15 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #37

Rohin ShahDec 17, 2018, 7:10 PM
25 points
4 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #38

Rohin ShahDec 25, 2018, 4:10 PM
9 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #39

Rohin ShahJan 1, 2019, 8:10 AM
32 points
2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #40

Rohin ShahJan 8, 2019, 8:10 PM
21 points
2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #41

Rohin ShahJan 17, 2019, 8:10 AM
22 points
6 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #42

Rohin ShahJan 22, 2019, 2:00 AM
20 points
1 comment10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #43

Rohin ShahJan 29, 2019, 9:10 PM
14 points
2 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #44

Rohin ShahFeb 6, 2019, 8:30 AM
18 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #45

Rohin ShahFeb 14, 2019, 2:10 AM
25 points
2 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #46

Rohin ShahFeb 22, 2019, 12:10 AM
12 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #48

Rohin ShahMar 11, 2019, 9:10 PM
29 points
14 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #49

Rohin ShahMar 20, 2019, 4:20 AM
23 points
1 comment11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #50

Rohin ShahMar 28, 2019, 6:10 PM
15 points
2 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #51

Rohin ShahApr 3, 2019, 4:10 AM
25 points
2 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #52

Rohin ShahApr 6, 2019, 1:20 AM
19 points
1 comment8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter One Year Retrospective

Rohin ShahApr 10, 2019, 6:58 AM
93 points
31 comments21 min readLW link

Align­ment Newslet­ter #53

Rohin ShahApr 18, 2019, 5:20 PM
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #54] Box­ing a finite-hori­zon AI sys­tem to keep it unambitious

Rohin ShahApr 28, 2019, 5:20 AM
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #55] Reg­u­la­tory mar­kets and in­ter­na­tional stan­dards as a means of en­sur­ing benefi­cial AI

Rohin ShahMay 5, 2019, 2:20 AM
17 points
2 comments8 min readLW link
(mailchi.mp)

[AN #56] Should ML re­searchers stop run­ning ex­per­i­ments be­fore mak­ing hy­pothe­ses?

Rohin ShahMay 21, 2019, 2:20 AM
21 points
8 comments9 min readLW link
(mailchi.mp)

[AN #57] Why we should fo­cus on ro­bust­ness in AI safety, and the analo­gous prob­lems in programming

Rohin ShahJun 5, 2019, 11:20 PM
26 points
15 comments7 min readLW link
(mailchi.mp)

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

Rohin ShahJun 24, 2019, 4:10 PM
54 points
9 comments8 min readLW link
(mailchi.mp)

[AN #59] How ar­gu­ments for AI risk have changed over time

Rohin ShahJul 8, 2019, 5:20 PM
43 points
4 comments7 min readLW link
(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that as­sist hu­man play­ers in cre­ative mode

Rohin ShahJul 22, 2019, 5:00 PM
23 points
6 comments9 min readLW link
(mailchi.mp)

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin ShahAug 22, 2019, 5:10 PM
27 points
10 comments9 min readLW link
(mailchi.mp)

[AN #63] How ar­chi­tec­ture search, meta learn­ing, and en­vi­ron­ment de­sign could lead to gen­eral intelligence

Rohin ShahSep 10, 2019, 7:10 PM
21 points
12 comments8 min readLW link
(mailchi.mp)

[AN #64]: Us­ing Deep RL and Re­ward Uncer­tainty to In­cen­tivize Prefer­ence Learning

Rohin ShahSep 16, 2019, 5:10 PM
11 points
8 comments7 min readLW link
(mailchi.mp)

[AN #65]: Learn­ing use­ful skills by watch­ing hu­mans “play”

Rohin ShahSep 23, 2019, 5:30 PM
11 points
0 comments9 min readLW link
(mailchi.mp)

[AN #66]: De­com­pos­ing ro­bust­ness into ca­pa­bil­ity ro­bust­ness and al­ign­ment robustness

Rohin ShahSep 30, 2019, 6:00 PM
12 points
1 comment7 min readLW link
(mailchi.mp)

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

Rohin ShahOct 7, 2019, 5:10 PM
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin ShahOct 14, 2019, 5:00 PM
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin ShahOct 19, 2019, 12:30 AM
60 points
12 comments15 min readLW link
(mailchi.mp)

[AN #70]: Agents that help hu­mans who are still learn­ing about their own preferences

Rohin ShahOct 23, 2019, 5:10 PM
16 points
0 comments9 min readLW link
(mailchi.mp)

[AN #71]: Avoid­ing re­ward tam­per­ing through cur­rent-RF optimization

Rohin ShahOct 30, 2019, 5:10 PM
12 points
0 comments7 min readLW link
(mailchi.mp)

[AN #72]: Align­ment, ro­bust­ness, method­ol­ogy, and sys­tem build­ing as re­search pri­ori­ties for AI safety

Rohin ShahNov 6, 2019, 6:10 PM
26 points
4 comments10 min readLW link
(mailchi.mp)

[AN #73]: De­tect­ing catas­trophic failures by learn­ing how agents tend to break

Rohin ShahNov 13, 2019, 6:10 PM
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #74]: Separat­ing benefi­cial AI into com­pe­tence, al­ign­ment, and cop­ing with impacts

Rohin ShahNov 20, 2019, 6:20 PM
19 points
0 comments7 min readLW link
(mailchi.mp)

[AN #75]: Solv­ing Atari and Go with learned game mod­els, and thoughts from a MIRI employee

Rohin ShahNov 27, 2019, 6:10 PM
38 points
1 comment10 min readLW link
(mailchi.mp)

[AN #76]: How dataset size af­fects ro­bust­ness, and bench­mark­ing safe ex­plo­ra­tion by mea­sur­ing con­straint violations

Rohin ShahDec 4, 2019, 6:10 PM
14 points
6 comments9 min readLW link
(mailchi.mp)

[AN #77]: Dou­ble de­scent: a unifi­ca­tion of statis­ti­cal the­ory and mod­ern ML practice

Rohin ShahDec 18, 2019, 6:30 PM
21 points
4 comments14 min readLW link
(mailchi.mp)

[AN #78] For­mal­iz­ing power and in­stru­men­tal con­ver­gence, and the end-of-year AI safety char­ity comparison

Rohin ShahDec 26, 2019, 1:10 AM
26 points
10 comments9 min readLW link
(mailchi.mp)

[AN #79]: Re­cur­sive re­ward mod­el­ing as an al­ign­ment tech­nique in­te­grated with deep RL

Rohin ShahJan 1, 2020, 6:00 PM
13 points
0 comments12 min readLW link
(mailchi.mp)

[AN #81]: Univer­sal­ity as a po­ten­tial solu­tion to con­cep­tual difficul­ties in in­tent alignment

Rohin ShahJan 8, 2020, 6:00 PM
31 points
4 comments11 min readLW link
(mailchi.mp)

[AN #82]: How OpenAI Five dis­tributed their train­ing computation

Rohin ShahJan 15, 2020, 6:20 PM
19 points
0 comments8 min readLW link
(mailchi.mp)

[AN #83]: Sam­ple-effi­cient deep learn­ing with ReMixMatch

Rohin ShahJan 22, 2020, 6:10 PM
15 points
4 comments11 min readLW link
(mailchi.mp)

[AN #84] Re­view­ing AI al­ign­ment work in 2018-19

Rohin ShahJan 29, 2020, 6:30 PM
23 points
0 comments6 min readLW link
(mailchi.mp)

[AN #85]: The nor­ma­tive ques­tions we should be ask­ing for AI al­ign­ment, and a sur­pris­ingly good chatbot

Rohin ShahFeb 5, 2020, 6:20 PM
14 points
2 comments7 min readLW link
(mailchi.mp)

[AN #86]: Im­prov­ing de­bate and fac­tored cog­ni­tion through hu­man experiments

Rohin ShahFeb 12, 2020, 6:10 PM
14 points
0 comments9 min readLW link
(mailchi.mp)

[AN #87]: What might hap­pen as deep learn­ing scales even fur­ther?

Rohin ShahFeb 19, 2020, 6:20 PM
28 points
0 comments4 min readLW link
(mailchi.mp)

[AN #88]: How the prin­ci­pal-agent liter­a­ture re­lates to AI risk

Rohin ShahFeb 27, 2020, 9:10 AM
18 points
0 comments9 min readLW link
(mailchi.mp)

[AN #89]: A unify­ing for­mal­ism for prefer­ence learn­ing algorithms

Rohin ShahMar 4, 2020, 6:20 PM
16 points
0 comments9 min readLW link
(mailchi.mp)

[AN #90]: How search land­scapes can con­tain self-re­in­forc­ing feed­back loops

Rohin ShahMar 11, 2020, 5:30 PM
11 points
6 comments8 min readLW link
(mailchi.mp)

[AN #91]: Con­cepts, im­ple­men­ta­tions, prob­lems, and a bench­mark for im­pact measurement

Rohin ShahMar 18, 2020, 5:10 PM
15 points
10 comments13 min readLW link
(mailchi.mp)

[AN #92]: Learn­ing good rep­re­sen­ta­tions with con­trastive pre­dic­tive coding

Rohin ShahMar 25, 2020, 5:20 PM
18 points
1 comment10 min readLW link
(mailchi.mp)

[AN #93]: The Precipice we’re stand­ing at, and how we can back away from it

Rohin ShahApr 1, 2020, 5:10 PM
24 points
0 comments7 min readLW link
(mailchi.mp)

Fore­cast­ing AI Progress: A Re­search Agenda

Aug 10, 2020, 1:04 AM
39 points
4 comments1 min readLW link

The Steer­ing Problem

paulfchristianoNov 13, 2018, 5:14 PM
43 points
12 comments7 min readLW link

Will hu­mans build goal-di­rected agents?

Rohin ShahJan 5, 2019, 1:33 AM
51 points
43 comments5 min readLW link

Pro­saic AI alignment

paulfchristianoNov 20, 2018, 1:56 PM
40 points
10 comments8 min readLW link

David Chalmers’ “The Sin­gu­lar­ity: A Philo­soph­i­cal Anal­y­sis”

lukeprogJan 29, 2011, 2:52 AM
55 points
203 comments4 min readLW link

[Talk] Paul Chris­ti­ano on his al­ign­ment taxonomy

jpSep 27, 2019, 6:37 PM
31 points
1 comment1 min readLW link
(www.youtube.com)

Dreams of AI Design

Eliezer YudkowskyAug 27, 2008, 4:04 AM
26 points
61 comments5 min readLW link

Qual­i­ta­tive Strate­gies of Friendliness

Eliezer YudkowskyAug 30, 2008, 2:12 AM
30 points
56 comments12 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_ArmstrongMay 3, 2019, 2:09 PM
22 points
0 comments3 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_ArmstrongJun 28, 2019, 9:57 AM
6 points
1 comment5 min readLW link

In­vest­ment idea: bas­ket of tech stocks weighted to­wards AI

ioannesAug 12, 2020, 9:30 PM
14 points
7 comments3 min readLW link

Con­cep­tual is­sues in AI safety: the paradig­matic gap

vedevazzJun 24, 2018, 3:09 PM
33 points
0 comments1 min readLW link
(www.foldl.me)

Disagree­ment with Paul: al­ign­ment induction

Stuart_ArmstrongSep 10, 2018, 1:54 PM
31 points
6 comments1 min readLW link

Largest open col­lec­tion quotes about AI

teradimichJul 12, 2019, 5:18 PM
35 points
2 comments3 min readLW link
(drive.google.com)

S.E.A.R.L.E’s COBOL room

Stuart_ArmstrongFeb 1, 2013, 8:29 PM
52 points
36 comments2 min readLW link

In­tro­duc­ing Cor­rigi­bil­ity (an FAI re­search sub­field)

So8resOct 20, 2014, 9:09 PM
52 points
28 comments3 min readLW link

NES-game play­ing AI [video link and AI-box­ing-re­lated com­ment]

Dr_ManhattanApr 12, 2013, 1:11 PM
42 points
22 comments1 min readLW link

On un­fix­ably un­safe AGI architectures

Steven ByrnesFeb 19, 2020, 9:16 PM
33 points
8 comments5 min readLW link

To con­tribute to AI safety, con­sider do­ing AI research

VikaJan 16, 2016, 8:42 PM
39 points
39 comments2 min readLW link

Ghosts in the Machine

Eliezer YudkowskyJun 17, 2008, 11:29 PM
54 points
30 comments4 min readLW link

Tech­ni­cal AGI safety re­search out­side AI

Richard_NgoOct 18, 2019, 3:00 PM
43 points
3 comments3 min readLW link

De­ci­pher­ing China’s AI Dream

Qiaochu_YuanMar 18, 2018, 3:26 AM
12 points
2 comments1 min readLW link
(www.fhi.ox.ac.uk)

Above-Aver­age AI Scientists

Eliezer YudkowskySep 28, 2008, 11:04 AM
57 points
97 comments8 min readLW link

The Na­ture of Logic

Eliezer YudkowskyNov 15, 2008, 6:20 AM
37 points
12 comments10 min readLW link

Or­a­cle paper

Stuart_ArmstrongDec 13, 2017, 2:59 PM
12 points
7 comments1 min readLW link

AI Align­ment Writ­ing Day Roundup #1

Ben PaceAug 30, 2019, 1:26 AM
32 points
12 comments1 min readLW link

Notes on the Safety in Ar­tifi­cial In­tel­li­gence conference

UmamiSalamiJul 1, 2016, 12:36 AM
40 points
15 comments13 min readLW link

Rein­ter­pret­ing “AI and Com­pute”

habrykaDec 25, 2018, 9:12 PM
30 points
10 comments1 min readLW link
(aiimpacts.org)

AI Safety Pr­ereq­ui­sites Course: Re­vamp and New Lessons

philip_bFeb 3, 2019, 9:04 PM
24 points
5 comments1 min readLW link

An an­gle of at­tack on Open Prob­lem #1

BenyaAug 18, 2012, 12:08 PM
47 points
85 comments7 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFoxJan 10, 2013, 8:17 AM
38 points
188 comments4 min readLW link

Only hu­mans can have hu­man values

PhilGoetzApr 26, 2010, 6:57 PM
51 points
161 comments17 min readLW link

Mas­ter­ing Chess and Shogi by Self-Play with a Gen­eral Re­in­force­ment Learn­ing Algorithm

DragonGodDec 6, 2017, 6:01 AM
13 points
4 comments1 min readLW link
(arxiv.org)

Cake, or death!

Stuart_ArmstrongOct 25, 2012, 10:33 AM
46 points
13 comments4 min readLW link

Self-reg­u­la­tion of safety in AI research

Gordon Seidoh WorleyFeb 25, 2018, 11:17 PM
12 points
6 comments2 min readLW link

How safe “safe” AI de­vel­op­ment?

Gordon Seidoh WorleyFeb 28, 2018, 11:21 PM
9 points
1 comment1 min readLW link

Stan­ford In­tro to AI course to be taught for free online

Psy-KoshJul 30, 2011, 4:22 PM
38 points
39 comments1 min readLW link

Bayesian Utility: Rep­re­sent­ing Prefer­ence by Prob­a­bil­ity Measures

Vladimir_NesovJul 27, 2009, 2:28 PM
45 points
37 comments2 min readLW link

Gains from trade: Slug ver­sus Galaxy—how much would I give up to con­trol you?

Stuart_ArmstrongJul 23, 2013, 7:06 PM
55 points
67 comments7 min readLW link

Defeat­ing Mun­dane Holo­causts With Robots

lsparrishMay 30, 2011, 10:34 PM
34 points
28 comments2 min readLW link

As­sum­ing we’ve solved X, could we do Y...

Stuart_ArmstrongDec 11, 2018, 6:13 PM
31 points
16 comments2 min readLW link

The Stamp Collector

So8resMay 1, 2015, 11:11 PM
45 points
14 comments6 min readLW link

Sav­ing the world in 80 days: Prologue

Logan RiggsMay 9, 2018, 9:16 PM
12 points
16 comments2 min readLW link

Pro­ject Pro­posal: Con­sid­er­a­tions for trad­ing off ca­pa­bil­ities and safety im­pacts of AI research

David Scott Krueger (formerly: capybaralet)Aug 6, 2019, 10:22 PM
25 points
11 comments2 min readLW link

AI Safety Pr­ereq­ui­sites Course: Ba­sic ab­stract rep­re­sen­ta­tions of computation

RAISEMar 13, 2019, 7:38 PM
28 points
2 comments1 min readLW link

What I Think, If Not Why

Eliezer YudkowskyDec 11, 2008, 5:41 PM
41 points
103 comments4 min readLW link

RFC: Philo­soph­i­cal Con­ser­vatism in AI Align­ment Research

Gordon Seidoh WorleyMay 15, 2018, 3:29 AM
17 points
13 comments1 min readLW link

Pre­dicted AI al­ign­ment event/​meet­ing calendar

rmoehnAug 14, 2019, 7:14 AM
29 points
14 comments1 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM
29 points
6 comments3 min readLW link

Re­ward func­tion learn­ing: the value function

Stuart_ArmstrongApr 24, 2018, 4:29 PM
9 points
0 comments11 min readLW link

Re­ward func­tion learn­ing: the learn­ing process

Stuart_ArmstrongApr 24, 2018, 12:56 PM
6 points
11 comments8 min readLW link

Utility ver­sus Re­ward func­tion: par­tial equivalence

Stuart_ArmstrongApr 13, 2018, 2:58 PM
17 points
5 comments5 min readLW link

Full toy model for prefer­ence learning

Stuart_ArmstrongOct 16, 2019, 11:06 AM
20 points
2 comments12 min readLW link

New(ish) AI con­trol ideas

Stuart_ArmstrongOct 31, 2017, 12:52 PM
0 points
0 comments4 min readLW link

Rig­ging is a form of wireheading

Stuart_ArmstrongMay 3, 2018, 12:50 PM
11 points
2 comments1 min readLW link

The re­ward en­g­ineer­ing prob­lem

paulfchristianoJan 16, 2019, 6:47 PM
26 points
3 comments7 min readLW link

AI co­op­er­a­tion in practice

cousin_itJul 30, 2010, 4:21 PM
37 points
166 comments1 min readLW link

Ex­am­ples of AI’s be­hav­ing badly

Stuart_ArmstrongJul 16, 2015, 10:01 AM
41 points
37 comments1 min readLW link

Con­trol­ling Con­stant Programs

Vladimir_NesovSep 5, 2010, 1:45 PM
34 points
33 comments5 min readLW link

Recom­mended Read­ing for Friendly AI Research

Vladimir_NesovOct 9, 2010, 1:46 PM
36 points
30 comments2 min readLW link

Autism, Wat­son, the Tur­ing test, and Gen­eral Intelligence

Stuart_ArmstrongSep 24, 2013, 11:00 AM
11 points
22 comments1 min readLW link

Pes­simism About Un­known Un­knowns In­spires Conservatism

michaelcohenFeb 3, 2020, 2:48 PM
31 points
2 comments5 min readLW link

The Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence Wants You (to sub­mit es­says and ar­ti­cles on the fu­ture of gov­ern­ment AI policy)

quanticleJul 18, 2019, 5:21 PM
30 points
0 comments1 min readLW link
(warontherocks.com)

Sys­tems Eng­ineer­ing and the META Program

ryan_bDec 20, 2018, 8:19 PM
30 points
3 comments1 min readLW link

Hu­man er­rors, hu­man values

PhilGoetzApr 9, 2011, 2:50 AM
45 points
138 comments1 min readLW link

ISO: Name of Problem

johnswentworthJul 24, 2018, 5:15 PM
28 points
15 comments1 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprogMar 16, 2012, 5:12 PM
42 points
161 comments33 min readLW link

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Samuel RødalNov 10, 2018, 12:00 PM
24 points
6 comments1 min readLW link
(docs.google.com)

Su­per­in­tel­li­gence Read­ing Group—Sec­tion 1: Past Devel­op­ments and Pre­sent Capabilities

KatjaGraceSep 16, 2014, 1:00 AM
43 points
233 comments7 min readLW link

[Question] What are the differ­ences be­tween all the iter­a­tive/​re­cur­sive ap­proaches to AI al­ign­ment?

riceissaSep 21, 2019, 2:09 AM
30 points
14 comments2 min readLW link

Al­gorith­mic Similarity

LukasMAug 23, 2019, 4:39 PM
27 points
10 comments11 min readLW link

Direc­tions and desider­ata for AI alignment

paulfchristianoJan 13, 2019, 7:47 AM
47 points
1 comment14 min readLW link

Friendly AI Re­search and Taskification

multifoliateroseDec 14, 2010, 6:30 AM
30 points
47 comments5 min readLW link

Against easy su­per­in­tel­li­gence: the un­fore­seen fric­tion argument

Stuart_ArmstrongJul 10, 2013, 1:47 PM
39 points
48 comments5 min readLW link

[Question] Why are the peo­ple who could be do­ing safety re­search, but aren’t, do­ing some­thing else?

Adam SchollAug 29, 2019, 8:51 AM
27 points
19 comments1 min readLW link

TV’s “Ele­men­tary” Tack­les Friendly AI and X-Risk—“Bella” (Pos­si­ble Spoilers)

pjebyNov 22, 2014, 7:51 PM
48 points
18 comments2 min readLW link

Univer­sal­ity Unwrapped

adamShimiAug 21, 2020, 6:53 PM
28 points
2 comments18 min readLW link

AI Risk and Op­por­tu­nity: Hu­man­ity’s Efforts So Far

lukeprogMar 21, 2012, 2:49 AM
53 points
49 comments23 min readLW link

Learn­ing with catastrophes

paulfchristianoJan 23, 2019, 3:01 AM
27 points
9 comments4 min readLW link

[Question] De­gree of du­pli­ca­tion and co­or­di­na­tion in pro­jects that ex­am­ine com­put­ing prices, AI progress, and re­lated top­ics?

riceissaApr 23, 2019, 12:27 PM
26 points
1 comment2 min readLW link

Solv­ing the AI Race Finalists

Gordon Seidoh WorleyJul 19, 2018, 9:04 PM
24 points
0 comments1 min readLW link
(medium.com)

An Agent is a Wor­ldline in Teg­mark V

komponistoJul 12, 2018, 5:12 AM
24 points
12 comments2 min readLW link

Towards for­mal­iz­ing universality

paulfchristianoJan 13, 2019, 8:39 PM
27 points
19 comments18 min readLW link

Con­cep­tual Anal­y­sis for AI Align­ment

David Scott Krueger (formerly: capybaralet)Dec 30, 2018, 12:46 AM
26 points
3 comments2 min readLW link

Gw­ern’s “Why Tool AIs Want to Be Agent AIs: The Power of Agency”

habrykaMay 5, 2019, 5:11 AM
26 points
3 comments1 min readLW link
(www.gwern.net)

[Question] Why not tool AI?

smitheeJan 19, 2019, 10:18 PM
19 points
10 comments1 min readLW link

Su­per­in­tel­li­gence 16: Tool AIs

KatjaGraceDec 30, 2014, 2:00 AM
12 points
37 comments7 min readLW link

Think­ing of tool AIs

Michele CampoloNov 20, 2019, 9:47 PM
6 points
2 comments4 min readLW link

Re­ply to Holden on ‘Tool AI’

Eliezer YudkowskyJun 12, 2012, 6:00 PM
152 points
357 comments17 min readLW link

Re­ply to Holden on The Sin­gu­lar­ity Institute

lukeprogJul 10, 2012, 11:20 PM
69 points
215 comments26 min readLW link

Levels of AI Self-Im­prove­ment

avturchinApr 29, 2018, 11:45 AM
11 points
0 comments39 min readLW link

AI: re­quire­ments for per­ni­cious policies

Stuart_ArmstrongJul 17, 2015, 2:18 PM
11 points
3 comments3 min readLW link

Tools want to be­come agents

Stuart_ArmstrongJul 4, 2014, 10:12 AM
24 points
81 comments1 min readLW link

Su­per­in­tel­li­gence read­ing group

KatjaGraceAug 31, 2014, 2:59 PM
31 points
2 comments2 min readLW link

Su­per­in­tel­li­gence Read­ing Group 2: Fore­cast­ing AI

KatjaGraceSep 23, 2014, 1:00 AM
17 points
109 comments11 min readLW link

Su­per­in­tel­li­gence Read­ing Group 3: AI and Uploads

KatjaGraceSep 30, 2014, 1:00 AM
17 points
139 comments6 min readLW link

SRG 4: Biolog­i­cal Cog­ni­tion, BCIs, Organizations

KatjaGraceOct 7, 2014, 1:00 AM
14 points
139 comments5 min readLW link

Su­per­in­tel­li­gence 5: Forms of Superintelligence

KatjaGraceOct 14, 2014, 1:00 AM
22 points
114 comments5 min readLW link

Su­per­in­tel­li­gence 6: In­tel­li­gence ex­plo­sion kinetics

KatjaGraceOct 21, 2014, 1:00 AM
15 points
68 comments8 min readLW link

Su­per­in­tel­li­gence 7: De­ci­sive strate­gic advantage

KatjaGraceOct 28, 2014, 1:01 AM
18 points
60 comments6 min readLW link

Su­per­in­tel­li­gence 8: Cog­ni­tive superpowers

KatjaGraceNov 4, 2014, 2:01 AM
14 points
96 comments6 min readLW link

Su­per­in­tel­li­gence 9: The or­thog­o­nal­ity of in­tel­li­gence and goals

KatjaGraceNov 11, 2014, 2:00 AM
13 points
80 comments7 min readLW link

Su­per­in­tel­li­gence 10: In­stru­men­tally con­ver­gent goals

KatjaGraceNov 18, 2014, 2:00 AM
13 points
33 comments5 min readLW link

Su­per­in­tel­li­gence 11: The treach­er­ous turn

KatjaGraceNov 25, 2014, 2:00 AM
16 points
50 comments6 min readLW link

Su­per­in­tel­li­gence 12: Mal­ig­nant failure modes

KatjaGraceDec 2, 2014, 2:02 AM
15 points
51 comments5 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGraceDec 9, 2014, 2:00 AM
14 points
48 comments6 min readLW link

Su­per­in­tel­li­gence 14: Mo­ti­va­tion se­lec­tion methods

KatjaGraceDec 16, 2014, 2:00 AM
9 points
28 comments5 min readLW link

Su­per­in­tel­li­gence 15: Or­a­cles, ge­nies and sovereigns

KatjaGraceDec 23, 2014, 2:01 AM
11 points
30 comments7 min readLW link

Su­per­in­tel­li­gence 17: Mul­tipo­lar scenarios

KatjaGraceJan 6, 2015, 6:44 AM
9 points
38 comments6 min readLW link

Su­per­in­tel­li­gence 18: Life in an al­gorith­mic economy

KatjaGraceJan 13, 2015, 2:00 AM
10 points
52 comments6 min readLW link

Su­per­in­tel­li­gence 19: Post-tran­si­tion for­ma­tion of a singleton

KatjaGraceJan 20, 2015, 2:00 AM
12 points
35 comments7 min readLW link

Su­per­in­tel­li­gence 20: The value-load­ing problem

KatjaGraceJan 27, 2015, 2:00 AM
8 points
21 comments6 min readLW link

Su­per­in­tel­li­gence 21: Value learning

KatjaGraceFeb 3, 2015, 2:01 AM
12 points
33 comments4 min readLW link

Su­per­in­tel­li­gence 22: Emu­la­tion mod­u­la­tion and in­sti­tu­tional design

KatjaGraceFeb 10, 2015, 2:06 AM
13 points
11 comments6 min readLW link

Su­per­in­tel­li­gence 23: Co­her­ent ex­trap­o­lated volition

KatjaGraceFeb 17, 2015, 2:00 AM
11 points
97 comments7 min readLW link

Su­per­in­tel­li­gence 24: Mo­ral­ity mod­els and “do what I mean”

KatjaGraceFeb 24, 2015, 2:00 AM
13 points
47 comments6 min readLW link

Ob­jec­tions to Co­her­ent Ex­trap­o­lated Volition

XiXiDuNov 22, 2011, 10:32 AM
12 points
56 comments3 min readLW link

CEV: co­her­ence ver­sus extrapolation

Stuart_ArmstrongSep 22, 2014, 11:24 AM
21 points
17 comments2 min readLW link

What if AI doesn’t quite go FOOM?

Mass_DriverJun 20, 2010, 12:03 AM
16 points
191 comments5 min readLW link

Su­per­in­tel­li­gence 25: Com­po­nents list for ac­quiring values

KatjaGraceMar 3, 2015, 2:01 AM
11 points
12 comments8 min readLW link

Su­per­in­tel­li­gence 26: Science and tech­nol­ogy strategy

KatjaGraceMar 10, 2015, 1:43 AM
14 points
21 comments6 min readLW link

Su­per­in­tel­li­gence 27: Path­ways and enablers

KatjaGraceMar 17, 2015, 1:00 AM
15 points
21 comments8 min readLW link

Su­per­in­tel­li­gence 28: Collaboration

KatjaGraceMar 24, 2015, 1:29 AM
13 points
21 comments6 min readLW link

Su­per­in­tel­li­gence 29: Crunch time

KatjaGraceMar 31, 2015, 4:24 AM
14 points
27 comments6 min readLW link

Univer­sal agents and util­ity functions

AnjaNov 14, 2012, 4:05 AM
43 points
38 comments6 min readLW link

Look­ing for re­mote writ­ing part­ners (for AI al­ign­ment re­search)

rmoehnOct 1, 2019, 2:16 AM
23 points
4 comments2 min readLW link

Self-Su­per­vised Learn­ing and AGI Safety

Steven ByrnesAug 7, 2019, 2:21 PM
29 points
9 comments12 min readLW link

Which of these five AI al­ign­ment re­search pro­jects ideas are no good?

rmoehnAug 8, 2019, 7:17 AM
25 points
13 comments1 min readLW link

Un­der­stand­ing understanding

mthqAug 23, 2019, 6:10 PM
24 points
1 comment2 min readLW link

Eval­u­at­ing Ex­ist­ing Ap­proaches to AGI Alignment

Gordon Seidoh WorleyMar 27, 2018, 7:57 PM
12 points
0 comments4 min readLW link
(mapandterritory.org)

CEV: a util­i­tar­ian critique

PabloJan 26, 2013, 4:12 PM
32 points
94 comments5 min readLW link

Vingean Reflec­tion: Reli­able Rea­son­ing for Self-Im­prov­ing Agents

So8resJan 15, 2015, 10:47 PM
37 points
5 comments9 min readLW link

Slide deck: In­tro­duc­tion to AI Safety

Aryeh EnglanderJan 29, 2020, 3:57 PM
22 points
0 comments1 min readLW link
(drive.google.com)

The Self-Unaware AI Oracle

Steven ByrnesJul 22, 2019, 7:04 PM
21 points
38 comments8 min readLW link

May Gw­ern.net newslet­ter (w/​GPT-3 com­men­tary)

gwernJun 2, 2020, 3:40 PM
32 points
7 comments1 min readLW link
(www.gwern.net)

Build a Causal De­ci­sion Theorist

michaelcohenSep 23, 2019, 8:43 PM
1 point
14 comments4 min readLW link

A trick for Safer GPT-N

RaziedAug 23, 2020, 12:39 AM
7 points
1 comment2 min readLW link

In­tro­duc­tion To The In­fra-Bayesi­anism Sequence

Aug 26, 2020, 8:31 PM
104 points
64 comments14 min readLW link2 reviews

Model splin­ter­ing: mov­ing from one im­perfect model to another

Stuart_ArmstrongAug 27, 2020, 11:53 AM
74 points
10 comments33 min readLW link

Al­gorith­mic Progress in Six Domains

lukeprogAug 3, 2013, 2:29 AM
38 points
32 comments1 min readLW link

[Question] What are some good ex­am­ples of in­cor­rigi­bil­ity?

RyanCareyApr 28, 2019, 12:22 AM
23 points
17 comments1 min readLW link

Safely and use­fully spec­tat­ing on AIs op­ti­miz­ing over toy worlds

AlexMennenJul 31, 2018, 6:30 PM
24 points
16 comments2 min readLW link

Up­dates and ad­di­tions to “Embed­ded Agency”

Aug 29, 2020, 4:22 AM
73 points
1 comment3 min readLW link

[LINK] Ter­ror­ists tar­get AI researchers

RobertLumleySep 15, 2011, 2:22 PM
32 points
35 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM
22 points
16 comments4 min readLW link

Ex­plor­ing Botworld

So8resApr 30, 2014, 10:29 PM
34 points
2 comments6 min readLW link

in­ter­pret­ing GPT: the logit lens

nostalgebraistAug 31, 2020, 2:47 AM
158 points
32 comments11 min readLW link

From GPT to AGI

ChristianKlAug 31, 2020, 1:28 PM
6 points
7 comments1 min readLW link

Log­i­cal or Con­nec­tion­ist AI?

Eliezer YudkowskyNov 17, 2008, 8:03 AM
39 points
26 comments9 min readLW link

Ar­tifi­cial In­tel­li­gence and Life Sciences (Why Big Data is not enough to cap­ture biolog­i­cal sys­tems?)

HansNaujJan 15, 2020, 1:59 AM
6 points
3 comments6 min readLW link

The Case against Killer Robots (link)

D_AlexNov 20, 2012, 7:47 AM
12 points
25 comments1 min readLW link

Near-Term Risk: Killer Robots a Threat to Free­dom and Democracy

EpiphanyJun 14, 2013, 6:28 AM
15 points
105 comments2 min readLW link

Muehlhauser-Wang Dialogue

lukeprogApr 22, 2012, 10:40 PM
34 points
288 comments12 min readLW link

Google may be try­ing to take over the world

[deleted]Jan 27, 2014, 9:33 AM
33 points
133 comments1 min readLW link

Gw­ern about cen­taurs: there is no chance that any use­ful man+ma­chine com­bi­na­tion will work to­gether for more than 10 years, as hu­mans soon will be only a liability

avturchinDec 15, 2018, 9:32 PM
31 points
4 comments1 min readLW link
(www.reddit.com)

Q&A with Abram Dem­ski on risks from AI

XiXiDuJan 17, 2012, 9:43 AM
33 points
71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDuJan 9, 2012, 7:40 PM
22 points
29 comments7 min readLW link

Let the AI teach you how to flirt

DirectedEvolutionSep 17, 2020, 7:04 PM
47 points
11 comments2 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda LinseforsOct 8, 2020, 12:11 PM
5 points
0 comments1 min readLW link

New(ish) AI con­trol ideas

Stuart_ArmstrongMar 5, 2015, 5:03 PM
34 points
14 comments3 min readLW link

Not Tak­ing Over the World

Eliezer YudkowskyDec 15, 2008, 10:18 PM
35 points
97 comments4 min readLW link

Nat­u­ral­is­tic trust among AIs: The parable of the the­sis ad­vi­sor’s theorem

BenyaDec 15, 2013, 8:32 AM
36 points
20 comments6 min readLW link

The Solomonoff Prior is Malign

Mark XuOct 14, 2020, 1:33 AM
148 points
52 comments16 min readLW link3 reviews

Twenty-three AI al­ign­ment re­search pro­ject definitions

rmoehnFeb 3, 2020, 10:21 PM
23 points
0 comments6 min readLW link

When Good­hart­ing is op­ti­mal: lin­ear vs diminish­ing re­turns, un­likely vs likely, and other factors

Stuart_ArmstrongDec 19, 2019, 1:55 PM
24 points
18 comments7 min readLW link

[Question] As a Washed Up Former Data Scien­tist and Ma­chine Learn­ing Re­searcher What Direc­tion Should I Go In Now?

DarklightOct 19, 2020, 8:13 PM
13 points
7 comments3 min readLW link

Ar­tifi­cial Mys­te­ri­ous Intelligence

Eliezer YudkowskyDec 7, 2008, 8:05 PM
29 points
24 comments5 min readLW link

A Pre­ma­ture Word on AI

Eliezer YudkowskyMay 31, 2008, 5:48 PM
26 points
69 comments8 min readLW link

Let’s reim­ple­ment EURISKO!

cousin_itJun 11, 2009, 4:28 PM
23 points
162 comments1 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_ArmstrongJan 18, 2017, 3:57 PM
3 points
0 comments1 min readLW link

[Question] [Meta] Do you want AIS We­bi­nars?

Linda LinseforsMar 21, 2020, 4:01 PM
18 points
7 comments1 min readLW link

New ar­ti­cle from Oren Etzioni

Aryeh EnglanderFeb 25, 2020, 3:25 PM
19 points
19 comments2 min readLW link

Sin­gle­tons Rule OK

Eliezer YudkowskyNov 30, 2008, 4:45 PM
20 points
47 comments5 min readLW link

“On the Im­pos­si­bil­ity of Su­per­sized Machines”

crmflynnMar 31, 2017, 11:32 PM
24 points
4 comments1 min readLW link
(philpapers.org)

Non­sen­tient Optimizers

Eliezer YudkowskyDec 27, 2008, 2:32 AM
34 points
48 comments6 min readLW link

Build­ing Some­thing Smarter

Eliezer YudkowskyNov 2, 2008, 5:00 PM
22 points
57 comments4 min readLW link

Let’s Read: an es­say on AI Theology

Yuxi_LiuJul 4, 2019, 7:50 AM
22 points
9 comments7 min readLW link

Wanted: Python open source volunteers

Eliezer YudkowskyMar 11, 2009, 4:59 AM
16 points
13 comments1 min readLW link

Equil­ibrium and prior se­lec­tion prob­lems in mul­ti­po­lar deployment

JesseCliftonApr 2, 2020, 8:06 PM
20 points
11 comments11 min readLW link

[Question] The Si­mu­la­tion Epiphany Problem

Koen.HoltmanOct 31, 2019, 10:12 PM
15 points
13 comments4 min readLW link

Chang­ing ac­cepted pub­lic opinion and Skynet

RokoMay 22, 2009, 11:05 AM
17 points
71 comments2 min readLW link

In­tro­duc­ing CADIE

MBlumeApr 1, 2009, 7:32 AM
0 points
8 comments1 min readLW link

Deep­mind Plans for Rat-Level AI

moridinamaelAug 18, 2016, 4:26 PM
34 points
9 comments1 min readLW link

“Robot sci­en­tists can think for them­selves”

CronoDASApr 2, 2009, 9:16 PM
−1 points
11 comments1 min readLW link

Au­tomat­ing rea­son­ing about the fu­ture at Ought

jungofthewonNov 9, 2020, 9:51 PM
17 points
0 comments1 min readLW link
(ought.org)

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogismJan 12, 2018, 4:19 PM
10 points
6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleNJan 23, 2015, 3:43 PM
33 points
3 comments1 min readLW link

In the be­gin­ning, Dart­mouth cre­ated the AI and the hype

Stuart_ArmstrongJan 24, 2013, 4:49 PM
33 points
22 comments1 min readLW link

Fun­da­men­tal Philo­soph­i­cal Prob­lems In­her­ent in AI discourse

AlexSadlerSep 16, 2018, 9:03 PM
23 points
1 comment17 min readLW link

Re­search Pri­ori­ties for Ar­tifi­cial In­tel­li­gence: An Open Letter

jimrandomhJan 11, 2015, 7:52 PM
38 points
11 comments1 min readLW link

[Question] How can I help re­search Friendly AI?

avichapmanJul 9, 2019, 12:15 AM
22 points
3 comments1 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelainJun 3, 2015, 7:25 PM
26 points
59 comments7 min readLW link

[Question] How to deal with a mis­lead­ing con­fer­ence talk about AI risk?

rmoehnJun 27, 2019, 9:04 PM
21 points
13 comments4 min readLW link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence Align­ment Research

Aug 22, 2019, 10:33 AM
24 points
3 comments13 min readLW link

[Question] How can labour pro­duc­tivity growth be an in­di­ca­tor of au­toma­tion?

PolytoposNov 16, 2020, 9:16 PM
2 points
5 comments1 min readLW link

[Question] Should I do it?

MrLightNov 19, 2020, 1:08 AM
−3 points
16 comments2 min readLW link

My in­tel­lec­tual influences

Richard_NgoNov 22, 2020, 6:00 PM
92 points
1 comment5 min readLW link
(thinkingcomplete.blogspot.com)

Del­e­gated agents in prac­tice: How com­pa­nies might end up sel­l­ing AI ser­vices that act on be­half of con­sumers and coal­i­tions, and what this im­plies for safety research

RemmeltNov 26, 2020, 11:17 AM
7 points
5 comments4 min readLW link

SETI Predictions

hippkeNov 30, 2020, 8:09 PM
23 points
8 comments1 min readLW link

What hap­pens when your be­liefs fully propagate

AlexeiFeb 14, 2012, 7:53 AM
29 points
79 comments7 min readLW link

In­ter­ac­tive ex­plo­ra­tion of LessWrong and other large col­lec­tions of documents

Dec 20, 2020, 7:06 PM
49 points
9 comments10 min readLW link

[Question] Will AGI have “hu­man” flaws?

Agustinus TheodorusDec 23, 2020, 3:43 AM
1 point
2 comments1 min readLW link

Op­ti­mum num­ber of sin­gle points of failure

Douglas_ReayMar 14, 2018, 1:30 PM
7 points
4 comments4 min readLW link

Don’t put all your eggs in one basket

Douglas_ReayMar 15, 2018, 8:07 AM
5 points
0 comments7 min readLW link

Defect or Cooperate

Douglas_ReayMar 16, 2018, 2:12 PM
4 points
5 comments6 min readLW link

En­vi­ron­ments for kil­ling AIs

Douglas_ReayMar 17, 2018, 3:23 PM
3 points
1 comment9 min readLW link

The ad­van­tage of not be­ing open-ended

Douglas_ReayMar 18, 2018, 1:50 PM
7 points
2 comments6 min readLW link

Metamorphosis

Douglas_ReayApr 12, 2018, 9:53 PM
2 points
0 comments4 min readLW link

Believ­able Promises

Douglas_ReayApr 16, 2018, 4:17 PM
5 points
0 comments5 min readLW link

Trust­wor­thy Computing

Douglas_ReayApr 10, 2018, 7:55 AM
9 points
1 comment6 min readLW link

Edge of the Cliff

akaTricksterJan 5, 2021, 5:21 PM
1 point
0 comments5 min readLW link

[Question] How is re­in­force­ment learn­ing pos­si­ble in non-sen­tient agents?

SomeoneKindJan 5, 2021, 8:57 PM
3 points
5 comments1 min readLW link

AI Align­ment Us­ing Re­v­erse Simulation

Sven NilsenJan 12, 2021, 8:48 PM
1 point
0 comments1 min readLW link

A toy model of the con­trol problem

Stuart_ArmstrongSep 16, 2015, 2:59 PM
36 points
24 comments3 min readLW link

On the na­ture of pur­pose

Nora_AmmannJan 22, 2021, 8:30 AM
28 points
15 comments9 min readLW link

Learn­ing Nor­ma­tivity: Language

BunthutFeb 5, 2021, 10:26 PM
14 points
4 comments8 min readLW link

Sin­gu­lar­ity&phase tran­si­tion-2. A pri­ori prob­a­bil­ity and ways to check.

Valentin2026Feb 8, 2021, 2:21 AM
1 point
0 comments3 min readLW link

Non­per­son Predicates

Eliezer YudkowskyDec 27, 2008, 1:47 AM
52 points
176 comments6 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjrFeb 12, 2021, 7:55 AM
15 points
0 comments26 min readLW link

2021-03-01 Na­tional Library of Medicine Pre­sen­ta­tion: “At­las of AI: Map­ping the so­cial and eco­nomic forces be­hind AI”

IrenicTruthFeb 17, 2021, 6:23 PM
1 point
0 comments2 min readLW link

Chaotic era: avoid or sur­vive?

Valentin2026Feb 22, 2021, 1:34 AM
3 points
3 comments2 min readLW link

Suffer­ing-Fo­cused Ethics in the In­finite Uni­verse. How can we re­deem our­selves if Mul­ti­verse Im­mor­tal­ity is real and sub­jec­tive death is im­pos­si­ble.

Szymon KucharskiFeb 24, 2021, 9:02 PM
−3 points
4 comments70 min readLW link

AIDun­geon 3.1

Yair HalberstadtMar 1, 2021, 5:56 AM
2 points
0 comments2 min readLW link

Phys­i­cal­ism im­plies ex­pe­rience never dies. So what am I go­ing to ex­pe­rience af­ter it does?

Szymon KucharskiMar 14, 2021, 2:45 PM
−2 points
1 comment30 min readLW link

An An­tropic Ar­gu­ment for Post-sin­gu­lar­ity Antinatalism

monkaapMar 16, 2021, 5:40 PM
3 points
4 comments3 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxaeMar 25, 2021, 2:46 PM
15 points
7 comments3 min readLW link

AI or­a­cles on blockchain

CaravaggioApr 6, 2021, 8:13 PM
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky WilkinsenApr 14, 2021, 12:05 AM
11 points
5 comments1 min readLW link

Re­view of “Why AI is Harder Than We Think”

electroswingApr 30, 2021, 6:14 PM
40 points
10 comments8 min readLW link

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogaoJun 2, 2021, 9:32 PM
79 points
11 comments17 min readLW link

[Question] Sup­pose $1 billion is given to AI Safety. How should it be spent?

hunterglennMay 15, 2021, 11:24 PM
23 points
2 comments1 min readLW link

Con­trol­ling In­tel­li­gent Agents The Only Way We Know How: Ideal Bureau­cratic Struc­ture (IBS)

Justin BullockMay 24, 2021, 12:53 PM
11 points
11 comments6 min readLW link

Cu­rated con­ver­sa­tions with brilli­ant rationalists

spencergMay 28, 2021, 2:23 PM
153 points
18 comments6 min readLW link

Se­cu­rity Mind­set and Or­di­nary Paranoia

Eliezer YudkowskyNov 25, 2017, 5:53 PM
98 points
24 comments29 min readLW link

The Anti-Carter Basilisk

Jon GilbertMay 26, 2021, 10:56 PM
0 points
0 comments2 min readLW link

Pa­ram­e­ter counts in Ma­chine Learning

Jun 19, 2021, 4:04 PM
47 points
16 comments7 min readLW link

Ir­ra­tional Modesty

Tomás B.Jun 20, 2021, 7:38 PM
132 points
7 comments1 min readLW link

[Question] Thoughts on a “Se­quences In­spired” PhD Topic

goose000Jun 17, 2021, 8:36 PM
7 points
2 comments2 min readLW link

Some al­ter­na­tives to “Friendly AI”

lukeprogJun 15, 2014, 7:53 PM
30 points
44 comments2 min readLW link

In­tel­li­gence with­out Consciousness

Andrew VlahosJul 7, 2021, 5:27 AM
13 points
5 comments1 min readLW link

[Question] What would it look like if it looked like AGI was very near?

Tomás B.Jul 12, 2021, 3:22 PM
52 points
25 comments1 min readLW link

Is the ar­gu­ment that AI is an xrisk valid?

MACannonJul 19, 2021, 1:20 PM
5 points
62 comments1 min readLW link
(onlinelibrary.wiley.com)

[Question] Jay­ne­sian in­ter­pre­ta­tion—How does “es­ti­mat­ing prob­a­bil­ities” make sense?

Haziq MuhammadJul 21, 2021, 9:36 PM
4 points
40 comments1 min readLW link

The biolog­i­cal in­tel­li­gence explosion

Rob LucasJul 25, 2021, 1:08 PM
8 points
6 comments4 min readLW link

[Question] Do Bayesi­ans like Bayesian model Aver­ag­ing?

Haziq MuhammadAug 2, 2021, 12:24 PM
4 points
13 comments1 min readLW link

[Question] Ques­tion about Test-sets and Bayesian ma­chine learn­ing

Haziq MuhammadAug 9, 2021, 5:16 PM
2 points
8 comments1 min readLW link

[Question] Halpern’s pa­per—A re­fu­ta­tion of Cox’s the­o­rem?

Haziq MuhammadAug 11, 2021, 9:25 AM
11 points
7 comments1 min readLW link

New GPT-3 competitor

Quintin PopeAug 12, 2021, 7:05 AM
32 points
10 comments1 min readLW link

[Question] Jaynes-Cox Prob­a­bil­ity: Are plau­si­bil­ities ob­jec­tive?

Haziq MuhammadAug 12, 2021, 2:23 PM
9 points
17 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykovAug 16, 2021, 5:03 AM
3 points
5 comments3 min readLW link

[Question] Is it worth mak­ing a database for moral pre­dic­tions?

Jonas HallgrenAug 16, 2021, 2:51 PM
1 point
0 comments2 min readLW link

Cyn­i­cal ex­pla­na­tions of FAI crit­ics (in­clud­ing my­self)

Wei DaiAug 13, 2012, 9:19 PM
31 points
49 comments1 min readLW link

[Question] Has Van Horn fixed Cox’s the­o­rem?

Haziq MuhammadAug 29, 2021, 6:36 PM
9 points
1 comment1 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-PerlmanAug 29, 2021, 6:00 PM
5 points
2 comments11 min readLW link

Limits of and to (ar­tifi­cial) Intelligence

MoritzGAug 25, 2019, 10:16 PM
1 point
3 comments7 min readLW link

Grokking the In­ten­tional Stance

jbkjrAug 31, 2021, 3:49 PM
41 points
20 comments20 min readLW link

In­tel­li­gence, Fast and Slow

Mateusz MazurkiewiczSep 1, 2021, 7:52 PM
−3 points
2 comments2 min readLW link

[Question] Is LessWrong dead with­out Cox’s the­o­rem?

Haziq MuhammadSep 4, 2021, 5:45 AM
−2 points
88 comments1 min readLW link

Align­ment via man­u­ally im­ple­ment­ing the util­ity function

ChantielSep 7, 2021, 8:20 PM
1 point
6 comments2 min readLW link

Pivot!

Carlos RamirezSep 12, 2021, 8:39 PM
−19 points
5 comments1 min readLW link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete CitriniSep 16, 2021, 4:13 PM
6 points
0 comments8 min readLW link

Why will AI be dan­ger­ous?

LegionnaireFeb 4, 2022, 11:41 PM
37 points
14 comments1 min readLW link

Oc­cam’s Ra­zor and the Univer­sal Prior

Peter ChatainOct 3, 2021, 3:23 AM
22 points
5 comments21 min readLW link

We’re Red­wood Re­search, we do ap­plied al­ign­ment re­search, AMA

Nate ThomasOct 6, 2021, 5:51 AM
56 points
3 comments2 min readLW link
(forum.effectivealtruism.org)

[LINK] Wait But Why—The AI Revolu­tion Part 2

Adam ZernerFeb 4, 2015, 4:02 PM
27 points
88 comments1 min readLW link

Slate Star Codex Notes on the Asilo­mar Con­fer­ence on Benefi­cial AI

Gunnar_ZarnckeFeb 7, 2017, 12:14 PM
24 points
8 comments1 min readLW link
(slatestarcodex.com)

Three Ap­proaches to “Friendli­ness”

Wei DaiJul 17, 2013, 7:46 AM
32 points
86 comments3 min readLW link

P₂B: Plan to P₂B Better

Oct 24, 2021, 3:21 PM
33 points
14 comments6 min readLW link

A Roadmap to a Post-Scarcity Economy

lorepieriOct 30, 2021, 9:04 AM
3 points
3 comments1 min readLW link

What is the link be­tween al­tru­ism and in­tel­li­gence?

Ruralvisitor83Nov 3, 2021, 11:59 PM
3 points
13 comments1 min readLW link

Model­ing the im­pact of safety agendas

Ben CottierNov 5, 2021, 7:46 PM
51 points
6 comments10 min readLW link

[Question] Does any­one know what Marvin Min­sky is talk­ing about here?

delton137Nov 19, 2021, 12:56 AM
1 point
6 comments3 min readLW link

In­te­grat­ing Three Models of (Hu­man) Cognition

jbkjrNov 23, 2021, 1:06 AM
29 points
4 comments32 min readLW link

[Question] I cur­rently trans­late AGI-re­lated texts to Rus­sian. Is that use­ful?

TapataktNov 27, 2021, 5:51 PM
29 points
7 comments1 min readLW link

Ques­tion/​Is­sue with the 5/​10 Problem

acgtNov 29, 2021, 10:45 AM
6 points
3 comments3 min readLW link

Can solip­sism be dis­proven?

nx2059Dec 4, 2021, 8:24 AM
−2 points
5 comments2 min readLW link

[Question] Misc. ques­tions about EfficientZero

Daniel KokotajloDec 4, 2021, 7:45 PM
51 points
17 comments1 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM
8 points
15 comments27 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_likaDec 7, 2021, 7:37 PM
1 point
0 comments1 min readLW link

What role should evolu­tion­ary analo­gies play in un­der­stand­ing AI take­off speeds?

anson.hoDec 11, 2021, 1:19 AM
14 points
0 comments42 min readLW link

Mo­ti­va­tions, Nat­u­ral Selec­tion, and Cur­ricu­lum Engineering

Oliver SourbutDec 16, 2021, 1:07 AM
16 points
0 comments42 min readLW link

Emer­gent mod­u­lar­ity and safety

Richard_NgoOct 21, 2021, 1:54 AM
31 points
15 comments3 min readLW link

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kittenDec 16, 2021, 10:41 PM
22 points
10 comments21 min readLW link

Univer­sal­ity and the “Filter”

maggiehayesDec 16, 2021, 12:47 AM
10 points
3 comments11 min readLW link

[Question] Can you prove that 0 = 1?

purplelightFeb 4, 2022, 9:31 PM
−10 points
4 comments1 min readLW link

Ex­pec­ta­tions In­fluence Real­ity (and AI)

purplelightFeb 4, 2022, 9:31 PM
0 points
3 comments7 min readLW link

[Question] What ques­tions do you have about do­ing work on AI safety?

peterbarnettDec 21, 2021, 4:36 PM
13 points
8 comments1 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM
76 points
20 comments1 min readLW link

Elic­it­ing La­tent Knowl­edge Via Hy­po­thet­i­cal Sensors

John_MaxwellDec 30, 2021, 3:53 PM
38 points
2 comments6 min readLW link

Lat­eral Think­ing (AI safety HPMOR fan­fic)

SlytherinsMonsterJan 2, 2022, 11:50 PM
75 points
9 comments5 min readLW link

SONN : What’s Next ?

D𝜋Jan 9, 2022, 8:15 AM
−17 points
3 comments1 min readLW link

An Open Philan­thropy grant pro­posal: Causal rep­re­sen­ta­tion learn­ing of hu­man preferences

PabloAMCJan 11, 2022, 11:28 AM
19 points
6 comments8 min readLW link

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. MurphyJan 19, 2022, 10:47 PM
23 points
8 comments3 min readLW link

Emo­tions = Re­ward Functions

jpyykkoJan 20, 2022, 6:46 PM
16 points
10 comments5 min readLW link

[Question] Is AI Align­ment a pseu­do­science?

mocny-chlapikJan 23, 2022, 10:32 AM
21 points
41 comments1 min readLW link

De­con­fus­ing Deception

J BostockJan 29, 2022, 4:43 PM
26 points
6 comments2 min readLW link

Re­vis­it­ing Brave New World Re­vis­ited (Chap­ter 3)

Justin BullockFeb 1, 2022, 5:17 PM
5 points
0 comments10 min readLW link

[Question] Do mesa-op­ti­miza­tion prob­lems cor­re­late with low-slack?

sudoFeb 4, 2022, 9:11 PM
1 point
1 comment1 min readLW link

Can the laws of physics/​na­ture pre­vent hell?

superads91Feb 6, 2022, 8:39 PM
−7 points
10 comments2 min readLW link

Ngo and Yud­kowsky on sci­en­tific rea­son­ing and pivotal acts

Feb 21, 2022, 8:54 PM
51 points
13 comments35 min readLW link

Bet­ter a Brave New World than a dead one

YitzFeb 25, 2022, 11:11 PM
8 points
5 comments4 min readLW link

Be­ing an in­di­vi­d­ual al­ign­ment grantmaker

A_donorFeb 28, 2022, 8:02 PM
64 points
5 comments2 min readLW link

How to de­velop safe superintelligence

martillopartMar 1, 2022, 9:57 PM
−5 points
3 comments13 min readLW link

Deep Dives: My Ad­vice for Pur­su­ing Work in Re­search

scasperMar 11, 2022, 5:56 PM
21 points
2 comments3 min readLW link

One pos­si­ble ap­proach to de­velop the best pos­si­ble gen­eral learn­ing algorithm

martillopartMar 14, 2022, 7:24 PM
3 points
0 comments7 min readLW link

[Question] Our time in his­tory as ev­i­dence for simu­la­tion the­ory?

Garrett GarzonieMar 18, 2022, 3:35 AM
3 points
2 comments1 min readLW link

The weak­est ar­gu­ments for and against hu­man level AI

Stuart_ArmstrongAug 15, 2012, 11:04 AM
22 points
34 comments1 min readLW link

Chris­ti­ano and Yud­kowsky on AI pre­dic­tions and hu­man intelligence

Eliezer YudkowskyFeb 23, 2022, 9:34 PM
69 points
35 comments42 min readLW link

Even more cu­rated con­ver­sa­tions with brilli­ant rationalists

spencergMar 21, 2022, 11:49 PM
57 points
0 comments15 min readLW link

Man­hat­tan pro­ject for al­igned AI

Chris van MerwijkMar 27, 2022, 11:41 AM
34 points
6 comments2 min readLW link

Gears-Level Men­tal Models of Trans­former Interpretability

KevinRoWangMar 29, 2022, 8:09 PM
56 points
4 comments6 min readLW link

Meta wants to use AI to write Wikipe­dia ar­ti­cles; I am Ner­vous™

YitzMar 30, 2022, 7:05 PM
14 points
12 comments1 min readLW link

[Question] If AGI were com­ing in a year, what should we do?

MichaelStJulesApr 1, 2022, 12:41 AM
20 points
16 comments1 min readLW link

On Agent In­cen­tives to Ma­nipu­late Hu­man Feed­back in Multi-Agent Re­ward Learn­ing Scenarios

Francis Rhys WardApr 3, 2022, 6:20 PM
27 points
11 comments8 min readLW link

[Question] How to write a LW se­quence to learn a topic?

PabloAMCApr 3, 2022, 8:09 PM
3 points
2 comments1 min readLW link

Save Hu­man­ity! Breed Sapi­ent Oc­to­puses!

Yair HalberstadtApr 5, 2022, 6:39 PM
54 points
17 comments1 min readLW link

What Should We Op­ti­mize—A Conversation

Johannes C. MayerApr 7, 2022, 3:47 AM
1 point
0 comments14 min readLW link

The Ex­plana­tory Gap of AI

David ValdmanApr 7, 2022, 6:28 PM
1 point
0 comments4 min readLW link

Progress re­port 3: clus­ter­ing trans­former neurons

Nathan Helm-BurgerApr 5, 2022, 11:13 PM
5 points
0 comments2 min readLW link

God­shat­ter Ver­sus Leg­i­bil­ity: A Fun­da­men­tally Differ­ent Ap­proach To AI Alignment

LukeOnlineApr 9, 2022, 9:43 PM
11 points
14 comments7 min readLW link

Is Fish­e­rian Ru­n­away Gra­di­ent Hack­ing?

Ryan KiddApr 10, 2022, 1:47 PM
15 points
7 comments4 min readLW link

The Glitch And Notes On Digi­tal Beings

GhvstApr 11, 2022, 7:46 PM
−4 points
0 comments2 min readLW link
(ghvsted.com)

Post-his­tory is writ­ten by the martyrs

VeedracApr 11, 2022, 3:45 PM
37 points
2 comments19 min readLW link
(www.royalroad.com)

An AI-in-a-box suc­cess model

azsantoskApr 11, 2022, 10:28 PM
16 points
1 comment10 min readLW link

Ra­tion­al­ist Should Win. Not Dy­ing with Dig­nity and Fund­ing WBE.

CitizenTenApr 12, 2022, 2:14 AM
23 points
15 comments5 min readLW link

Re­ward model hack­ing as a challenge for re­ward learning

Erik JennerApr 12, 2022, 9:39 AM
25 points
1 comment9 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bearApr 12, 2022, 1:07 PM
4 points
2 comments2 min readLW link

Another list of the­o­ries of im­pact for interpretability

Beth BarnesApr 13, 2022, 1:29 PM
32 points
1 comment5 min readLW link

Some rea­sons why a pre­dic­tor wants to be a consequentialist

Lauro LangoscoApr 15, 2022, 3:02 PM
23 points
16 comments5 min readLW link

Red­wood Re­search is hiring for sev­eral roles (Oper­a­tions and Tech­ni­cal)

Apr 14, 2022, 4:57 PM
29 points
0 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

YitzApr 15, 2022, 5:26 PM
60 points
53 comments1 min readLW link

Another ar­gu­ment that you will let the AI out of the box

Garrett BakerApr 19, 2022, 9:54 PM
8 points
16 comments2 min readLW link

For ev­ery choice of AGI difficulty, con­di­tion­ing on grad­ual take-off im­plies shorter timelines.

Francis Rhys WardApr 21, 2022, 7:44 AM
29 points
13 comments3 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone PineApr 21, 2022, 4:19 PM
51 points
25 comments5 min readLW link

Key ques­tions about ar­tifi­cial sen­tience: an opinionated guide

RobboApr 25, 2022, 12:09 PM
45 points
31 comments18 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone PineApr 25, 2022, 6:30 PM
7 points
7 comments1 min readLW link

Why Copi­lot Ac­cel­er­ates Timelines

Michaël TrazziApr 26, 2022, 10:06 PM
35 points
14 comments7 min readLW link

[Question] Is it de­sir­able for the first AGI to be con­scious?

Charbel-RaphaëlMay 1, 2022, 9:29 PM
5 points
12 comments1 min readLW link

[Question] What Was Your Best /​ Most Suc­cess­ful DALL-E 2 Prompt?

EvidentialMay 4, 2022, 3:16 AM
1 point
0 comments1 min readLW link

Ne­go­ti­at­ing Up and Down the Si­mu­la­tion Hier­ar­chy: Why We Might Sur­vive the Unal­igned Singularity

David UdellMay 4, 2022, 4:21 AM
24 points
16 comments2 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

May 5, 2022, 12:59 AM
136 points
29 comments9 min readLW link

Deriv­ing Con­di­tional Ex­pected Utility from Pareto-Effi­cient Decisions

Thomas KwaMay 5, 2022, 3:21 AM
23 points
1 comment6 min readLW link

Tran­scripts of in­ter­views with AI researchers

Vael GatesMay 9, 2022, 5:57 AM
160 points
8 comments2 min readLW link

Agency As a Nat­u­ral Abstraction

Thane RuthenisMay 13, 2022, 6:02 PM
55 points
9 comments13 min readLW link

Pre­dict­ing the Elec­tions with Deep Learn­ing—Part 1 - Results

Quentin ChenevierMay 14, 2022, 12:54 PM
0 points
0 comments1 min readLW link

On sav­ing one’s world

Rob BensingerMay 17, 2022, 7:53 PM
190 points
5 comments1 min readLW link

In defence of flailing

acylhalideJun 18, 2022, 5:26 AM
10 points
14 comments4 min readLW link

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
143 points
34 comments21 min readLW link

Science for the Pos­si­ble World

Zechen ZhangMay 23, 2022, 2:01 PM
7 points
0 comments3 min readLW link

Syn­thetic Me­dia and The Fu­ture of Film

ifalphaMay 24, 2022, 5:54 AM
35 points
13 comments8 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy GillenMay 24, 2022, 11:10 PM
9 points
2 comments10 min readLW link

A dis­cus­sion of the pa­per, “Large Lan­guage Models are Zero-Shot Rea­son­ers”

HiroSakurabaMay 26, 2022, 3:55 PM
7 points
0 comments4 min readLW link

On in­ner and outer al­ign­ment, and their confusion

Nina PanicksseryMay 26, 2022, 9:56 PM
6 points
7 comments4 min readLW link

RL with KL penalties is bet­ter seen as Bayesian inference

May 25, 2022, 9:23 AM
90 points
15 comments12 min readLW link

Bits of Op­ti­miza­tion Can Only Be Lost Over A Distance

johnswentworthMay 23, 2022, 6:55 PM
26 points
15 comments2 min readLW link

Gra­da­tions of Agency

Daniel KokotajloMay 23, 2022, 1:10 AM
40 points
6 comments5 min readLW link

Utilitarianism

C S SRUTHIMay 28, 2022, 7:35 PM
0 points
1 comment1 min readLW link

Distil­led—AGI Safety from First Principles

Harrison GMay 29, 2022, 12:57 AM
8 points
1 comment14 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius314May 29, 2022, 8:36 AM
7 points
0 comments14 min readLW link

The im­pact you might have work­ing on AI safety

Fabien RogerMay 29, 2022, 4:31 PM
5 points
1 comment4 min readLW link

My SERI MATS Application

Daniel PalekaMay 30, 2022, 2:04 AM
16 points
0 comments8 min readLW link

[Question] A ter­rify­ing var­i­ant of Boltz­mann’s brains problem

Zeruel017May 30, 2022, 8:08 PM
5 points
12 comments4 min readLW link

The Re­v­erse Basilisk

Dunning K.May 30, 2022, 11:10 PM
15 points
23 comments2 min readLW link

The Hard In­tel­li­gence Hy­poth­e­sis and Its Bear­ing on Suc­ces­sion In­duced Foom

DragonGodMay 31, 2022, 7:04 PM
10 points
7 comments4 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet FarlowMay 31, 2022, 10:03 PM
16 points
0 comments6 min readLW link

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael BrightJun 1, 2022, 6:45 PM
1 point
6 comments1 min readLW link

New co­op­er­a­tion mechanism—quadratic fund­ing with­out a match­ing pool

Filip SondejJun 5, 2022, 1:55 PM
11 points
0 comments5 min readLW link

Miriam Ye­vick on why both sym­bols and net­works are nec­es­sary for ar­tifi­cial minds

Bill BenzonJun 6, 2022, 8:34 AM
1 point
0 comments4 min readLW link

Six Di­men­sions of Oper­a­tional Ad­e­quacy in AGI Projects

Eliezer YudkowskyMay 30, 2022, 5:00 PM
270 points
65 comments13 min readLW link

Grokking “Fore­cast­ing TAI with biolog­i­cal an­chors”

anson.hoJun 6, 2022, 6:58 PM
34 points
0 comments14 min readLW link

Who mod­els the mod­els that model mod­els? An ex­plo­ra­tion of GPT-3′s in-con­text model fit­ting ability

LovreJun 7, 2022, 7:37 PM
112 points
14 comments9 min readLW link

Pitch­ing an Align­ment Softball

mu_(negative)Jun 7, 2022, 4:10 AM
47 points
13 comments10 min readLW link

[Question] Con­fused Thoughts on AI After­life (se­ri­ously)

EpiritoJun 7, 2022, 2:37 PM
−6 points
6 comments1 min readLW link

Trans­former Re­search Ques­tions from Stained Glass Windows

StefanHexJun 8, 2022, 12:38 PM
4 points
0 comments2 min readLW link

Elic­it­ing La­tent Knowl­edge (ELK) - Distil­la­tion/​Summary

Marius HobbhahnJun 8, 2022, 1:18 PM
49 points
2 comments21 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane RuthenisJun 16, 2022, 10:00 PM
24 points
4 comments18 min readLW link

Vael Gates: Risks from Ad­vanced AI (June 2022)

Vael GatesJun 14, 2022, 12:54 AM
38 points
2 comments30 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan KinnimentJun 27, 2022, 6:56 PM
21 points
3 comments18 min readLW link

Oper­a­tional­iz­ing two tasks in Gary Mar­cus’s AGI challenge

Bill BenzonJun 9, 2022, 6:31 PM
10 points
3 comments8 min readLW link

A plau­si­ble story about AI risk.

DeLesley HutchinsJun 10, 2022, 2:08 AM
14 points
1 comment4 min readLW link

I No Longer Believe In­tel­li­gence to be “Mag­i­cal”

DragonGodJun 10, 2022, 8:58 AM
31 points
34 comments6 min readLW link

[Question] Why don’t you in­tro­duce re­ally im­pres­sive peo­ple you per­son­ally know to AI al­ign­ment (more of­ten)?

VerdenJun 11, 2022, 3:59 PM
33 points
15 comments1 min readLW link

Godzilla Strategies

johnswentworthJun 11, 2022, 3:44 PM
151 points
65 comments3 min readLW link

In­tu­itive Ex­pla­na­tion of AIXI

Thomas LarsenJun 12, 2022, 9:41 PM
13 points
0 comments5 min readLW link

Train­ing Trace Priors

Adam JermynJun 13, 2022, 2:22 PM
12 points
17 comments4 min readLW link

Why multi-agent safety is im­por­tant

Akbir KhanJun 14, 2022, 9:23 AM
8 points
2 comments10 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

nsokolskyJun 13, 2022, 6:26 PM
124 points
76 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafetyJun 13, 2022, 7:24 PM
−15 points
1 comment5 min readLW link

OpenAI: GPT-based LLMs show abil­ity to dis­crim­i­nate be­tween its own wrong an­swers, but in­abil­ity to ex­plain how/​why it makes that dis­crim­i­na­tion, even as model scales

Aditya JainJun 13, 2022, 11:33 PM
14 points
5 comments1 min readLW link
(openai.com)

Re­sources I send to AI re­searchers about AI safety

Vael GatesJun 14, 2022, 2:24 AM
62 points
12 comments10 min readLW link

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

Jun 14, 2022, 1:57 PM
28 points
4 comments13 min readLW link

[Question] How Do You Quan­tify [Physics In­ter­fac­ing] Real World Ca­pa­bil­ities?

DragonGodJun 14, 2022, 2:49 PM
17 points
1 comment4 min readLW link

Cryp­to­graphic Life: How to tran­scend in a sub-light­speed world via Ho­mo­mor­phic encryption

GololJun 14, 2022, 7:22 PM
1 point
0 comments3 min readLW link

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMillsJun 15, 2022, 3:12 AM
35 points
4 comments2 min readLW link

Multi­gate Priors

Adam JermynJun 15, 2022, 7:30 PM
4 points
0 comments3 min readLW link

In­fo­haz­ards and in­fer­en­tial distances

acylhalideJun 16, 2022, 7:59 AM
8 points
0 comments6 min readLW link

Ap­ply to the Ma­chine Learn­ing For Good boot­camp in France

Alexandre VariengienJun 17, 2022, 7:32 AM
10 points
0 comments1 min readLW link

Adap­ta­tion Ex­ecu­tors and the Telos Margin

PlinthistJun 20, 2022, 1:06 PM
2 points
8 comments5 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

Jun 20, 2022, 10:54 AM
83 points
30 comments18 min readLW link

[Question] What is the most prob­a­ble AI?

Zeruel017Jun 20, 2022, 11:26 PM
−2 points
0 comments3 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

Jun 22, 2022, 3:05 PM
28 points
1 comment14 min readLW link

The Limits of Automation

milkandcigarettesJun 23, 2022, 6:03 PM
5 points
1 comment5 min readLW link
(milkandcigarettes.com)

Con­ver­sa­tion with Eliezer: What do you want the sys­tem to do?

Orpheus16Jun 25, 2022, 5:36 PM
112 points
38 comments2 min readLW link

[Yann Le­cun] A Path Towards Au­tonomous Ma­chine In­tel­li­gence

DragonGodJun 27, 2022, 7:24 PM
38 points
12 comments1 min readLW link
(openreview.net)

Yann LeCun, A Path Towards Au­tonomous Ma­chine In­tel­li­gence [link]

Bill BenzonJun 27, 2022, 11:29 PM
5 points
1 comment1 min readLW link

Doom doubts—is in­ner al­ign­ment a likely prob­lem?

CrissmanJun 28, 2022, 12:42 PM
6 points
7 comments1 min readLW link

What suc­cess looks like

Jun 28, 2022, 2:38 PM
19 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

La­tent Ad­ver­sar­ial Training

Adam JermynJun 29, 2022, 8:04 PM
24 points
9 comments5 min readLW link

He­donis­tic Iso­topes:

TrozxzrJun 30, 2022, 4:49 PM
1 point
0 comments1 min readLW link

[Question] What about tran­shu­mans and be­yond?

AlignmentMirrorJul 2, 2022, 1:58 PM
7 points
6 comments1 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. MurphyJul 4, 2022, 1:25 AM
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes KamelJul 5, 2022, 12:34 PM
10 points
7 comments4 min readLW link
(youneskamel.substack.com)

How hu­man­ity would re­spond to slow take­off, with take­aways from the en­tire COVID-19 pan­demic

Noosphere89Jul 6, 2022, 5:52 PM
4 points
1 comment2 min readLW link

Four So­cietal In­ter­ven­tions to Im­prove our AGI Position

Rafael CosmanJul 6, 2022, 6:32 PM
−6 points
2 comments6 min readLW link
(rafaelcosman.com)

Deep neu­ral net­works are not opaque.

jem-mosigJul 6, 2022, 6:03 PM
22 points
14 comments3 min readLW link

Co­op­er­a­tion with and be­tween AGI\’s

PeterMcCluskeyJul 7, 2022, 4:45 PM
10 points
3 comments10 min readLW link
(www.bayesianinvestor.com)

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM
14 points
5 comments22 min readLW link

Grouped Loss may dis­fa­vor dis­con­tin­u­ous capabilities

Adam JermynJul 9, 2022, 5:22 PM
14 points
2 comments4 min readLW link

We are now at the point of deep­fake job interviews

trevorJul 10, 2022, 3:37 AM
6 points
0 comments1 min readLW link
(www.businessinsider.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
43 points
0 comments1 min readLW link
(docs.google.com)

Find­ing Skele­tons on Rashomon Ridge

Jul 24, 2022, 10:31 PM
30 points
2 comments7 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8resJul 15, 2022, 4:46 AM
178 points
31 comments6 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM
18 points
0 comments37 min readLW link

AI Hiroshima (Does A Vivid Ex­am­ple Of Destruc­tion Fore­stall Apoca­lypse?)

SableJul 18, 2022, 12:06 PM
4 points
4 comments2 min readLW link

Bounded com­plex­ity of solv­ing ELK and its implications

Rubi J. HudsonJul 19, 2022, 6:56 AM
10 points
4 comments18 min readLW link

Abram Dem­ski’s ELK thoughts and pro­posal—distillation

Rubi J. HudsonJul 19, 2022, 6:57 AM
15 points
4 comments16 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth BarnesJul 19, 2022, 4:55 AM
94 points
6 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCephJul 19, 2022, 2:28 AM
8 points
1 comment9 min readLW link

Model­ling Deception

Garrett BakerJul 18, 2022, 9:21 PM
15 points
0 comments7 min readLW link

En­light­en­ment Values in a Vuln­er­a­ble World

Maxwell TabarrokJul 20, 2022, 7:52 PM
15 points
6 comments31 min readLW link
(maximumprogress.substack.com)

AI Safety Cheat­sheet /​ Quick Reference

Zohar JacksonJul 20, 2022, 9:39 AM
3 points
0 comments1 min readLW link
(github.com)

Coun­ter­ing ar­gu­ments against work­ing on AI safety

Rauno ArikeJul 20, 2022, 6:23 PM
6 points
2 comments7 min readLW link

Why AGI Timeline Re­search/​Dis­course Might Be Overrated

Noosphere89Jul 20, 2022, 8:26 PM
5 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Con­nor Leahy on Dy­ing with Dig­nity, EleutherAI and Conjecture

Michaël TrazziJul 22, 2022, 6:44 PM
176 points
29 comments14 min readLW link
(theinsideview.ai)

Brain­storm of things that could force an AI team to burn their lead

So8resJul 24, 2022, 11:58 PM
103 points
4 comments13 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín SotoJul 25, 2022, 11:57 PM
12 points
2 comments2 min readLW link

AI ethics vs AI alignment

Wei DaiJul 26, 2022, 1:08 PM
4 points
1 comment1 min readLW link

NeurIPS ML Safety Work­shop 2022

Dan HJul 26, 2022, 3:28 PM
72 points
2 comments1 min readLW link
(neurips2022.mlsafety.org)

Quan­tum Ad­van­tage in Learn­ing from Experiments

Dennis TowneJul 27, 2022, 3:49 PM
5 points
5 comments1 min readLW link
(ai.googleblog.com)

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8resJul 27, 2022, 3:21 AM
148 points
37 comments6 min readLW link

A Quick Note on AI Scal­ing Asymptotes

alyssavanceMay 25, 2022, 2:55 AM
43 points
6 comments1 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span1Aug 1, 2022, 4:08 AM
3 points
3 comments1 min readLW link

[Question] I want to donate some money (not much, just what I can af­ford) to AGI Align­ment re­search, to what­ever or­ga­ni­za­tion has the best chance of mak­ing sure that AGI goes well and doesn’t kill us all. What are my best op­tions, where can I make the most differ­ence per dol­lar?

lumenwritesAug 2, 2022, 12:08 PM
15 points
9 comments1 min readLW link

Law-Fol­low­ing AI 4: Don’t Rely on Vi­car­i­ous Liability

CullenAug 2, 2022, 11:26 PM
5 points
2 comments3 min readLW link

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tameraAug 3, 2022, 12:03 PM
103 points
22 comments6 min readLW link

Trans­former lan­guage mod­els are do­ing some­thing more general

NumendilAug 3, 2022, 9:13 PM
44 points
6 comments2 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

LintzAAug 3, 2022, 11:15 PM
17 points
0 comments12 min readLW link

Sur­prised by ELK re­port’s coun­terex­am­ple to De­bate, IDA

Evan R. MurphyAug 4, 2022, 2:12 AM
18 points
0 comments5 min readLW link

Bias to­wards sim­ple func­tions; ap­pli­ca­tion to al­ign­ment?

DavidHolmesAug 18, 2022, 4:15 PM
3 points
7 comments2 min readLW link

What do ML re­searchers think about AI in 2022?

KatjaGraceAug 4, 2022, 3:40 PM
217 points
33 comments3 min readLW link
(aiimpacts.org)

Deon­tol­ogy and Tool AI

Nathan1123Aug 5, 2022, 5:20 AM
4 points
5 comments6 min readLW link

Bridg­ing Ex­pected Utility Max­i­miza­tion and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM
23 points
5 comments14 min readLW link

Coun­ter­fac­tu­als are Con­fus­ing be­cause of an On­tolog­i­cal Shift

Chris_LeongAug 5, 2022, 7:03 PM
17 points
35 comments2 min readLW link

A Data limited future

Donald HobsonAug 6, 2022, 2:56 PM
52 points
25 comments2 min readLW link

A Com­mu­nity for Un­der­stand­ing Con­scious­ness: Rais­ing r/​MathPie

NavjotツAug 7, 2022, 8:17 AM
−12 points
0 comments3 min readLW link
(www.reddit.com)

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity mat­ters less than you think for real life prob­lems)

Noosphere89Aug 7, 2022, 7:55 PM
17 points
14 comments3 min readLW link
(www.gwern.net)

A suffi­ciently para­noid pa­per­clip maximizer

RomanSAug 8, 2022, 11:17 AM
17 points
10 comments2 min readLW link

Steganog­ra­phy in Chain of Thought Reasoning

A RayAug 8, 2022, 3:47 AM
49 points
13 comments6 min readLW link

In­ter­pretabil­ity/​Tool-ness/​Align­ment/​Cor­rigi­bil­ity are not Composable

johnswentworthAug 8, 2022, 6:05 PM
111 points
8 comments3 min readLW link

How (not) to choose a re­search project

Aug 9, 2022, 12:26 AM
76 points
11 comments7 min readLW link

Team Shard Sta­tus Report

David UdellAug 9, 2022, 5:33 AM
38 points
8 comments3 min readLW link

[Question] How would two su­per­in­tel­li­gent AIs in­ter­act, if they are un­al­igned with each other?

Nathan1123Aug 9, 2022, 6:58 PM
4 points
6 comments1 min readLW link

The Host Minds of HBO’s West­world.

NerretAug 12, 2022, 6:53 PM
1 point
0 comments3 min readLW link

Anti-squat­ted AI x-risk do­mains index

plexAug 12, 2022, 12:01 PM
50 points
3 comments1 min readLW link

The Dumbest Pos­si­ble Gets There First

ArtaxerxesAug 13, 2022, 10:20 AM
35 points
7 comments2 min readLW link

[Question] The OpenAI play­ground for GPT-3 is a ter­rible in­ter­face. Is there any great lo­cal (or web) app for ex­plor­ing/​learn­ing with lan­guage mod­els?

avivAug 13, 2022, 4:34 PM
2 points
1 comment1 min readLW link

I missed the crux of the al­ign­ment prob­lem the whole time

zeshenAug 13, 2022, 10:11 AM
53 points
7 comments3 min readLW link

An Un­canny Prison

Nathan1123Aug 13, 2022, 9:40 PM
3 points
3 comments2 min readLW link

[Question] What is the prob­a­bil­ity that a su­per­in­tel­li­gent, sen­tient AGI is ac­tu­ally in­fea­si­ble?

Nathan1123Aug 14, 2022, 10:41 PM
−3 points
6 comments1 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

Oct 25, 2022, 8:48 PM
9 points
1 comment4 min readLW link

What’s Gen­eral-Pur­pose Search, And Why Might We Ex­pect To See It In Trained ML Sys­tems?

johnswentworthAug 15, 2022, 10:48 PM
103 points
15 comments10 min readLW link

Dis­cov­er­ing Agents

zac_kentonAug 18, 2022, 5:33 PM
56 points
8 comments6 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane RuthenisAug 17, 2022, 6:47 PM
42 points
22 comments1 min readLW link

Con­di­tion­ing, Prompts, and Fine-Tuning

Adam JermynAug 17, 2022, 8:52 PM
32 points
9 comments4 min readLW link

De­bate AI and the De­ci­sion to Re­lease an AI

Chris_LeongJan 17, 2019, 2:36 PM
9 points
18 comments3 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM
75 points
80 comments1 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

NetcentricaAug 21, 2022, 10:01 PM
4 points
2 comments5 min readLW link

AI al­ign­ment as “nav­i­gat­ing the space of in­tel­li­gent be­havi­our”

Nora_AmmannAug 23, 2022, 1:28 PM
18 points
0 comments6 min readLW link

AGI Timelines Are Mostly Not Strate­gi­cally Rele­vant To Alignment

johnswentworthAug 23, 2022, 8:15 PM
44 points
35 comments1 min readLW link

[Question] Would you ask a ge­nie to give you the solu­tion to al­ign­ment?

sudoAug 24, 2022, 1:29 AM
6 points
1 comment1 min readLW link

Ethan Perez on the In­verse Scal­ing Prize, Lan­guage Feed­back and Red Teaming

Michaël TrazziAug 24, 2022, 4:35 PM
25 points
0 comments3 min readLW link
(theinsideview.ai)

Prepar­ing for the apoc­a­lypse might help pre­vent it

OcracokeAug 25, 2022, 12:18 AM
1 point
1 comment1 min readLW link

Your posts should be on arXiv

JanBAug 25, 2022, 10:35 AM
136 points
39 comments3 min readLW link

The Solomonoff prior is ma­lign. It’s not a big deal.

Charlie SteinerAug 25, 2022, 8:25 AM
38 points
9 comments7 min readLW link

AI strat­egy nearcasting

HoldenKarnofskyAug 25, 2022, 5:26 PM
79 points
3 comments9 min readLW link

Com­mon mis­con­cep­tions about OpenAI

Jacob_HiltonAug 25, 2022, 2:02 PM
226 points
138 comments5 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane RuthenisAug 26, 2022, 6:49 PM
29 points
1 comment6 min readLW link

What’s the Most Im­pres­sive Thing That GPT-4 Could Plau­si­bly Do?

bayesedAug 26, 2022, 3:34 PM
23 points
24 comments1 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett BakerAug 26, 2022, 6:26 PM
117 points
48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence PhillipsAug 27, 2022, 12:06 AM
24 points
3 comments2 min readLW link
(www.metaculus.com)

Is there a benefit in low ca­pa­bil­ity AI Align­ment re­search?

LettiAug 26, 2022, 11:51 PM
1 point
1 comment2 min readLW link

Help Un­der­stand­ing Prefer­ences And Evil

NetcentricaAug 27, 2022, 3:42 AM
6 points
7 comments2 min readLW link

Solv­ing Align­ment by “solv­ing” semantics

Q HomeAug 27, 2022, 4:17 AM
15 points
10 comments26 min readLW link

An In­tro­duc­tion to Cur­rent The­o­ries of Consciousness

hohenheimAug 28, 2022, 5:55 PM
59 points
44 comments49 min readLW link

*New* Canada AI Safety & Gover­nance community

Wyatt Tessari L'AlliéAug 29, 2022, 6:45 PM
21 points
0 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane RuthenisAug 29, 2022, 6:37 PM
12 points
2 comments3 min readLW link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

HoldenKarnofskyAug 29, 2022, 3:42 PM
107 points
17 comments45 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworthAug 30, 2022, 8:48 PM
144 points
26 comments10 min readLW link

[Question] How might we make bet­ter use of AI ca­pa­bil­ities re­search for al­ign­ment pur­poses?

Jemal YoungAug 31, 2022, 4:19 AM
11 points
4 comments1 min readLW link

ML Model At­tri­bu­tion Challenge [Linkpost]

aogAug 30, 2022, 7:34 PM
11 points
0 comments1 min readLW link
(mlmac.io)

I Tripped and Be­came GPT! (And How This Up­dated My Timelines)

FrankophoneSep 1, 2022, 5:56 PM
31 points
0 comments4 min readLW link

[Question] Can some­one ex­plain to me why most re­searchers think al­ign­ment is prob­a­bly some­thing that is hu­manly tractable?

iamthouthouartiSep 3, 2022, 1:12 AM
32 points
11 comments1 min readLW link

An Up­date on Academia vs. In­dus­try (one year into my fac­ulty job)

David Scott Krueger (formerly: capybaralet)Sep 3, 2022, 8:43 PM
118 points
18 comments4 min readLW link

Fram­ing AI Childhoods

David UdellSep 6, 2022, 11:40 PM
37 points
8 comments4 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardtSep 5, 2022, 7:55 AM
18 points
13 comments2 min readLW link

Is train­ing data go­ing to be diluted by AI-gen­er­ated con­tent?

Hannes ThurnherrSep 7, 2022, 6:13 PM
10 points
7 comments1 min readLW link

Turn­ing What­sApp Chat Data into Prompt-Re­sponse Form for Fine-Tuning

casualphysicsenjoyerSep 8, 2022, 8:05 PM
1 point
0 comments1 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM
46 points
1 comment5 min readLW link

Mon­i­tor­ing for de­cep­tive alignment

evhubSep 8, 2022, 11:07 PM
118 points
7 comments9 min readLW link

Samotsvety’s AI risk forecasts

eliflandSep 9, 2022, 4:01 AM
44 points
0 comments4 min readLW link

Ought will host a fac­tored cog­ni­tion “Lab Meet­ing”

Sep 9, 2022, 11:46 PM
35 points
1 comment1 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

Sep 11, 2022, 10:57 AM
43 points
13 comments30 min readLW link

An in­ves­ti­ga­tion into when agents may be in­cen­tivized to ma­nipu­late our be­liefs.

Felix HofstätterSep 13, 2022, 5:08 PM
15 points
0 comments14 min readLW link

Risk aver­sion and GPT-3

casualphysicsenjoyerSep 13, 2022, 8:50 PM
1 point
0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGodSep 14, 2022, 12:15 PM
6 points
7 comments6 min readLW link

[Question] Why Do Peo­ple Think Hu­mans Are Stupid?

DragonGodSep 14, 2022, 1:55 PM
21 points
39 comments3 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimonySep 14, 2022, 5:19 PM
18 points
6 comments1 min readLW link

Co­or­di­nate-Free In­ter­pretabil­ity Theory

johnswentworthSep 14, 2022, 11:33 PM
41 points
14 comments5 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilmSep 15, 2022, 8:25 AM
10 points
4 comments12 min readLW link

[Question] Are Hu­man Brains Univer­sal?

DragonGodSep 15, 2022, 3:15 PM
16 points
28 comments5 min readLW link

Should AI learn hu­man val­ues, hu­man norms or some­thing else?

Q HomeSep 17, 2022, 6:19 AM
5 points
2 comments4 min readLW link

The ELK Fram­ing I’ve Used

sudoSep 19, 2022, 10:28 AM
4 points
1 comment1 min readLW link

[Question] If we have Hu­man-level chat­bots, won’t we end up be­ing ruled by pos­si­ble peo­ple?

Erlja Jkdf.Sep 20, 2022, 1:59 PM
5 points
13 comments1 min readLW link

Char­ac­ter alignment

p.b.Sep 20, 2022, 8:27 AM
22 points
0 comments2 min readLW link

Cryp­tocur­rency Ex­ploits Show the Im­por­tance of Proac­tive Poli­cies for AI X-Risk

eSpencerSep 20, 2022, 5:53 PM
1 point
0 comments4 min readLW link

Do­ing over­sight from the very start of train­ing seems hard

peterbarnettSep 20, 2022, 5:21 PM
14 points
3 comments3 min readLW link

Trends in Train­ing Dataset Sizes

Pablo VillalobosSep 21, 2022, 3:47 PM
24 points
2 comments5 min readLW link
(epochai.org)

Two rea­sons we might be closer to solv­ing al­ign­ment than it seems

Sep 24, 2022, 8:00 PM
56 points
9 comments4 min readLW link

Fund­ing is All You Need: Get­ting into Grad School by Hack­ing the NSF GRFP Fellowship

hapaninSep 22, 2022, 9:39 PM
93 points
9 comments12 min readLW link

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

FeraidoonSep 24, 2022, 11:53 PM
6 points
0 comments1 min readLW link

How to Study Un­safe AGI’s safely (and why we might have no choice)

PunoxysmMar 7, 2014, 7:24 AM
10 points
47 comments5 min readLW link

On Generality

Eris DiscordiaSep 26, 2022, 4:06 AM
2 points
0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Eris DiscordiaSep 26, 2022, 4:06 AM
0 points
0 comments1 min readLW link

Sum­mary of ML Safety Course

zeshenSep 27, 2022, 1:05 PM
6 points
0 comments6 min readLW link

My Thoughts on the ML Safety Course

zeshenSep 27, 2022, 1:15 PM
49 points
3 comments17 min readLW link

Re­ward IS the Op­ti­miza­tion Target

CarnSep 28, 2022, 5:59 PM
−1 points
3 comments5 min readLW link

A Library and Tu­to­rial for Fac­tored Cog­ni­tion with Lan­guage Models

Sep 28, 2022, 6:15 PM
47 points
0 comments1 min readLW link

Will Values and Com­pe­ti­tion De­cou­ple?

intersticeSep 28, 2022, 4:27 PM
15 points
11 comments17 min readLW link

Make-A-Video by Meta AI

P.Sep 29, 2022, 5:07 PM
9 points
4 comments1 min readLW link
(makeavideo.studio)

Open ap­pli­ca­tion to be­come an AI safety pro­ject mentor

Charbel-RaphaëlSep 29, 2022, 11:27 AM
7 points
0 comments1 min readLW link
(docs.google.com)

It mat­ters when the first sharp left turn happens

Adam JermynSep 29, 2022, 8:12 PM
35 points
9 comments4 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

eliflandSep 30, 2022, 12:21 PM
58 points
0 comments3 min readLW link
(docs.google.com)

[Question] Rank the fol­low­ing based on like­li­hood to nul­lify AI-risk

AorouSep 30, 2022, 11:15 AM
3 points
1 comment4 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon LangSep 29, 2022, 10:38 PM
17 points
2 comments12 min readLW link

[Question] What Is the Idea Be­hind (Un-)Su­per­vised Learn­ing and Re­in­force­ment Learn­ing?

MorpheusSep 30, 2022, 4:48 PM
9 points
6 comments2 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul BricmanSep 30, 2022, 11:28 AM
25 points
0 comments10 min readLW link

Where I cur­rently dis­agree with Ryan Green­blatt’s ver­sion of the ELK approach

So8resSep 29, 2022, 9:18 PM
63 points
7 comments5 min readLW link

Paper: Large Lan­guage Models Can Self-im­prove [Linkpost]

Evan R. MurphyOct 2, 2022, 1:29 AM
52 points
14 comments1 min readLW link
(openreview.net)

[Question] Is there a cul­ture over­hang?

Aleksi LiimatainenOct 3, 2022, 7:26 AM
18 points
4 comments1 min readLW link

Vi­su­al­iz­ing Learned Rep­re­sen­ta­tions of Rice Disease

muhia_beeOct 3, 2022, 9:09 AM
7 points
0 comments4 min readLW link
(indecisive-sand-24a.notion.site)

If you want to learn tech­ni­cal AI safety, here’s a list of AI safety courses, read­ing lists, and resources

KatWoodsOct 3, 2022, 12:43 PM
12 points
3 comments1 min readLW link

Frontline of AGI Align­ment

SD MarlowOct 4, 2022, 3:47 AM
−10 points
0 comments1 min readLW link
(robothouse.substack.com)

Hu­mans aren’t fit­ness maximizers

So8resOct 4, 2022, 1:31 AM
52 points
45 comments5 min readLW link

Smoke with­out fire is scary

Adam JermynOct 4, 2022, 9:08 PM
49 points
22 comments4 min readLW link

CHAI, As­sis­tance Games, And Fully-Up­dated Defer­ence [Scott Alexan­der]

lberglundOct 4, 2022, 5:04 PM
21 points
1 comment17 min readLW link
(astralcodexten.substack.com)

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael GlassOct 5, 2022, 11:18 PM
11 points
3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

T431Oct 5, 2022, 4:42 PM
8 points
4 comments2 min readLW link
(www.whitehouse.gov)

The Answer

Alex BeymanOct 5, 2022, 9:23 PM
−3 points
0 comments4 min readLW link

The prob­a­bil­ity that Ar­tifi­cial Gen­eral In­tel­li­gence will be de­vel­oped by 2043 is ex­tremely low.

cveresOct 6, 2022, 6:05 PM
−14 points
8 comments1 min readLW link

The Shape of Things to Come

Alex BeymanOct 7, 2022, 4:11 PM
12 points
3 comments8 min readLW link

The Slow Reveal

Alex BeymanOct 9, 2022, 3:16 AM
3 points
0 comments24 min readLW link

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:13 AM
72 points
32 comments3 min readLW link

Boolean Prim­i­tives for Cou­pled Optimizers

Paul BricmanOct 7, 2022, 6:02 PM
9 points
0 comments8 min readLW link

Anal­y­sis: US re­stricts GPU sales to China

aogOct 7, 2022, 6:38 PM
94 points
58 comments5 min readLW link

[Question] Bro­ken Links for the Au­dio Ver­sion of 2021 MIRI Conversations

KriegerOct 8, 2022, 4:16 PM
1 point
1 comment1 min readLW link

Don’t leave your finger­prints on the future

So8resOct 8, 2022, 12:35 AM
93 points
32 comments5 min readLW link

Let’s talk about un­con­trol­lable AI

Karl von WendtOct 9, 2022, 10:34 AM
12 points
6 comments3 min readLW link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM
207 points
16 comments12 min readLW link

When re­port­ing AI timelines, be clear who you’re (not) defer­ring to

Sam ClarkeOct 10, 2022, 2:24 PM
37 points
3 comments1 min readLW link

Nat­u­ral Cat­e­gories Update

Logan ZoellnerOct 10, 2022, 3:19 PM
29 points
6 comments2 min readLW link

Up­dates and Clarifications

SD MarlowOct 11, 2022, 5:34 AM
−5 points
1 comment1 min readLW link

My ar­gu­ment against AGI

cveresOct 12, 2022, 6:33 AM
3 points
5 comments1 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

Oct 12, 2022, 12:24 PM
27 points
4 comments8 min readLW link
(www.gladstone.ai)

A strange twist on the road to AGI

cveresOct 12, 2022, 11:27 PM
−8 points
0 comments1 min readLW link

Perfect Enemy

Alex BeymanOct 13, 2022, 8:23 AM
−2 points
0 comments46 min readLW link

A stub­born un­be­liever fi­nally gets the depth of the AI al­ign­ment problem

aelwoodOct 13, 2022, 3:16 PM
17 points
8 comments3 min readLW link
(pursuingreality.substack.com)

Misal­ign­ment-by-de­fault in multi-agent systems

Oct 13, 2022, 3:38 PM
17 points
8 comments20 min readLW link
(www.gladstone.ai)

Nice­ness is unnatural

So8resOct 13, 2022, 1:30 AM
98 points
18 comments8 min readLW link

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Xin Chen, CynthiaOct 13, 2022, 6:32 PM
21 points
0 comments1 min readLW link

Greed Is the Root of This Evil

Thane RuthenisOct 13, 2022, 8:40 PM
21 points
4 comments8 min readLW link

Con­tra shard the­ory, in the con­text of the di­a­mond max­i­mizer problem

So8resOct 13, 2022, 11:51 PM
84 points
16 comments2 min readLW link

An­thro­po­mor­phic AI and Sand­boxed Vir­tual Uni­verses

jacob_cannellSep 3, 2010, 7:02 PM
4 points
124 comments5 min readLW link

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

Oct 14, 2022, 3:50 PM
15 points
0 comments17 min readLW link
(www.gladstone.ai)

Prov­ably Hon­est—A First Step

Srijanak DeNov 5, 2022, 7:18 PM
10 points
2 comments8 min readLW link

They gave LLMs ac­cess to physics simulators

ryan_bOct 17, 2022, 9:21 PM
50 points
18 comments1 min readLW link
(arxiv.org)

De­ci­sion the­ory does not im­ply that we get to have nice things

So8resOct 18, 2022, 3:04 AM
142 points
53 comments26 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere89Oct 18, 2022, 5:48 PM
3 points
0 comments1 min readLW link

How To Make Pre­dic­tion Mar­kets Use­ful For Align­ment Work

johnswentworthOct 18, 2022, 7:01 PM
86 points
18 comments2 min readLW link

The re­ward func­tion is already how well you ma­nipu­late humans

KerryOct 19, 2022, 1:52 AM
20 points
9 comments2 min readLW link

Co­op­er­a­tors are more pow­er­ful than agents

Ivan VendrovOct 21, 2022, 8:02 PM
14 points
7 comments3 min readLW link

Log­i­cal De­ci­sion The­o­ries: Our fi­nal failsafe?

Noosphere89Oct 25, 2022, 12:51 PM
−6 points
8 comments1 min readLW link
(www.lesswrong.com)

[Question] Sim­ple ques­tion about cor­rigi­bil­ity and val­ues in AI.

jmhOct 22, 2022, 2:59 AM
6 points
1 comment1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
14 points
0 comments1 min readLW link

“Origi­nal­ity is noth­ing but ju­di­cious imi­ta­tion”—Voltaire

VestoziaOct 23, 2022, 7:00 PM
0 points
0 comments13 min readLW link

AI re­searchers an­nounce Neu­roAI agenda

Cameron BergOct 24, 2022, 12:14 AM
37 points
12 comments6 min readLW link
(arxiv.org)

AGI in our life­times is wish­ful thinking

niknobleOct 24, 2022, 11:53 AM
−4 points
21 comments8 min readLW link

ques­tion-an­swer coun­ter­fac­tual intervals

Tamsin LeakeOct 24, 2022, 1:08 PM
8 points
0 comments4 min readLW link
(carado.moe)

Why some peo­ple be­lieve in AGI, but I don’t.

cveresOct 26, 2022, 3:09 AM
−15 points
6 comments1 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere89Oct 27, 2022, 2:41 PM
12 points
18 comments1 min readLW link

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDELOct 28, 2022, 1:53 AM
−22 points
4 comments9 min readLW link

Causal scrub­bing: Appendix

Dec 3, 2022, 12:58 AM
16 points
0 comments20 min readLW link

Beyond Kol­mogorov and Shannon

Oct 25, 2022, 3:13 PM
60 points
14 comments5 min readLW link

Method of state­ments: an al­ter­na­tive to taboo

Q HomeNov 16, 2022, 10:57 AM
7 points
0 comments41 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

Dec 3, 2022, 12:58 AM
130 points
9 comments20 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

Oct 28, 2022, 11:55 PM
86 points
5 comments9 min readLW link
(arxiv.org)

Causal scrub­bing: re­sults on a paren bal­ance checker

Dec 3, 2022, 12:59 AM
26 points
0 comments30 min readLW link

AI as a Civ­i­liza­tional Risk Part 1/​6: His­tor­i­cal Priors

PashaKamyshevOct 29, 2022, 9:59 PM
2 points
2 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM
9 points
0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM
7 points
4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshevNov 1, 2022, 8:50 PM
7 points
1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshevNov 3, 2022, 2:19 AM
2 points
0 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshevNov 3, 2022, 7:48 PM
2 points
3 comments4 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porbyOct 29, 2022, 10:16 PM
98 points
4 comments4 min readLW link

“Nor­mal” is the equil­ibrium state of past op­ti­miza­tion processes

Alex_AltairOct 30, 2022, 7:03 PM
77 points
5 comments5 min readLW link

love, not competition

Tamsin LeakeOct 30, 2022, 7:44 PM
31 points
20 comments1 min readLW link
(carado.moe)

My (naive) take on Risks from Learned Optimization

Artyom KarpovOct 31, 2022, 10:59 AM
7 points
0 comments5 min readLW link

Embed­ding safety in ML development

zeshenOct 31, 2022, 12:27 PM
24 points
1 comment18 min readLW link

Au­dit­ing games for high-level interpretability

Paul CologneseNov 1, 2022, 10:44 AM
28 points
1 comment7 min readLW link

pub­lish­ing al­ign­ment re­search and infohazards

Tamsin LeakeOct 31, 2022, 6:02 PM
69 points
10 comments1 min readLW link
(carado.moe)

Cau­tion when in­ter­pret­ing Deep­mind’s In-con­text RL paper

Sam MarksNov 1, 2022, 2:42 AM
104 points
6 comments4 min readLW link

AGI and the fu­ture: Is a fu­ture with AGI and hu­mans al­ive ev­i­dence that AGI is not a threat to our ex­is­tence?

LetUsTalkNov 1, 2022, 7:37 AM
4 points
8 comments1 min readLW link

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
55 points
4 comments25 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
102 points
23 comments4 min readLW link

a ca­sual in­tro to AI doom and alignment

Tamsin LeakeNov 1, 2022, 4:38 PM
12 points
0 comments4 min readLW link
(carado.moe)

[Question] Which Is­sues in Con­cep­tual Align­ment have been For­mal­ised or Ob­served (or not)?

ojorgensenNov 1, 2022, 10:32 PM
4 points
0 comments1 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM
12 points
2 comments12 min readLW link
(sambrown.eu)

Why do we post our AI safety plans on the In­ter­net?

Peter S. ParkNov 3, 2022, 4:02 PM
3 points
4 comments11 min readLW link

Mechanis­tic In­ter­pretabil­ity as Re­v­erse Eng­ineer­ing (fol­low-up to “cars and elephants”)

David Scott Krueger (formerly: capybaralet)Nov 3, 2022, 11:19 PM
28 points
3 comments1 min readLW link

[Question] Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson JonesNov 4, 2022, 12:58 AM
13 points
3 comments3 min readLW link

[Question] Don’t you think RLHF solves outer al­ign­ment?

Charbel-RaphaëlNov 4, 2022, 12:36 AM
2 points
19 comments1 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshenNov 4, 2022, 2:29 PM
30 points
1 comment10 min readLW link

Toy Models and Tegum Products

Adam JermynNov 4, 2022, 6:51 PM
27 points
7 comments5 min readLW link

For ELK truth is mostly a distraction

c.troutNov 4, 2022, 9:14 PM
32 points
0 comments21 min readLW link

In­ter­pret­ing sys­tems as solv­ing POMDPs: a step to­wards a for­mal un­der­stand­ing of agency [pa­per link]

the gears to ascensionNov 5, 2022, 1:06 AM
12 points
2 comments1 min readLW link
(www.semanticscholar.org)

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David JohnstonNov 5, 2022, 1:19 PM
8 points
4 comments16 min readLW link

The Slip­pery Slope from DALLE-2 to Deep­fake Anarchy

scasperNov 5, 2022, 2:53 PM
16 points
9 comments11 min readLW link

[Question] Can we get around Godel’s In­com­plete­ness the­o­rems and Tur­ing un­de­cid­able prob­lems via in­finite com­put­ers?

Noosphere89Nov 5, 2022, 6:01 PM
−10 points
12 comments1 min readLW link

Recom­mend HAIST re­sources for as­sess­ing the value of RLHF-re­lated al­ign­ment research

Nov 5, 2022, 8:58 PM
26 points
9 comments3 min readLW link

[Question] Has any­one in­creased their AGI timelines?

Darren McKeeNov 6, 2022, 12:03 AM
38 points
13 comments1 min readLW link

Ap­ply­ing su­per­in­tel­li­gence with­out col­lu­sion

Eric DrexlerNov 8, 2022, 6:08 PM
88 points
57 comments4 min readLW link

A philoso­pher’s cri­tique of RLHF

TW123Nov 7, 2022, 2:42 AM
55 points
8 comments2 min readLW link

4 Key As­sump­tions in AI Safety

PrometheusNov 7, 2022, 10:50 AM
20 points
5 comments7 min readLW link

Hacker-AI – Does it already ex­ist?

Erland WittkotterNov 7, 2022, 2:01 PM
3 points
11 comments11 min readLW link

Loss of con­trol of AI is not a likely source of AI x-risk

squekNov 7, 2022, 6:44 PM
−6 points
0 comments5 min readLW link

Mys­ter­ies of mode collapse

janusNov 8, 2022, 10:37 AM
213 points
35 comments14 min readLW link

Some ad­vice on in­de­pen­dent research

Marius HobbhahnNov 8, 2022, 2:46 PM
41 points
4 comments10 min readLW link

A first suc­cess story for Outer Align­ment: In­struc­tGPT

Noosphere89Nov 8, 2022, 10:52 PM
6 points
1 comment1 min readLW link
(openai.com)

A caveat to the Orthog­o­nal­ity Thesis

Wuschel SchulzNov 9, 2022, 3:06 PM
36 points
10 comments2 min readLW link

Try­ing to Make a Treach­er­ous Mesa-Optimizer

MadHatterNov 9, 2022, 6:07 PM
87 points
13 comments4 min readLW link
(attentionspan.blog)

Is full self-driv­ing an AGI-com­plete prob­lem?

kraemahzNov 10, 2022, 2:04 AM
5 points
5 comments1 min readLW link

The har­ness­ing of complexity

geduardoNov 10, 2022, 6:44 PM
6 points
2 comments3 min readLW link

[Question] I there a demo of “You can’t fetch the coffee if you’re dead”?

Ram RachumNov 10, 2022, 6:41 PM
8 points
9 comments1 min readLW link

LessWrong Poll on AGI

Niclas KupperNov 10, 2022, 1:13 PM
12 points
6 comments1 min readLW link

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
27 points
9 comments34 min readLW link

[simu­la­tion] 4chan user claiming to be the at­tor­ney hired by Google’s sen­tient chat­bot LaMDA shares wild de­tails of encounter

janusNov 10, 2022, 9:39 PM
11 points
1 comment13 min readLW link
(generative.ink)

Why I’m Work­ing On Model Ag­nos­tic Interpretability

Jessica RumbelowNov 11, 2022, 9:24 AM
28 points
9 comments2 min readLW link

Are fund­ing op­tions for AI Safety threat­ened? W45

Nov 11, 2022, 1:00 PM
7 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

How likely are ma­lign pri­ors over ob­jec­tives? [aborted WIP]

David JohnstonNov 11, 2022, 5:36 AM
−2 points
0 comments8 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatterNov 12, 2022, 2:33 AM
8 points
2 comments2 min readLW link

Vanessa Kosoy’s PreDCA, distilled

Martín SotoNov 12, 2022, 11:38 AM
16 points
17 comments5 min readLW link

fully al­igned sin­gle­ton as a solu­tion to everything

Tamsin LeakeNov 12, 2022, 6:19 PM
6 points
2 comments2 min readLW link
(carado.moe)

Ways to buy time

Nov 12, 2022, 7:31 PM
26 points
21 comments12 min readLW link

Char­ac­ter­iz­ing In­trin­sic Com­po­si­tion­al­ity in Trans­form­ers with Tree Projections

Ulisse MiniNov 13, 2022, 9:46 AM
12 points
2 comments1 min readLW link
(arxiv.org)

I (with the help of a few more peo­ple) am plan­ning to cre­ate an in­tro­duc­tion to AI Safety that a smart teenager can un­der­stand. What am I miss­ing?

TapataktNov 14, 2022, 4:12 PM
3 points
5 comments1 min readLW link

Will we run out of ML data? Ev­i­dence from pro­ject­ing dataset size trends

Pablo VillalobosNov 14, 2022, 4:42 PM
74 points
12 comments2 min readLW link
(epochai.org)

The limited up­side of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM
13 points
11 comments1 min readLW link

[Question] Is the speed of train­ing large mod­els go­ing to in­crease sig­nifi­cantly in the near fu­ture due to Cere­bras An­dromeda?

Amal Nov 15, 2022, 10:50 PM
11 points
11 comments1 min readLW link

Un­pack­ing “Shard The­ory” as Hunch, Ques­tion, The­ory, and Insight

Jacy Reese AnthisNov 16, 2022, 1:54 PM
29 points
9 comments2 min readLW link

The two con­cep­tions of Ac­tive In­fer­ence: an in­tel­li­gence ar­chi­tec­ture and a the­ory of agency

Roman LeventovNov 16, 2022, 9:30 AM
7 points
0 comments4 min readLW link

Eng­ineer­ing Monose­man­tic­ity in Toy Models

Nov 18, 2022, 1:43 AM
72 points
6 comments3 min readLW link
(arxiv.org)

[Question] Is there any policy for a fair treat­ment of AIs whose friendli­ness is in doubt?

nahojNov 18, 2022, 7:01 PM
15 points
9 comments1 min readLW link

The Ground Truth Prob­lem (Or, Why Eval­u­at­ing In­ter­pretabil­ity Meth­ods Is Hard)

Jessica RumbelowNov 17, 2022, 11:06 AM
26 points
2 comments2 min readLW link

Mas­sive Scal­ing Should be Frowned Upon

harsimonyNov 17, 2022, 8:43 AM
7 points
6 comments5 min readLW link

How AI Fails Us: A non-tech­ni­cal view of the Align­ment Problem

testingthewatersNov 18, 2022, 7:02 PM
7 points
0 comments2 min readLW link
(ethics.harvard.edu)

LLMs may cap­ture key com­po­nents of hu­man agency

catubcNov 17, 2022, 8:14 PM
21 points
0 comments4 min readLW link

AGIs may value in­trin­sic re­wards more than ex­trin­sic ones

catubcNov 17, 2022, 9:49 PM
8 points
6 comments4 min readLW link

The econ­omy as an anal­ogy for ad­vanced AI systems

Nov 15, 2022, 11:16 AM
26 points
0 comments5 min readLW link

Cog­ni­tive sci­ence and failed AI fore­casts

Eleni AngelouNov 24, 2022, 9:02 PM
0 points
0 comments2 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

Nov 19, 2022, 9:04 PM
40 points
0 comments3 min readLW link

[Question] Up­dates on scal­ing laws for foun­da­tion mod­els from ′ Tran­scend­ing Scal­ing Laws with 0.1% Ex­tra Com­pute’

Nick_GreigNov 18, 2022, 12:46 PM
15 points
2 comments1 min readLW link

Distil­la­tion of “How Likely Is De­cep­tive Align­ment?”

NickGabsNov 18, 2022, 4:31 PM
20 points
3 comments10 min readLW link

The Disas­trously Con­fi­dent And Inac­cu­rate AI

Sharat Jacob JacobNov 18, 2022, 7:06 PM
13 points
0 comments13 min readLW link

gen­er­al­ized wireheading

Tamsin LeakeNov 18, 2022, 8:18 PM
21 points
7 comments2 min readLW link
(carado.moe)

By De­fault, GPTs Think In Plain Sight

Fabien RogerNov 19, 2022, 7:15 PM
60 points
16 comments9 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik JennerNov 20, 2022, 1:22 AM
88 points
2 comments2 min readLW link
(arxiv.org)

Planes are still decades away from dis­plac­ing most bird jobs

guzeyNov 25, 2022, 4:49 PM
156 points
13 comments3 min readLW link

Scott Aaron­son on “Re­form AI Align­ment”

ShmiNov 20, 2022, 10:20 PM
39 points
17 comments1 min readLW link
(scottaaronson.blog)

How Should AIS Re­late To Its Fun­ders? W46

Nov 21, 2022, 3:58 PM
6 points
1 comment3 min readLW link
(newsletter.apartresearch.com)

Benefits/​Risks of Scott Aaron­son’s Ortho­dox/​Re­form Fram­ing for AI Alignment

JeremyyNov 21, 2022, 5:54 PM
2 points
1 comment1 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Introduction

Nov 21, 2022, 8:34 PM
34 points
3 comments4 min readLW link
(www.snellessen.com)

Mis­cel­la­neous First-Pass Align­ment Thoughts

NickGabsNov 21, 2022, 9:23 PM
12 points
4 comments10 min readLW link

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM
95 points
64 comments1 min readLW link
(www.science.org)

An­nounc­ing AI Align­ment Awards: $100k re­search con­tests about goal mis­gen­er­al­iza­tion & corrigibility

Nov 22, 2022, 10:19 PM
69 points
20 comments4 min readLW link

Brute-forc­ing the uni­verse: a non-stan­dard shot at di­a­mond alignment

Martín SotoNov 22, 2022, 10:36 PM
6 points
0 comments20 min readLW link

Si­mu­la­tors, con­straints, and goal ag­nos­ti­cism: por­bynotes vol. 1

porbyNov 23, 2022, 4:22 AM
36 points
2 comments35 min readLW link

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

Nov 23, 2022, 6:49 AM
11 points
0 comments8 min readLW link

Hu­man-level Di­plo­macy was my fire alarm

Lao MeinNov 23, 2022, 10:05 AM
51 points
15 comments3 min readLW link

Ex nihilo

Hopkins StanleyNov 23, 2022, 2:38 PM
1 point
0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane RuthenisNov 24, 2022, 5:06 PM
13 points
5 comments9 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

Nov 23, 2022, 5:10 PM
183 points
9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

Nov 23, 2022, 5:11 PM
85 points
0 comments1 min readLW link

In­ject­ing some num­bers into the AGI de­bate—by Boaz Barak

JsevillamolNov 23, 2022, 4:10 PM
12 points
0 comments3 min readLW link
(windowsontheory.org)

Hu­man-level Full-Press Di­plo­macy (some bare facts).

Cleo NardoNov 22, 2022, 8:59 PM
50 points
7 comments3 min readLW link

When AI solves a game, fo­cus on the game’s me­chan­ics, not its theme.

Cleo NardoNov 23, 2022, 7:16 PM
81 points
7 comments2 min readLW link

[Question] What is the best source to ex­plain short AI timelines to a skep­ti­cal per­son?

trevorNov 23, 2022, 5:19 AM
4 points
4 comments1 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

Dec 5, 2022, 8:28 PM
37 points
16 comments10 min readLW link

The man and the tool

pedroalvaradoNov 25, 2022, 7:51 PM
1 point
0 comments4 min readLW link

Gliders in Lan­guage Models

Alexandre VariengienNov 25, 2022, 12:38 AM
27 points
11 comments10 min readLW link

The AI Safety com­mu­nity has four main work groups, Strat­egy, Gover­nance, Tech­ni­cal and Move­ment Building

peterslatteryNov 25, 2022, 3:45 AM
0 points
0 comments6 min readLW link

Us­ing mechanis­tic in­ter­pretabil­ity to find in-dis­tri­bu­tion failure in toy transformers

Charlie GeorgeNov 28, 2022, 7:39 PM
6 points
0 comments4 min readLW link

In­tu­itions by ML re­searchers may get pro­gres­sively worse con­cern­ing likely can­di­dates for trans­for­ma­tive AI

Viktor RehnbergNov 25, 2022, 3:49 PM
7 points
0 comments2 min readLW link

Guardian AI (Misal­igned sys­tems are all around us.)

Jessica RumbelowNov 25, 2022, 3:55 PM
15 points
6 comments2 min readLW link

Three Align­ment Schemas & Their Problems

Shoshannah TekofskyNov 26, 2022, 4:25 AM
16 points
1 comment6 min readLW link

Re­ward Is Not Ne­c­es­sary: How To Create A Com­po­si­tional Self-Pre­serv­ing Agent For Life-Long Learning

CapybasiliskNov 27, 2022, 2:05 PM
3 points
0 comments1 min readLW link
(arxiv.org)

Re­view: LOVE in a simbox

PeterMcCluskeyNov 27, 2022, 5:41 PM
32 points
4 comments9 min readLW link
(bayesianinvestor.com)

Su­per­in­tel­li­gent AI is nec­es­sary for an amaz­ing fu­ture, but far from sufficient

So8resOct 31, 2022, 9:16 PM
115 points
46 comments34 min readLW link

[Question] How to cor­rect for mul­ti­plic­ity with AI-gen­er­ated mod­els?

Lao MeinNov 28, 2022, 3:51 AM
4 points
0 comments1 min readLW link

Is Con­struc­tor The­ory a use­ful tool for AI al­ign­ment?

A.H.Nov 29, 2022, 12:35 PM
11 points
8 comments26 min readLW link

Multi-Com­po­nent Learn­ing and S-Curves

Nov 30, 2022, 1:37 AM
57 points
24 comments7 min readLW link

Sub­sets and quo­tients in interpretability

Erik JennerDec 2, 2022, 11:13 PM
24 points
1 comment7 min readLW link

Ne­glected cause: au­to­mated fraud de­tec­tion in academia through image analysis

Lao MeinNov 30, 2022, 5:52 AM
10 points
1 comment2 min readLW link

AGI Im­pos­si­ble due to En­ergy Constrains

TheKlausNov 30, 2022, 6:48 PM
−8 points
13 comments1 min readLW link

Master plan spec: needs au­dit (logic and co­op­er­a­tive AI)

QuinnNov 30, 2022, 6:10 AM
12 points
5 comments7 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel KokotajloNov 30, 2022, 7:16 AM
51 points
3 comments1 min readLW link

Has AI gone too far?

Boston AndersonNov 30, 2022, 6:49 PM
−15 points
3 comments1 min readLW link

Seek­ing sub­mis­sions for short AI-safety course proposals

SergioDec 1, 2022, 12:32 AM
3 points
0 comments1 min readLW link

Did ChatGPT just gaslight me?

TW123Dec 1, 2022, 5:41 AM
123 points
45 comments9 min readLW link
(equonc.substack.com)

Safe Devel­op­ment of Hacker-AI Coun­ter­mea­sures – What if we are too late?

Erland WittkotterDec 1, 2022, 7:59 AM
3 points
0 comments14 min readLW link

Re­search re­quest (al­ign­ment strat­egy): Deep dive on “mak­ing AI solve al­ign­ment for us”

JanBDec 1, 2022, 2:55 PM
16 points
3 comments1 min readLW link

[LINK] - ChatGPT discussion

JanBDec 1, 2022, 3:04 PM
13 points
7 comments1 min readLW link
(openai.com)

ChatGPT: First Impressions

specbugDec 1, 2022, 4:36 PM
18 points
2 comments13 min readLW link
(sixeleven.in)

Re-Ex­am­in­ing LayerNorm

Eric WinsorDec 1, 2022, 10:20 PM
100 points
8 comments5 min readLW link

Up­date on Har­vard AI Safety Team and MIT AI Alignment

Dec 2, 2022, 12:56 AM
56 points
4 comments8 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

berenDec 2, 2022, 11:30 AM
48 points
6 comments10 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

JozdienDec 2, 2022, 4:33 PM
31 points
8 comments5 min readLW link

Take­off speeds, the chimps anal­ogy, and the Cul­tural In­tel­li­gence Hypothesis

NickGabsDec 2, 2022, 7:14 PM
14 points
2 comments4 min readLW link

Non-Tech­ni­cal Prepa­ra­tion for Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:42 AM
2 points
0 comments25 min readLW link

Ap­ply for the ML Up­skil­ling Win­ter Camp in Cam­bridge, UK [2-10 Jan]

hannah wing-yeeDec 2, 2022, 8:45 PM
3 points
0 comments2 min readLW link

Re­search Prin­ci­ples for 6 Months of AI Align­ment Studies

Shoshannah TekofskyDec 2, 2022, 10:55 PM
22 points
3 comments6 min readLW link

Chat GPT’s views on Me­ta­physics and Ethics

Cole KillianDec 3, 2022, 6:12 PM
5 points
3 comments1 min readLW link
(twitter.com)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahojDec 3, 2022, 8:32 PM
1 point
8 comments1 min readLW link

Could an AI be Reli­gious?

mk54Dec 4, 2022, 5:00 AM
−12 points
14 comments1 min readLW link

ChatGPT seems over­con­fi­dent to me

qbolecDec 4, 2022, 8:03 AM
19 points
3 comments16 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−19 points
4 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM
12 points
2 comments1 min readLW link

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie SteinerDec 4, 2022, 2:48 AM
21 points
12 comments2 min readLW link

ChatGPT is set­tling the Chi­nese Room argument

averrosDec 4, 2022, 8:25 PM
−7 points
4 comments1 min readLW link

AGI as a Black Swan Event

Stephen McAleeseDec 4, 2022, 11:00 PM
8 points
8 comments7 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
73 points
15 comments2 min readLW link

A ChatGPT story about ChatGPT doom

SurfingOrcaDec 5, 2022, 5:40 AM
6 points
3 comments4 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM
19 points
5 comments7 min readLW link

Is the “Valley of Con­fused Ab­strac­tions” real?

jacquesthibsDec 5, 2022, 1:36 PM
15 points
9 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash JafariDec 5, 2022, 7:21 PM
10 points
2 comments4 min readLW link

Test­ing Ways to By­pass ChatGPT’s Safety Features

Robert_AIZIDec 5, 2022, 6:50 PM
6 points
2 comments5 min readLW link
(aizi.substack.com)

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill BenzonDec 5, 2022, 9:10 PM
5 points
0 comments4 min readLW link

Shh, don’t tell the AI it’s likely to be evil

naterushDec 6, 2022, 3:35 AM
19 points
9 comments1 min readLW link

Neu­ral net­works bi­ased to­wards ge­o­met­ri­cally sim­ple func­tions?

DavidHolmesDec 8, 2022, 4:16 PM
16 points
2 comments3 min readLW link

Things roll downhill

awenonianDec 6, 2022, 3:27 PM
19 points
0 comments1 min readLW link

ChatGPT and the Hu­man Race

Ben ReillyDec 6, 2022, 9:38 PM
6 points
1 comment3 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM
3 points
2 comments3 min readLW link

In defense of prob­a­bly wrong mechanis­tic models

evhubDec 6, 2022, 11:24 PM
41 points
10 comments2 min readLW link

ChatGPT: “An er­ror oc­curred. If this is­sue per­sists...”

Bill BenzonDec 7, 2022, 3:41 PM
5 points
11 comments3 min readLW link

Where to be an AI Safety Pro­fes­sor

scasperDec 7, 2022, 7:09 AM
30 points
12 comments2 min readLW link

Thoughts on AGI or­ga­ni­za­tions and ca­pa­bil­ities work

Dec 7, 2022, 7:46 PM
94 points
17 comments5 min readLW link

Riffing on the agent type

QuinnDec 8, 2022, 12:19 AM
16 points
0 comments4 min readLW link

Of pump­kins, the Fal­con Heavy, and Grou­cho Marx: High-Level dis­course struc­ture in ChatGPT

Bill BenzonDec 8, 2022, 10:25 PM
2 points
0 comments8 min readLW link

Why I’m Scep­ti­cal of Foom

DragonGodDec 8, 2022, 10:01 AM
19 points
26 comments3 min readLW link

If Went­worth is right about nat­u­ral ab­strac­tions, it would be bad for alignment

Wuschel SchulzDec 8, 2022, 3:19 PM
27 points
5 comments4 min readLW link

Take 7: You should talk about “the hu­man’s util­ity func­tion” less.

Charlie SteinerDec 8, 2022, 8:14 AM
47 points
22 comments2 min readLW link

Notes on OpenAI’s al­ign­ment plan

Alex FlintDec 8, 2022, 7:13 PM
47 points
5 comments7 min readLW link

We need to make scary AIs

Igor IvanovDec 9, 2022, 10:04 AM
3 points
8 comments5 min readLW link

I Believe we are in a Hard­ware Overhang

nemDec 8, 2022, 11:18 PM
8 points
0 comments1 min readLW link

[Question] What are your thoughts on the fu­ture of AI-as­sisted soft­ware de­vel­op­ment?

RomanHaukssonDec 9, 2022, 10:04 AM
4 points
2 comments1 min readLW link

ChatGPT’s Misal­ign­ment Isn’t What You Think

stavrosDec 9, 2022, 11:11 AM
3 points
12 comments1 min readLW link

Si­mu­la­tors and Mindcrime

DragonGodDec 9, 2022, 3:20 PM
0 points
4 comments3 min readLW link

Work­ing to­wards AI al­ign­ment is better

Johannes C. MayerDec 9, 2022, 3:39 PM
7 points
2 comments2 min readLW link

[Question] How would you im­prove ChatGPT’s fil­ter­ing?

Noah ScalesDec 10, 2022, 8:05 AM
9 points
6 comments1 min readLW link

In­spira­tion as a Scarce Resource

zenbu zenbu zenbu zenbuDec 10, 2022, 3:23 PM
7 points
0 comments4 min readLW link
(inflorescence.substack.com)

Poll Re­sults on AGI

Niclas KupperDec 10, 2022, 9:25 PM
10 points
0 comments2 min readLW link

The Op­por­tu­nity and Risks of Learn­ing Hu­man Values In-Context

Past AccountDec 10, 2022, 9:40 PM
1 point
4 comments5 min readLW link

High level dis­course struc­ture in ChatGPT: Part 2 [Quasi-sym­bolic?]

Bill BenzonDec 10, 2022, 10:26 PM
7 points
0 comments6 min readLW link

ChatGPT goes through a worm­hole hole in our Shandyesque uni­verse [vir­tual wacky weed]

Bill BenzonDec 11, 2022, 11:59 AM
−1 points
2 comments3 min readLW link

Ques­tions about AI that bother me

Eleni AngelouDec 11, 2022, 6:14 PM
11 points
2 comments2 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

Dec 11, 2022, 9:53 PM
31 points
0 comments18 min readLW link

Bench­marks for Com­par­ing Hu­man and AI Intelligence

MrThinkDec 11, 2022, 10:06 PM
8 points
4 comments2 min readLW link

a rough sketch of for­mal al­igned AI us­ing QACI

Tamsin LeakeDec 11, 2022, 11:40 PM
14 points
0 comments4 min readLW link
(carado.moe)

Triv­ial GPT-3.5 limi­ta­tion workaround

Dave LindberghDec 12, 2022, 8:42 AM
5 points
4 comments1 min readLW link

[Question] Thought ex­per­i­ment. If hu­man minds could be har­nessed into one uni­ver­sal con­scious­ness of hu­man­ity, would we dis­cover things that have been quite difficult to reach with the means of mod­ern sci­ence? And would the con­scious­ness of hu­man­ity be more com­pre­hen­sive than the fu­ture power of ar­tifi­cial in­tel­li­gence?

lotta liedesDec 12, 2022, 2:43 PM
−1 points
0 comments1 min readLW link

Mean­ingful things are those the uni­verse pos­sesses a se­man­tics for

Abhimanyu Pallavi SudhirDec 12, 2022, 4:03 PM
7 points
14 comments14 min readLW link

Let’s go meta: Gram­mat­i­cal knowl­edge and self-refer­en­tial sen­tences [ChatGPT]

Bill BenzonDec 12, 2022, 9:50 PM
5 points
0 comments9 min readLW link

[Question] Are law­suits against AGI com­pa­nies ex­tend­ing AGI timelines?

SlowingAGIDec 13, 2022, 6:00 AM
1 point
1 comment1 min readLW link

An ex­plo­ra­tion of GPT-2′s em­bed­ding weights

Adam ScherlisDec 13, 2022, 12:46 AM
26 points
2 comments10 min readLW link

Re­vis­it­ing al­gorith­mic progress

Dec 13, 2022, 1:39 AM
92 points
8 comments2 min readLW link
(arxiv.org)

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM
7 points
3 comments45 min readLW link

Limits of Superintelligence

Aleksei PetrenkoDec 13, 2022, 12:19 PM
1 point
0 comments1 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
14 points
5 comments2 min readLW link
(forum.effectivealtruism.org)

Seek­ing par­ti­ci­pants for study of AI safety researchers

joelegardnerDec 13, 2022, 9:58 PM
2 points
0 comments1 min readLW link

Assess­ing the Ca­pa­bil­ities of ChatGPT through Suc­cess Rates

Past AccountDec 13, 2022, 9:16 PM
5 points
0 comments2 min readLW link

Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Supervision

XodarapDec 14, 2022, 12:32 PM
45 points
1 comment1 min readLW link
(arxiv.org)

all claw, no world — and other thoughts on the uni­ver­sal distribution

Tamsin LeakeDec 14, 2022, 6:55 PM
14 points
0 comments7 min readLW link
(carado.moe)

Con­trary to List of Lethal­ity’s point 22, al­ign­ment’s door num­ber 2

False NameDec 14, 2022, 10:01 PM
0 points
1 comment22 min readLW link

ChatGPT has a HAL Problem

Paul AndersonDec 14, 2022, 9:31 PM
1 point
0 comments1 min readLW link

How “Dis­cov­er­ing La­tent Knowl­edge in Lan­guage Models Without Su­per­vi­sion” Fits Into a Broader Align­ment Scheme

CollinDec 15, 2022, 6:22 PM
124 points
18 comments16 min readLW link

Avoid­ing Psy­cho­pathic AI

Cameron BergDec 19, 2022, 5:01 PM
28 points
3 comments20 min readLW link

We’ve stepped over the thresh­old into the Fourth Arena, but don’t rec­og­nize it

Bill BenzonDec 15, 2022, 8:22 PM
2 points
0 comments7 min readLW link

AI Safety Move­ment Builders should help the com­mu­nity to op­ti­mise three fac­tors: con­trib­u­tors, con­tri­bu­tions and coordination

peterslatteryDec 15, 2022, 10:50 PM
4 points
0 comments6 min readLW link

Proper scor­ing rules don’t guaran­tee pre­dict­ing fixed points

Dec 16, 2022, 6:22 PM
55 points
5 comments21 min readLW link

A learned agent is not the same as a learn­ing agent

Ben AmitayDec 16, 2022, 5:27 PM
4 points
4 comments2 min readLW link

Ab­stract con­cepts and met­al­in­gual defi­ni­tion: Does ChatGPT un­der­stand jus­tice and char­ity?

Bill BenzonDec 16, 2022, 9:01 PM
2 points
0 comments13 min readLW link

Us­ing In­for­ma­tion The­ory to tackle AI Align­ment: A Prac­ti­cal Approach

Daniel SalamiDec 17, 2022, 1:37 AM
6 points
4 comments8 min readLW link

Look­ing for an al­ign­ment tutor

JanBDec 17, 2022, 7:08 PM
15 points
2 comments1 min readLW link

What we owe the microbiome

weverkaDec 17, 2022, 7:40 PM
2 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Bad at Arith­metic, Promis­ing at Math

cohenmacaulayDec 18, 2022, 5:40 AM
91 points
17 comments20 min readLW link

AGI is here, but no­body wants it. Why should we even care?

MGowDec 20, 2022, 7:14 PM
−20 points
0 comments17 min readLW link

Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:46 AM
2 points
0 comments15 min readLW link

Does ChatGPT’s perfor­mance war­rant work­ing on a tu­tor for chil­dren? [It’s time to take it to the lab.]

Bill BenzonDec 19, 2022, 3:12 PM
13 points
2 comments4 min readLW link
(new-savanna.blogspot.com)

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

Dec 19, 2022, 3:19 PM
50 points
2 comments19 min readLW link

Pro­lifer­at­ing Education

Haris RashidDec 20, 2022, 7:22 PM
−1 points
2 comments5 min readLW link
(www.harisrab.com)

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
47 points
15 comments10 min readLW link

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensenDec 20, 2022, 2:35 PM
6 points
0 comments6 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidadDec 20, 2022, 1:04 PM
18 points
12 comments4 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman LeventovDec 20, 2022, 5:13 PM
7 points
0 comments36 min readLW link

I be­lieve some AI doomers are overconfident

FTPickleDec 20, 2022, 5:09 PM
10 points
14 comments2 min readLW link

Perform­ing an SVD on a time-se­ries ma­trix of gra­di­ent up­dates on an MNIST net­work pro­duces 92.5 sin­gu­lar values

Garrett BakerDec 21, 2022, 12:44 AM
8 points
10 comments5 min readLW link

CIRL Cor­rigi­bil­ity is Fragile

Dec 21, 2022, 1:40 AM
21 points
1 comment12 min readLW link

New AI risk in­tro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM
5 points
1 comment2 min readLW link
(www.vox.com)