Pit­falls with Proofs

scasperJul 19, 2022, 10:21 PM
19 points
21 comments8 min readLW link

A daily rou­tine I do for my AI safety re­search work

scasperJul 19, 2022, 9:58 PM
21 points
7 comments1 min readLW link

Progress links and tweets, 2022-07-19

jasoncrawfordJul 19, 2022, 8:50 PM
11 points
1 comment1 min readLW link
(rootsofprogress.org)

Ap­pli­ca­tions are open for CFAR work­shops in Prague this fall!

John SteidleyJul 19, 2022, 6:29 PM
64 points
3 comments2 min readLW link

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous OtterJul 19, 2022, 6:06 PM
256 points
72 comments1 min readLW link

Spend­ing Up­date 2022

jefftkJul 19, 2022, 2:10 PM
28 points
0 comments3 min readLW link
(www.jefftk.com)

Abram Dem­ski’s ELK thoughts and pro­posal—distillation

Rubi J. HudsonJul 19, 2022, 6:57 AM
19 points
8 comments16 min readLW link

Bounded com­plex­ity of solv­ing ELK and its implications

Rubi J. HudsonJul 19, 2022, 6:56 AM
11 points
4 comments18 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth BarnesJul 19, 2022, 4:55 AM
95 points
6 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCephJul 19, 2022, 2:28 AM
9 points
1 comment9 min readLW link

Ars D&D.Sci: Mys­ter­ies of Mana Eval­u­a­tion & Ruleset

aphyerJul 19, 2022, 2:06 AM
33 points
4 comments5 min readLW link

Mar­burg Virus Pan­demic Pre­dic­tion Checklist

DirectedEvolutionJul 18, 2022, 11:15 PM
30 points
0 comments5 min readLW link

At what point will we know if Eliezer’s pre­dic­tions are right or wrong?

anonymous123456Jul 18, 2022, 10:06 PM
5 points
6 comments1 min readLW link

Model­ling Deception

Garrett BakerJul 18, 2022, 9:21 PM
15 points
0 comments7 min readLW link

Are In­tel­li­gence and Gen­er­al­ity Orthog­o­nal?

cubefoxJul 18, 2022, 8:07 PM
18 points
16 comments1 min readLW link

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM
368 points
95 comments75 min readLW link1 review

Turn­ing Some In­con­sis­tent Prefer­ences into Con­sis­tent Ones

niplavJul 18, 2022, 6:40 PM
23 points
5 comments12 min readLW link

Ad­den­dum: A non-mag­i­cal ex­pla­na­tion of Jeffrey Epstein

lcJul 18, 2022, 5:40 PM
81 points
21 comments11 min readLW link

Launch­ing a new progress in­sti­tute, seek­ing a CEO

jasoncrawfordJul 18, 2022, 4:58 PM
25 points
2 comments3 min readLW link
(rootsofprogress.org)

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo VillalobosJul 18, 2022, 4:51 PM
20 points
0 comments1 min readLW link
(epochai.org)

Quan­tiliz­ers and Gen­er­a­tive Models

Adam JermynJul 18, 2022, 4:32 PM
24 points
5 comments4 min readLW link

AI Hiroshima (Does A Vivid Ex­am­ple Of Destruc­tion Fore­stall Apoca­lypse?)

SableJul 18, 2022, 12:06 PM
4 points
4 comments2 min readLW link

How the ---- did Feyn­man Get Here !?

George3d6Jul 18, 2022, 9:43 AM
8 points
8 comments3 min readLW link
(www.epistem.ink)

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
59 points
8 comments20 min readLW link

Train­ing goals for large lan­guage models

Johannes TreutleinJul 18, 2022, 7:09 AM
28 points
5 comments19 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM
15 points
1 comment10 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardtJul 18, 2022, 2:50 AM
36 points
20 comments12 min readLW link
(bounded-regret.ghost.io)

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM
336 points
60 comments6 min readLW link1 review

De­cep­tion?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM
55 points
5 comments13 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM
18 points
0 comments37 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM
24 points
0 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

TW123Jul 17, 2022, 8:06 PM
107 points
14 comments1 min readLW link

Four ques­tions I ask AI safety researchers

AkashJul 17, 2022, 5:25 PM
17 points
0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM
14 points
6 comments1 min readLW link

Cul­ture wars in rid­dle format

MalmesburyJul 17, 2022, 2:51 PM
7 points
28 comments3 min readLW link

Ban­ga­lore LW/​ACX Meetup in person

VyakartJul 17, 2022, 6:53 AM
1 point
0 comments1 min readLW link

Re­solve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM
139 points
8 comments10 min readLW link

Align­ment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM
11 points
7 comments2 min readLW link

Risk Man­age­ment from a Clim­bers Perspective

AnnapurnaJul 16, 2022, 9:14 PM
5 points
0 comments6 min readLW link
(jorgevelez.substack.com)

Cog­ni­tive In­sta­bil­ity, Phys­i­cal­ism, and Free Will

dadadarrenJul 16, 2022, 1:13 PM
5 points
27 comments2 min readLW link
(www.sleepingbeautyproblem.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points
132 comments3 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM
40 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

To-do waves

Paweł SysiakJul 16, 2022, 1:19 AM
3 points
0 comments3 min readLW link

Money­pump­ing Bryan Ca­plan’s Belief in Free Will

MorpheusJul 16, 2022, 12:46 AM
5 points
9 comments1 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

AkashJul 15, 2022, 11:01 PM
97 points
7 comments17 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM
102 points
18 comments6 min readLW link

Com­fort Zone Exploration

CFAR!DuncanJul 15, 2022, 9:18 PM
51 points
2 comments12 min readLW link

A time-in­var­i­ant ver­sion of Laplace’s rule

Jul 15, 2022, 7:28 PM
72 points
13 comments17 min readLW link
(epochai.org)

An at­tempt to break cir­cu­lar­ity in science

fryolysisJul 15, 2022, 6:32 PM
3 points
5 comments1 min readLW link

A story about a du­plic­i­tous API

LiLiLiJul 15, 2022, 6:26 PM
2 points
0 comments1 min readLW link