Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
60 points
8 comments20 min readLW link

Train­ing goals for large lan­guage models

Johannes TreutleinJul 18, 2022, 7:09 AM
28 points
5 comments19 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_WJul 18, 2022, 3:38 AM
15 points
1 comment10 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardtJul 18, 2022, 2:50 AM
36 points
20 comments12 min readLW link
(bounded-regret.ghost.io)

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamonJul 18, 2022, 1:11 AM
338 points
60 comments6 min readLW link1 review

De­cep­tion?! I ain’t got time for that!

Paul CologneseJul 18, 2022, 12:06 AM
55 points
5 comments13 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall GarrodJul 18, 2022, 12:06 AM
18 points
0 comments37 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei AlexandruJul 17, 2022, 8:31 PM
24 points
0 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

TW123Jul 17, 2022, 8:06 PM
107 points
14 comments1 min readLW link

Four ques­tions I ask AI safety researchers

Orpheus16Jul 17, 2022, 5:25 PM
17 points
0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirkJul 17, 2022, 5:04 PM
14 points
6 comments1 min readLW link

Cul­ture wars in rid­dle format

MalmesburyJul 17, 2022, 2:51 PM
7 points
28 comments3 min readLW link

Ban­ga­lore LW/​ACX Meetup in person

VyakartJul 17, 2022, 6:53 AM
1 point
0 comments1 min readLW link

Re­solve Cycles

CFAR!DuncanJul 16, 2022, 11:17 PM
140 points
8 comments10 min readLW link

Align­ment as Game Design

Shoshannah TekofskyJul 16, 2022, 10:36 PM
11 points
7 comments2 min readLW link

Risk Man­age­ment from a Clim­bers Perspective

AnnapurnaJul 16, 2022, 9:14 PM
5 points
0 comments6 min readLW link
(jorgevelez.substack.com)

Cog­ni­tive In­sta­bil­ity, Phys­i­cal­ism, and Free Will

dadadarrenJul 16, 2022, 1:13 PM
5 points
27 comments2 min readLW link
(www.sleepingbeautyproblem.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points
132 comments3 min readLW link

QNR Prospects

PeterMcCluskeyJul 16, 2022, 2:03 AM
40 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

To-do waves

Paweł SysiakJul 16, 2022, 1:19 AM
3 points
0 comments3 min readLW link

Money­pump­ing Bryan Ca­plan’s Belief in Free Will

MorpheusJul 16, 2022, 12:46 AM
5 points
9 comments1 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

Orpheus16Jul 15, 2022, 11:01 PM
98 points
7 comments17 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan VendrovJul 15, 2022, 9:47 PM
102 points
18 comments6 min readLW link

Com­fort Zone Exploration

CFAR!DuncanJul 15, 2022, 9:18 PM
51 points
2 comments12 min readLW link

A time-in­var­i­ant ver­sion of Laplace’s rule

Jul 15, 2022, 7:28 PM
72 points
13 comments17 min readLW link
(epochai.org)

An at­tempt to break cir­cu­lar­ity in science

fryolysisJul 15, 2022, 6:32 PM
3 points
5 comments1 min readLW link

A story about a du­plic­i­tous API

LiLiLiJul 15, 2022, 6:26 PM
2 points
0 comments1 min readLW link

High­lights from the mem­o­irs of Van­nevar Bush

jasoncrawfordJul 15, 2022, 6:08 PM
11 points
0 comments13 min readLW link
(rootsofprogress.org)

Notes on Learn­ing the Prior

carboniferous_umbraculum Jul 15, 2022, 5:28 PM
25 points
2 comments25 min readLW link

Re­view of The Eng­ines of Cognition

William GasarchJul 15, 2022, 2:13 PM
14 points
5 comments15 min readLW link

A re­view of Nate Hilger’s The Par­ent Trap

David Hugh-JonesJul 15, 2022, 9:30 AM
15 points
8 comments4 min readLW link
(wyclif.substack.com)

Mus­ings on the Hu­man Ob­jec­tive Function

Michael SoareverixJul 15, 2022, 7:13 AM
3 points
0 comments3 min readLW link

Peter Singer’s first pub­lished piece on AI

FaiJul 15, 2022, 6:18 AM
20 points
5 comments1 min readLW link
(link.springer.com)

Don’t use ‘in­fo­haz­ard’ for col­lec­tively de­struc­tive info

Eliezer YudkowskyJul 15, 2022, 5:13 AM
86 points
33 comments1 min readLW link2 reviews
(www.facebook.com)

Up­com­ing heat­wave: advice

stavrosJul 15, 2022, 5:03 AM
16 points
13 comments3 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8resJul 15, 2022, 4:46 AM
197 points
33 comments6 min readLW link

In­ward and out­ward steelmanning

Q HomeJul 14, 2022, 11:32 PM
13 points
6 comments18 min readLW link

Po­tato diet: A post mortem and an an­swer to SMTM’s article

Épiphanie GédéonJul 14, 2022, 11:18 PM
48 points
34 comments16 min readLW link

Pro­posed Orthog­o­nal­ity Th­e­ses #2-5

rjbgJul 14, 2022, 10:59 PM
8 points
0 comments2 min readLW link

Bet­ter Quiddler

jefftkJul 14, 2022, 5:40 PM
17 points
0 comments1 min readLW link
(www.jefftk.com)

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee SharkeyJul 14, 2022, 4:59 PM
114 points
15 comments33 min readLW link

Covid 7/​14/​22: BA.2.75 Plus Tax

ZviJul 14, 2022, 2:40 PM
39 points
9 comments8 min readLW link
(thezvi.wordpress.com)

Crit­i­cism of EA Crit­i­cism Contest

ZviJul 14, 2022, 2:30 PM
108 points
17 comments31 min readLW link1 review
(thezvi.wordpress.com)

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

Jul 14, 2022, 2:31 AM
212 points
94 comments9 min readLW link1 review

[Question] Wacky, risky, anti-in­duc­tive in­tel­li­gence-en­hance­ment meth­ods?

Nicholas / Heather KrossJul 14, 2022, 1:40 AM
20 points
30 comments1 min readLW link

[Question] How to im­press stu­dents with re­cent ad­vances in ML?

Charbel-RaphaëlJul 14, 2022, 12:03 AM
12 points
2 comments1 min readLW link

Notes on Love

David GrossJul 13, 2022, 11:35 PM
18 points
3 comments29 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_HiltonJul 13, 2022, 9:58 PM
57 points
3 comments1 min readLW link
(github.com)

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam BowmanJul 13, 2022, 9:14 PM
42 points
6 comments5 min readLW link

[Question] Any tips for elic­it­ing one’s own la­tent knowl­edge?

MSRayneJul 13, 2022, 9:12 PM
16 points
20 comments2 min readLW link