K-com­plex­ity is silly; use cross-en­tropy instead

So8resDec 20, 2022, 11:06 PM
147 points
54 comments14 min readLW link2 reviews

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Orpheus16Dec 20, 2022, 9:39 PM
18 points
2 comments11 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

Dec 20, 2022, 8:08 PM
100 points
34 comments1 min readLW link
(www.anthropic.com)

Reflec­tions: Bureau­cratic Hell

Haris RashidDec 20, 2022, 7:22 PM
−5 points
1 comment1 min readLW link
(www.harisrab.com)

Pro­lifer­at­ing Education

Haris RashidDec 20, 2022, 7:22 PM
−1 points
2 comments5 min readLW link
(www.harisrab.com)

AGI is here, but no­body wants it. Why should we even care?

MGowDec 20, 2022, 7:14 PM
−22 points
0 comments17 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman LeventovDec 20, 2022, 5:13 PM
33 points
3 comments36 min readLW link

I be­lieve some AI doomers are overconfident

FTPickleDec 20, 2022, 5:09 PM
8 points
15 comments2 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven ByrnesDec 20, 2022, 5:08 PM
23 points
4 comments2 min readLW link

Marvel Snap: Phase 2

ZviDec 20, 2022, 2:50 PM
11 points
1 comment13 min readLW link
(thezvi.wordpress.com)

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensenDec 20, 2022, 2:35 PM
17 points
0 comments6 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidadDec 20, 2022, 1:04 PM
80 points
22 comments4 min readLW link

Un­der-Ap­pre­ci­ated Ways to Use Flash­cards—Part I

Florence HinderDec 20, 2022, 12:43 PM
22 points
5 comments5 min readLW link
(thoughtsaver.ghost.io)

EA & LW Fo­rums Weekly Sum­mary (12th Dec − 18th Dec 22′)

Zoe WilliamsDec 20, 2022, 9:49 AM
10 points
0 commentsLW link

[link, 2019] AI paradigm: in­ter­ac­tive learn­ing from un­la­beled instructions

the gears to ascensionDec 20, 2022, 6:45 AM
2 points
0 comments2 min readLW link
(jgrizou.github.io)

[Fic­tion] Un­spo­ken Stone

Gordon Seidoh WorleyDec 20, 2022, 5:11 AM
19 points
0 comments5 min readLW link

No­tice when you stop read­ing right be­fore you understand

just_browsingDec 20, 2022, 5:09 AM
61 points
6 comments1 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie SteinerDec 20, 2022, 5:01 AM
25 points
1 comment3 min readLW link

More notes from rais­ing a late-talk­ing kid

Steven ByrnesDec 20, 2022, 2:13 AM
40 points
2 comments6 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworthDec 20, 2022, 1:22 AM
53 points
24 comments12 min readLW link

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceCDec 19, 2022, 10:52 PM
150 points
30 comments18 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
65 points
28 comments10 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM
95 points
30 comments9 min readLW link

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron BergDec 19, 2022, 5:01 PM
30 points
3 comments20 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

Dec 19, 2022, 3:19 PM
79 points
2 comments19 min readLW link

Does ChatGPT’s perfor­mance war­rant work­ing on a tu­tor for chil­dren? [It’s time to take it to the lab.]

Bill BenzonDec 19, 2022, 3:12 PM
13 points
5 comments4 min readLW link
(new-savanna.blogspot.com)

Con­di­tions for Su­per­ra­tional­ity-mo­ti­vated Co­op­er­a­tion in a one-shot Pri­soner’s Dilemma

Jim BuhlerDec 19, 2022, 3:00 PM
24 points
4 comments5 min readLW link

Next Level Seinfeld

ZviDec 19, 2022, 1:30 PM
50 points
8 comments1 min readLW link
(thezvi.wordpress.com)

CEA Disambiguation

jefftkDec 19, 2022, 1:20 PM
25 points
0 comments1 min readLW link
(www.jefftk.com)

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

RemmeltDec 19, 2022, 12:02 PM
−3 points
9 comments31 min readLW link

Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:46 AM
2 points
0 comments15 min readLW link

Non-Tech­ni­cal Prepa­ra­tion for Hacker-AI and Cy­ber­war 2.0+

Erland WittkotterDec 19, 2022, 11:42 AM
2 points
0 comments25 min readLW link

An Effec­tive Grab Bag

stavrosDec 19, 2022, 10:29 AM
28 points
2 comments7 min readLW link

Slick hy­per­finite Ram­sey the­ory proof

Alok SinghDec 19, 2022, 8:40 AM
8 points
3 comments1 min readLW link
(alok.github.io)

The True Spirit of Sols­tice?

RaemonDec 19, 2022, 8:00 AM
69 points
31 comments9 min readLW link

The Risk of Or­bital De­bris and One (Cheap) Way to Miti­gate It

clansDec 19, 2022, 3:16 AM
13 points
1 comment4 min readLW link
(locationtbd.home.blog)

Why I think that teach­ing philos­o­phy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM
5 points
0 comments2 min readLW link

A tem­plate for do­ing an­nual reviews

peterslatteryDec 19, 2022, 3:09 AM
2 points
0 comments1 min readLW link

Event [Berkeley]: Align­ment Col­lab­o­ra­tor Speed-Meeting

Dec 19, 2022, 2:24 AM
18 points
2 comments1 min readLW link

An eas­ier(?) end to the elec­toral college

ejacobDec 19, 2022, 2:09 AM
2 points
2 comments2 min readLW link

How Death Feels

sisyphusDec 18, 2022, 11:47 PM
−7 points
9 comments1 min readLW link

Why Are Women Hot?

Jacob FalkovichDec 18, 2022, 11:20 PM
17 points
19 comments11 min readLW link

[Question] Can we, in prin­ci­ple, know the mea­sure of coun­ter­fac­tual quan­tum branches?

sisyphusDec 18, 2022, 10:07 PM
1 point
15 comments1 min readLW link

Bos­ton Sols­tice 2022 Retrospective

jefftkDec 18, 2022, 7:00 PM
19 points
3 comments5 min readLW link
(www.jefftk.com)

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie SteinerDec 18, 2022, 2:14 PM
34 points
0 comments2 min readLW link

Bad at Arith­metic, Promis­ing at Math

cohenmacaulayDec 18, 2022, 5:40 AM
100 points
19 comments20 min readLW link1 review

Over­con­fi­dence bubbles

kaputmiDec 18, 2022, 2:07 AM
3 points
0 comments2 min readLW link

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTroutDec 17, 2022, 9:43 PM
52 points
13 comments2 min readLW link

What we owe the microbiome

weverkaDec 17, 2022, 7:40 PM
2 points
0 comments1 min readLW link
(forum.effectivealtruism.org)