Let’s use AI to harden hu­man defenses against AI manipulation

Tom Davidson17 May 2023 23:33 UTC
34 points
7 comments24 min readLW link

Im­prov­ing the safety of AI evals

17 May 2023 22:24 UTC
13 points
7 comments7 min readLW link

Pos­si­ble AI “Fire Alarms”

Chris_Leong17 May 2023 21:56 UTC
15 points
0 comments1 min readLW link

AI Align­ment in The New Yorker

Eleni Angelou17 May 2023 21:36 UTC
8 points
0 comments1 min readLW link
(www.newyorker.com)

ACI #3: The Ori­gin of Goals and Utility

Akira Pyinya17 May 2023 20:47 UTC
1 point
0 comments6 min readLW link

What if they gave an In­dus­trial Revolu­tion and no­body came?

jasoncrawford17 May 2023 19:41 UTC
93 points
10 comments19 min readLW link
(rootsofprogress.org)

DCF Event Notes

jefftk17 May 2023 17:30 UTC
22 points
7 comments3 min readLW link
(www.jefftk.com)

Hi­a­tus: EA and LW post summaries

Zoe Williams17 May 2023 17:17 UTC
14 points
0 comments1 min readLW link

[Question] When should I close the fridge?

lemonhope17 May 2023 16:56 UTC
11 points
11 comments1 min readLW link

Play Re­grantor: Move up to $250,000 to Your Top High-Im­pact Pro­jects!

Dawn Drescher17 May 2023 16:51 UTC
26 points
0 comments1 min readLW link

Eisen­hower’s Atoms for Peace Speech

Akash17 May 2023 16:10 UTC
18 points
3 comments11 min readLW link
(www.iaea.org)

Creat­ing a self-refer­en­tial sys­tem prompt for GPT-4

Ozyrus17 May 2023 14:13 UTC
3 points
1 comment3 min readLW link

GPT-4 im­plic­itly val­ues iden­tity preser­va­tion: a study of LMCA iden­tity management

Ozyrus17 May 2023 14:13 UTC
21 points
4 comments13 min readLW link

Some quotes from Tues­day’s Se­nate hear­ing on AI

Daniel_Eth17 May 2023 12:13 UTC
66 points
9 comments1 min readLW link

Why AGI sys­tems will not be fa­nat­i­cal max­imisers (un­less trained by fa­nat­i­cal hu­mans)

titotal17 May 2023 11:58 UTC
5 points
3 comments1 min readLW link

Con­flicts be­tween emo­tional schemas of­ten in­volve in­ter­nal coercion

Richard_Ngo17 May 2023 10:02 UTC
40 points
4 comments4 min readLW link

[Question] Is there a ‘time se­ries fore­cast­ing’ equiv­a­lent of AIXI?

Solenoid_Entity17 May 2023 4:35 UTC
12 points
2 comments1 min readLW link

$300 for the best sci-fi prompt

RomanS17 May 2023 4:23 UTC
40 points
30 comments2 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGI17 May 2023 1:50 UTC
−13 points
11 comments19 min readLW link

New User’s Guide to LessWrong

Ruby17 May 2023 0:55 UTC
89 points
52 comments11 min readLW link

Are AIs like An­i­mals? Per­spec­tives and Strate­gies from Biology

Jackson Emanuel16 May 2023 23:39 UTC
1 point
0 comments21 min readLW link

A Mechanis­tic In­ter­pretabil­ity Anal­y­sis of a GridWorld Agent-Si­mu­la­tor (Part 1 of N)

Joseph Bloom16 May 2023 22:59 UTC
36 points
2 comments16 min readLW link

A TAI which kills all hu­mans might also doom itself

Jeffrey Heninger16 May 2023 22:36 UTC
7 points
3 comments3 min readLW link

Brief notes on the Se­nate hear­ing on AI oversight

Diziet16 May 2023 22:29 UTC
77 points
2 comments2 min readLW link

$500 Bounty/​Prize Prob­lem: Chan­nel Ca­pac­ity Us­ing “Insen­si­tive” Functions

johnswentworth16 May 2023 21:31 UTC
40 points
11 comments2 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawford16 May 2023 20:54 UTC
14 points
0 comments1 min readLW link
(rootsofprogress.org)

AI Will Not Want to Self-Improve

petersalib16 May 2023 20:53 UTC
20 points
24 comments20 min readLW link

Nice in­tro video to RSI

Nathan Helm-Burger16 May 2023 18:48 UTC
12 points
0 comments1 min readLW link
(youtu.be)

[In­ter­view w/​ Zvi Mow­show­itz] Should we halt progress in AI?

fowlertm16 May 2023 18:12 UTC
18 points
2 comments3 min readLW link

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

_will_16 May 2023 18:06 UTC
11 points
4 comments8 min readLW link

[Question] Why doesn’t the pres­ence of log-loss for prob­a­bil­is­tic mod­els (e.g. se­quence pre­dic­tion) im­ply that any util­ity func­tion ca­pa­ble of pro­duc­ing a “fairly ca­pa­ble” agent will have at least some non-neg­ligible frac­tion of over­lap with hu­man val­ues?

Thoth Hermes16 May 2023 18:02 UTC
2 points
0 comments1 min readLW link

De­ci­sion The­ory with the Magic Parts Highlighted

moridinamael16 May 2023 17:39 UTC
175 points
24 comments5 min readLW link

We learn long-last­ing strate­gies to pro­tect our­selves from dan­ger and rejection

Richard_Ngo16 May 2023 16:36 UTC
84 points
5 comments5 min readLW link

Pro­posal: Align Sys­tems Ear­lier In Training

OneManyNone16 May 2023 16:24 UTC
18 points
0 comments11 min readLW link

Pro­ce­du­ral Ex­ec­u­tive Func­tion, Part 2

DaystarEld16 May 2023 16:22 UTC
23 points
0 comments18 min readLW link
(daystareld.com)

My cur­rent work­flow to study the in­ter­nal mechanisms of LLM

Yulu Pi16 May 2023 15:27 UTC
4 points
0 comments1 min readLW link

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher King16 May 2023 15:18 UTC
22 points
6 comments2 min readLW link

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

16 May 2023 15:14 UTC
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Lazy Baked Mac and Cheese

jefftk16 May 2023 14:40 UTC
18 points
2 comments1 min readLW link
(www.jefftk.com)

Tyler Cowen’s challenge to de­velop an ‘ac­tual math­e­mat­i­cal model’ for AI X-Risk

Joe Brenton16 May 2023 11:57 UTC
6 points
4 comments1 min readLW link

Eval­u­at­ing Lan­guage Model Be­havi­ours for Shut­down Avoidance in Tex­tual Scenarios

16 May 2023 10:53 UTC
26 points
0 comments13 min readLW link

[Re­view] Two Peo­ple Smok­ing Be­hind the Supermarket

lsusr16 May 2023 7:25 UTC
32 points
1 comment1 min readLW link

Su­per­po­si­tion and Dropout

Edoardo Pona16 May 2023 7:24 UTC
21 points
5 comments6 min readLW link

[Question] What is the liter­a­ture on long term wa­ter fasts?

lc16 May 2023 3:23 UTC
16 points
4 comments1 min readLW link

Les­sons learned from offer­ing in-office nu­tri­tional testing

Elizabeth15 May 2023 23:20 UTC
86 points
11 comments14 min readLW link
(acesounderglass.com)

Judg­ments of­ten smug­gle in im­plicit standards

Richard_Ngo15 May 2023 18:50 UTC
91 points
4 comments3 min readLW link

Ra­tional re­tire­ment plans

Ik15 May 2023 17:49 UTC
5 points
17 comments1 min readLW link

[Question] (Cross­post) Ask­ing for on­line calls on AI s-risks dis­cus­sions

jackchang11015 May 2023 17:42 UTC
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

Sim­ple ex­per­i­ments with de­cep­tive alignment

Andreas_Moe15 May 2023 17:41 UTC
7 points
0 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermott15 May 2023 16:09 UTC
62 points
1 comment13 min readLW link