RSS

Peter S. Park

Karma: 145

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

Aug 29, 2023, 1:29 AM
54 points
3 comments10 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−15 points
4 commentsLW link

The limited up­side of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM
13 points
11 commentsLW link

Why do we post our AI safety plans on the In­ter­net?

Peter S. ParkNov 3, 2022, 4:02 PM
4 points
4 comments11 min readLW link

Can We Align a Self-Im­prov­ing AGI?

Peter S. ParkAug 30, 2022, 12:14 AM
8 points
5 comments11 min readLW link

What Makes an Idea Un­der­stand­able? On Ar­chi­tec­turally and Cul­turally Nat­u­ral Ideas.

Aug 16, 2022, 2:09 AM
21 points
2 comments16 min readLW link

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

Aug 10, 2022, 6:14 PM
28 points
30 comments11 min readLW link

Find­ing Skele­tons on Rashomon Ridge

Jul 24, 2022, 10:31 PM
30 points
2 comments7 min readLW link

Race Along Rashomon Ridge

Jul 7, 2022, 3:20 AM
50 points
15 comments8 min readLW link