RSS

scasper

Karma: 2,003

https://​​stephencasper.com/​​

A Short Memo on AI In­ter­pretabil­ity Rain­bows

scasperJul 27, 2023, 11:05 PM
18 points
0 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

Jul 22, 2023, 8:21 PM
21 points
5 comments6 min readLW link

Eight Strate­gies for Tack­ling the Hard Part of the Align­ment Problem

scasperJul 8, 2023, 6:55 PM
42 points
11 comments7 min readLW link

Take­aways from the Mechanis­tic In­ter­pretabil­ity Challenges

scasperJun 8, 2023, 6:56 PM
94 points
5 comments6 min readLW link

Ad­vice for En­ter­ing AI Safety Research

scasperJun 2, 2023, 8:46 PM
26 points
2 comments5 min readLW link

GPT-4 is eas­ily con­trol­led/​ex­ploited with tricky de­ci­sion the­o­retic dilem­mas.

scasperApr 14, 2023, 7:39 PM
6 points
4 comments2 min readLW link