RSS

scasper

Karma: 2,003

https://​​stephencasper.com/​​

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasperDec 5, 2023, 4:48 PM
125 points
30 comments13 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
38 points
2 comments2 min readLW link
(arxiv.org)

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasperNov 4, 2023, 8:08 PM
277 points
42 comments3 min readLW link

An­nounc­ing the CNN In­ter­pretabil­ity Competition

scasperSep 26, 2023, 4:21 PM
22 points
0 comments4 min readLW link

Open Prob­lems and Fun­da­men­tal Limi­ta­tions of RLHF

scasperJul 31, 2023, 3:31 PM
66 points
6 comments2 min readLW link
(arxiv.org)