RSS

domenicrosati

Karma: 57

PhD in Technical AI Safety /​ Alignment at Dalhousie University.

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

Jun 6, 2024, 3:17 PM
4 points
0 comments12 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

May 25, 2024, 3:10 PM
15 points
4 comments7 min readLW link

Con­trol Sym­me­try: why we might want to start in­ves­ti­gat­ing asym­met­ric al­ign­ment interventions

domenicrosatiNov 11, 2023, 5:27 PM
25 points
1 comment2 min readLW link