RSS

domenicrosati

Karma: 57

PhD in Technical AI Safety /​ Alignment at Dalhousie University.

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024 15:17 UTC
4 points
0 comments12 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

25 May 2024 15:10 UTC
15 points
4 comments7 min readLW link

Con­trol Sym­me­try: why we might want to start in­ves­ti­gat­ing asym­met­ric al­ign­ment interventions

domenicrosati11 Nov 2023 17:27 UTC
25 points
1 comment2 min readLW link