domenicrosati

Karma: 57

PhD in Technical AI Safety / Alignment at Dalhousie University.

Immunization against harmful fine-tuning attacks

domenicrosati, Jan Wehner and David Atanasov

6 Jun 2024 15:17 UTC

4 points

0 comments12 min readLW link

Training-time domain authorization could be helpful for safety

domenicrosati, Jan Wehner and David Atanasov

25 May 2024 15:10 UTC

15 points

4 comments7 min readLW link

Control Symmetry: why we might want to start investigating asymmetric alignment interventions

domenicrosati11 Nov 2023 17:27 UTC

25 points

1 comment2 min readLW link