Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
domenicrosati
Karma:
57
PhD in Technical AI Safety / Alignment at Dalhousie University.
All
Posts
Comments
New
Top
Old
Immunization against harmful fine-tuning attacks
domenicrosati
,
Jan Wehner
and
David Atanasov
6 Jun 2024 15:17 UTC
4
points
0
comments
12
min read
LW
link
Training-time domain authorization could be helpful for safety
domenicrosati
,
Jan Wehner
and
David Atanasov
25 May 2024 15:10 UTC
15
points
4
comments
7
min read
LW
link
Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati
11 Nov 2023 17:27 UTC
25
points
1
comment
2
min read
LW
link
Back to top