domenicrosati comments on Training-time domain authorization could be helpful for safety

domenicrosati 25 May 2024 15:50 UTC
1 point
0
Thanks for pointing this out—I think its a critical point.

Im not imagining anything in particular (and ya in that paper we do very much do “baby” sized attacks)

Generally ya this is a problem we need to work out: what is the relationship between a defence strength and the budget an attacker would take to overcome this AND for large groups that have budget for training from scratch would defences here even make an impact.

I think your right in that the large budget groups who can just train from scratch would just not be impacted by defences of this nature. While that seems to really poo-poo this whole endevour, I think its still promising and valuable as you point out to want to prevent this happening at all budgests that preclude training from scratch.

A scenario that maybe help with thinking about this endevour still being valueable is:

Maybe we are talking about trillion/billion dollar models in the future where compute governance allows us to be able to trace whether this training from scratch is occuring somewhere, in which case defences that approch this limit for attacker spend are indeed quite valuable.