I appreciate that you’re endorsing these changes in response to the two specific cases I raised on X (unlimited model retraining and composition with unsafe covered models). My gut sense is still that ad-hoc patching in this manner just isn’t a robust way to deal with the underlying issue*, and that there are likely still more cases like those two. In my opinion it would be better for the bill to adopt a different framework with respect to hazardous capabilities from post-training modifications (something closer to “Covered model developers have a duty to ensure that the marginal impact of training/releasing their model would not be to make hazardous capabilities significantly easier to acquire.”). The drafters of SB 1047 shouldn’t have to anticipate every possible contingency in advance, that’s just bad design.
* In the same way that, when someone notices that their supposedly-safe utility function for their AI has edge cases that expose unforseen maxima, introducing ad-hoc patches to deal with those particular noticed edge cases is not a robust strategy to get an AI that is actually safe across the board.
Left the following comment on the blog: