ryan_greenblatt comments on tylerjohnston’s Shortform

ryan_greenblatt 12 Aug 2024 1:17 UTC
13 points
2
I think it seems pretty reasonable for a company in the reference class of Magic to do something like: “When we hit X capability level (as measured by a specific known benchmark), we’ll actually write out a scaling policy. Right now, here is some vague idea of what this would look like.” This post seems like a reasonable implementation of that AFAICT.

I remain baffled by how people can set thresholds this high with a straight face:

I don’t think these are thresholds. The text says:

We describe these threat models along with high-level, illustrative capability levels that would require strong mitigations.

And the table calls the corresponding capability level “Critical Capability Threshold”. (Which seems to imply that there should be multiple thresholds with earlier mitigations required?)

Overall, this seems fine to me? They are just trying to outline the threat model here.

It would be hard for their “Information Security Measures” and “Deployment Mitigations” to be more basic.

These sections just have high level examples and discussion. I think this seems fine given the overall situation with Magic (not training frontier AIs), though I agree that it would be good if people at the company had more detailed safety plans.