Thanks for writing this and proposing a plan. Coincidentally, I drafted a short take here yesterday explaining one complaint I currently have with the safety conditions of this plan. In short, I suspect the “No AIs improving other AIs” criterion isn’t worth including within a safety plan: it i) doesn’t address that many more marginal threat models (or does so ineffectively) and ii) would be too unpopular to implement (or, alternatively, too weak to be useful).
I think there is a version of this plan with a lower safety tax, with more focus on reactive policy and the other three criterion, that I would be more excited about.
Thanks! Do you still think the “No AIs improving other AIs” criterion is too onerous after reading the policy enforcing it in Phase 0?
In that policy, we developed the definition of “found systems” to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans.
This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.
Am I correct in interpreting that your definition of “found system” would apply nearly all useful AI systems today such as ChatGPT, as these are algorithms which run on weights that are found with optimization methods such as gradient descent? If so, it is still fairly onerous.
Thanks for writing this and proposing a plan. Coincidentally, I drafted a short take here yesterday explaining one complaint I currently have with the safety conditions of this plan. In short, I suspect the “No AIs improving other AIs” criterion isn’t worth including within a safety plan: it i) doesn’t address that many more marginal threat models (or does so ineffectively) and ii) would be too unpopular to implement (or, alternatively, too weak to be useful).
I think there is a version of this plan with a lower safety tax, with more focus on reactive policy and the other three criterion, that I would be more excited about.
Thanks! Do you still think the “No AIs improving other AIs” criterion is too onerous after reading the policy enforcing it in Phase 0?
In that policy, we developed the definition of “found systems” to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans.
This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.
Am I correct in interpreting that your definition of “found system” would apply nearly all useful AI systems today such as ChatGPT, as these are algorithms which run on weights that are found with optimization methods such as gradient descent? If so, it is still fairly onerous.