Unknown unknowns seem like a totally valid basis for concern.
But I don’t think you get to move the burden of proof by fiat. If you want action then you need to convince relevant actors they should be concerned about them, and that unknown unknowns can cause catastrophe before a lab will stop. Without further elaboration I don’t think “unknown unknowns could cause a catastrophe” is enough to convince governments (or AI developers) to take significant actions.
I think RSPs make this situation better by pushing developers away from vague “Yeah we’ll be safe” to saying “Here’s what we’ll actually do” and allowing us to have a conversation about whether that specific thing sufficient to prevent risk early enough. I think this is way better, because vagueness and equivocation make scrutiny much harder.
My own take is that there is small but non-negligible risk before Anthropic’s ASL-3. For my part I’d vote to move to a lower threshold, or to require more stringent protective measures when working with any system bigger than LLaMA. But I’m not the median voter or decision-maker here (nor is Anthropic), and so I’ll say my piece but then move on to trying to convince people or to find a compromise that works.
The specific conversation is much better than nothing—but I do think it ought to be emphasized that solving all the problems we’re aware of isn’t sufficient for safety. We’re training on the test set.[1] Our confidence levels should reflect that—but I expect overconfidence.
It’s plausible that RSPs could be net positive, but I think that given successful coordination [vague and uncertain] beats [significantly more concrete, but overconfident]. My presumption is that without good coordination (a necessary condition being cautious decision-makers), things will go badly.
RSPs seem likely to increase the odds we get some international coordination and regulation. But to get sufficient regulation, we’d need the unknown unknowns issue to be covered at some point. To me this seems simplest to add clearly and explicitly from the beginning. Otherwise I expect regulation to adapt to issues for which we have concrete new evidence, and to fail to adapt beyond that.
Granted that you’re not the median voter/decision-maker—but you’re certainly one of the most, if not the most, influential voice on the issue. It seems important not to underestimate your capacity to change people’s views before figuring out a compromise to aim for (I’m primarily thinking of government people, who seem more likely to have views that might change radically based on a few conversations). But I’m certainly no expert on this kind of thing.
I do wonder whether it might be helpful not to share all known problems publicly on this basis—I’d have somewhat more confidence in safety measures that succeeded in solving some problems of a type the designers didn’t know about.
Unknown unknowns seem like a totally valid basis for concern.
But I don’t think you get to move the burden of proof by fiat. If you want action then you need to convince relevant actors they should be concerned about them, and that unknown unknowns can cause catastrophe before a lab will stop. Without further elaboration I don’t think “unknown unknowns could cause a catastrophe” is enough to convince governments (or AI developers) to take significant actions.
I think RSPs make this situation better by pushing developers away from vague “Yeah we’ll be safe” to saying “Here’s what we’ll actually do” and allowing us to have a conversation about whether that specific thing sufficient to prevent risk early enough. I think this is way better, because vagueness and equivocation make scrutiny much harder.
My own take is that there is small but non-negligible risk before Anthropic’s ASL-3. For my part I’d vote to move to a lower threshold, or to require more stringent protective measures when working with any system bigger than LLaMA. But I’m not the median voter or decision-maker here (nor is Anthropic), and so I’ll say my piece but then move on to trying to convince people or to find a compromise that works.
The specific conversation is much better than nothing—but I do think it ought to be emphasized that solving all the problems we’re aware of isn’t sufficient for safety. We’re training on the test set.[1]
Our confidence levels should reflect that—but I expect overconfidence.
It’s plausible that RSPs could be net positive, but I think that given successful coordination [vague and uncertain] beats [significantly more concrete, but overconfident].
My presumption is that without good coordination (a necessary condition being cautious decision-makers), things will go badly.
RSPs seem likely to increase the odds we get some international coordination and regulation. But to get sufficient regulation, we’d need the unknown unknowns issue to be covered at some point. To me this seems simplest to add clearly and explicitly from the beginning. Otherwise I expect regulation to adapt to issues for which we have concrete new evidence, and to fail to adapt beyond that.
Granted that you’re not the median voter/decision-maker—but you’re certainly one of the most, if not the most, influential voice on the issue. It seems important not to underestimate your capacity to change people’s views before figuring out a compromise to aim for (I’m primarily thinking of government people, who seem more likely to have views that might change radically based on a few conversations). But I’m certainly no expert on this kind of thing.
I do wonder whether it might be helpful not to share all known problems publicly on this basis—I’d have somewhat more confidence in safety measures that succeeded in solving some problems of a type the designers didn’t know about.