Because if you have a sufficiently powerful AI and you get safety wrong, you don’t get a second chance because it kills you. That’s what makes it different—once you get to a certain level of capability, there is no deploy, check, and tweak cycle like you could do with an algorithm’s accuracy or general level of productivity. You have to get it right, or at least close enough, the first time, every time.
Safety is absolutely one of several attributes by which we can judge the success of an AI, but it can’t be treated as “just another attribute”, and that’s why. Whether you say an unsafe AI is “intelligent” or not doesn’t matter. What matters is whether the AI is sufficiently powerful that it can kill you if you program it wrong.
I’m sorry, I think i misspoke—I agree with all that you said about it being different. But when I’ve attempted to question the Orthogonality of safety with AI-safety experts, it seems as if I was told that safety is independent of capability. First, I think this is a reason why AI-Safety has been relegated to 2nd-class status...and second, I can’t see why it is not, like I think Yann puts it, central to any objective (i.e. an attribute of competency/intelligence) we give to AGI (presuming we are talking about real-world goals and not just theoretical IQ points)
so to reiterate I do indeed agree that we need to (somehow, can’t see how, or even why we’d take these risks, honestly) get it right every time including the first time—despite Yann’s plan to build-in correction mechanisms, post-failure, or build-in common-sense safety into the objective itself
I think that “Safety is independent of capability” could mean a couple different things.
My understanding of what you’re saying is this:
When we talk about the capability of an AI, what we mean is “The ability of an AI to achieve the objective we want it to achieve.” The objective we want it to achieve inherently includes safety—a self-driving car that flawlessly navigated from Point A to Point B while recklessly running stop signs and endangering pedestrians is, in fact, less capable than an AI that does not do this. Therefore, safety is inherently a part of capability, and should be treated as such.
When someone in AI Safety says “Safety is independent of capability”, my understanding of the phrase is this:
It is possible to have very highly capable systems that are unsafe. This will inevitably happen without us specifically making the AI safe. This is a much harder problem than capabilities researchers understand, and that is why AI safety is its own field instead of just being part of general AI capabilities. Most of what capabilities researchers consider “AI safety” is stuff like preventing racism in predictive models or reducing bias in language models. These are useful but do not solve the core problem of how to control an agent smarter than you.
The first point can be summarised as “Safety is not independent of capability, because safety is an inherent part of capability for any useful objective.” The second point can be summarised as “Safety is independent of capability, because it is possible to arbitrarily increase the level of one without increasing the other.”
These two arguments can both be true independently, and I personally believe both are true. Would you say the first argument is an accurate representation of your point? If not, how would you adjust it? What do you think of the second argument? Do the experts make more sense when examining their claim through this lens?
Yes you hit the nail on the head understanding my point, thank you. I also think this is what Yann is saying, to go out on a limb: He’s doing AI-safety simultaneously, he considers alignment AS safety.
I guess, maybe, I can see how the 2nd take could be true..but I also can’t think of a practical example, which is my sticking point. Of course, a bomb which can blow-up the moon is partly “capable”, and there is partial-progress to report—but only if we judge it based on limited factors, and exclude certain essential ones (e.g. navigation). I posit we will never avoid judging our real inventions based on what I’d consider essential output:
“Will it not kill us == Does it work?”
It’s a theory, but: I think AI-safety ppl may lose the argument right away, and can sadly be an afterthought (that’s what I’m told, by them), because they are allowing others to define “intelligence/capability” to be free from normal human concerns about our own safety...like I said before, others can go their merry-way making stuff more powerful, calling it “progress”, calling it higher-IQ...but I don’t see how that should earn Capability.
Ah, I see. I thought we were having a sticking point on definitions, but it seems that the definition is part of the point.
So, if I have this right, what you’re saying is:
Currently, the AI community defines capability and safety as two different things. This is very bad. Firstly, because it’s wrong—an unsafe system cannot reasonably be thought of as being capable of achieving anything more complex than predicting cat pictures. Secondly, because it leads to bad outcomes when this paradigm is adopted by AI researchers. Who doesn’t want to make a more capable system? Who wants to slow that down for “safety”? That shit’s boring! What would be better is if the AI community considered safety to be a core metric of capability, just as important as “Is this AI powerful enough to perform the task we want?”.
Glad to help! And hey, clarifying our ideas is half of what discussion is for!
I’d love to see a top-level post on ideas for making this happen, since I think you’re right, even though safety in current AI systems is very different from the problems we would face with AGI-level systems.
but I’m prob going to stay in the “dumb questions” area and not comment :)
ie. “the feeling I have when someone tries to teach me that human-safety is orthogonal to AI-Capability—in a real implementation, they’d be correlated in some way”
Because if you have a sufficiently powerful AI and you get safety wrong, you don’t get a second chance because it kills you. That’s what makes it different—once you get to a certain level of capability, there is no deploy, check, and tweak cycle like you could do with an algorithm’s accuracy or general level of productivity. You have to get it right, or at least close enough, the first time, every time.
Safety is absolutely one of several attributes by which we can judge the success of an AI, but it can’t be treated as “just another attribute”, and that’s why. Whether you say an unsafe AI is “intelligent” or not doesn’t matter. What matters is whether the AI is sufficiently powerful that it can kill you if you program it wrong.
I’m sorry, I think i misspoke—I agree with all that you said about it being different. But when I’ve attempted to question the Orthogonality of safety with AI-safety experts, it seems as if I was told that safety is independent of capability. First, I think this is a reason why AI-Safety has been relegated to 2nd-class status...and second, I can’t see why it is not, like I think Yann puts it, central to any objective (i.e. an attribute of competency/intelligence) we give to AGI (presuming we are talking about real-world goals and not just theoretical IQ points)
so to reiterate I do indeed agree that we need to (somehow, can’t see how, or even why we’d take these risks, honestly) get it right every time including the first time—despite Yann’s plan to build-in correction mechanisms, post-failure, or build-in common-sense safety into the objective itself
I think that “Safety is independent of capability” could mean a couple different things.
My understanding of what you’re saying is this:
When we talk about the capability of an AI, what we mean is “The ability of an AI to achieve the objective we want it to achieve.” The objective we want it to achieve inherently includes safety—a self-driving car that flawlessly navigated from Point A to Point B while recklessly running stop signs and endangering pedestrians is, in fact, less capable than an AI that does not do this. Therefore, safety is inherently a part of capability, and should be treated as such.
When someone in AI Safety says “Safety is independent of capability”, my understanding of the phrase is this:
It is possible to have very highly capable systems that are unsafe. This will inevitably happen without us specifically making the AI safe. This is a much harder problem than capabilities researchers understand, and that is why AI safety is its own field instead of just being part of general AI capabilities. Most of what capabilities researchers consider “AI safety” is stuff like preventing racism in predictive models or reducing bias in language models. These are useful but do not solve the core problem of how to control an agent smarter than you.
The first point can be summarised as “Safety is not independent of capability, because safety is an inherent part of capability for any useful objective.” The second point can be summarised as “Safety is independent of capability, because it is possible to arbitrarily increase the level of one without increasing the other.”
These two arguments can both be true independently, and I personally believe both are true. Would you say the first argument is an accurate representation of your point? If not, how would you adjust it? What do you think of the second argument? Do the experts make more sense when examining their claim through this lens?
Yes you hit the nail on the head understanding my point, thank you. I also think this is what Yann is saying, to go out on a limb: He’s doing AI-safety simultaneously, he considers alignment AS safety.
I guess, maybe, I can see how the 2nd take could be true..but I also can’t think of a practical example, which is my sticking point. Of course, a bomb which can blow-up the moon is partly “capable”, and there is partial-progress to report—but only if we judge it based on limited factors, and exclude certain essential ones (e.g. navigation). I posit we will never avoid judging our real inventions based on what I’d consider essential output:
“Will it not kill us == Does it work?”
It’s a theory, but: I think AI-safety ppl may lose the argument right away, and can sadly be an afterthought (that’s what I’m told, by them), because they are allowing others to define “intelligence/capability” to be free from normal human concerns about our own safety...like I said before, others can go their merry-way making stuff more powerful, calling it “progress”, calling it higher-IQ...but I don’t see how that should earn Capability.
Ah, I see. I thought we were having a sticking point on definitions, but it seems that the definition is part of the point.
So, if I have this right, what you’re saying is:
Currently, the AI community defines capability and safety as two different things. This is very bad. Firstly, because it’s wrong—an unsafe system cannot reasonably be thought of as being capable of achieving anything more complex than predicting cat pictures. Secondly, because it leads to bad outcomes when this paradigm is adopted by AI researchers. Who doesn’t want to make a more capable system? Who wants to slow that down for “safety”? That shit’s boring! What would be better is if the AI community considered safety to be a core metric of capability, just as important as “Is this AI powerful enough to perform the task we want?”.
YES.
You are a gentleman and a scholar for taking the time on this. I wish I could’ve explained it more clearly from the outset.
Glad to help! And hey, clarifying our ideas is half of what discussion is for!
I’d love to see a top-level post on ideas for making this happen, since I think you’re right, even though safety in current AI systems is very different from the problems we would face with AGI-level systems.
Does this remind you of what I’m trying to get at? bc it sure does, to me:
https://twitter.com/ESYudkowsky/status/1537842203543801856?s=20&t=5THtjV5sUU1a7Ge1-venUw
but I’m prob going to stay in the “dumb questions” area and not comment :)
ie. “the feeling I have when someone tries to teach me that human-safety is orthogonal to AI-Capability—in a real implementation, they’d be correlated in some way”
https://twitter.com/KerryLVaughan/status/1536365808594608129?s=20&t=yTDds2nbg4F4J3wqXbsbCA