I’m saying that no one is asking that safety be smuggled-in, or obtained “for free”, or by default—I’m curious why it would be singled-out for the Thesis, when it’s always a part of any goal, like any other attribute of the goal in question? if it fails to be safe then it fails competency to perform properly...whether it swerved into another lane on the highway, or it didn’t brake fast enough and hit someone, both not smart things.
“the smarter the AI is, the safer it becomes”—eureka, but this seems un-orthogonal, dang-near correlated, all of a sudden, doesn’t it? :)
Yes, I agree about the maximizer and subtle failures, thanks to Rob Miles vids about how this thing is likely to fail, ceteris paribus.
“the smarter the AI gets, the more likely it is to think of something like this...” -- this seems to contradict the above quote. Also, I would submit that we actually call this incompetence...and avoid saying that it got any “smarter” at all, because: One of the critical parts of its activity was to understand and perform what we intended, which it failed to do.
FAIR simply must be already concerned with alignment issues, and the correlated safety-risks if that fails. Their grand plans will naturally fail if safety is not baked-into everything, right?
I’m getting dangerously close to admitting that I don’t like the AGI-odds here, but that’s, well, an adjacent topic.
I think one of the cruxes we have here is the way we’re defining “intelligence” or “smart”. I see how if you define “smart/intelligent” as “Ability to achieve complex goals” then a phrase like “As an AI becomes more intelligent, it becomes more unsafe” doesn’t really make sense. If the AI becomes more unsafe, it becomes less able to achieve the goal, which therefore makes it stupider.
So, I should take a step back and clarify. One of the key problems of alignment is getting the AGI to do what we want. The reason for this is that the goal we program into the machine is not necessarily exactly the goal we want it to achieve. This gap is where the problem lies.
If what you want is for an AI to learn to finish a video game, and what the AI actually learns to do is to maximise its score (reward function) by farming enemies at a specific point, the AI has gotten “smarter” by its own goal, but “stupider” by yours. This is the problem of outer alignment. The AI has gotten better at achieving its goals, but its goal wasn’t ours. Its goal was perfectly correlated with ours (It progresses through the game in order to score more points) right up until it suddenly wasn’t. This is how an AI can improve at intelligence (i.e, goal-seeking behaviour) and as a result become worse at achieving our goals—because our goals are not its goals. If we can precisely define what our goals are to an AI, we have gone a long way towards solving alignment, and what you said would be true.
As for why people are singling out safety—safety failures are much worse than capabilities failures, for the most part. If an AI is told to cure cancer and emits a series of random strings...that isn’t exactly ideal, but at least no harm was done. If an AI is told to cure cancer and prescribes a treatment that cures 100% of all cancer but kills the patient in the process, that’s much worse.
“I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.”
In other words—the only reason we’d fail the alignment problem is if we made no attempt to solve it. Most people who work on AI alignment believe the problem is considerably more difficult than this.
He also says:
“There will be mistakes, no doubt, as with any new technology (early jetliners lost wings, early cars didn’t have seat belts, roads didn’t have speed limits...).
But I disagree that there is a high risk of accidentally building existential threats to humanity.
Existential threats to humanity have to be explicitly designed as such.”
This is, to put it mildly, not the dominant belief among people who work on these problems for a living.
when i asked about singling-out safety, i agree about it being considered different, however, what i meant: why wouldn’t safety be considered as ‘just another attribute’ by which we can judge the success/intelligence of the AI ? that’s what Yann seems to be implying? how could it be considered orthogonal to the real issue—we judge the AI by its actions in the real world, the primary concern is its effect on humanity, and we consider those actions on a scale of intelligence, and every goal (I would presume) has some semblance of embedded safety consideration…
Because if you have a sufficiently powerful AI and you get safety wrong, you don’t get a second chance because it kills you. That’s what makes it different—once you get to a certain level of capability, there is no deploy, check, and tweak cycle like you could do with an algorithm’s accuracy or general level of productivity. You have to get it right, or at least close enough, the first time, every time.
Safety is absolutely one of several attributes by which we can judge the success of an AI, but it can’t be treated as “just another attribute”, and that’s why. Whether you say an unsafe AI is “intelligent” or not doesn’t matter. What matters is whether the AI is sufficiently powerful that it can kill you if you program it wrong.
I’m sorry, I think i misspoke—I agree with all that you said about it being different. But when I’ve attempted to question the Orthogonality of safety with AI-safety experts, it seems as if I was told that safety is independent of capability. First, I think this is a reason why AI-Safety has been relegated to 2nd-class status...and second, I can’t see why it is not, like I think Yann puts it, central to any objective (i.e. an attribute of competency/intelligence) we give to AGI (presuming we are talking about real-world goals and not just theoretical IQ points)
so to reiterate I do indeed agree that we need to (somehow, can’t see how, or even why we’d take these risks, honestly) get it right every time including the first time—despite Yann’s plan to build-in correction mechanisms, post-failure, or build-in common-sense safety into the objective itself
I think that “Safety is independent of capability” could mean a couple different things.
My understanding of what you’re saying is this:
When we talk about the capability of an AI, what we mean is “The ability of an AI to achieve the objective we want it to achieve.” The objective we want it to achieve inherently includes safety—a self-driving car that flawlessly navigated from Point A to Point B while recklessly running stop signs and endangering pedestrians is, in fact, less capable than an AI that does not do this. Therefore, safety is inherently a part of capability, and should be treated as such.
When someone in AI Safety says “Safety is independent of capability”, my understanding of the phrase is this:
It is possible to have very highly capable systems that are unsafe. This will inevitably happen without us specifically making the AI safe. This is a much harder problem than capabilities researchers understand, and that is why AI safety is its own field instead of just being part of general AI capabilities. Most of what capabilities researchers consider “AI safety” is stuff like preventing racism in predictive models or reducing bias in language models. These are useful but do not solve the core problem of how to control an agent smarter than you.
The first point can be summarised as “Safety is not independent of capability, because safety is an inherent part of capability for any useful objective.” The second point can be summarised as “Safety is independent of capability, because it is possible to arbitrarily increase the level of one without increasing the other.”
These two arguments can both be true independently, and I personally believe both are true. Would you say the first argument is an accurate representation of your point? If not, how would you adjust it? What do you think of the second argument? Do the experts make more sense when examining their claim through this lens?
Yes you hit the nail on the head understanding my point, thank you. I also think this is what Yann is saying, to go out on a limb: He’s doing AI-safety simultaneously, he considers alignment AS safety.
I guess, maybe, I can see how the 2nd take could be true..but I also can’t think of a practical example, which is my sticking point. Of course, a bomb which can blow-up the moon is partly “capable”, and there is partial-progress to report—but only if we judge it based on limited factors, and exclude certain essential ones (e.g. navigation). I posit we will never avoid judging our real inventions based on what I’d consider essential output:
“Will it not kill us == Does it work?”
It’s a theory, but: I think AI-safety ppl may lose the argument right away, and can sadly be an afterthought (that’s what I’m told, by them), because they are allowing others to define “intelligence/capability” to be free from normal human concerns about our own safety...like I said before, others can go their merry-way making stuff more powerful, calling it “progress”, calling it higher-IQ...but I don’t see how that should earn Capability.
Ah, I see. I thought we were having a sticking point on definitions, but it seems that the definition is part of the point.
So, if I have this right, what you’re saying is:
Currently, the AI community defines capability and safety as two different things. This is very bad. Firstly, because it’s wrong—an unsafe system cannot reasonably be thought of as being capable of achieving anything more complex than predicting cat pictures. Secondly, because it leads to bad outcomes when this paradigm is adopted by AI researchers. Who doesn’t want to make a more capable system? Who wants to slow that down for “safety”? That shit’s boring! What would be better is if the AI community considered safety to be a core metric of capability, just as important as “Is this AI powerful enough to perform the task we want?”.
Glad to help! And hey, clarifying our ideas is half of what discussion is for!
I’d love to see a top-level post on ideas for making this happen, since I think you’re right, even though safety in current AI systems is very different from the problems we would face with AGI-level systems.
but I’m prob going to stay in the “dumb questions” area and not comment :)
ie. “the feeling I have when someone tries to teach me that human-safety is orthogonal to AI-Capability—in a real implementation, they’d be correlated in some way”
great stuff.
I’m saying that no one is asking that safety be smuggled-in, or obtained “for free”, or by default—I’m curious why it would be singled-out for the Thesis, when it’s always a part of any goal, like any other attribute of the goal in question? if it fails to be safe then it fails competency to perform properly...whether it swerved into another lane on the highway, or it didn’t brake fast enough and hit someone, both not smart things.
“the smarter the AI is, the safer it becomes”—eureka, but this seems un-orthogonal, dang-near correlated, all of a sudden, doesn’t it? :)
Yes, I agree about the maximizer and subtle failures, thanks to Rob Miles vids about how this thing is likely to fail, ceteris paribus.
“the smarter the AI gets, the more likely it is to think of something like this...” -- this seems to contradict the above quote. Also, I would submit that we actually call this incompetence...and avoid saying that it got any “smarter” at all, because: One of the critical parts of its activity was to understand and perform what we intended, which it failed to do.
FAIR simply must be already concerned with alignment issues, and the correlated safety-risks if that fails. Their grand plans will naturally fail if safety is not baked-into everything, right?
I’m getting dangerously close to admitting that I don’t like the AGI-odds here, but that’s, well, an adjacent topic.
I think one of the cruxes we have here is the way we’re defining “intelligence” or “smart”. I see how if you define “smart/intelligent” as “Ability to achieve complex goals” then a phrase like “As an AI becomes more intelligent, it becomes more unsafe” doesn’t really make sense. If the AI becomes more unsafe, it becomes less able to achieve the goal, which therefore makes it stupider.
So, I should take a step back and clarify. One of the key problems of alignment is getting the AGI to do what we want. The reason for this is that the goal we program into the machine is not necessarily exactly the goal we want it to achieve. This gap is where the problem lies.
If what you want is for an AI to learn to finish a video game, and what the AI actually learns to do is to maximise its score (reward function) by farming enemies at a specific point, the AI has gotten “smarter” by its own goal, but “stupider” by yours. This is the problem of outer alignment. The AI has gotten better at achieving its goals, but its goal wasn’t ours. Its goal was perfectly correlated with ours (It progresses through the game in order to score more points) right up until it suddenly wasn’t. This is how an AI can improve at intelligence (i.e, goal-seeking behaviour) and as a result become worse at achieving our goals—because our goals are not its goals. If we can precisely define what our goals are to an AI, we have gone a long way towards solving alignment, and what you said would be true.
As for why people are singling out safety—safety failures are much worse than capabilities failures, for the most part. If an AI is told to cure cancer and emits a series of random strings...that isn’t exactly ideal, but at least no harm was done. If an AI is told to cure cancer and prescribes a treatment that cures 100% of all cancer but kills the patient in the process, that’s much worse.
I don’t know exactly what FAIR believes, but I believe people’s skepticism about Yann LeCun of FAIR is well-founded. Yann LeCun believes we need to build safety into things, but he thinks that the safety mechanisms are pretty obvious, and he doesn’t seem to believe it’s a hard problem. See this: https://www.lesswrong.com/posts/WxW6Gc6f2z3mzmqKs/debate-on-instrumental-convergence-between-lecun-russell
“I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.”
In other words—the only reason we’d fail the alignment problem is if we made no attempt to solve it. Most people who work on AI alignment believe the problem is considerably more difficult than this.
He also says:
“There will be mistakes, no doubt, as with any new technology (early jetliners lost wings, early cars didn’t have seat belts, roads didn’t have speed limits...).
But I disagree that there is a high risk of accidentally building existential threats to humanity.
Existential threats to humanity have to be explicitly designed as such.”
This is, to put it mildly, not the dominant belief among people who work on these problems for a living.
when i asked about singling-out safety, i agree about it being considered different, however, what i meant: why wouldn’t safety be considered as ‘just another attribute’ by which we can judge the success/intelligence of the AI ? that’s what Yann seems to be implying? how could it be considered orthogonal to the real issue—we judge the AI by its actions in the real world, the primary concern is its effect on humanity, and we consider those actions on a scale of intelligence, and every goal (I would presume) has some semblance of embedded safety consideration…
Because if you have a sufficiently powerful AI and you get safety wrong, you don’t get a second chance because it kills you. That’s what makes it different—once you get to a certain level of capability, there is no deploy, check, and tweak cycle like you could do with an algorithm’s accuracy or general level of productivity. You have to get it right, or at least close enough, the first time, every time.
Safety is absolutely one of several attributes by which we can judge the success of an AI, but it can’t be treated as “just another attribute”, and that’s why. Whether you say an unsafe AI is “intelligent” or not doesn’t matter. What matters is whether the AI is sufficiently powerful that it can kill you if you program it wrong.
I’m sorry, I think i misspoke—I agree with all that you said about it being different. But when I’ve attempted to question the Orthogonality of safety with AI-safety experts, it seems as if I was told that safety is independent of capability. First, I think this is a reason why AI-Safety has been relegated to 2nd-class status...and second, I can’t see why it is not, like I think Yann puts it, central to any objective (i.e. an attribute of competency/intelligence) we give to AGI (presuming we are talking about real-world goals and not just theoretical IQ points)
so to reiterate I do indeed agree that we need to (somehow, can’t see how, or even why we’d take these risks, honestly) get it right every time including the first time—despite Yann’s plan to build-in correction mechanisms, post-failure, or build-in common-sense safety into the objective itself
I think that “Safety is independent of capability” could mean a couple different things.
My understanding of what you’re saying is this:
When we talk about the capability of an AI, what we mean is “The ability of an AI to achieve the objective we want it to achieve.” The objective we want it to achieve inherently includes safety—a self-driving car that flawlessly navigated from Point A to Point B while recklessly running stop signs and endangering pedestrians is, in fact, less capable than an AI that does not do this. Therefore, safety is inherently a part of capability, and should be treated as such.
When someone in AI Safety says “Safety is independent of capability”, my understanding of the phrase is this:
It is possible to have very highly capable systems that are unsafe. This will inevitably happen without us specifically making the AI safe. This is a much harder problem than capabilities researchers understand, and that is why AI safety is its own field instead of just being part of general AI capabilities. Most of what capabilities researchers consider “AI safety” is stuff like preventing racism in predictive models or reducing bias in language models. These are useful but do not solve the core problem of how to control an agent smarter than you.
The first point can be summarised as “Safety is not independent of capability, because safety is an inherent part of capability for any useful objective.” The second point can be summarised as “Safety is independent of capability, because it is possible to arbitrarily increase the level of one without increasing the other.”
These two arguments can both be true independently, and I personally believe both are true. Would you say the first argument is an accurate representation of your point? If not, how would you adjust it? What do you think of the second argument? Do the experts make more sense when examining their claim through this lens?
Yes you hit the nail on the head understanding my point, thank you. I also think this is what Yann is saying, to go out on a limb: He’s doing AI-safety simultaneously, he considers alignment AS safety.
I guess, maybe, I can see how the 2nd take could be true..but I also can’t think of a practical example, which is my sticking point. Of course, a bomb which can blow-up the moon is partly “capable”, and there is partial-progress to report—but only if we judge it based on limited factors, and exclude certain essential ones (e.g. navigation). I posit we will never avoid judging our real inventions based on what I’d consider essential output:
“Will it not kill us == Does it work?”
It’s a theory, but: I think AI-safety ppl may lose the argument right away, and can sadly be an afterthought (that’s what I’m told, by them), because they are allowing others to define “intelligence/capability” to be free from normal human concerns about our own safety...like I said before, others can go their merry-way making stuff more powerful, calling it “progress”, calling it higher-IQ...but I don’t see how that should earn Capability.
Ah, I see. I thought we were having a sticking point on definitions, but it seems that the definition is part of the point.
So, if I have this right, what you’re saying is:
Currently, the AI community defines capability and safety as two different things. This is very bad. Firstly, because it’s wrong—an unsafe system cannot reasonably be thought of as being capable of achieving anything more complex than predicting cat pictures. Secondly, because it leads to bad outcomes when this paradigm is adopted by AI researchers. Who doesn’t want to make a more capable system? Who wants to slow that down for “safety”? That shit’s boring! What would be better is if the AI community considered safety to be a core metric of capability, just as important as “Is this AI powerful enough to perform the task we want?”.
YES.
You are a gentleman and a scholar for taking the time on this. I wish I could’ve explained it more clearly from the outset.
Glad to help! And hey, clarifying our ideas is half of what discussion is for!
I’d love to see a top-level post on ideas for making this happen, since I think you’re right, even though safety in current AI systems is very different from the problems we would face with AGI-level systems.
Does this remind you of what I’m trying to get at? bc it sure does, to me:
https://twitter.com/ESYudkowsky/status/1537842203543801856?s=20&t=5THtjV5sUU1a7Ge1-venUw
but I’m prob going to stay in the “dumb questions” area and not comment :)
ie. “the feeling I have when someone tries to teach me that human-safety is orthogonal to AI-Capability—in a real implementation, they’d be correlated in some way”
https://twitter.com/KerryLVaughan/status/1536365808594608129?s=20&t=yTDds2nbg4F4J3wqXbsbCA