This is just a placeholder: I will try to reply to this properly later.
Meanwhile, I only want to add one little thing.
Don’t forget that all of this analysis is supposed to be about situations in which we have, so to speak “done our best” with the AI design. That is sort of built into the premise. If there is a no-brainer change we can make to the design of the AI, to guard against some failure mode, then is assumed that this has been done.
The reason for that is that the basic premise of these scenarios is “We did our best to make the thing friendly, but in spite of all that effort, it went off the rails.”
For that reason, I am not really making arguments about the characteristics of a “generic” AI.
Maybe I could try to reduce possible confusion here. The paper was written to address a category of “AI Risk” scenarios in which we are told:
“Even if the AI is programmed with goals that are ostensibly favorable to humankind, it could execute those goals in such a way that would lead to disaster”.
Given that premise, it would be a bait-and-switch if I proposed a fix for this problem, and someone objected with “But you cannot ASSUME that the programmers would implement that fix!”
The whole point of the problem under consideration is that even if the engineers tried, they could not get the AI to stay true.
Yudkowsky et al don’t argue that the problem is unsolvable, only that it is hard. In particular, Yudkowsky fears it may be harder than creating AI in the first place, which would mean that in the natural evolution of things, UFAI appears before FAI. However, I needn’t factor what I’m saying through the views of Yudkowsky. For an even more modest claim, we don’t have to believe that FAI is hard in hindsight in order to claim that AI will be unfriendly unless certain failure modes are guarded against. On this view of the FAI project, a large part of the effort is just noticing the possible failure modes that were only obvious in hindsight, and convincing people that the problem is important and won’t solve itself.
The problem with you objecting to the particular scenarios Yudkowsky et al propose is that the scenarios are merely illustrative. Of course, you can probably guard against any specific failure mode. The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.
Mind you, I know your argument is more than just “I can see why these particular disasters could be avoided”. You’re claiming that certain features of AI will in general tend to make it careful and benevolent. Still, I don’t think it’s valid for you to complain about bait-and-switch, since that’s precisely the problem.
I have explicitly addressed this point on many occasions. My paper had nothing in it that was specific to any failure mode.
The suggestion is that the entire class of failure modes suggested by Yudkowsky et al. has a common feature: they all rely on the AI being incapable of using a massive array of contextual constraints when evaluating plans.
By simply proposing an AI in which such massive constraint deployment is the norm, the ball is now in the other court: it is up to Yudkowsky et al. to come up with ANY kind of failure mode that could get through.
The scenarios I attacked in the paper have the common feature that they have been predicated on such a simplistic type of AI that they were bound to fail. They had failure built into them.
As soon as everyone moves on from those “dumb” superintelligences and starts to discuss the possible failure modes that could occur in a superintelligence that makes maximum use of constraints, we can start to talk about possible AI dangers. I’m ready to do that. Just waiting for it to happen, is all.
Failure Mode I: The AI doesn’t do anything useful, because there’s no way of satisfying every contextual constraint.
Predicting your response: “That’s not what I meant.”
Failure Mode II: The AI weighs contextual constraints incorrectly and sterilizes all humans to satisfy the sort of person who believes in Voluntary Human Extinction.
Predicting your response: “It would (somehow) figure out the correct weighting for all the contextual constraints.”
Failure Mode III: The AI weighs contextual constraints correctly (for a given value of “correctly”) and sterilizes everybody of below-average intelligence or any genetic abnormalities that could impose costs on offspring, and in the process, sterilizes all humans.
Predicting your response: “It wouldn’t do something so dumb.”
Failure Mode IV: The AI weighs contextual constraints correctly and puts all people of minority ethical positions into mind-rewriting machines so that there’s no disagreement anymore.
Predicting your response: “It wouldn’t do something so dumb.”
We could keep going, but the issue is that so far, you’ve defined -any- failure mode as “dumb”ness, and have argued that the AI wouldn’t do anything so “dumb”, because you’ve already defined that it is superintelligent.
I don’t think you know what intelligence -is-. Intelligence does not confer immunity to “dumb” behaviors.
It’s got to confer some degree of dumbness avoidance.
In any case, MIRI has already conceded that superintelligent AIs won’t misbehave through stupidity.
They maintain the problem is motivation … the Genie KNOWS but doesn’t CARE.
It’s got to confer some degree of dumbness avoidance.
Does it? On what grounds?
In any case, MIRI has already conceded that superintelligent AIs won’t misbehave through stupidity. They maintain the problem is motivation … the Genie KNOWS but doesn’t CARE.
That’s putting an alien intelligence in human terms; the very phrasing inappropriately anthropomorphizes the genie.
We probably won’t go anywhere without an example.
Market economics (“capitalism”) is an intelligence system which is very similar to the intelligence system Richard is proposing. Very, very similar; it’s composed entirely of independent nodes (seven billion of them) which each provide their own set of constraints, and promote or demote information as it passes through them based on those constraints. It’s an alien intelligence which follows Richard’s model which we are very familiar with. Does the market “know” anything? Does it even make sense to suggest that market economics -could- care?
Does the market always arrive at the correct conclusions? Does it even consistently avoid stupid conclusions?
How difficult is it to program the market to behave in specific ways?
Is the market “friendly”?
Does it make sense to say that the market is “stupid”? Does the concept “stupid” -mean- anything when talking about the market?
On the grounds of the opposite meanings of dumbness and intelligence.
Dumbness isn’t merely the opposite of intelligence.
Take it up with the author,
I don’t need to.
Economic systems affect us because wrong are part of them. How is an some neither-intelligent-nor-stupid-system in a box supposed to effect us?
Not really relevant to the discussion at hand.
And if AIs are neither-intelligent-nor-stupid, why are they called AIs?
Every AI we’ve created so far has resulted in the definition of “AI” being changed to not include what we just created. So I guess the answer is a combination of optimism and the word “AI” having poor descriptive power.
And if AIs are alien, why are they able to do comprehensible and useful thing like winning jeopardy and guiding us to our destinations.
What makes you think an alien intelligence should be useless?
That’s about a quarter of an argument. You need to show that AI research is some kind of random shot into mind space, and not anthropomorphically biased for the reasons given.
The relevant part of the argument is this: “whose dimensions we mostly haven’t even identified yet.”
If we created an AI mind which was 100% human, as far as we’ve yet defined the human mind, we have absolutely no idea how human that AI mind would actually behave. The unknown unknowns dominate.
Failure Mode I: The AI doesn’t do anything useful, because there’s no way of satisfying every contextual constraint.
An elementary error. The constraints in question are referred to in the literature as “weak” constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.
Predicting your response: “That’s not what I meant.”
That’s an insult. But I will overlook it, since I know it is just your style.
Failure Mode II: The AI weighs contextual constraints incorrectly and sterilizes all humans to satisfy the sort of person who believes in Voluntary Human Extinction.
How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?
That is implicit in the way that weak constraint systems are built. Perhaps you are not familiar with the details.
Predicting your response: “It would (somehow) figure out the correct weighting for all the contextual constraints.”
Assuming this isn’t more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.
Failure Mode III: The AI weighs contextual constraints correctly (for a given value of “correctly”) and sterilizes everybody of below-average intelligence or any genetic abnormalities that could impose costs on offspring, and in the process, sterilizes all humans.
A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren’t.
Predicting your response: “It wouldn’t do something so dumb.”
Yet another insult. This is getting a little tiresome, but I will carry on.
Failure Mode IV: The AI weighs contextual constraints correctly and puts all people of minority ethical positions into mind-rewriting machines so that there’s no disagreement anymore.
This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.
Predicting your response: “It wouldn’t do something so dumb.”
No comment.
We could keep going, but the issue is that so far, you’ve defined -any- failure mode as “dumb”ness, and have argued that the AI wouldn’t do anything so “dumb”, because you’ve already defined that it is superintelligent.
This is a bizarre statement, since I have said no such thing. Would you mind including citations, from now on, when you say that I “said” something? And please try not to paraphrase, because it takes time to correct the distortions in your paraphrases.
I don’t think you know what intelligence -is-. Intelligence does not confer immunity to “dumb” behaviors.
Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.
An elementary error. The constraints in question are referred to in the literature as “weak” constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.
I understand the concept.
How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?
I’d hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn’t that thousands of failures occur. The issue is that thousands of failures -always- occur.
Assuming this isn’t more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.
The problem is solved only for well-understood (and very limited) problem domains with comprehensive training sets.
A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren’t.
They were counted. They are, however, weak constraints. The constraints which required human extinction outweighed them, as they do for countless human beings. Fortunately for us in this imagined scenario, the constraints against killing people counted for more.
This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.
Again, they weren’t ignored. They are, as you say, weak constraints. Other constraints overrode them.
Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.
The issue here isn’t my lack of understanding. The issue here is that you are implicitly privileging some constraints over others without any justification.
Every single conclusion I reached here is one that humans—including very intelligence humans—have reached. By dismissing them as possible conclusions an AI could reach, you’re implicitly rejecting every argument pushed for each of these positions without first considering them. The “weak constraints” prevent them.
I didn’t choose -wrong- conclusions, you see, I just chose -unpopular- conclusions, conclusions I knew you’d find objectionable. You should have noticed that; you didn’t, because you were too concerned with proving that AI wouldn’t do them. You were too concerned with your destination, and didn’t pay any attention to your travel route.
If doing nothing is the correct conclusion, your AI should do nothing. If human extinction is the correct conclusion, your AI should choose human extinction. If sterilizing people with unhealthy genes is the correct conclusion, your AI should sterilize people with unhealthy genes (you didn’t notice that humans didn’t necessarily go extinct in that scenario). If rewriting minds is the correct conclusion, your AI should rewrite minds.
And if your constraints prevent the AI from undertaking the correct conclusion?
Then your constraints have made your AI stupid, for some value of “stupid”.
The issue, of course, is that you have decided that you know better what is or is not the correct conclusion than an intelligence you are supposedly creating to know things better than you.
How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?
And your reply was:
I’d hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn’t that thousands of failures occur. The issue is that thousands of failures -always- occur.
This reveals that you are really not understanding what a weak constraint system is, and where the system is located.
When the human mind looks at a scene and uses a thousand clues in the scene to constrain the interpretation of it, those thousand clues all, when the network settles, relax into a state in which most or all of them agree about what is being seen. You don’t get “less than 70%” agreement on the interpretation of the scene! If even one element of the scene violates a constraint in a strong way, the mind orients toward the violation extremely rapidly.
The same story applies to countless other examples of weak constraint relaxation systems dropping down into energy minima.
Let me know when you do understand what you are talking about, and we can resume.
There is no energy minimum, if your goal is Friendliness. There is no “correct” answer. No matter what your AI does, no matter what architecture it uses, with respect to human goals and concerns, there is going to be a sizable percentage to whom it is unequivocally Unfriendly.
This isn’t an image problem. The first problem you have to solve in order to train the system is—what are you training it to do?
You’re skipping the actual difficult issue in favor of an imaginary, and easy to solve, issue.
there is going to be a sizable percentage to whom it is unequivocally Unfriendly
Unfriendly is an equivocal term.
“Friendliness” is ambiguous. It can mean safety, ie not making things worse, or it can mean making things better, creating paradise on Earth.
Friendliness in the second sense is a superset of morality. A friendly AI will be moral, a moral AI will not necessarily be friendly.
“Unfriendliness” is similarly ambiguous: an unfriendly AI may be downright dangerous; or it might have enough grasp of ethics to be safe, but not enough to be able to make the world a much more fun place for humans. Unfriendliness in the second sense is not, strictly speaking a safety issue.
A lot of people are able to survive the fact that some institutions, movements and ideologies are unfriendly to them, for some value of unfriendly. Unfriendliness doesn’t have to be terminal.
The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.
I doubt that, since, coupled with claims of existential risk, the logical conclusion would be to halt AI research , but MIRI isnt saying that,
There are other methods than “sitting around thinking of as many exotic disaster scenarios as possible” by which one could seek to make AI friendly. Thus, believing that “sitting around [...]” will not be sufficient does not imply that we should halt AI research.
Don’t forget that all of this analysis is supposed to be about situations in which we have, so to speak “done our best” with the AI design. That is sort of built into the premise. If there is a no-brainer change we can make to the design of the AI, to guard against some failure mode, then is assumed that this has been done.
I feel like this could be an endless source of confusion and disagreement; if we’re trying to discuss what makes airplanes fly or crash, should we assume that engineers have done their best and made every no-brainer change? I’d rather we look for the underlying principles, we codify best practices, we come up with lists and tests.
If we’re trying to discuss what makes airplanes fly or crash, should we assume that engineers have done their best and made every no-brainer change?
If you are in the business of pointing out to them potential problems they are not aware of, then yes, because they can be assumed to be aware of no brainer issues.
MIRI seeks to point out dangers in AI that aren’t the result of gross incompetence or deliberate attempts to weaponise AI: it’s banal to point out that these could read to danger.
This is just a placeholder: I will try to reply to this properly later.
Meanwhile, I only want to add one little thing.
Don’t forget that all of this analysis is supposed to be about situations in which we have, so to speak “done our best” with the AI design. That is sort of built into the premise. If there is a no-brainer change we can make to the design of the AI, to guard against some failure mode, then is assumed that this has been done.
The reason for that is that the basic premise of these scenarios is “We did our best to make the thing friendly, but in spite of all that effort, it went off the rails.”
For that reason, I am not really making arguments about the characteristics of a “generic” AI.
Maybe I could try to reduce possible confusion here. The paper was written to address a category of “AI Risk” scenarios in which we are told:
Given that premise, it would be a bait-and-switch if I proposed a fix for this problem, and someone objected with “But you cannot ASSUME that the programmers would implement that fix!”
The whole point of the problem under consideration is that even if the engineers tried, they could not get the AI to stay true.
Yudkowsky et al don’t argue that the problem is unsolvable, only that it is hard. In particular, Yudkowsky fears it may be harder than creating AI in the first place, which would mean that in the natural evolution of things, UFAI appears before FAI. However, I needn’t factor what I’m saying through the views of Yudkowsky. For an even more modest claim, we don’t have to believe that FAI is hard in hindsight in order to claim that AI will be unfriendly unless certain failure modes are guarded against. On this view of the FAI project, a large part of the effort is just noticing the possible failure modes that were only obvious in hindsight, and convincing people that the problem is important and won’t solve itself.
If no one is building AIs with utility functions, then the one kind of failure MIRI is talking about has solved itself,
The problem with you objecting to the particular scenarios Yudkowsky et al propose is that the scenarios are merely illustrative. Of course, you can probably guard against any specific failure mode. The claim is that there will be a lot of failure modes, and we can’t expect to guard against all of them by just sitting around thinking of as many exotic disaster scenarios as possible.
Mind you, I know your argument is more than just “I can see why these particular disasters could be avoided”. You’re claiming that certain features of AI will in general tend to make it careful and benevolent. Still, I don’t think it’s valid for you to complain about bait-and-switch, since that’s precisely the problem.
I have explicitly addressed this point on many occasions. My paper had nothing in it that was specific to any failure mode.
The suggestion is that the entire class of failure modes suggested by Yudkowsky et al. has a common feature: they all rely on the AI being incapable of using a massive array of contextual constraints when evaluating plans.
By simply proposing an AI in which such massive constraint deployment is the norm, the ball is now in the other court: it is up to Yudkowsky et al. to come up with ANY kind of failure mode that could get through.
The scenarios I attacked in the paper have the common feature that they have been predicated on such a simplistic type of AI that they were bound to fail. They had failure built into them.
As soon as everyone moves on from those “dumb” superintelligences and starts to discuss the possible failure modes that could occur in a superintelligence that makes maximum use of constraints, we can start to talk about possible AI dangers. I’m ready to do that. Just waiting for it to happen, is all.
Alright, I’ll take you up on it:
Failure Mode I: The AI doesn’t do anything useful, because there’s no way of satisfying every contextual constraint.
Predicting your response: “That’s not what I meant.”
Failure Mode II: The AI weighs contextual constraints incorrectly and sterilizes all humans to satisfy the sort of person who believes in Voluntary Human Extinction.
Predicting your response: “It would (somehow) figure out the correct weighting for all the contextual constraints.”
Failure Mode III: The AI weighs contextual constraints correctly (for a given value of “correctly”) and sterilizes everybody of below-average intelligence or any genetic abnormalities that could impose costs on offspring, and in the process, sterilizes all humans.
Predicting your response: “It wouldn’t do something so dumb.”
Failure Mode IV: The AI weighs contextual constraints correctly and puts all people of minority ethical positions into mind-rewriting machines so that there’s no disagreement anymore.
Predicting your response: “It wouldn’t do something so dumb.”
We could keep going, but the issue is that so far, you’ve defined -any- failure mode as “dumb”ness, and have argued that the AI wouldn’t do anything so “dumb”, because you’ve already defined that it is superintelligent.
I don’t think you know what intelligence -is-. Intelligence does not confer immunity to “dumb” behaviors.
It’s got to confer some degree of dumbness avoidance.
In any case, MIRI has already conceded that superintelligent AIs won’t misbehave through stupidity. They maintain the problem is motivation … the Genie KNOWS but doesn’t CARE.
Does it? On what grounds?
That’s putting an alien intelligence in human terms; the very phrasing inappropriately anthropomorphizes the genie.
We probably won’t go anywhere without an example.
Market economics (“capitalism”) is an intelligence system which is very similar to the intelligence system Richard is proposing. Very, very similar; it’s composed entirely of independent nodes (seven billion of them) which each provide their own set of constraints, and promote or demote information as it passes through them based on those constraints. It’s an alien intelligence which follows Richard’s model which we are very familiar with. Does the market “know” anything? Does it even make sense to suggest that market economics -could- care?
Does the market always arrive at the correct conclusions? Does it even consistently avoid stupid conclusions?
How difficult is it to program the market to behave in specific ways?
Is the market “friendly”?
Does it make sense to say that the market is “stupid”? Does the concept “stupid” -mean- anything when talking about the market?
On the grounds of the opposite meanings of dumbness and intelligence.
Take it up with the author,
Economic systems affect us because wrong are part of them. How is an some neither-intelligent-nor-stupid-system in a box supposed to effect us?
And if AIs are neither-intelligent-nor-stupid, why are they called AIs?
And if AIs are alien, why are they able to do comprehensible and useful thing like winning jeopardy and guiding us to our destinations.
Dumbness isn’t merely the opposite of intelligence.
I don’t need to.
Not really relevant to the discussion at hand.
Every AI we’ve created so far has resulted in the definition of “AI” being changed to not include what we just created. So I guess the answer is a combination of optimism and the word “AI” having poor descriptive power.
What makes you think an alien intelligence should be useless?
What makes you think that a thing designed by humans to be useful to humans, which is useful to humans would be alien?
Because “human” is a tiny piece of a potential mindspace whose dimensions we mostly haven’t even identified yet.
That’s about a quarter of an argument. You need to show that AI research is some kind of random shot into mind space, and not anthropomorphically biased for the reasons given.
The relevant part of the argument is this: “whose dimensions we mostly haven’t even identified yet.”
If we created an AI mind which was 100% human, as far as we’ve yet defined the human mind, we have absolutely no idea how human that AI mind would actually behave. The unknown unknowns dominate.
Alien isnt the most transparent term to use fir human unknowns.
I will take them one at a time:
An elementary error. The constraints in question are referred to in the literature as “weak” constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.
That’s an insult. But I will overlook it, since I know it is just your style.
How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?
That is implicit in the way that weak constraint systems are built. Perhaps you are not familiar with the details.
Assuming this isn’t more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.
A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren’t.
Yet another insult. This is getting a little tiresome, but I will carry on.
This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.
No comment.
This is a bizarre statement, since I have said no such thing. Would you mind including citations, from now on, when you say that I “said” something? And please try not to paraphrase, because it takes time to correct the distortions in your paraphrases.
Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.
I understand the concept.
I’d hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn’t that thousands of failures occur. The issue is that thousands of failures -always- occur.
The problem is solved only for well-understood (and very limited) problem domains with comprehensive training sets.
They were counted. They are, however, weak constraints. The constraints which required human extinction outweighed them, as they do for countless human beings. Fortunately for us in this imagined scenario, the constraints against killing people counted for more.
Again, they weren’t ignored. They are, as you say, weak constraints. Other constraints overrode them.
The issue here isn’t my lack of understanding. The issue here is that you are implicitly privileging some constraints over others without any justification.
Every single conclusion I reached here is one that humans—including very intelligence humans—have reached. By dismissing them as possible conclusions an AI could reach, you’re implicitly rejecting every argument pushed for each of these positions without first considering them. The “weak constraints” prevent them.
I didn’t choose -wrong- conclusions, you see, I just chose -unpopular- conclusions, conclusions I knew you’d find objectionable. You should have noticed that; you didn’t, because you were too concerned with proving that AI wouldn’t do them. You were too concerned with your destination, and didn’t pay any attention to your travel route.
If doing nothing is the correct conclusion, your AI should do nothing. If human extinction is the correct conclusion, your AI should choose human extinction. If sterilizing people with unhealthy genes is the correct conclusion, your AI should sterilize people with unhealthy genes (you didn’t notice that humans didn’t necessarily go extinct in that scenario). If rewriting minds is the correct conclusion, your AI should rewrite minds.
And if your constraints prevent the AI from undertaking the correct conclusion?
Then your constraints have made your AI stupid, for some value of “stupid”.
The issue, of course, is that you have decided that you know better what is or is not the correct conclusion than an intelligence you are supposedly creating to know things better than you.
And that sums up the issue.
I said:
And your reply was:
This reveals that you are really not understanding what a weak constraint system is, and where the system is located.
When the human mind looks at a scene and uses a thousand clues in the scene to constrain the interpretation of it, those thousand clues all, when the network settles, relax into a state in which most or all of them agree about what is being seen. You don’t get “less than 70%” agreement on the interpretation of the scene! If even one element of the scene violates a constraint in a strong way, the mind orients toward the violation extremely rapidly.
The same story applies to countless other examples of weak constraint relaxation systems dropping down into energy minima.
Let me know when you do understand what you are talking about, and we can resume.
There is no energy minimum, if your goal is Friendliness. There is no “correct” answer. No matter what your AI does, no matter what architecture it uses, with respect to human goals and concerns, there is going to be a sizable percentage to whom it is unequivocally Unfriendly.
This isn’t an image problem. The first problem you have to solve in order to train the system is—what are you training it to do?
You’re skipping the actual difficult issue in favor of an imaginary, and easy to solve, issue.
Unfriendly is an equivocal term.
“Friendliness” is ambiguous. It can mean safety, ie not making things worse, or it can mean making things better, creating paradise on Earth.
Friendliness in the second sense is a superset of morality. A friendly AI will be moral, a moral AI will not necessarily be friendly.
“Unfriendliness” is similarly ambiguous: an unfriendly AI may be downright dangerous; or it might have enough grasp of ethics to be safe, but not enough to be able to make the world a much more fun place for humans. Unfriendliness in the second sense is not, strictly speaking a safety issue.
A lot of people are able to survive the fact that some institutions, movements and ideologies are unfriendly to them, for some value of unfriendly. Unfriendliness doesn’t have to be terminal.
Everything is equivocal to someone. Do you disagree with my fundamental assertion?
I can’t answer unequivocally for the reasons given.
There won’t be a sizeable percentage to whom the AI is unfriendly in the sense of obliterating them.
There might well be a percentage to whom the AI is unfriendly in some business as usual sense.
Obliterating them is only bad by your ethical system. Other ethical systems may hold other things to be even worse.
Irrelevant.
You responded to me in this case. It’s wholly relevant to my point that You-Friendly AI isn’t a sufficient condition for Human-Friendly AI.
However there are a lot of “wrong” answers.
I doubt that, since, coupled with claims of existential risk, the logical conclusion would be to halt AI research , but MIRI isnt saying that,
There are other methods than “sitting around thinking of as many exotic disaster scenarios as possible” by which one could seek to make AI friendly. Thus, believing that “sitting around [...]” will not be sufficient does not imply that we should halt AI research.
So where are the multiple solutions to the multiple failure modes?
Thanks, and take your time!
I feel like this could be an endless source of confusion and disagreement; if we’re trying to discuss what makes airplanes fly or crash, should we assume that engineers have done their best and made every no-brainer change? I’d rather we look for the underlying principles, we codify best practices, we come up with lists and tests.
If you are in the business of pointing out to them potential problems they are not aware of, then yes, because they can be assumed to be aware of no brainer issues.
MIRI seeks to point out dangers in AI that aren’t the result of gross incompetence or deliberate attempts to weaponise AI: it’s banal to point out that these could read to danger.