My question was about what criteria would cause the AI to make a proposal to the human supervisors before executing its plan. In this case, I don’t think the criteria can be that humans are objecting, since they haven’t heard its plan yet.
(Regarding the point that you’re only addressing the scenarios proposed by Yudkowsky et al, see my remark here .)
Why would the humans have “not heard the plan yet”? It is a no-brainer part of this AI’s design that part of the motivation engine (the goals) will be a goal that says “Check with the humans first.” The premise in the paper is that we are discussing an AI that was designed as best we could, BUT it then went maverick anyway: it makes no sense for us to switch, now, to talk about an AI that was actually built without that most elementary of safety precautions!
Quite independently, the AI can use its contextual understanding of the situation. Any intelligent system with such a poor understanding of the context and implications of its plans that it just goes ahead with the first plan off the stack, without thinking about implications, is an intelligent system that will walk out in front of a bus just because it wants to get to the other side of the road. In the case in question you are imagining an AI that would be capable of executing a plan to put all humans into bottles, without thinking for one moment to mention to anybody that it was considering this plan? That makes sense in any version of the real world. Such an AI is an implausible hypothetical.
With respect, your first point doesn’t answer my question. My question was, what criteria would cause the AI to submit a given proposed action or plan for human approval? You might say that the AI submits every proposed atomic action for approval (in this case, the criterion is the trivial one, “always submit proposal”), but this seems unlikely. Regardless, it doesn’t make sense to say the humans have already heard of the plan about which the AI is just now deciding whether to tell them.
In your second point you seem to be suggesting an answer to my question. (Correct me if I’m wrong.) You seem to be suggesting “context.” I’m not sure what is meant by this. Is it reasonable to suppose that the AI would make the decision about whether to “shoot first” or “ask first” based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?
As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge.
You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of “What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of “nontrivialness” and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers” I know many articulate people who could, if they had the time, write a massive book on that subject.
Now, that book would contain a huge number of constraints (little factoids about the situation) about “significant actions”, and the SOURCE of that long list of constraints would be …. the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book.
If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger.
That background knowledge—both its content AND its structure—is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action.
You go on to ask a bizarre question:
Is it reasonable to suppose that the AI would make the decision about whether to “shoot first” or “ask first” based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?
This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed …… but instead of using that knowledge to make a preliminary assessement of whether “shooting first” would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system:
“Does this satisfy my criteria for how satisfied my supervisors will be?”
It would entirely defeat the object of using large numbers of constraints in the system, to use only one constraint. The system design is (assumed to be) such that this is impossible. That is the whole point of the Swarm Relaxation design that I talked about.
My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).
If I may translate what you’re saying into my own terms, you’re saying that for a problem like “shoot first or ask first?” the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I’ll grant that’s a defensible design choice.
Earlier in the thread you said
the AI is supposed to take an action in spite of the fact that it is getting ‴massive feedback‴ from all the humans on the planet, that they do not want this action to be executed.
This is why I have honed in on scenarios where the AI has not yet received feedback on its plan. In these scenarios, the AI presumably must decide (even if the decision is only implicit) whether to consult humans about its plan first, or to go ahead with its plan first (and halt or change course in response to human feedback). To lay my cards on the table, I want to consider three possible policies the AI could have regarding this choice.
Always (or usually) consult first. We can rule this out as impractical, if the AI is making a large number of atomic actions.
Always (or usually) shoot first, and see what the response is. Unless the AI only makes friendly plans, I think this policy is catastrophic, since I believe there are many scenarios where an AI could initiate a plan and before we know what hit us we’re in an unrecoverably bad situation. Therefore, implementing this policy in a non-catastrophic way is FAI-complete.
Have some good critera for picking between “shoot first” or “ask first” on any given chunk of planning. This is what you seem to be favoring in your answer above. (Correct me if I’m wrong.) These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way. Regardless, I fear making good choices between “shoot first” or “ask first” is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.
Can you let me know: have I understood you correctly? More importantly, do you agree with my framing of the dilemma for the AI? Do you agree with my assessment of the pitfalls of each of the 3 policies?
I am with you on your rejection of 1 and 2, if only because they are both framed as absolutes which ignore context.
And, yes, I do favor 3. However, you insert some extra wording that I don’t necessarily buy....
These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way.
You see, hidden in these words seems to be an understanding of how the AI is working, that might lead you to see a huge problem, and me to see something very different. I don’t know if this is really what you are thinking, but bear with me while I run with this for a moment.
Trying to formulate criteria for something, in an objective, ‘codified’ way, can sometimes be incredibly hard even when most people would say they have internal ‘judgement’ that allowed them to make a ruling very easily: the standard saw being “I cannot define what ‘pornography’ is, but I know it when I see it.” And (stepping quickly away from that example because I don’t want to get into that quagmire) there is a much more concrete example in the old interactive activation model of word recognition, which is a simple constraint system. In IAC, word recognition is remarkably robust in the face of noise, whereas attempts to write symbolic programs to deal with all the different kinds of noisy corruption of the image turn out to be horribly complex and faulty.
As you can see, I am once again pointing to the fact that Swarm Relaxation systems (understood in the very broad sense that allows all varieties of neural net to be included) can make criterial decisions seem easy, where explicit codification of the decision is a nightmare.
So, where does that lead to? Well, you go on to say:
Regardless, I fear making good choices between “shoot first” or “ask first” is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.
The key phrase here is “Screw up once, and...”. In a constraint system it is impossible for one screw-up (one faulty constraint) to unbalance the whole system. That is the whole raison-d’etre of constraint systems.
Also, you say that the problem of making good choices might be FAI-complete. Now, I have some substantial quibbles with that whole “FAI-complete” idea, but in this case I will just ask a question: are you tring to say that in order to DESIGN the motivation system of the AI in such a way that it will not make one catastrophic choice between shoot-first and ask-first, we must FIRST build a FAI, because that is the only way we can get enough intelligence-horsepower applied to the problem? If so, why exactly would we need to? If the constraint system just cannot allow single failures to get out of control, we don’t need to specify every possible criterial decision in advance, we simply rely on context to do the heavy lifting, in perpetuity.
Put another way: the constraint-based AI IS the FAI already, and the reasons for thinking that it can deal with all the potentially troublesome cases have nothing to do with us anticipating every potential troublesome case, ahead of time.
--
Stepping back a moment, consider the following three kinds of case where the AI might have to make a decision.
1) An interstellar asteroid appears from nowhere, travelling at unthinkable speed, and it is going to make a direct hit on the Earth in one hour, with no possibility of survivors. The AI considers a plan in which it quietly euthanizes all life, on the grounds that any other option would lead to one hour of horror, followed by certain death.
2) The AI considers the Dopamine Drip plan.
3) The AI suddenly becomes aware that a rare, precious species of bird has become endangered and the only surviving pair is on a nature trail that is about to be filled with a gang of humans who have been planning a holiday on that trail for months. The gang is approaching the pair right now and one of the birds will die if frightened because it has a heart condition. One plan is to block the humans without explaining (until later), which will inconvenience them.
In all three cases there is a great deal of background information (constraints) that could be brought to bear, and if the AI is constraint-based, it will consider that information. People do this all the time.
In no case is there ONLY a small number of constraints (like, 2 or 3) that are relevant. Where the number of constraints is tiny, there is a chance for a “bad choice” to be made. In fact, I would argue that it is inconceivable that a decision would take place in a near-vacuum of constraints. The more significant the decision, the greater the number of constraints. The bird situation is without doubt the one that has the fewest, but it still involves a fistful of considerations. For this reason, we would expect that all major decisions—and especially the existential threat ones like 1 and 2 -- would involve a very large number of constraints indeed. It is this mass effect that is at the heart of claims that the constraint approach leads to AI that cannot get into bizarre reasoning episodes.
Finally, notice that in case 1, we are in a situation where (unlike case 2) many humans would say that there is no good decision.
My question was about what criteria would cause the AI to make a proposal to the human supervisors before executing its plan. In this case, I don’t think the criteria can be that humans are objecting, since they haven’t heard its plan yet.
(Regarding the point that you’re only addressing the scenarios proposed by Yudkowsky et al, see my remark here .)
That is easy:
Why would the humans have “not heard the plan yet”? It is a no-brainer part of this AI’s design that part of the motivation engine (the goals) will be a goal that says “Check with the humans first.” The premise in the paper is that we are discussing an AI that was designed as best we could, BUT it then went maverick anyway: it makes no sense for us to switch, now, to talk about an AI that was actually built without that most elementary of safety precautions!
Quite independently, the AI can use its contextual understanding of the situation. Any intelligent system with such a poor understanding of the context and implications of its plans that it just goes ahead with the first plan off the stack, without thinking about implications, is an intelligent system that will walk out in front of a bus just because it wants to get to the other side of the road. In the case in question you are imagining an AI that would be capable of executing a plan to put all humans into bottles, without thinking for one moment to mention to anybody that it was considering this plan? That makes sense in any version of the real world. Such an AI is an implausible hypothetical.
With respect, your first point doesn’t answer my question. My question was, what criteria would cause the AI to submit a given proposed action or plan for human approval? You might say that the AI submits every proposed atomic action for approval (in this case, the criterion is the trivial one, “always submit proposal”), but this seems unlikely. Regardless, it doesn’t make sense to say the humans have already heard of the plan about which the AI is just now deciding whether to tell them.
In your second point you seem to be suggesting an answer to my question. (Correct me if I’m wrong.) You seem to be suggesting “context.” I’m not sure what is meant by this. Is it reasonable to suppose that the AI would make the decision about whether to “shoot first” or “ask first” based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?
As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge.
You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of “What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of “nontrivialness” and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers” I know many articulate people who could, if they had the time, write a massive book on that subject.
Now, that book would contain a huge number of constraints (little factoids about the situation) about “significant actions”, and the SOURCE of that long list of constraints would be …. the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book.
If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger.
That background knowledge—both its content AND its structure—is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action.
You go on to ask a bizarre question:
This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed …… but instead of using that knowledge to make a preliminary assessement of whether “shooting first” would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system:
It would entirely defeat the object of using large numbers of constraints in the system, to use only one constraint. The system design is (assumed to be) such that this is impossible. That is the whole point of the Swarm Relaxation design that I talked about.
My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).
If I may translate what you’re saying into my own terms, you’re saying that for a problem like “shoot first or ask first?” the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I’ll grant that’s a defensible design choice.
Earlier in the thread you said
This is why I have honed in on scenarios where the AI has not yet received feedback on its plan. In these scenarios, the AI presumably must decide (even if the decision is only implicit) whether to consult humans about its plan first, or to go ahead with its plan first (and halt or change course in response to human feedback). To lay my cards on the table, I want to consider three possible policies the AI could have regarding this choice.
Always (or usually) consult first. We can rule this out as impractical, if the AI is making a large number of atomic actions.
Always (or usually) shoot first, and see what the response is. Unless the AI only makes friendly plans, I think this policy is catastrophic, since I believe there are many scenarios where an AI could initiate a plan and before we know what hit us we’re in an unrecoverably bad situation. Therefore, implementing this policy in a non-catastrophic way is FAI-complete.
Have some good critera for picking between “shoot first” or “ask first” on any given chunk of planning. This is what you seem to be favoring in your answer above. (Correct me if I’m wrong.) These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way. Regardless, I fear making good choices between “shoot first” or “ask first” is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.
Can you let me know: have I understood you correctly? More importantly, do you agree with my framing of the dilemma for the AI? Do you agree with my assessment of the pitfalls of each of the 3 policies?
I am with you on your rejection of 1 and 2, if only because they are both framed as absolutes which ignore context.
And, yes, I do favor 3. However, you insert some extra wording that I don’t necessarily buy....
You see, hidden in these words seems to be an understanding of how the AI is working, that might lead you to see a huge problem, and me to see something very different. I don’t know if this is really what you are thinking, but bear with me while I run with this for a moment.
Trying to formulate criteria for something, in an objective, ‘codified’ way, can sometimes be incredibly hard even when most people would say they have internal ‘judgement’ that allowed them to make a ruling very easily: the standard saw being “I cannot define what ‘pornography’ is, but I know it when I see it.” And (stepping quickly away from that example because I don’t want to get into that quagmire) there is a much more concrete example in the old interactive activation model of word recognition, which is a simple constraint system. In IAC, word recognition is remarkably robust in the face of noise, whereas attempts to write symbolic programs to deal with all the different kinds of noisy corruption of the image turn out to be horribly complex and faulty.
As you can see, I am once again pointing to the fact that Swarm Relaxation systems (understood in the very broad sense that allows all varieties of neural net to be included) can make criterial decisions seem easy, where explicit codification of the decision is a nightmare.
So, where does that lead to? Well, you go on to say:
The key phrase here is “Screw up once, and...”. In a constraint system it is impossible for one screw-up (one faulty constraint) to unbalance the whole system. That is the whole raison-d’etre of constraint systems.
Also, you say that the problem of making good choices might be FAI-complete. Now, I have some substantial quibbles with that whole “FAI-complete” idea, but in this case I will just ask a question: are you tring to say that in order to DESIGN the motivation system of the AI in such a way that it will not make one catastrophic choice between shoot-first and ask-first, we must FIRST build a FAI, because that is the only way we can get enough intelligence-horsepower applied to the problem? If so, why exactly would we need to? If the constraint system just cannot allow single failures to get out of control, we don’t need to specify every possible criterial decision in advance, we simply rely on context to do the heavy lifting, in perpetuity.
Put another way: the constraint-based AI IS the FAI already, and the reasons for thinking that it can deal with all the potentially troublesome cases have nothing to do with us anticipating every potential troublesome case, ahead of time.
--
Stepping back a moment, consider the following three kinds of case where the AI might have to make a decision.
1) An interstellar asteroid appears from nowhere, travelling at unthinkable speed, and it is going to make a direct hit on the Earth in one hour, with no possibility of survivors. The AI considers a plan in which it quietly euthanizes all life, on the grounds that any other option would lead to one hour of horror, followed by certain death.
2) The AI considers the Dopamine Drip plan.
3) The AI suddenly becomes aware that a rare, precious species of bird has become endangered and the only surviving pair is on a nature trail that is about to be filled with a gang of humans who have been planning a holiday on that trail for months. The gang is approaching the pair right now and one of the birds will die if frightened because it has a heart condition. One plan is to block the humans without explaining (until later), which will inconvenience them.
In all three cases there is a great deal of background information (constraints) that could be brought to bear, and if the AI is constraint-based, it will consider that information. People do this all the time.
In no case is there ONLY a small number of constraints (like, 2 or 3) that are relevant. Where the number of constraints is tiny, there is a chance for a “bad choice” to be made. In fact, I would argue that it is inconceivable that a decision would take place in a near-vacuum of constraints. The more significant the decision, the greater the number of constraints. The bird situation is without doubt the one that has the fewest, but it still involves a fistful of considerations. For this reason, we would expect that all major decisions—and especially the existential threat ones like 1 and 2 -- would involve a very large number of constraints indeed. It is this mass effect that is at the heart of claims that the constraint approach leads to AI that cannot get into bizarre reasoning episodes.
Finally, notice that in case 1, we are in a situation where (unlike case 2) many humans would say that there is no good decision.