The discussion will necessarily be confused unless we propose a mechanism how the AI answers the questions.
I suppose that to be smart enough to answer complex questions, the AI must have an ability to model the world. For example, Google Maps only has information about roads, so it can only answer questions about roads. It cannot even tell you “generally, this would be a good road, but I found on internet that tomorrow there will be some celebration in that area, so I inferred that the road could be blocked and it would be safer to plan another road”. Or it cannot tell you “I recommend this other way, although it is a bit longer, because the gas stations are cheaper along that way, and from our previous conversations it seems to me that you care about the price more than about the time or distance per se”. So we have a choice between an AI looking at a specified domain and ignoring the rest of the universe, and an AI capable of looking at the rest of the universe and finding data relevant to the question. Which one will we use?
The choice of domain-limited AI is safer, but then it is our tasks to specify the domain precisely. The AI, however smart, will simply ignore all the solutions outside of the domain, even if they would be greatly superior to the in-domain answers. In other words, it would be unable to “think out of the box”. You would miss good solutions only because you forgot to ask or simply used a wrong word in the question. For example there could be a relatively simple (for the AI) solution to double the human lifespan, but it would include something that we forgot to specify as a part of medicine, so the AI will never tell us. Or we will ask how to win a war, and the AI could see a relatively simple way to make peace, but it will never think that way, because we did not ask that. Think about the danger of this kind of AI, if you give it more complex questions, for example how to best organize the society. What are the important things you forgot to ask or to include in the problem domain?
On the other hand, a super-human domain-unlimited AI simply has a model of universe, and it is an outcome pump. It includes the model of you, and of your reactions to what it says. Even if it has no concept of manipulation, it just sees your “decision tree” and chooses the optimal path—optimal for maximizing the value of the question you asked. Here we have AI already capable of manipulating humans, and we only need to suppose that it has a model of the world, and a function for deciding which of many possible answers is the best.
If the AI can model humans, it is unsafe. If the AI cannot model humans, it will give wrong answers when the human reactions are part of the problem domain.
I was following you up until your AI achieved godhood. Then we hit a rather sharp disparity in expectations.
Excepting that paragraph, is it fair to sum up your response as, “Not giving the AI sufficient motivational flexibility results in suboptimal results”?
Not allowing AI to model things outside of a narrowly specified domain results in suboptimal results.
(I don’t like the word “motivation”. Either the AI can process some kind of data, or it can not; either because the data are missing, or because the AI’s algorithm does not take them into consideration. For example Google Maps cannot model humans, because it has no such data, and because its algorithm is unable to gather such data.)
I’m not talking about “can” or “can not” model, though; if you ask the AI to psychoanalyze you, it should be capable of modeling you.
I’m talking about—trying to taboo the word here—the system which causes the AI to engage in specific activities.
So in this case, the question is—what mechanism, within the code, causes the algorithm to consider some data or not. Assume a general-use algorithm which can process any kind of meaningful data.
Plugging your general-use algorithm as the mechanism which determines what data to use gives the system considerable flexibility. It also potentially enables the AI to model humans whenever the information is deemed relevant, which could potentially be every time it runs, to try to decipher the question being asked; we’ve agreed that this is dangerous.
(It’s very difficult to discuss this problem without proposing token solutions as examples of the “right” way to do it, even though I know they probably -aren’t- right. Motivation was such a convenient abstraction of the concept.)
Generalizing the question, the issue comes down to the distinction between the AI asking itself what to do next as opposed to determining what the next logical step is. “What should I do next” is in fact a distinct question from “What should I do next to resolve the problem I’m currently considering”.
The system which answers the question “What should I do next” is what I call the motivational system, in the sense of “motive force,” rather than the more common anthropomorphized sense of motivation. It’s possible that this system grants full authority to the logical process to determine what it needs to do—I’d call this an unfettered AI, in the TV Tropes sense of the word. A strong fetter would require the AI to consult its “What should I do next” system for every step in its “What should I do next to resolve the problem I’m currently considering” system.
At this point, have I made a convincing case of the distinction between the motivational system (“What should I do next?”) versus the logical system (“What should I do next to resolve the problem I’m currently considering?”)?
what mechanism, within the code, causes the algorithm to consider some data or not
I like this way to express it. This seems like a successful way to taboo various antropomorphic concepts.
Unfortunately, I don’t understand the distinction between “should do next?” and “should do next to resolve the problem?”. Is the AI supposed to do something else besides solving the users’ problems? Is it supposed to consist of two subsystems: one of them is a general problem solver, and the other one is some kind of a gatekeeper saying: “you are allowed to think about this, but not allowed to think about that?”. If yes, then who decides what data the gatekeeper is allowed to consider? Is gatekeeper the less smart part of the AI? Is the general-problem-solving part allowed to model the gatekeeper?
I wrote an example I erased, based on a possibly apocryphal anecdote by Richard Feynman I am recalling from memory, discussing the motivations for working on the Manhattan Project; the original reasons for starting on the project were to beat Germany to building an atomic bomb; after Germany was defeated, the original reason was outdated, but he (and others sharing his motivation) continued working anyways, solving the immediate problem rather than the one they originally intended to solve.
That’s an example of the logical system and the motivational system being in conflict, even if the anecdote doesn’t turn out to be very accurate. I hope it is suggestive of the distinction.
The motivational system -could- be a gatekeeper, but I suspect that would mean there are substantive issues in how the logical system is devised. It should function as an enabler—as the motive force behind all actions taken within the logical system. And yes, in a sense it should be less intelligent than the logical system; if it considers everything to the same extent the logical system does, it isn’t doing its job, it’s just duplicating the efforts of the logical system.
That is, I’m regarding an ideal motivational system as something that drives the logical system; the logical system shouldn’t be -trying- to trick its motivational system, in something the same way and for the same reason you shouldn’t try to convince yourself of a falsehood.
The issue in describing this is that I can think of plenty of motivational systems, but none which do what we want here. (Granted, if I could, the friendly AI problem might be substantively solved.) I can’t even say for certain that a gatekeeper motivator wouldn’t work.
Part of my mental model of this functional dichotomy, however, is that the logical system is stateless—if the motivational system asks it to evaluate its own solutions, it has to do so only with the information the motivational system gives it. The communication model has a very limited vocabulary. Rules for the system, but not rules for reasoning, are encoded into the motivational system, and govern its internal communications only. The logical system goes as far as it can with what it has, produces a set of candidate solutions and unresolved problems, and passes these back to the motivational system. Unresolved problems might be passed back with additional information necessary to resolve them, depending on the motivational system’s rules.
So in my model-of-my-model, an Asimov-syle AI might hand a problem to its logical system, get several candidate solutions back, and then pass those candidate solutions back into the logical system with the rules of robotics, one by one, asking if this action could violate each rule in turn, discarding any candidate solutions which do.
Manual motivational systems are also conceptually possible, although probably too slow to be of much use.
[My apologies if this response isn’t very good; I’m running short on time, and don’t have any more time for editing, and in particular for deciding which pieces to exclude.]
The discussion will necessarily be confused unless we propose a mechanism how the AI answers the questions.
I suppose that to be smart enough to answer complex questions, the AI must have an ability to model the world. For example, Google Maps only has information about roads, so it can only answer questions about roads. It cannot even tell you “generally, this would be a good road, but I found on internet that tomorrow there will be some celebration in that area, so I inferred that the road could be blocked and it would be safer to plan another road”. Or it cannot tell you “I recommend this other way, although it is a bit longer, because the gas stations are cheaper along that way, and from our previous conversations it seems to me that you care about the price more than about the time or distance per se”. So we have a choice between an AI looking at a specified domain and ignoring the rest of the universe, and an AI capable of looking at the rest of the universe and finding data relevant to the question. Which one will we use?
The choice of domain-limited AI is safer, but then it is our tasks to specify the domain precisely. The AI, however smart, will simply ignore all the solutions outside of the domain, even if they would be greatly superior to the in-domain answers. In other words, it would be unable to “think out of the box”. You would miss good solutions only because you forgot to ask or simply used a wrong word in the question. For example there could be a relatively simple (for the AI) solution to double the human lifespan, but it would include something that we forgot to specify as a part of medicine, so the AI will never tell us. Or we will ask how to win a war, and the AI could see a relatively simple way to make peace, but it will never think that way, because we did not ask that. Think about the danger of this kind of AI, if you give it more complex questions, for example how to best organize the society. What are the important things you forgot to ask or to include in the problem domain?
On the other hand, a super-human domain-unlimited AI simply has a model of universe, and it is an outcome pump. It includes the model of you, and of your reactions to what it says. Even if it has no concept of manipulation, it just sees your “decision tree” and chooses the optimal path—optimal for maximizing the value of the question you asked. Here we have AI already capable of manipulating humans, and we only need to suppose that it has a model of the world, and a function for deciding which of many possible answers is the best.
If the AI can model humans, it is unsafe. If the AI cannot model humans, it will give wrong answers when the human reactions are part of the problem domain.
I was following you up until your AI achieved godhood. Then we hit a rather sharp disparity in expectations.
Excepting that paragraph, is it fair to sum up your response as, “Not giving the AI sufficient motivational flexibility results in suboptimal results”?
Not allowing AI to model things outside of a narrowly specified domain results in suboptimal results.
(I don’t like the word “motivation”. Either the AI can process some kind of data, or it can not; either because the data are missing, or because the AI’s algorithm does not take them into consideration. For example Google Maps cannot model humans, because it has no such data, and because its algorithm is unable to gather such data.)
I’m not talking about “can” or “can not” model, though; if you ask the AI to psychoanalyze you, it should be capable of modeling you.
I’m talking about—trying to taboo the word here—the system which causes the AI to engage in specific activities.
So in this case, the question is—what mechanism, within the code, causes the algorithm to consider some data or not. Assume a general-use algorithm which can process any kind of meaningful data.
Plugging your general-use algorithm as the mechanism which determines what data to use gives the system considerable flexibility. It also potentially enables the AI to model humans whenever the information is deemed relevant, which could potentially be every time it runs, to try to decipher the question being asked; we’ve agreed that this is dangerous.
(It’s very difficult to discuss this problem without proposing token solutions as examples of the “right” way to do it, even though I know they probably -aren’t- right. Motivation was such a convenient abstraction of the concept.)
Generalizing the question, the issue comes down to the distinction between the AI asking itself what to do next as opposed to determining what the next logical step is. “What should I do next” is in fact a distinct question from “What should I do next to resolve the problem I’m currently considering”.
The system which answers the question “What should I do next” is what I call the motivational system, in the sense of “motive force,” rather than the more common anthropomorphized sense of motivation. It’s possible that this system grants full authority to the logical process to determine what it needs to do—I’d call this an unfettered AI, in the TV Tropes sense of the word. A strong fetter would require the AI to consult its “What should I do next” system for every step in its “What should I do next to resolve the problem I’m currently considering” system.
At this point, have I made a convincing case of the distinction between the motivational system (“What should I do next?”) versus the logical system (“What should I do next to resolve the problem I’m currently considering?”)?
I like this way to express it. This seems like a successful way to taboo various antropomorphic concepts.
Unfortunately, I don’t understand the distinction between “should do next?” and “should do next to resolve the problem?”. Is the AI supposed to do something else besides solving the users’ problems? Is it supposed to consist of two subsystems: one of them is a general problem solver, and the other one is some kind of a gatekeeper saying: “you are allowed to think about this, but not allowed to think about that?”. If yes, then who decides what data the gatekeeper is allowed to consider? Is gatekeeper the less smart part of the AI? Is the general-problem-solving part allowed to model the gatekeeper?
I wrote an example I erased, based on a possibly apocryphal anecdote by Richard Feynman I am recalling from memory, discussing the motivations for working on the Manhattan Project; the original reasons for starting on the project were to beat Germany to building an atomic bomb; after Germany was defeated, the original reason was outdated, but he (and others sharing his motivation) continued working anyways, solving the immediate problem rather than the one they originally intended to solve.
That’s an example of the logical system and the motivational system being in conflict, even if the anecdote doesn’t turn out to be very accurate. I hope it is suggestive of the distinction.
The motivational system -could- be a gatekeeper, but I suspect that would mean there are substantive issues in how the logical system is devised. It should function as an enabler—as the motive force behind all actions taken within the logical system. And yes, in a sense it should be less intelligent than the logical system; if it considers everything to the same extent the logical system does, it isn’t doing its job, it’s just duplicating the efforts of the logical system.
That is, I’m regarding an ideal motivational system as something that drives the logical system; the logical system shouldn’t be -trying- to trick its motivational system, in something the same way and for the same reason you shouldn’t try to convince yourself of a falsehood.
The issue in describing this is that I can think of plenty of motivational systems, but none which do what we want here. (Granted, if I could, the friendly AI problem might be substantively solved.) I can’t even say for certain that a gatekeeper motivator wouldn’t work.
Part of my mental model of this functional dichotomy, however, is that the logical system is stateless—if the motivational system asks it to evaluate its own solutions, it has to do so only with the information the motivational system gives it. The communication model has a very limited vocabulary. Rules for the system, but not rules for reasoning, are encoded into the motivational system, and govern its internal communications only. The logical system goes as far as it can with what it has, produces a set of candidate solutions and unresolved problems, and passes these back to the motivational system. Unresolved problems might be passed back with additional information necessary to resolve them, depending on the motivational system’s rules.
So in my model-of-my-model, an Asimov-syle AI might hand a problem to its logical system, get several candidate solutions back, and then pass those candidate solutions back into the logical system with the rules of robotics, one by one, asking if this action could violate each rule in turn, discarding any candidate solutions which do.
Manual motivational systems are also conceptually possible, although probably too slow to be of much use.
[My apologies if this response isn’t very good; I’m running short on time, and don’t have any more time for editing, and in particular for deciding which pieces to exclude.]