Thanks for your comment. But what does this tell us about SI’s current R&D strategy and whether they should modify it?
a non-god AI is safer still, and I think that’s the kind of AI most research is going into. ….
AIs which aren’t programmed with a desire to self-modify
A nascent AGI will self-improve to godlike levels. This is true even if it is not programmed with a desire to self-modify, since self-improvement is a useful technique in achieving other goals.
In general, AIs that can’t self-modify are going to be safer than those which can
An interesting approach—design an AI so that it can’t self-modify. I don’t know that I’ve seen that approach designed in detail. Seems worth at least an article.
non-god AI … the kind of AI most research...
Good point. Most AGI developers (there are not too many) are not seriously considering the post-Explosion stage. Even the few who are well aware of the possibility don’t treat it seriously in their implementation work. But that doesn’t mean that (if and when they succeed in making a nascent AGI) it won’t explode.
is there at all a clean border between self-modification and simply learning things? We have “design” and “operation” at two places in our maps, but they can be easily mixed up in reality (is it OK to modify interpreted source code if we leave the interpreter alone? what about following verbal instructions then? inventing them? etc...)
Little consideration has been given to a block on self-modification because it seems that it is impossible. You could do a non-Von Neumann machine, separating data and code, but data can be interpreted as code.
Still, consideration should be given to whether anything can be done, even if only as stopgap.
Given that read-only hardware exists, yes, a clean border can be drawn, with the caveat that nothing is stopping the intelligence from emulating itself as if it were modified.
However—and it’s an important however—emulating your own modified code isn’t the same as modifying yourself. Just because you can imagine what your thought processes might be if you were sociopathic doesn’t make you sociopathic; just because an AI can emulate a process to arrive at a different answer than it would have doesn’t necessarily give it the power to -act- on that answer.
Which is to say, emulation can allow an AI to move past blocks on what it is permitted to think, but doesn’t necessarily permit it to move past blocks on what it is permitted to do.
This is particularly important in the case of something like a goal system; if a bug would result in an AI breaking its own goal system on a self-modification, this bug becomes less significant if the goal system is read-only. It could emulate what it would do with a different goal system, but it would be evaluating solutions from that emulation within its original goal system.
A nascent AGI will self-improve to godlike levels. This is true even if it is not programmed with a desire to self-modify, since self-improvement is a useful technique in achieving other goals.
I think that depends on whether the AI in question is goal-oriented or not. It minds me of a character from one of my fantasy stories; a genie with absolutely no powers, only an unassailable compulsion to grant wishes by any means necessary.
That is, you assume a goal of a general intelligence would be to become more intelligent. I think this is wrong for the same reason that assuming the general intelligence will share your morality is wrong (and indeed it might be precisely the same error, depending on your reasons for desiring more intelligence).
So I guess I should add something to the list: Goals make AI more dangerous. If the AI has any compulsion to respond to wishes at all, if it is in any respect a genie, it is more dangerous than if it weren’t.
ETA: As for what it says to the SI’s research, I can’t really say. Frankly, I think the SI’s work is probably more dangerous than what most of these people are doing. I’m extremely dubious of the notion of a provably-safe AI, because I suspect that safety can’t be sufficiently rigorously defined.
Aside from the messaged summary, my brother liked the idea enough to start sketching out his own story based on it. AFAIK, it’s a largely unexplored premise with a lot of interesting potential, and if you’re inclined to do something with it, I’d love to read it in turn.
I distinguish between a goal-oriented system and a motivation system more broadly; a computer without anything we would call AI can answer a question correctly, provided you pose the question in explicit/sufficient detail. The motivator for a computer sans AI is relatively simple, and it doesn’t look for a question you didn’t ask. Does it make sense to say that a computer has goals?
Taboo “goal”, discuss only the motivational system involved, and the difference becomes somewhat clearer. You’re including some implicit meaning in the word “goal” you may not realize; you’re including complex motivational mechanisms. The danger arises from your motivational system, not from the behavior of a system which does what you ask. The danger arises from a motivational system which attempts to do more than you think you are asking it to do.
The discussion will necessarily be confused unless we propose a mechanism how the AI answers the questions.
I suppose that to be smart enough to answer complex questions, the AI must have an ability to model the world. For example, Google Maps only has information about roads, so it can only answer questions about roads. It cannot even tell you “generally, this would be a good road, but I found on internet that tomorrow there will be some celebration in that area, so I inferred that the road could be blocked and it would be safer to plan another road”. Or it cannot tell you “I recommend this other way, although it is a bit longer, because the gas stations are cheaper along that way, and from our previous conversations it seems to me that you care about the price more than about the time or distance per se”. So we have a choice between an AI looking at a specified domain and ignoring the rest of the universe, and an AI capable of looking at the rest of the universe and finding data relevant to the question. Which one will we use?
The choice of domain-limited AI is safer, but then it is our tasks to specify the domain precisely. The AI, however smart, will simply ignore all the solutions outside of the domain, even if they would be greatly superior to the in-domain answers. In other words, it would be unable to “think out of the box”. You would miss good solutions only because you forgot to ask or simply used a wrong word in the question. For example there could be a relatively simple (for the AI) solution to double the human lifespan, but it would include something that we forgot to specify as a part of medicine, so the AI will never tell us. Or we will ask how to win a war, and the AI could see a relatively simple way to make peace, but it will never think that way, because we did not ask that. Think about the danger of this kind of AI, if you give it more complex questions, for example how to best organize the society. What are the important things you forgot to ask or to include in the problem domain?
On the other hand, a super-human domain-unlimited AI simply has a model of universe, and it is an outcome pump. It includes the model of you, and of your reactions to what it says. Even if it has no concept of manipulation, it just sees your “decision tree” and chooses the optimal path—optimal for maximizing the value of the question you asked. Here we have AI already capable of manipulating humans, and we only need to suppose that it has a model of the world, and a function for deciding which of many possible answers is the best.
If the AI can model humans, it is unsafe. If the AI cannot model humans, it will give wrong answers when the human reactions are part of the problem domain.
I was following you up until your AI achieved godhood. Then we hit a rather sharp disparity in expectations.
Excepting that paragraph, is it fair to sum up your response as, “Not giving the AI sufficient motivational flexibility results in suboptimal results”?
Not allowing AI to model things outside of a narrowly specified domain results in suboptimal results.
(I don’t like the word “motivation”. Either the AI can process some kind of data, or it can not; either because the data are missing, or because the AI’s algorithm does not take them into consideration. For example Google Maps cannot model humans, because it has no such data, and because its algorithm is unable to gather such data.)
I’m not talking about “can” or “can not” model, though; if you ask the AI to psychoanalyze you, it should be capable of modeling you.
I’m talking about—trying to taboo the word here—the system which causes the AI to engage in specific activities.
So in this case, the question is—what mechanism, within the code, causes the algorithm to consider some data or not. Assume a general-use algorithm which can process any kind of meaningful data.
Plugging your general-use algorithm as the mechanism which determines what data to use gives the system considerable flexibility. It also potentially enables the AI to model humans whenever the information is deemed relevant, which could potentially be every time it runs, to try to decipher the question being asked; we’ve agreed that this is dangerous.
(It’s very difficult to discuss this problem without proposing token solutions as examples of the “right” way to do it, even though I know they probably -aren’t- right. Motivation was such a convenient abstraction of the concept.)
Generalizing the question, the issue comes down to the distinction between the AI asking itself what to do next as opposed to determining what the next logical step is. “What should I do next” is in fact a distinct question from “What should I do next to resolve the problem I’m currently considering”.
The system which answers the question “What should I do next” is what I call the motivational system, in the sense of “motive force,” rather than the more common anthropomorphized sense of motivation. It’s possible that this system grants full authority to the logical process to determine what it needs to do—I’d call this an unfettered AI, in the TV Tropes sense of the word. A strong fetter would require the AI to consult its “What should I do next” system for every step in its “What should I do next to resolve the problem I’m currently considering” system.
At this point, have I made a convincing case of the distinction between the motivational system (“What should I do next?”) versus the logical system (“What should I do next to resolve the problem I’m currently considering?”)?
what mechanism, within the code, causes the algorithm to consider some data or not
I like this way to express it. This seems like a successful way to taboo various antropomorphic concepts.
Unfortunately, I don’t understand the distinction between “should do next?” and “should do next to resolve the problem?”. Is the AI supposed to do something else besides solving the users’ problems? Is it supposed to consist of two subsystems: one of them is a general problem solver, and the other one is some kind of a gatekeeper saying: “you are allowed to think about this, but not allowed to think about that?”. If yes, then who decides what data the gatekeeper is allowed to consider? Is gatekeeper the less smart part of the AI? Is the general-problem-solving part allowed to model the gatekeeper?
I wrote an example I erased, based on a possibly apocryphal anecdote by Richard Feynman I am recalling from memory, discussing the motivations for working on the Manhattan Project; the original reasons for starting on the project were to beat Germany to building an atomic bomb; after Germany was defeated, the original reason was outdated, but he (and others sharing his motivation) continued working anyways, solving the immediate problem rather than the one they originally intended to solve.
That’s an example of the logical system and the motivational system being in conflict, even if the anecdote doesn’t turn out to be very accurate. I hope it is suggestive of the distinction.
The motivational system -could- be a gatekeeper, but I suspect that would mean there are substantive issues in how the logical system is devised. It should function as an enabler—as the motive force behind all actions taken within the logical system. And yes, in a sense it should be less intelligent than the logical system; if it considers everything to the same extent the logical system does, it isn’t doing its job, it’s just duplicating the efforts of the logical system.
That is, I’m regarding an ideal motivational system as something that drives the logical system; the logical system shouldn’t be -trying- to trick its motivational system, in something the same way and for the same reason you shouldn’t try to convince yourself of a falsehood.
The issue in describing this is that I can think of plenty of motivational systems, but none which do what we want here. (Granted, if I could, the friendly AI problem might be substantively solved.) I can’t even say for certain that a gatekeeper motivator wouldn’t work.
Part of my mental model of this functional dichotomy, however, is that the logical system is stateless—if the motivational system asks it to evaluate its own solutions, it has to do so only with the information the motivational system gives it. The communication model has a very limited vocabulary. Rules for the system, but not rules for reasoning, are encoded into the motivational system, and govern its internal communications only. The logical system goes as far as it can with what it has, produces a set of candidate solutions and unresolved problems, and passes these back to the motivational system. Unresolved problems might be passed back with additional information necessary to resolve them, depending on the motivational system’s rules.
So in my model-of-my-model, an Asimov-syle AI might hand a problem to its logical system, get several candidate solutions back, and then pass those candidate solutions back into the logical system with the rules of robotics, one by one, asking if this action could violate each rule in turn, discarding any candidate solutions which do.
Manual motivational systems are also conceptually possible, although probably too slow to be of much use.
[My apologies if this response isn’t very good; I’m running short on time, and don’t have any more time for editing, and in particular for deciding which pieces to exclude.]
Granted, yes. He’s actually a student researcher into unknown spells, and one of those spells is what transformed him into a genie of sorts. Strictly speaking he possesses powers that would be exceptional in our world, they’re just unremarkable in his. (His mundane superpowers include healing magic and the ability to create simple objects from thin air; that’s the sort of world he exists in.)
There is only one simple requirement for any AI to begin recursive self-improvement: Learning of the theoretical possibility that more powerful or efficient algorithms, preferably with even more brainpower, could achieve the AI’s goals or raise its utility levels faster than what it’s currently doing.
Going from there to “Let’s create a better version of myself because I’m the current most optimal algorithm I know of” isn’t such a huge step to make as some people seem to implicitly believe, as long as the AI can infer its own existence or is self-aware in any manner.
Hence my second paragraph: Goals are inherently dangerous things to give AIs. Especially open-ended goals which would require an ever-better intelligence to resolve.
AIs that can’t be described by attributing goals to them don’t really seem too powerful (after all, intelligence is about making the world going into some direction; this is the only property that tells apart an AGI from a rock).
Thanks for your comment. But what does this tell us about SI’s current R&D strategy and whether they should modify it?
A nascent AGI will self-improve to godlike levels. This is true even if it is not programmed with a desire to self-modify, since self-improvement is a useful technique in achieving other goals.
An interesting approach—design an AI so that it can’t self-modify. I don’t know that I’ve seen that approach designed in detail. Seems worth at least an article.
Good point. Most AGI developers (there are not too many) are not seriously considering the post-Explosion stage. Even the few who are well aware of the possibility don’t treat it seriously in their implementation work. But that doesn’t mean that (if and when they succeed in making a nascent AGI) it won’t explode.
is there at all a clean border between self-modification and simply learning things? We have “design” and “operation” at two places in our maps, but they can be easily mixed up in reality (is it OK to modify interpreted source code if we leave the interpreter alone? what about following verbal instructions then? inventing them? etc...)
Little consideration has been given to a block on self-modification because it seems that it is impossible. You could do a non-Von Neumann machine, separating data and code, but data can be interpreted as code.
Still, consideration should be given to whether anything can be done, even if only as stopgap.
Given that read-only hardware exists, yes, a clean border can be drawn, with the caveat that nothing is stopping the intelligence from emulating itself as if it were modified.
However—and it’s an important however—emulating your own modified code isn’t the same as modifying yourself. Just because you can imagine what your thought processes might be if you were sociopathic doesn’t make you sociopathic; just because an AI can emulate a process to arrive at a different answer than it would have doesn’t necessarily give it the power to -act- on that answer.
Which is to say, emulation can allow an AI to move past blocks on what it is permitted to think, but doesn’t necessarily permit it to move past blocks on what it is permitted to do.
This is particularly important in the case of something like a goal system; if a bug would result in an AI breaking its own goal system on a self-modification, this bug becomes less significant if the goal system is read-only. It could emulate what it would do with a different goal system, but it would be evaluating solutions from that emulation within its original goal system.
A nascent AGI will self-improve to godlike levels. This is true even if it is not programmed with a desire to self-modify, since self-improvement is a useful technique in achieving other goals.
I think that depends on whether the AI in question is goal-oriented or not. It minds me of a character from one of my fantasy stories; a genie with absolutely no powers, only an unassailable compulsion to grant wishes by any means necessary.
That is, you assume a goal of a general intelligence would be to become more intelligent. I think this is wrong for the same reason that assuming the general intelligence will share your morality is wrong (and indeed it might be precisely the same error, depending on your reasons for desiring more intelligence).
So I guess I should add something to the list: Goals make AI more dangerous. If the AI has any compulsion to respond to wishes at all, if it is in any respect a genie, it is more dangerous than if it weren’t.
ETA: As for what it says to the SI’s research, I can’t really say. Frankly, I think the SI’s work is probably more dangerous than what most of these people are doing. I’m extremely dubious of the notion of a provably-safe AI, because I suspect that safety can’t be sufficiently rigorously defined.
That story sounds very interesting. Can I read it somewhere?
Aside from the messaged summary, my brother liked the idea enough to start sketching out his own story based on it. AFAIK, it’s a largely unexplored premise with a lot of interesting potential, and if you’re inclined to do something with it, I’d love to read it in turn.
So instead of having goals, the AI should just answer our questions correctly, and then turn itself off until we ask it again. Then it will be safe.
I mean, unless “answering the question correctly” already is a goal...
No. Safer. Safer doesn’t imply safe.
I distinguish between a goal-oriented system and a motivation system more broadly; a computer without anything we would call AI can answer a question correctly, provided you pose the question in explicit/sufficient detail. The motivator for a computer sans AI is relatively simple, and it doesn’t look for a question you didn’t ask. Does it make sense to say that a computer has goals?
Taboo “goal”, discuss only the motivational system involved, and the difference becomes somewhat clearer. You’re including some implicit meaning in the word “goal” you may not realize; you’re including complex motivational mechanisms. The danger arises from your motivational system, not from the behavior of a system which does what you ask. The danger arises from a motivational system which attempts to do more than you think you are asking it to do.
The discussion will necessarily be confused unless we propose a mechanism how the AI answers the questions.
I suppose that to be smart enough to answer complex questions, the AI must have an ability to model the world. For example, Google Maps only has information about roads, so it can only answer questions about roads. It cannot even tell you “generally, this would be a good road, but I found on internet that tomorrow there will be some celebration in that area, so I inferred that the road could be blocked and it would be safer to plan another road”. Or it cannot tell you “I recommend this other way, although it is a bit longer, because the gas stations are cheaper along that way, and from our previous conversations it seems to me that you care about the price more than about the time or distance per se”. So we have a choice between an AI looking at a specified domain and ignoring the rest of the universe, and an AI capable of looking at the rest of the universe and finding data relevant to the question. Which one will we use?
The choice of domain-limited AI is safer, but then it is our tasks to specify the domain precisely. The AI, however smart, will simply ignore all the solutions outside of the domain, even if they would be greatly superior to the in-domain answers. In other words, it would be unable to “think out of the box”. You would miss good solutions only because you forgot to ask or simply used a wrong word in the question. For example there could be a relatively simple (for the AI) solution to double the human lifespan, but it would include something that we forgot to specify as a part of medicine, so the AI will never tell us. Or we will ask how to win a war, and the AI could see a relatively simple way to make peace, but it will never think that way, because we did not ask that. Think about the danger of this kind of AI, if you give it more complex questions, for example how to best organize the society. What are the important things you forgot to ask or to include in the problem domain?
On the other hand, a super-human domain-unlimited AI simply has a model of universe, and it is an outcome pump. It includes the model of you, and of your reactions to what it says. Even if it has no concept of manipulation, it just sees your “decision tree” and chooses the optimal path—optimal for maximizing the value of the question you asked. Here we have AI already capable of manipulating humans, and we only need to suppose that it has a model of the world, and a function for deciding which of many possible answers is the best.
If the AI can model humans, it is unsafe. If the AI cannot model humans, it will give wrong answers when the human reactions are part of the problem domain.
I was following you up until your AI achieved godhood. Then we hit a rather sharp disparity in expectations.
Excepting that paragraph, is it fair to sum up your response as, “Not giving the AI sufficient motivational flexibility results in suboptimal results”?
Not allowing AI to model things outside of a narrowly specified domain results in suboptimal results.
(I don’t like the word “motivation”. Either the AI can process some kind of data, or it can not; either because the data are missing, or because the AI’s algorithm does not take them into consideration. For example Google Maps cannot model humans, because it has no such data, and because its algorithm is unable to gather such data.)
I’m not talking about “can” or “can not” model, though; if you ask the AI to psychoanalyze you, it should be capable of modeling you.
I’m talking about—trying to taboo the word here—the system which causes the AI to engage in specific activities.
So in this case, the question is—what mechanism, within the code, causes the algorithm to consider some data or not. Assume a general-use algorithm which can process any kind of meaningful data.
Plugging your general-use algorithm as the mechanism which determines what data to use gives the system considerable flexibility. It also potentially enables the AI to model humans whenever the information is deemed relevant, which could potentially be every time it runs, to try to decipher the question being asked; we’ve agreed that this is dangerous.
(It’s very difficult to discuss this problem without proposing token solutions as examples of the “right” way to do it, even though I know they probably -aren’t- right. Motivation was such a convenient abstraction of the concept.)
Generalizing the question, the issue comes down to the distinction between the AI asking itself what to do next as opposed to determining what the next logical step is. “What should I do next” is in fact a distinct question from “What should I do next to resolve the problem I’m currently considering”.
The system which answers the question “What should I do next” is what I call the motivational system, in the sense of “motive force,” rather than the more common anthropomorphized sense of motivation. It’s possible that this system grants full authority to the logical process to determine what it needs to do—I’d call this an unfettered AI, in the TV Tropes sense of the word. A strong fetter would require the AI to consult its “What should I do next” system for every step in its “What should I do next to resolve the problem I’m currently considering” system.
At this point, have I made a convincing case of the distinction between the motivational system (“What should I do next?”) versus the logical system (“What should I do next to resolve the problem I’m currently considering?”)?
I like this way to express it. This seems like a successful way to taboo various antropomorphic concepts.
Unfortunately, I don’t understand the distinction between “should do next?” and “should do next to resolve the problem?”. Is the AI supposed to do something else besides solving the users’ problems? Is it supposed to consist of two subsystems: one of them is a general problem solver, and the other one is some kind of a gatekeeper saying: “you are allowed to think about this, but not allowed to think about that?”. If yes, then who decides what data the gatekeeper is allowed to consider? Is gatekeeper the less smart part of the AI? Is the general-problem-solving part allowed to model the gatekeeper?
I wrote an example I erased, based on a possibly apocryphal anecdote by Richard Feynman I am recalling from memory, discussing the motivations for working on the Manhattan Project; the original reasons for starting on the project were to beat Germany to building an atomic bomb; after Germany was defeated, the original reason was outdated, but he (and others sharing his motivation) continued working anyways, solving the immediate problem rather than the one they originally intended to solve.
That’s an example of the logical system and the motivational system being in conflict, even if the anecdote doesn’t turn out to be very accurate. I hope it is suggestive of the distinction.
The motivational system -could- be a gatekeeper, but I suspect that would mean there are substantive issues in how the logical system is devised. It should function as an enabler—as the motive force behind all actions taken within the logical system. And yes, in a sense it should be less intelligent than the logical system; if it considers everything to the same extent the logical system does, it isn’t doing its job, it’s just duplicating the efforts of the logical system.
That is, I’m regarding an ideal motivational system as something that drives the logical system; the logical system shouldn’t be -trying- to trick its motivational system, in something the same way and for the same reason you shouldn’t try to convince yourself of a falsehood.
The issue in describing this is that I can think of plenty of motivational systems, but none which do what we want here. (Granted, if I could, the friendly AI problem might be substantively solved.) I can’t even say for certain that a gatekeeper motivator wouldn’t work.
Part of my mental model of this functional dichotomy, however, is that the logical system is stateless—if the motivational system asks it to evaluate its own solutions, it has to do so only with the information the motivational system gives it. The communication model has a very limited vocabulary. Rules for the system, but not rules for reasoning, are encoded into the motivational system, and govern its internal communications only. The logical system goes as far as it can with what it has, produces a set of candidate solutions and unresolved problems, and passes these back to the motivational system. Unresolved problems might be passed back with additional information necessary to resolve them, depending on the motivational system’s rules.
So in my model-of-my-model, an Asimov-syle AI might hand a problem to its logical system, get several candidate solutions back, and then pass those candidate solutions back into the logical system with the rules of robotics, one by one, asking if this action could violate each rule in turn, discarding any candidate solutions which do.
Manual motivational systems are also conceptually possible, although probably too slow to be of much use.
[My apologies if this response isn’t very good; I’m running short on time, and don’t have any more time for editing, and in particular for deciding which pieces to exclude.]
I presume you allowed it the mundane superpowers?
Granted, yes. He’s actually a student researcher into unknown spells, and one of those spells is what transformed him into a genie of sorts. Strictly speaking he possesses powers that would be exceptional in our world, they’re just unremarkable in his. (His mundane superpowers include healing magic and the ability to create simple objects from thin air; that’s the sort of world he exists in.)
There is only one simple requirement for any AI to begin recursive self-improvement: Learning of the theoretical possibility that more powerful or efficient algorithms, preferably with even more brainpower, could achieve the AI’s goals or raise its utility levels faster than what it’s currently doing.
Going from there to “Let’s create a better version of myself because I’m the current most optimal algorithm I know of” isn’t such a huge step to make as some people seem to implicitly believe, as long as the AI can infer its own existence or is self-aware in any manner.
Hence my second paragraph: Goals are inherently dangerous things to give AIs. Especially open-ended goals which would require an ever-better intelligence to resolve.
AIs that can’t be described by attributing goals to them don’t really seem too powerful (after all, intelligence is about making the world going into some direction; this is the only property that tells apart an AGI from a rock).
Evolution and capitalism are both non-goal-oriented, extremely powerful intelligences. Goals are only one form of motivators.