Since my knowledge of AI is for practical purposes zilch- for large numbers of hypothetical future AI, if perhaps not a full Friendly AI, wouldn’t it be a simple solution to program the AI to model a specified human individual, determine said individual’s desires, and implement them?
I know no programming whatsoever- I’m because I figure that the problem of Friendly AI going way off-key has no comparable analogue in this case because it involves different facts.
A hypothetical AI programmed to run a paperclip factory, as compared to one designed to fulfil the role LessWrong grants Friendly AI, would:
-Not need any recursive intelligence enhancement, and probably not need upgrading
-Be able to discard massive numbers of functions regarding a lot of understanding of both humans and other matters
Less functions means less to program, which means less chance of glitches or other errors. Without intelligence enhancement the odds of an unfriendly outcome are greatly reduced. Therefore, the odds of a paperclip factory AI becoming a threat to humanity is far smaller than a Friendly AI.
A hypothetical AI programmed to run a paperclip factory
That is not what is meant around here by “paperclip maximiser”. A true clippy does not run a factory; it transmutes the total mass of the solar system into paperclips, starting with humans and ending with the computer it runs on. (With the possible exception of some rockets containing enough computing power to repeat the process on other suns.) That is what it means to maximise something.
Right. Which is why you just proposed a solution which is, in itself, AI-complete; you have not in fact reduced the problem. This aside, which of the desires of the human do you intend to fulfil? I desire chocolate, I also desire not to get fat. Solve for the equilibrium.
The desires implied in the orders given- interpreting desires by likely meaning. I didn’t intend to reduce the problem in any way, but make the point (albeit poorly as it turned out) that the example used was far less of a risk than the much better example of an actual attempt at Friendly AI.
An entire Sequence exists precisely for the purpose of showing that “just write an AI that takes orders” is not sufficient as a solution to this problem. “Likely meaning” is not translatable into computer code at the present state of knowledge, and what’s more, it wouldn’t even be sufficient if it were. You’ve left out the implicit “likely intended constraints”. If I say “get some chocolate”, you understand that I mean “if possible, within the constraint of not using an immense amount of resources, provided no higher-priority project intervenes, without killing anyone or breaking any laws except ones that are contextually ok to break such as coming to a full, not rolling, stop at stop signs, and actually, if I’m on a diet maybe you ought to remind me of the fact and suggest a healthier snack, and even if I’m not on a diet but ought to be, then a gentle suggestion to this effect is appropriate in some but not all circumstances...” Getting all that implicit stuff into code is exactly the problem of Friendly AI. “Likely meaning” just doesn’t cover it, and even so we can’t even solve that problem.
I thought it was clear that:
A- For Friendly A.I, I meant modelling a human via a direct simulation of a human brain (or at least the relevant parts) idealised in such a way as to give the results we would want
B- I DID NOT INTEND TO REDUCE THE PROBLEM.
A: What is the difference between this, and just asking the human brain in the first place? The whole point of the problem is that humans do not, actually, know what we want in full generality. You might as well implement a chess computer by putting a human inside it and asking, at every ply, “Do you think this looks like a winning position?” If you could solve the problem that way you wouldn’t need an AI!
What I’m saying is a bit different from CEV- it would involve modelling only a single’s human’s preferences, and would involve modelling their brain only in the short term (which would be a lot easier). Human beings have at least reasonable judgement with things such as, say, a paperclip factory, to the point where human will calling the shots will have no consequences that are too severe.
Specifying that kind of thing (including specifying preference) is probably almost as hard as getting the AI’s motivations right in the first place.
Though Paul Christiano had some suggestions along those lines, which (in my opinion) needed uploads (human minds instantiated in a computer) to have a hope of working...
We should remember that we aren’t talking about true Friendly AI here, but AI in charge of lesser tasks such as, in the example, running a factory. There will be many things the AI doesn’t know because it doesn’t need to, including how to defend itself against being shut down (I see no logical reason why that would be necessary for running a paperclip factory). Combining that with the limits on intelligence necessary for such lesser tasks, and failure modes become far less likely.
Since my knowledge of AI is for practical purposes zilch- for large numbers of hypothetical future AI, if perhaps not a full Friendly AI, wouldn’t it be a simple solution to program the AI to model a specified human individual, determine said individual’s desires, and implement them?
Write the program.
I know no programming whatsoever- I’m because I figure that the problem of Friendly AI going way off-key has no comparable analogue in this case because it involves different facts.
Then what basis do you have for thinking that a particular programming task is simple?
A hypothetical AI programmed to run a paperclip factory, as compared to one designed to fulfil the role LessWrong grants Friendly AI, would: -Not need any recursive intelligence enhancement, and probably not need upgrading -Be able to discard massive numbers of functions regarding a lot of understanding of both humans and other matters
Less functions means less to program, which means less chance of glitches or other errors. Without intelligence enhancement the odds of an unfriendly outcome are greatly reduced. Therefore, the odds of a paperclip factory AI becoming a threat to humanity is far smaller than a Friendly AI.
That is not what is meant around here by “paperclip maximiser”. A true clippy does not run a factory; it transmutes the total mass of the solar system into paperclips, starting with humans and ending with the computer it runs on. (With the possible exception of some rockets containing enough computing power to repeat the process on other suns.) That is what it means to maximise something.
Right. Which is why you just proposed a solution which is, in itself, AI-complete; you have not in fact reduced the problem. This aside, which of the desires of the human do you intend to fulfil? I desire chocolate, I also desire not to get fat. Solve for the equilibrium.
The desires implied in the orders given- interpreting desires by likely meaning. I didn’t intend to reduce the problem in any way, but make the point (albeit poorly as it turned out) that the example used was far less of a risk than the much better example of an actual attempt at Friendly AI.
An entire Sequence exists precisely for the purpose of showing that “just write an AI that takes orders” is not sufficient as a solution to this problem. “Likely meaning” is not translatable into computer code at the present state of knowledge, and what’s more, it wouldn’t even be sufficient if it were. You’ve left out the implicit “likely intended constraints”. If I say “get some chocolate”, you understand that I mean “if possible, within the constraint of not using an immense amount of resources, provided no higher-priority project intervenes, without killing anyone or breaking any laws except ones that are contextually ok to break such as coming to a full, not rolling, stop at stop signs, and actually, if I’m on a diet maybe you ought to remind me of the fact and suggest a healthier snack, and even if I’m not on a diet but ought to be, then a gentle suggestion to this effect is appropriate in some but not all circumstances...” Getting all that implicit stuff into code is exactly the problem of Friendly AI. “Likely meaning” just doesn’t cover it, and even so we can’t even solve that problem.
I thought it was clear that: A- For Friendly A.I, I meant modelling a human via a direct simulation of a human brain (or at least the relevant parts) idealised in such a way as to give the results we would want B- I DID NOT INTEND TO REDUCE THE PROBLEM.
A: What is the difference between this, and just asking the human brain in the first place? The whole point of the problem is that humans do not, actually, know what we want in full generality. You might as well implement a chess computer by putting a human inside it and asking, at every ply, “Do you think this looks like a winning position?” If you could solve the problem that way you wouldn’t need an AI!
B: Then what was the point of your post?
Humans do not have an explicit desires, and there’s no clear way to figure out the implicit ones.
Not that that’s a bad idea. It’s basically the best idea anyone’s had. It’s just a lot harder to do than you make it sound.
They call it CEV here. Not a singe human, but many/all of them. Not what they want now, but what they would wanted, had they known it better.
I am skeptical that this could work.
What I’m saying is a bit different from CEV- it would involve modelling only a single’s human’s preferences, and would involve modelling their brain only in the short term (which would be a lot easier). Human beings have at least reasonable judgement with things such as, say, a paperclip factory, to the point where human will calling the shots will have no consequences that are too severe.
Specifying that kind of thing (including specifying preference) is probably almost as hard as getting the AI’s motivations right in the first place.
Though Paul Christiano had some suggestions along those lines, which (in my opinion) needed uploads (human minds instantiated in a computer) to have a hope of working...
Would a human be bound to “at least reasonable judgement” if given super intelligent ability?
We should remember that we aren’t talking about true Friendly AI here, but AI in charge of lesser tasks such as, in the example, running a factory. There will be many things the AI doesn’t know because it doesn’t need to, including how to defend itself against being shut down (I see no logical reason why that would be necessary for running a paperclip factory). Combining that with the limits on intelligence necessary for such lesser tasks, and failure modes become far less likely.
THat’s sort of similar to what I keep talking about w/ ‘obedient AI’.