You may be over-fitting there. The FAI could let people decide what they want when it comes to food and attractiveness. Actually it better would, or i’d be having some serious regrets about this FAI.
Well, and the uFAI needs to know what “paperclips or something” means (or a real world goal at all). Obstacle faced by all contestants in the race. We humans learn what is other people and what isn’t. (Or have evolved it, doesn’t matter)
If you get paperclips slightly wrong, you get something equally bad (staples is the usual example, but the point is that any slight difference is about equally bad), but if you get FAI slightly wrong, you don’t get something equally good. This breaks the symmetry.
I think if you get paperclips slightly wrong, you get a crash of some kind. If I get a ray-tracer slightly wrong, it doesn’t trace electrons instead of photons.
edit: To clarify. It’s about definition of person vs definition of paperclip. You need a very broad definition of person for FAI, so that it won’t misidentify a person as non-person (misidentifying dolphins as persons won’t be a big problem), and you need a very narrow definition of paperclip for uFAI, so that a person holding two papers together is not a paperclip. It’s not always intuitive how broad definitions compare to narrow in difficulty, but it is worth noting that it is ridiculously hard to define paperclip making so that a Soviet factory anxious to maximize the paperclips would make anything at all, while it wasn’t particularly difficult to define what a person is (or to define what ‘money’ are so that capitalist paperclip factory would make paperclips to maximize profit).
I don’t think “paperclip maximizer” is taken as a complete declarative specification of what a paperclip maximizer is, let alone what it understands itself to be.
I imagine the setup is something like this. An AI has been created by some unspecified (and irrelevant) process and is now doing things to its (and our) immediate environment. We look at the things it has done and anthropomorphize it, saying “it’s trying to maximize the quantity of paperclips in the universe”. Obviously, almost every word in that description is problematic.
But the point is that the AI doesn’t need to know what “paperclips or something” means. We’re the ones who notice that the world is much more filled with paperclips after the AI got switched on.
This scenario is invariant under replacing “paperclips” with some arbitrary “X”, I guess under the restriction that X is roughly at the scale (temporal, spatial, conceptual) of human experience. Picking paperclips, I assume, is just a rhetorical choice.
Well, I agree. That goes also for the what ever process determines something to be person. The difference is that the FAI doesn’t have to create persons; it’s definition doesn’t need to process correctly things from the enormous space of possible things that can be or not be persons. It can have very broad definition that will include dolphins, and it will still be OK.
The intelligence, to some extent, is self defeating when finding a way to make something real; the easiest Y that is inside set X should be picked, by design, as instrumental to making more of some kind of X.
I.e. you define X to be something to hold papers together, the AI thinks and thinks and sees that a single atom, under some circumstances common in the universe (very far away in space), can hold the papers together; it finds the Kasimir effect which makes a vacuum able to hold two conductive papers together; and so on. The X has to be resistant against such brute forcing for the optimum solution.
Whenever the AI can come up with some real world manufacturing goal that it can’t defeat in such a fashion, well, that’s open to debate. Incomputable things seem hard to defeat.
edit: Actually. Would you consider a case of a fairly stupid nano-manufacturing AI destroying us, and itself, with gray goo, an unfriendly AI? That seems to be a particularly simple failure mode for self improving system, FAI or UFAI, under bounded computational power.And a failure mode for likely non-general AIs, as we are likely to employ such AIs to work on biotechnology and nanotechnology.
It doesn’t sound like you are agreeing with me. I didn’t make any assumptions about what the AI wants or whether its instrumental goals can be isolated. All I supposed was that the AI was doing something. I particularly didn’t assume that the AI is at all concerned with what we think it is maximizing, namely, X.
As for the grey goo scenario, I think that an AI that caused the destruction of humanity not being called unfriendly would indicate a incorrect definition of at least one of “AI”, “humanity”, or “unfriendly” (“caused” too, I guess).
All I supposed was that the AI was doing something.
Can you be more specific? I have an AI that’s iterating parameters to some strange attractor—defined within it—until it finds unusual behaviour. I can make the AI that would hillclimb+search for the improvements to the former AI. edit: Now, the worst thing that can happen, it makes mind hack image that kills everyone who looks at it. That wasn’t the intent, but the ‘unusual behaviour’ might get too unusual for human brain to handle. Is that a serious risk? No it’s a laughable one.
Implicit in my setup was that the AI reached the point where it was having noticeable macroscopic effects on our world. This is obviously easiest when the AI’s substrate has some built-in capacity for input/output. If we’re being really generous, it might have an autonomous body, cameras, an internet connection, etc. If we’re being stingy, it might just be an isolated process running on a computer with its inputs limited to checking the wall-clock time and outputs limited to whatever physical effects it has on the CPU running it. In the latter case, doing something to the external world may be very difficult but not impossible.
The program you have doing local search in your example doesn’t sound like an AI; even if you stuck it in the autonomous body, it wouldn’t do anything to the world that’s not a generic side-effect of its running. No one would describe it as maximizing anything.
Well, it is maximizing what ever I defined for it to maximize, usefully for me, and in a way that is practical. In any case, you said, “All I supposed was that the AI was doing something.” . My AI is doing something.
This is obviously easiest when the AI’s substrate has some built-in capacity for input/output. If we’re being really generous, it might have an autonomous body, cameras, an internet connection, etc.
Yea, and it’s rolling forward and clamping it’s manipulators until they wear out. Clearly you want it to maximize something in the real world, not just do something. The issue is that the only things it can do approximately this way is shooting at colour blue or the like.
Everything else requires very detailed model, and maximization of something in the model, followed by carrying out of the actions in the real world, which, interestingly, is entirely optional, and which even humans have trouble getting themselves to do (when I invent something and to my satisfaction am sure that it will work, it is boring to implement, and it is a common problem). Edit: and one other point, without model all you can do is try random stuff on the world itself, which is not at all intelligent (and resembles the Wheatley in portal 2 trying to crack the code).
Sorry, I don’t understand what exactly you are proposing. A utility function is a function from states of the universe to real numbers. If the function contains a term like “let people decide”, it should also define “people”, which seems to require a lot of complexity.
Or are you coming at this from some other perspective, like assigning utilities to possible actions rather than world states? That’s a type error and also very likely to be Bayesian-irrational.
You may be over-fitting there. The FAI could let people decide what they want when it comes to food and attractiveness. Actually it better would, or i’d be having some serious regrets about this FAI.
That’s reasonable, but to let people decide, the FAI needs to recognize people, which also seems to require complexity...
If your biggest problem is on the order of recognizing people, the problem of FAI becomes much, much easier.
Well, and the uFAI needs to know what “paperclips or something” means (or a real world goal at all). Obstacle faced by all contestants in the race. We humans learn what is other people and what isn’t. (Or have evolved it, doesn’t matter)
If you get paperclips slightly wrong, you get something equally bad (staples is the usual example, but the point is that any slight difference is about equally bad), but if you get FAI slightly wrong, you don’t get something equally good. This breaks the symmetry.
I think if you get paperclips slightly wrong, you get a crash of some kind. If I get a ray-tracer slightly wrong, it doesn’t trace electrons instead of photons.
edit: To clarify. It’s about definition of person vs definition of paperclip. You need a very broad definition of person for FAI, so that it won’t misidentify a person as non-person (misidentifying dolphins as persons won’t be a big problem), and you need a very narrow definition of paperclip for uFAI, so that a person holding two papers together is not a paperclip. It’s not always intuitive how broad definitions compare to narrow in difficulty, but it is worth noting that it is ridiculously hard to define paperclip making so that a Soviet factory anxious to maximize the paperclips would make anything at all, while it wasn’t particularly difficult to define what a person is (or to define what ‘money’ are so that capitalist paperclip factory would make paperclips to maximize profit).
I agree that paperclips could also turn out to be pretty complex.
I don’t think “paperclip maximizer” is taken as a complete declarative specification of what a paperclip maximizer is, let alone what it understands itself to be.
I imagine the setup is something like this. An AI has been created by some unspecified (and irrelevant) process and is now doing things to its (and our) immediate environment. We look at the things it has done and anthropomorphize it, saying “it’s trying to maximize the quantity of paperclips in the universe”. Obviously, almost every word in that description is problematic.
But the point is that the AI doesn’t need to know what “paperclips or something” means. We’re the ones who notice that the world is much more filled with paperclips after the AI got switched on.
This scenario is invariant under replacing “paperclips” with some arbitrary “X”, I guess under the restriction that X is roughly at the scale (temporal, spatial, conceptual) of human experience. Picking paperclips, I assume, is just a rhetorical choice.
Well, I agree. That goes also for the what ever process determines something to be person. The difference is that the FAI doesn’t have to create persons; it’s definition doesn’t need to process correctly things from the enormous space of possible things that can be or not be persons. It can have very broad definition that will include dolphins, and it will still be OK.
The intelligence, to some extent, is self defeating when finding a way to make something real; the easiest Y that is inside set X should be picked, by design, as instrumental to making more of some kind of X.
I.e. you define X to be something to hold papers together, the AI thinks and thinks and sees that a single atom, under some circumstances common in the universe (very far away in space), can hold the papers together; it finds the Kasimir effect which makes a vacuum able to hold two conductive papers together; and so on. The X has to be resistant against such brute forcing for the optimum solution.
Whenever the AI can come up with some real world manufacturing goal that it can’t defeat in such a fashion, well, that’s open to debate. Incomputable things seem hard to defeat.
edit: Actually. Would you consider a case of a fairly stupid nano-manufacturing AI destroying us, and itself, with gray goo, an unfriendly AI? That seems to be a particularly simple failure mode for self improving system, FAI or UFAI, under bounded computational power.And a failure mode for likely non-general AIs, as we are likely to employ such AIs to work on biotechnology and nanotechnology.
It doesn’t sound like you are agreeing with me. I didn’t make any assumptions about what the AI wants or whether its instrumental goals can be isolated. All I supposed was that the AI was doing something. I particularly didn’t assume that the AI is at all concerned with what we think it is maximizing, namely, X.
As for the grey goo scenario, I think that an AI that caused the destruction of humanity not being called unfriendly would indicate a incorrect definition of at least one of “AI”, “humanity”, or “unfriendly” (“caused” too, I guess).
Can you be more specific? I have an AI that’s iterating parameters to some strange attractor—defined within it—until it finds unusual behaviour. I can make the AI that would hillclimb+search for the improvements to the former AI. edit: Now, the worst thing that can happen, it makes mind hack image that kills everyone who looks at it. That wasn’t the intent, but the ‘unusual behaviour’ might get too unusual for human brain to handle. Is that a serious risk? No it’s a laughable one.
Implicit in my setup was that the AI reached the point where it was having noticeable macroscopic effects on our world. This is obviously easiest when the AI’s substrate has some built-in capacity for input/output. If we’re being really generous, it might have an autonomous body, cameras, an internet connection, etc. If we’re being stingy, it might just be an isolated process running on a computer with its inputs limited to checking the wall-clock time and outputs limited to whatever physical effects it has on the CPU running it. In the latter case, doing something to the external world may be very difficult but not impossible.
The program you have doing local search in your example doesn’t sound like an AI; even if you stuck it in the autonomous body, it wouldn’t do anything to the world that’s not a generic side-effect of its running. No one would describe it as maximizing anything.
Well, it is maximizing what ever I defined for it to maximize, usefully for me, and in a way that is practical. In any case, you said, “All I supposed was that the AI was doing something.” . My AI is doing something.
Yea, and it’s rolling forward and clamping it’s manipulators until they wear out. Clearly you want it to maximize something in the real world, not just do something. The issue is that the only things it can do approximately this way is shooting at colour blue or the like.
Everything else requires very detailed model, and maximization of something in the model, followed by carrying out of the actions in the real world, which, interestingly, is entirely optional, and which even humans have trouble getting themselves to do (when I invent something and to my satisfaction am sure that it will work, it is boring to implement, and it is a common problem). Edit: and one other point, without model all you can do is try random stuff on the world itself, which is not at all intelligent (and resembles the Wheatley in portal 2 trying to crack the code).
...or perhaps “destruction”.
Sorry, I don’t understand what exactly you are proposing. A utility function is a function from states of the universe to real numbers. If the function contains a term like “let people decide”, it should also define “people”, which seems to require a lot of complexity.
Or are you coming at this from some other perspective, like assigning utilities to possible actions rather than world states? That’s a type error and also very likely to be Bayesian-irrational.