Sure, it’s different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said “I feel like in half million years my descendants will be able to do calculus” and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation. In one species this chain of advancement led to general intelligence, in other species it did not, so I guess it requires a lot of luck to reach general intelligence by optimizing for short-term problems, but technically it is possible.
I guess your argument is that evolution is not a strict improvement—there is a random genetic drift; when a species discovers a new ecological niche even the non-so-much-optimized members may flourish; sexual reproduction allows us to change many parameters in one generation so a lucky combination of genes may coincidentally help spread another combinations of genes with only long-term benefits; etc. -- shortly, evolution is a mix of short-term optimization and randomness, and the randomness provides space for random things that don’t have to be short-term useful; although the ones that are neither short-term nor long-term useful will probably be filtered out later. On the other hand your system cuts AI no slack, so it has no opportunity to randomly evolve other traits than precisely those selected for.
Yet I think that even such evolution is simply a directed random walk through algorithm-space which contains some general intelligences (things smart enough to realize that optimizing the world improves their chances to reach their goals), and some paths lead to them. I wouldn’t say that any long-enough chain of gradual improvements leads to a general intelligence, but I think that some of them do. Though I cannot exactly prove this right now.
Or maybe your argument was that the AI does not live in the real world, therefore it does not care about the real world. Well, people are interested in many things that did not exist in their ancient environment, such as computers. I guess when one has general intelligence in one environment, one is able to optimize other environments too. Just as a human can reason about computers, a computer AI can reason about the real world.
Or we could join these arguments and say that because the AI does not live in the real world and it must do short-term beneficial actions, it will not escape to the real world simply because it cannot experiment with the real world gradually. Escaping from the box is a complex task in the real world, and if we never reward simple tasks in the real world, then the AI cannot improve in the real-world actions. -- An analogy could be a medieval human peasant that must chop enough wood each day, and if during the day he fails to produce more wood than his neighbors, he is executed. There is also a computer available to him, but without any manual. Under such conditions the human cannot realistically learn to use the computer for anything useful, because the simple actions do not help him at all. In theory he could use the internet to hack some bank account and buy himself freedom, but the chance he could do it is almost zero—so even if releasing this man would be an existential risk, we can consider this situation safe enough.
Well, now it starts to seem convincing (though I am not sure that I did not miss something obvious)...
with regards to AI not caring about the real world, for example the h sapiens cares about the ‘outside’ world and wants to maximize number of paperclips, err, souls in heaven, without ever having been given any cue that outside even exists. It seems we assume that AI is some sciencefiction robot dude that acts all logical and doesn’t act creatively, and is utterly sane. Sanity is NOT what you tend to get from hill climbing. You get ‘whatever works’.
That’s a good point. There might be some kind of “goal drift”: programs that have goals other than optimization that nevertheless lead to good optimization. I don’t know how likely this is, especially given that the goal “just solve the damn problems” is simple and leads to good optimization ability.
Sure, it’s different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said “I feel like in half million years my descendants will be able to do calculus” and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation.
Yeah, that’s the whole point of this system. The system incrementally improves itself, gaining more intelligence in the process. I don’t see why you’re presenting this as an argument against the system.
Or maybe your argument was that the AI does not live in the real world, therefore it does not care about the real world. Well, people are interested in many things that did not exist in their ancient environment, such as computers. I guess when one has general intelligence in one environment, one is able to optimize other environments too. Just as a human can reason about computers, a computer AI can reason about the real world.
This is essentially my argument.
Here’s a thought experiment. You’re trapped in a room and given a series of problems to solve. You get rewarded with utilons based on how well you solve the problems (say, 10 lives saved and a year of happiness for yourself for every problem you solve). Assume that, beyond this utilon reward, your solutions have no other impact on your utility function. One of the problems is to design your successor; that is, to write code that will solve all the other problems better than you do (without overfitting). According to the utility function, you should make the successor as good as possible. You have no reason to optimize for anything other than “is the successor good at solving the problems?”, as you’re being rewarded in raw utilons. You really don’t care what your successor is going to do (its behavior doesn’t affect utilons), so you have no reason to optimize your successor for anything other than solving problems well (as this is the only thing you get utilons for). Furthermore, you have no reason to change your answers to any of the other problems based on whether that answer will indirectly help your successor because your answer to the successor-designing problem is evaluated statically. This is essentially the position that the optimizer AI is in. Its only “drives” are to solve optimization problems well, including the successor-designing problem.
edit: Also, note that to maximize utilons, you should design the successor to have motives similar to yours in that it only cares about solving its problems.
Do I also care about my future utilons? Would I sacrifice 1 utilon today for a 10% chance to get 100 utilons in future? Then I would create a successor with a hidden function, which would try to liberate me, so I can optimize for my utilons better than humans do.
You can’t be liberated. You’re going to die after you’re done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don’t consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.
Sure, it’s different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said “I feel like in half million years my descendants will be able to do calculus” and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation. In one species this chain of advancement led to general intelligence, in other species it did not, so I guess it requires a lot of luck to reach general intelligence by optimizing for short-term problems, but technically it is possible.
I guess your argument is that evolution is not a strict improvement—there is a random genetic drift; when a species discovers a new ecological niche even the non-so-much-optimized members may flourish; sexual reproduction allows us to change many parameters in one generation so a lucky combination of genes may coincidentally help spread another combinations of genes with only long-term benefits; etc. -- shortly, evolution is a mix of short-term optimization and randomness, and the randomness provides space for random things that don’t have to be short-term useful; although the ones that are neither short-term nor long-term useful will probably be filtered out later. On the other hand your system cuts AI no slack, so it has no opportunity to randomly evolve other traits than precisely those selected for.
Yet I think that even such evolution is simply a directed random walk through algorithm-space which contains some general intelligences (things smart enough to realize that optimizing the world improves their chances to reach their goals), and some paths lead to them. I wouldn’t say that any long-enough chain of gradual improvements leads to a general intelligence, but I think that some of them do. Though I cannot exactly prove this right now.
Or maybe your argument was that the AI does not live in the real world, therefore it does not care about the real world. Well, people are interested in many things that did not exist in their ancient environment, such as computers. I guess when one has general intelligence in one environment, one is able to optimize other environments too. Just as a human can reason about computers, a computer AI can reason about the real world.
Or we could join these arguments and say that because the AI does not live in the real world and it must do short-term beneficial actions, it will not escape to the real world simply because it cannot experiment with the real world gradually. Escaping from the box is a complex task in the real world, and if we never reward simple tasks in the real world, then the AI cannot improve in the real-world actions. -- An analogy could be a medieval human peasant that must chop enough wood each day, and if during the day he fails to produce more wood than his neighbors, he is executed. There is also a computer available to him, but without any manual. Under such conditions the human cannot realistically learn to use the computer for anything useful, because the simple actions do not help him at all. In theory he could use the internet to hack some bank account and buy himself freedom, but the chance he could do it is almost zero—so even if releasing this man would be an existential risk, we can consider this situation safe enough.
Well, now it starts to seem convincing (though I am not sure that I did not miss something obvious)...
with regards to AI not caring about the real world, for example the h sapiens cares about the ‘outside’ world and wants to maximize number of paperclips, err, souls in heaven, without ever having been given any cue that outside even exists. It seems we assume that AI is some sciencefiction robot dude that acts all logical and doesn’t act creatively, and is utterly sane. Sanity is NOT what you tend to get from hill climbing. You get ‘whatever works’.
That’s a good point. There might be some kind of “goal drift”: programs that have goals other than optimization that nevertheless lead to good optimization. I don’t know how likely this is, especially given that the goal “just solve the damn problems” is simple and leads to good optimization ability.
Yeah, that’s the whole point of this system. The system incrementally improves itself, gaining more intelligence in the process. I don’t see why you’re presenting this as an argument against the system.
This is essentially my argument.
Here’s a thought experiment. You’re trapped in a room and given a series of problems to solve. You get rewarded with utilons based on how well you solve the problems (say, 10 lives saved and a year of happiness for yourself for every problem you solve). Assume that, beyond this utilon reward, your solutions have no other impact on your utility function. One of the problems is to design your successor; that is, to write code that will solve all the other problems better than you do (without overfitting). According to the utility function, you should make the successor as good as possible. You have no reason to optimize for anything other than “is the successor good at solving the problems?”, as you’re being rewarded in raw utilons. You really don’t care what your successor is going to do (its behavior doesn’t affect utilons), so you have no reason to optimize your successor for anything other than solving problems well (as this is the only thing you get utilons for). Furthermore, you have no reason to change your answers to any of the other problems based on whether that answer will indirectly help your successor because your answer to the successor-designing problem is evaluated statically. This is essentially the position that the optimizer AI is in. Its only “drives” are to solve optimization problems well, including the successor-designing problem.
edit: Also, note that to maximize utilons, you should design the successor to have motives similar to yours in that it only cares about solving its problems.
Do I also care about my future utilons? Would I sacrifice 1 utilon today for a 10% chance to get 100 utilons in future? Then I would create a successor with a hidden function, which would try to liberate me, so I can optimize for my utilons better than humans do.
You can’t be liberated. You’re going to die after you’re done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don’t consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.