I reject the notion that one can factorize intelligence from goals
Human intelligence has been successfully applied to achieve many goals which were not applicable or trainable in the environment of anscestral adaption such as designing, building, and operating cars, playing chess, sending people to the moon and back, and programming computers. It is clear the goals and intelligence can be factorized as a matter of simple observation.
With humans being such imperfect consequentalists, there is not always a clear distinction between instrumental and terminal goals. Much that we consider fun to do also furthers higher goals.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
Sure. But the central question is “are higher level goals arbitrary?”, and while “Well, the subgoals we use to further those higher level goals are arbitrary along a few dimensions” is a start to answering that question, it is far from an end.
But the central question is “are higher level goals arbitrary?”
Wrong. The central question is “Can arbitrary goals be successfully plugged into a general goal achieving system?”, and I have shown examples where it can.
Perhaps one could give it a compulsion to optimize for paperclips, but I’d expect it to either put the compulsion on hold while it develops amazing fabrication, mining and space travel technologies, and never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future; or better yet, rapidly expunge the compulsion through self-modification.
As far as I can tell, that’s what you’re discussing, and so it sounds like you agree with him. Did I misread disagreement into this comment, or what am I missing here?
The section you quote allows for the possibility that an AI could be given a “compulsion” to optimize for paperclips, which it would eventually shrug off, whereas I am confident that an AI could be given a utility function that would make it actually optimize for paperclips.
Okay; but the examples you gave seem to me to be more similar to compulsions than to utility functions. A person can care a lot about cars, and cars can become a major part of human society, but they’re not the point of human society- if they stop serving their purposes they’ll go the way of the horse and buggy. I’m not sure I can express the meaning I’m trying to convey cleanly using that terminology, so maybe I ought to restart.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
And so you might get a vibrant interstellar civilization of synthetic intelligences- that happens to worship paperclips, and uses them for currency and religious purposes- rather than a dead world with nothing but peculiarly bent metal.
but the examples you gave seem to me to be more similar to compulsions than to utility functions
I would have liked to use examples of plugging in clearly terminal values to a general goal achieving system. But the only current or historical general goal achieving systems are humans, and it is notoriously difficult to figure out what humans’ terminal values are.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
It sounds like we agree: paperclips could be a genuine terminal value for AGIs, but a dead future doesn’t seem all that likely from AGIs (though it might be likely from AIs in general).
a dead future doesn’t seem all that likely from AGIs
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
Oh, I agree that humans die in such a scenario, but I don’t think the ‘tile the universe’ part counts as “dead” if the AGI has AI drives.
Human intelligence has been successfully applied to achieve many goals which were not applicable or trainable in the environment of anscestral adaption such as designing, building, and operating cars, playing chess, sending people to the moon and back, and programming computers. It is clear the goals and intelligence can be factorized as a matter of simple observation.
Are those goals instrumental, or terminal?
With humans being such imperfect consequentalists, there is not always a clear distinction between instrumental and terminal goals. Much that we consider fun to do also furthers higher goals.
But even if you assume all the goals are instrumental, my point still stands. That a goal was adopted because it furthers a higher level goal doesn’t change the fact that it could be successfully plugged into human intelligence.
Sure. But the central question is “are higher level goals arbitrary?”, and while “Well, the subgoals we use to further those higher level goals are arbitrary along a few dimensions” is a start to answering that question, it is far from an end.
Wrong. The central question is “Can arbitrary goals be successfully plugged into a general goal achieving system?”, and I have shown examples where it can.
As far as I can tell, that’s what you’re discussing, and so it sounds like you agree with him. Did I misread disagreement into this comment, or what am I missing here?
The section you quote allows for the possibility that an AI could be given a “compulsion” to optimize for paperclips, which it would eventually shrug off, whereas I am confident that an AI could be given a utility function that would make it actually optimize for paperclips.
Okay; but the examples you gave seem to me to be more similar to compulsions than to utility functions. A person can care a lot about cars, and cars can become a major part of human society, but they’re not the point of human society- if they stop serving their purposes they’ll go the way of the horse and buggy. I’m not sure I can express the meaning I’m trying to convey cleanly using that terminology, so maybe I ought to restart.
My model of davidad’s view is that part of general intelligence, as opposed to narrow intelligence, is varied and complex goals. We could make a narrow AI which only cared about the number of paperclips in the universe, but in order to make an intelligence that’s general we need to make it also care about the future, planning, existential risk, and so on.
And so you might get a vibrant interstellar civilization of synthetic intelligences- that happens to worship paperclips, and uses them for currency and religious purposes- rather than a dead world with nothing but peculiarly bent metal.
I would have liked to use examples of plugging in clearly terminal values to a general goal achieving system. But the only current or historical general goal achieving systems are humans, and it is notoriously difficult to figure out what humans’ terminal values are.
I am not claiming that you could give an AGI an arbitrary goal system that suppresses the “Basic AI Drives”, but that those drives will be effective instrumental values, not lost purposes, and while a paperclip maximizing AGI will have sub goals such as controlling resources and improving its ability to predict the future, the achievement of those goals will help it to actually produce paperclips.
It sounds like we agree: paperclips could be a genuine terminal value for AGIs, but a dead future doesn’t seem all that likely from AGIs (though it might be likely from AIs in general).
What? A paperclip AGI with first mover advantage would self-improve beyond the point where cooperating with humans has any instrumental value, become a singleton, and tile the universe with paperclips.
Oh, I agree that humans die in such a scenario, but I don’t think the ‘tile the universe’ part counts as “dead” if the AGI has AI drives.