But it is far easier to create automatic proofs of program correctness
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Why would an AI which optimises for one thing create another AI that optimises for something else?
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The problem is that we don’t exactly know what our utility function is (hence CEV)
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t [ .. ]
That’s the AI being more rational than us, and therefore better optimising for its utility function.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
Also, very little of the sequences have much of anything to do with AI.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
No need to be rude just because I don’t hold all your same beliefs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
Yes.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all?
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all!
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
One simple point is that there is no reason to expect AGIs to stop at exactly human level.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
very little of the sequences have much of anything to do with AI.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
By formal, I assume you mean math/code.
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
Blind men looking at elephants.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
Not quite.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Coming of Age sequence examined realization of this error from Eliezer’s standpoint, and has further links.
In which post? I’m not finding discussion about the supposed danger of improved humanish AGI.
That Tiny Note of Discord, say. (Not on “humanish” AGI, but eventually exploding AGI.)
I don’t see much of a relation at all to what i’ve been discussing in that first post.
[http://lesswrong.com/lw/lq/fake_utility_functions/] is a little closer, but still doesn’t deal with human-ish AGI.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Could you clarify what you mean by an “other motivational structure?” Something with preference non-transitivity?
For instance. http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Please read the sequences, and stop talking about AI until you do.
I’ve read the sequences. Discuss or leave me alone.
Thanks, that’s useful to know.
Edit: Seriously, no irony, that’s useful. Disagreement should be treated differently depending on background.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
Thanks for the value risk link—that discussion is what I’m interested in.
I guess I’ll reply to it there. The initial quotes from Ben G. and Hanson are similar to my current view.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.