An AI will not be pulled at random from mind design space.
I don’t think anyone’s ever disputed this. (However, that’s not very useful if the deterministic process resulting in the SI is too complex for humans to distinguish it in advance from the outcome of a random walk.)
An AI will be the result of a research and development process. A new generation of AIs will need to be better than other products at “Understand What Humans Mean” and “Do What Humans Mean”, in order to survive the research phase and subsequent market pressure.
Agreed. But by default, a machine that is better than other rival machines at satisfying our short-term desires will not satisfy our long-term desires. The concern isn’t that we’ll suddenly start building AIs with the express purpose of hitting humans in the face with mallets. The concern is that we’ll code for short-term rather than long-term goals, due to a mixture of disinterest in Friendliness and incompetence at Friendliness. But if intelligence explosion occurs, ‘the long run’ will arrive very suddenly, and very soon. So we need to adjust our research priorities to more seriously assess and modulate the long-term consequences of our technology.
An AI that was prone to take unbounded actions given any terminal goal would either be fixed or abandoned during the early stages of research.
That may be a reason to think that recursively self-improving AGI won’t occur. But it’s not a reason to expect such AGI, if it occurs, to be Friendly.
If early stages showed that inputs such as the natural language query would yield results such as
The seed is not the superintelligence. We shouldn’t expect the seed to automatically know whether the superintelligence will be Friendly, any more than we should expect humans to automatically know whether the superintelligence will be Friendly.
Making an AI that does not exhibit these drives in an unbounded manner is probably a prerequisite to get an AI to work at all (there are not enough resources to think about being obstructed by simulator gods etc.)
I’m not following. Why does an AGI have to have a halting condition (specifically, one that actually occurs at some point) in order to be able to productively rewrite its own source code?
An AI from point 4 will only ever do what it has been explicitly programmed to do.
You don’t seem to be internalizing my arguments. This is just the restatement of a claim I pointed out was not just wrong but dishonestly statedhere.
That any terminal goal can be realized in an infinite number of ways implies an infinite number of instrumental goals to choose from.
Sure, but the list of instrumental goals overlap more than the list of terminal goals, because energy from one project can be converted to energy for a different project. This is an empirical discovery about our world; we could have found ourselves in the sort of universe where instrumental goals don’t converge that much, e.g., because once energy’s been locked down into organisms or computer chips you just Can’t convert it into useful work for anything else. In a world where we couldn’t interfere with the AI’s alien goals, nor could our component parts and resources be harvested to build very different structures, nor could we be modified to work for the AI, the UFAI would just ignore us and zip off into space to try and find more useful objects. We don’t live in that world because complicated things can be broken down into simpler things at a net gain in our world, and humans value a specific set of complicated things.
‘These two sets are both infinite’ does not imply ‘we can’t reason about these two things’ relative size, or how often the same elements recur in their elements’.
I am not yet at a point of my education where I can say with confidence that this is the wrong way to think, but I do believe it is.
If someone walked up to you and told you about a risk only he can solve, and that you should therefore give this person money, would you give him money because you do not see any specific reason for why he could be wrong? Personally I would perceive the burden of proof to be on him to show me that the risk is real.
You’ve spent an awful lot of time writing about the varied ways in which you’ve not yet been convinced by claims you haven’t put much time into actively investigating. Maybe some of that time could be better spent researching these topics you keep writing about? I’m not saying to stop talking about this, but there’s plenty of material on a lot of these issues to be found. Have you read Intelligence Explosion Microeconomics?
succeeding at the implementation of “value to maximize intelligence” in conjunction with “by all means”.
As a rule, adding halting conditions adds complexity to an algorithm, rather than removing complexity.
Saying that a system values to become more intelligent then just means that a system values to increase its ability to achieve its goals.
No, this is a serious misunderstanding. Yudkowsky’s definition of ‘intelligence’ is about the ability to achieve goals in general, not about the ability to achieve the system’s goals. That’s why you can’t increase a system’s intelligence by lowering its standards, i.e., making its preferences easier to satisfy.
what you suggest is that humans will want to, and will succeed to, implement an AI that in order to beat humans at Tic-tac-toe is first going to take over the universe and make itself capable of building such things as Dyson spheres.
Straw-man; no one has claimed that humans are likely to want to create an UFAI. What we’ve suggested is that humans are likely to want to create an algorithm, X, that will turn out to be a UFAI. (In other words, the fallacy you’re committing is confusing intension with extension.)
That aside: Are you saying Dyson spheres wouldn’t be useful for beating more humans at more tic-tac-toe games? Seems like a pretty good way to win at tic-tac-toe to me.
Yudkowsky’s definition of ‘intelligence’ is about the ability to achieve goals in general, not about the ability to achieve the system’s goals. That’s why you can’t increase a system’s intelligence by lowering its standards, i.e., making its preferences easier to satisfy.
Actually I do define intelligence as ability to hit a narrow outcome target relative to your own goals, but if your goals are very relaxed then the volume of outcome space with equal or greater utility will be very large. However one would expect that many of the processes involved in hitting a narrow target in outcome space (such that few other outcomes are rated equal or greater in the agent’s preference ordering), such as building a good epistemic model or running on a fast computer, would generalize across many utility functions; this is why we can speak of properties apt to intelligence apart from particular utility functions.
Actually I do define intelligence as ability to hit a narrow outcome target relative to your own goals
Hmm. But this just sounds like optimization power to me. You’ve defined intelligence in the past as “efficient cross-domain optimization”. The “cross-domain” part I’ve taken to mean that you’re able to hit narrow targets in general, not just ones you happen to like. So you can become more intelligent by being better at hitting targets you hate, or by being better at hitting targets you like.
The former are harder to test, but something you’d hate doing now could become instrumentally useful to know how to do later. And your intelligence level doesn’t change when the circumstance shifts which part of your skillset is instrumentally useful. For that matter, I’m missing why it’s useful to think that your intelligence level could drastically shift if your abilities remained constant but your terminal values were shifted. (E.g., if you became pickier.)
No, “cross-domain” means that I can optimize across instrumental domains. Like, I can figure out how to go through water, air, or space if that’s the fastest way to my destination, I am not limited to land like a ground sloth.
Measured intelligence shouldn’t shift if you become pickier—if you could previously hit a point such that only 1/1000th of the space was more preferred than it, we’d still expect you to hit around that narrow a volume of the space given your intelligence even if you claimed afterward that a point like that only corresponded to 0.25 utility on your 0-1 scale instead of 0.75 utility due to being pickier ([expected] utilities sloping more sharply downward with increasing distance from the optimum).
But by default, a machine that is better than other rival machines at satisfying our short-term desires will not satisfy our long-term desires.
You might be not aware of this but I wrote a sequence of short blog posts where I tried to think of concrete scenarios that could lead to human extinction. Each of which raised many questions.
What might seem to appear completely obvious to you for reasons that I do not understand, e.g. that an AI can take over the world, appears to me largely like magic (I am not trying to be rude, by magic I only mean that I don’t understand the details). At the very least there are a lot of open questions. Even given that for the sake of the above posts I accepted that the AI is superhuman and can do such things as deceive humans by its superior knowledge of human psychology. Which seems to be non-trivial assumption, to say the least.
That may be a reason to think that recursively self-improving AGI won’t occur. But it’s not a reason to expect such AGI, if it occurs, to be Friendly.
Over and over I told you that given all your assumptions, I agree that AGI is an existential risk.
The seed is not the superintelligence. We shouldn’t expect the seed to automatically know whether the superintelligence will be Friendly, any more than we should expect humans to automatically know whether the superintelligence will be Friendly.
You did not reply to my argument. My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness. My argument did not pertain the possibility of a friendly seed turning unfriendly.
Why does an AGI have to have a halting condition (specifically, one that actually occurs at some point) in order to be able to productively rewrite its own source code?
What I have been arguing is that an AI should not be expected, by default, to want to eliminate all possible obstructions. There are many graduations here. That, by some economic or otherwise theoretic argument, it might be instrumentally rational for some ideal AI to take over the world, does not mean that humans would create such an AI, or that an AI could not be limited to care about fires in its server farm rather than that Russia might nuke the U.S. and thereby destroy its servers.
You don’t seem to be internalizing my arguments.
Did you mean to reply to another point? I don’t see how the reply you linked to is relevant to what I wrote.
Sure, but the list of instrumental goals overlap more than the list of terminal goals, because energy from one project can be converted to energy for a different project.
My argument is that an AI does not need to consider all possible threats and care to acquire all possible resources. Based on its design it could just want to optimize using its initial resources while only considering mundane threats. I just don’t see real-world AIs to conclude that they need to take over the world. I don’t think an AI is likely going to be designed that way. I also don’t think such an AI could work, because such inferences would require enormous amounts of resources.
You’ve spent an awful lot of time writing about the varied ways in which you’ve not yet been convinced by claims you haven’t put much time into actively investigating. Maybe some of that time could be better spent researching these topics you keep writing about?
I have done what is possible given my current level of education and what I perceive to be useful. I have e.g. asked experts about their opinion.
A few general remarks about the kind of papers such as the one that you linked to.
How much should I update towards MIRI’s position if I (1) understood the arguments in the paper (2) found the arguments convincing?
My answer is the following. If the paper was about the abc conjecture, the P versus NP problem, climate change, or even such mundane topics as psychology, I would either not be able to understand the paper, would be unable to verify the claims, or would have very little confidence in my judgement.
So what about ‘Intelligence Explosion Microeconomics’? That I can read most of it is only due to the fact that it is very informally written. The topic itself is more difficult and complex than all of the above mentioned problems together. Yet the arguments in support of it, to exaggerate a little bit, contain less rigor than the abstract of one of Shinichi Mochizuki’s papers on the abc conjecture.
Which means that my answer is that I should update very little towards MIRI’s position and that any confidence I gain about MIRI’s position is probably highly unreliable.
Thanks. My feeling is that to gain any confidence into what all this technically means, and to answer all the questions this raises, I’d probably need about 20 years of study.
No, this is a serious misunderstanding. Yudkowsky’s definition of ‘intelligence’ is
Here is part of a post exemplifying how I understand the relation between goals and intelligence:
If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.
This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.
It is really hard to communicate how I perceive this and other discussions about MIRI’s position without offending people, or killing the discussion.
I am saying this in full honesty. The position you appear to support seems so utterly “complex” (far-fetched) that the current arguments are unconvincing.
Here is my perception of the scenario that you try to sell me (exaggerated to make a point). I have a million questions about it that I can’t answer and which your answers either sidestep or explain away by using “magic”.
At this point I probably made 90% of the people reading this comment incredible angry. My perception is that you cannot communicate this perception on LessWrong without getting into serious trouble. That’s also what I meant when I told you that I cannot be completely honest if you want to discuss this on LessWrong.
I can also assure you that many people who are much smarter and higher status than me think so as well. Many people communicated the absurdity of all this to me but told me that they would not repeat this in public.
My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness.
Pretending to be friendly when you’re actually not is something that doesn’t even require human level intelligence. You could even do it accidentally.
In general, the appearance of Friendliness at low levels of ability to influence the world doesn’t guarantee actual Friendliness at high levels of ability to influence the world. (If it did, elected politicians would be much higher quality.)
I don’t think anyone’s ever disputed this. (However, that’s not very useful if the deterministic process resulting in the SI is too complex for humans to distinguish it in advance from the outcome of a random walk.)
Agreed. But by default, a machine that is better than other rival machines at satisfying our short-term desires will not satisfy our long-term desires. The concern isn’t that we’ll suddenly start building AIs with the express purpose of hitting humans in the face with mallets. The concern is that we’ll code for short-term rather than long-term goals, due to a mixture of disinterest in Friendliness and incompetence at Friendliness. But if intelligence explosion occurs, ‘the long run’ will arrive very suddenly, and very soon. So we need to adjust our research priorities to more seriously assess and modulate the long-term consequences of our technology.
That may be a reason to think that recursively self-improving AGI won’t occur. But it’s not a reason to expect such AGI, if it occurs, to be Friendly.
The seed is not the superintelligence. We shouldn’t expect the seed to automatically know whether the superintelligence will be Friendly, any more than we should expect humans to automatically know whether the superintelligence will be Friendly.
I’m not following. Why does an AGI have to have a halting condition (specifically, one that actually occurs at some point) in order to be able to productively rewrite its own source code?
You don’t seem to be internalizing my arguments. This is just the restatement of a claim I pointed out was not just wrong but dishonestly stated here.
Sure, but the list of instrumental goals overlap more than the list of terminal goals, because energy from one project can be converted to energy for a different project. This is an empirical discovery about our world; we could have found ourselves in the sort of universe where instrumental goals don’t converge that much, e.g., because once energy’s been locked down into organisms or computer chips you just Can’t convert it into useful work for anything else. In a world where we couldn’t interfere with the AI’s alien goals, nor could our component parts and resources be harvested to build very different structures, nor could we be modified to work for the AI, the UFAI would just ignore us and zip off into space to try and find more useful objects. We don’t live in that world because complicated things can be broken down into simpler things at a net gain in our world, and humans value a specific set of complicated things.
‘These two sets are both infinite’ does not imply ‘we can’t reason about these two things’ relative size, or how often the same elements recur in their elements’.
You’ve spent an awful lot of time writing about the varied ways in which you’ve not yet been convinced by claims you haven’t put much time into actively investigating. Maybe some of that time could be better spent researching these topics you keep writing about? I’m not saying to stop talking about this, but there’s plenty of material on a lot of these issues to be found. Have you read Intelligence Explosion Microeconomics?
http://wiki.lesswrong.com/wiki/Optimization_process
As a rule, adding halting conditions adds complexity to an algorithm, rather than removing complexity.
No, this is a serious misunderstanding. Yudkowsky’s definition of ‘intelligence’ is about the ability to achieve goals in general, not about the ability to achieve the system’s goals. That’s why you can’t increase a system’s intelligence by lowering its standards, i.e., making its preferences easier to satisfy.
Straw-man; no one has claimed that humans are likely to want to create an UFAI. What we’ve suggested is that humans are likely to want to create an algorithm, X, that will turn out to be a UFAI. (In other words, the fallacy you’re committing is confusing intension with extension.)
That aside: Are you saying Dyson spheres wouldn’t be useful for beating more humans at more tic-tac-toe games? Seems like a pretty good way to win at tic-tac-toe to me.
Actually I do define intelligence as ability to hit a narrow outcome target relative to your own goals, but if your goals are very relaxed then the volume of outcome space with equal or greater utility will be very large. However one would expect that many of the processes involved in hitting a narrow target in outcome space (such that few other outcomes are rated equal or greater in the agent’s preference ordering), such as building a good epistemic model or running on a fast computer, would generalize across many utility functions; this is why we can speak of properties apt to intelligence apart from particular utility functions.
Hmm. But this just sounds like optimization power to me. You’ve defined intelligence in the past as “efficient cross-domain optimization”. The “cross-domain” part I’ve taken to mean that you’re able to hit narrow targets in general, not just ones you happen to like. So you can become more intelligent by being better at hitting targets you hate, or by being better at hitting targets you like.
The former are harder to test, but something you’d hate doing now could become instrumentally useful to know how to do later. And your intelligence level doesn’t change when the circumstance shifts which part of your skillset is instrumentally useful. For that matter, I’m missing why it’s useful to think that your intelligence level could drastically shift if your abilities remained constant but your terminal values were shifted. (E.g., if you became pickier.)
No, “cross-domain” means that I can optimize across instrumental domains. Like, I can figure out how to go through water, air, or space if that’s the fastest way to my destination, I am not limited to land like a ground sloth.
Measured intelligence shouldn’t shift if you become pickier—if you could previously hit a point such that only 1/1000th of the space was more preferred than it, we’d still expect you to hit around that narrow a volume of the space given your intelligence even if you claimed afterward that a point like that only corresponded to 0.25 utility on your 0-1 scale instead of 0.75 utility due to being pickier ([expected] utilities sloping more sharply downward with increasing distance from the optimum).
You might be not aware of this but I wrote a sequence of short blog posts where I tried to think of concrete scenarios that could lead to human extinction. Each of which raised many questions.
The introductory post is ‘AI vs. humanity and the lack of concrete scenarios’.
1. Questions regarding the nanotechnology-AI-risk conjunction
2. AI risk scenario: Deceptive long-term replacement of the human workforce
3. AI risk scenario: Social engineering
4. AI risk scenario: Elite Cabal
5. AI risk scenario: Insect-sized drones
6. AI risks scenario: Biological warfare
What might seem to appear completely obvious to you for reasons that I do not understand, e.g. that an AI can take over the world, appears to me largely like magic (I am not trying to be rude, by magic I only mean that I don’t understand the details). At the very least there are a lot of open questions. Even given that for the sake of the above posts I accepted that the AI is superhuman and can do such things as deceive humans by its superior knowledge of human psychology. Which seems to be non-trivial assumption, to say the least.
Over and over I told you that given all your assumptions, I agree that AGI is an existential risk.
You did not reply to my argument. My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness. My argument did not pertain the possibility of a friendly seed turning unfriendly.
What I have been arguing is that an AI should not be expected, by default, to want to eliminate all possible obstructions. There are many graduations here. That, by some economic or otherwise theoretic argument, it might be instrumentally rational for some ideal AI to take over the world, does not mean that humans would create such an AI, or that an AI could not be limited to care about fires in its server farm rather than that Russia might nuke the U.S. and thereby destroy its servers.
Did you mean to reply to another point? I don’t see how the reply you linked to is relevant to what I wrote.
My argument is that an AI does not need to consider all possible threats and care to acquire all possible resources. Based on its design it could just want to optimize using its initial resources while only considering mundane threats. I just don’t see real-world AIs to conclude that they need to take over the world. I don’t think an AI is likely going to be designed that way. I also don’t think such an AI could work, because such inferences would require enormous amounts of resources.
I have done what is possible given my current level of education and what I perceive to be useful. I have e.g. asked experts about their opinion.
A few general remarks about the kind of papers such as the one that you linked to.
How much should I update towards MIRI’s position if I (1) understood the arguments in the paper (2) found the arguments convincing?
My answer is the following. If the paper was about the abc conjecture, the P versus NP problem, climate change, or even such mundane topics as psychology, I would either not be able to understand the paper, would be unable to verify the claims, or would have very little confidence in my judgement.
So what about ‘Intelligence Explosion Microeconomics’? That I can read most of it is only due to the fact that it is very informally written. The topic itself is more difficult and complex than all of the above mentioned problems together. Yet the arguments in support of it, to exaggerate a little bit, contain less rigor than the abstract of one of Shinichi Mochizuki’s papers on the abc conjecture.
Which means that my answer is that I should update very little towards MIRI’s position and that any confidence I gain about MIRI’s position is probably highly unreliable.
Thanks. My feeling is that to gain any confidence into what all this technically means, and to answer all the questions this raises, I’d probably need about 20 years of study.
Here is part of a post exemplifying how I understand the relation between goals and intelligence:
If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.
This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.
It is really hard to communicate how I perceive this and other discussions about MIRI’s position without offending people, or killing the discussion.
I am saying this in full honesty. The position you appear to support seems so utterly “complex” (far-fetched) that the current arguments are unconvincing.
Here is my perception of the scenario that you try to sell me (exaggerated to make a point). I have a million questions about it that I can’t answer and which your answers either sidestep or explain away by using “magic”.
At this point I probably made 90% of the people reading this comment incredible angry. My perception is that you cannot communicate this perception on LessWrong without getting into serious trouble. That’s also what I meant when I told you that I cannot be completely honest if you want to discuss this on LessWrong.
I can also assure you that many people who are much smarter and higher status than me think so as well. Many people communicated the absurdity of all this to me but told me that they would not repeat this in public.
Pretending to be friendly when you’re actually not is something that doesn’t even require human level intelligence. You could even do it accidentally.
In general, the appearance of Friendliness at low levels of ability to influence the world doesn’t guarantee actual Friendliness at high levels of ability to influence the world. (If it did, elected politicians would be much higher quality.)