But by default, a machine that is better than other rival machines at satisfying our short-term desires will not satisfy our long-term desires.
You might be not aware of this but I wrote a sequence of short blog posts where I tried to think of concrete scenarios that could lead to human extinction. Each of which raised many questions.
What might seem to appear completely obvious to you for reasons that I do not understand, e.g. that an AI can take over the world, appears to me largely like magic (I am not trying to be rude, by magic I only mean that I don’t understand the details). At the very least there are a lot of open questions. Even given that for the sake of the above posts I accepted that the AI is superhuman and can do such things as deceive humans by its superior knowledge of human psychology. Which seems to be non-trivial assumption, to say the least.
That may be a reason to think that recursively self-improving AGI won’t occur. But it’s not a reason to expect such AGI, if it occurs, to be Friendly.
Over and over I told you that given all your assumptions, I agree that AGI is an existential risk.
The seed is not the superintelligence. We shouldn’t expect the seed to automatically know whether the superintelligence will be Friendly, any more than we should expect humans to automatically know whether the superintelligence will be Friendly.
You did not reply to my argument. My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness. My argument did not pertain the possibility of a friendly seed turning unfriendly.
Why does an AGI have to have a halting condition (specifically, one that actually occurs at some point) in order to be able to productively rewrite its own source code?
What I have been arguing is that an AI should not be expected, by default, to want to eliminate all possible obstructions. There are many graduations here. That, by some economic or otherwise theoretic argument, it might be instrumentally rational for some ideal AI to take over the world, does not mean that humans would create such an AI, or that an AI could not be limited to care about fires in its server farm rather than that Russia might nuke the U.S. and thereby destroy its servers.
You don’t seem to be internalizing my arguments.
Did you mean to reply to another point? I don’t see how the reply you linked to is relevant to what I wrote.
Sure, but the list of instrumental goals overlap more than the list of terminal goals, because energy from one project can be converted to energy for a different project.
My argument is that an AI does not need to consider all possible threats and care to acquire all possible resources. Based on its design it could just want to optimize using its initial resources while only considering mundane threats. I just don’t see real-world AIs to conclude that they need to take over the world. I don’t think an AI is likely going to be designed that way. I also don’t think such an AI could work, because such inferences would require enormous amounts of resources.
You’ve spent an awful lot of time writing about the varied ways in which you’ve not yet been convinced by claims you haven’t put much time into actively investigating. Maybe some of that time could be better spent researching these topics you keep writing about?
I have done what is possible given my current level of education and what I perceive to be useful. I have e.g. asked experts about their opinion.
A few general remarks about the kind of papers such as the one that you linked to.
How much should I update towards MIRI’s position if I (1) understood the arguments in the paper (2) found the arguments convincing?
My answer is the following. If the paper was about the abc conjecture, the P versus NP problem, climate change, or even such mundane topics as psychology, I would either not be able to understand the paper, would be unable to verify the claims, or would have very little confidence in my judgement.
So what about ‘Intelligence Explosion Microeconomics’? That I can read most of it is only due to the fact that it is very informally written. The topic itself is more difficult and complex than all of the above mentioned problems together. Yet the arguments in support of it, to exaggerate a little bit, contain less rigor than the abstract of one of Shinichi Mochizuki’s papers on the abc conjecture.
Which means that my answer is that I should update very little towards MIRI’s position and that any confidence I gain about MIRI’s position is probably highly unreliable.
Thanks. My feeling is that to gain any confidence into what all this technically means, and to answer all the questions this raises, I’d probably need about 20 years of study.
No, this is a serious misunderstanding. Yudkowsky’s definition of ‘intelligence’ is
Here is part of a post exemplifying how I understand the relation between goals and intelligence:
If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.
This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.
It is really hard to communicate how I perceive this and other discussions about MIRI’s position without offending people, or killing the discussion.
I am saying this in full honesty. The position you appear to support seems so utterly “complex” (far-fetched) that the current arguments are unconvincing.
Here is my perception of the scenario that you try to sell me (exaggerated to make a point). I have a million questions about it that I can’t answer and which your answers either sidestep or explain away by using “magic”.
At this point I probably made 90% of the people reading this comment incredible angry. My perception is that you cannot communicate this perception on LessWrong without getting into serious trouble. That’s also what I meant when I told you that I cannot be completely honest if you want to discuss this on LessWrong.
I can also assure you that many people who are much smarter and higher status than me think so as well. Many people communicated the absurdity of all this to me but told me that they would not repeat this in public.
My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness.
Pretending to be friendly when you’re actually not is something that doesn’t even require human level intelligence. You could even do it accidentally.
In general, the appearance of Friendliness at low levels of ability to influence the world doesn’t guarantee actual Friendliness at high levels of ability to influence the world. (If it did, elected politicians would be much higher quality.)
You might be not aware of this but I wrote a sequence of short blog posts where I tried to think of concrete scenarios that could lead to human extinction. Each of which raised many questions.
The introductory post is ‘AI vs. humanity and the lack of concrete scenarios’.
1. Questions regarding the nanotechnology-AI-risk conjunction
2. AI risk scenario: Deceptive long-term replacement of the human workforce
3. AI risk scenario: Social engineering
4. AI risk scenario: Elite Cabal
5. AI risk scenario: Insect-sized drones
6. AI risks scenario: Biological warfare
What might seem to appear completely obvious to you for reasons that I do not understand, e.g. that an AI can take over the world, appears to me largely like magic (I am not trying to be rude, by magic I only mean that I don’t understand the details). At the very least there are a lot of open questions. Even given that for the sake of the above posts I accepted that the AI is superhuman and can do such things as deceive humans by its superior knowledge of human psychology. Which seems to be non-trivial assumption, to say the least.
Over and over I told you that given all your assumptions, I agree that AGI is an existential risk.
You did not reply to my argument. My argument was that if the seed is unfriendly then it will not be smart enough to hide its unfriendliness. My argument did not pertain the possibility of a friendly seed turning unfriendly.
What I have been arguing is that an AI should not be expected, by default, to want to eliminate all possible obstructions. There are many graduations here. That, by some economic or otherwise theoretic argument, it might be instrumentally rational for some ideal AI to take over the world, does not mean that humans would create such an AI, or that an AI could not be limited to care about fires in its server farm rather than that Russia might nuke the U.S. and thereby destroy its servers.
Did you mean to reply to another point? I don’t see how the reply you linked to is relevant to what I wrote.
My argument is that an AI does not need to consider all possible threats and care to acquire all possible resources. Based on its design it could just want to optimize using its initial resources while only considering mundane threats. I just don’t see real-world AIs to conclude that they need to take over the world. I don’t think an AI is likely going to be designed that way. I also don’t think such an AI could work, because such inferences would require enormous amounts of resources.
I have done what is possible given my current level of education and what I perceive to be useful. I have e.g. asked experts about their opinion.
A few general remarks about the kind of papers such as the one that you linked to.
How much should I update towards MIRI’s position if I (1) understood the arguments in the paper (2) found the arguments convincing?
My answer is the following. If the paper was about the abc conjecture, the P versus NP problem, climate change, or even such mundane topics as psychology, I would either not be able to understand the paper, would be unable to verify the claims, or would have very little confidence in my judgement.
So what about ‘Intelligence Explosion Microeconomics’? That I can read most of it is only due to the fact that it is very informally written. The topic itself is more difficult and complex than all of the above mentioned problems together. Yet the arguments in support of it, to exaggerate a little bit, contain less rigor than the abstract of one of Shinichi Mochizuki’s papers on the abc conjecture.
Which means that my answer is that I should update very little towards MIRI’s position and that any confidence I gain about MIRI’s position is probably highly unreliable.
Thanks. My feeling is that to gain any confidence into what all this technically means, and to answer all the questions this raises, I’d probably need about 20 years of study.
Here is part of a post exemplifying how I understand the relation between goals and intelligence:
If a goal has very few constraints then the set that satisfies all constraints is very large. A vague and ambiguous goal allows for too much freedom in the sense that a wide range of world states would have the same expected value and therefore imply a very large solution space, since a wide range of AI’s will be able to achieve those world states and thereby satisfy the condition of being improved versions of their predecessor.
This means that in order to get an AI to become superhuman at all, and very quickly in particular, you will need to encode a very specific goal against which mistakes, optimization power and achievement can be judged.
It is really hard to communicate how I perceive this and other discussions about MIRI’s position without offending people, or killing the discussion.
I am saying this in full honesty. The position you appear to support seems so utterly “complex” (far-fetched) that the current arguments are unconvincing.
Here is my perception of the scenario that you try to sell me (exaggerated to make a point). I have a million questions about it that I can’t answer and which your answers either sidestep or explain away by using “magic”.
At this point I probably made 90% of the people reading this comment incredible angry. My perception is that you cannot communicate this perception on LessWrong without getting into serious trouble. That’s also what I meant when I told you that I cannot be completely honest if you want to discuss this on LessWrong.
I can also assure you that many people who are much smarter and higher status than me think so as well. Many people communicated the absurdity of all this to me but told me that they would not repeat this in public.
Pretending to be friendly when you’re actually not is something that doesn’t even require human level intelligence. You could even do it accidentally.
In general, the appearance of Friendliness at low levels of ability to influence the world doesn’t guarantee actual Friendliness at high levels of ability to influence the world. (If it did, elected politicians would be much higher quality.)