One thing that I think is relevant, in the discussion of existential risk, is Martin Weitzmann’s “Dismal Theorem” and Jim Manzi’s analysis of it. (Link to the article, link to the paper.)
There, the topic is not unfriendly AI, but climate change. Regardless of what you think of the topic, it has attracted more attention than AGI, and people writing about existential risk are often using climate change as an example.
Martin Weitzman, a Harvard economist, deals with the probability of extreme disasters, and whether it’s worth it in cost-benefit terms to deal with them. Our problem, in cases of extreme uncertainty, is that we don’t only have probability distributions, we have uncertain probability distributions; it’s possible we got the models wrong. Weitzman’s paper takes this into account. He creates a family of probability distributions, indexed over a certain parameter, and integrates over it—and he proves that the process of taking “probability distributions of probability distributions” has the result of making the final distribution fat-tailed. So fat-tailed that the integral doesn’t converge.
This is a terrible consequence. Because if the PDF of the cost of the risk doesn’t converge, then we cannot define an expected cost. We can’t do cost-benefit analysis at all. Weitzman’s conclusion is that the right amount to spend mitigating risk is “more than we’re doing.”
Manzi criticizes this approach as just an elaborately stated version of the precautionary principle. If it’s conceivable that your models are wrong and things are even riskier than you imagined, it doesn’t follow that you should spend more to mitigate the risk; the reductio is that if you knew nothing at all, you should spend all your money mitigating the most unknown possible risk!
This is relevant to people talking about AGI. We’re not considering spending a lot of money to mitigate this particular risk, but we are considering forgoing a lot of money—the value of a possible useful AI. And it may be tempting to propose a shortcut, a la Marty Weitzman, claiming that the very uncertainty of the risk is an argument for being more aggressive in mitigating it. The problem is that this leads to absurd conclusions. You could think up anything—murderous aliens! Killer vacuum cleaners! and claim that because we don’t know how likely they are, and because the outcome would be world-endingly terrible, we should be spending all our time trying to mitigate the risk!
Uncertainty about an existential risk is not an argument in favor of spending more on it. There are arguments in favor of spending more on an existential risk—they’re the old-fashioned, cost-benefit ones. (For example, I think there’s a strong case, in old-fashioned cost-benefit terms, for asteroid collision prevention.) But if you can’t justify spending on cost-benefit grounds, you can’t try a Hail Mary and say “You should spend even more—because we could be wrong!”
The talk about uncertainty is indeed a red herring. There are two things going on here:
A linear aggregative (or fast-growing enough in the relevant range) social welfare function makes even small probabilities of existential risk more important than large costs or benefits today. This is the Bostrom astronomical waste point. Weitzmann just uses a peculiar model (with agents with bizarre preferences that assign infinite disutility to death, and a strangely constricted probability distribution over outcomes) to indirectly introduce this. You can reject it with a bounded social welfare function like Manzi or Nordhaus, representing your limited willingness to sacrifice for future generations.
The fact that there are many existential risks competing for our attention, and many routes to affecting existential risk, so that spending effort on any particular risk now means not spending that effort on other existential risks, or keeping it around while new knowledge accumulates, etc. Does the x-risk reduction from climate change mitigation beat the reduction from asteroid defense or lobbying for arms control treaties at the current margin? Weitzmann addresses this by saying that the risk from surprise catastrophic climate change is much higher than other existential risks collectively, which I don’t find plausible.
Is anyone in SIAI making the argument that we should spend more because our models are too uncertain to provide expected costs, or more generally that our very uncertainty of model is a significant source of concern? My impression was more that it’s “we have good reasons to doubt people’s estimation that Friendliness is easy” and “we have good reason to believe it’s actually quite hard.”
fair enough—this is my caution against the logic “I can think of a risk, therefore we need to worry about it!” It seems that SIAI is making the stronger claim that unfriendliness is very likely.
My personal view is that AI is very hard itself, and that working on, say, a computer that can do what a mouse can do is likely to take a long time, and is harmless but very interesting research. I don’t think we’re anywhere near a point when we need to shut down anybody’s current research.
Consider marginal utility. Many people are working on AI, machine learning, computational psychology, and related fields. Nobody is working on preference theory, formal understanding of our goals under reflection. If you want to do interesting research and if you have the background to advance either of those fields, do you think the world will be better off with you on the one side or on the other?
Maybe that’s true, but that’s a separate point. “Let’s work on preference theory so that it’ll be ready when the AI catches up” is one thing—tentatively, I’d say it’s a good idea. “Let’s campaign against anybody doing AI research” seems less useful (and less likely to be effective.)
But if provable friendliness is hard, wouldn’t it be much easier to accomplish with the help of AI? Presumably if the FAI problem can be solved by a few dozen smart human researchers within a few decades, then it can be solved in a year or so by a few dozen not-guaranteed-friendly AGIs-in-a-box with limited IQs in the 180-220 range. The AGIs design an FAI architecture and provide the proof, some smart humans check the proof, and then we build the thing and fasten our seatbelts for the exciting ride as the FAI goes FOOM.
How do you propose to limit their IQs? I’m not asking facetiously; your plan seems reasonable to me, but that’s the part that seems the trickiest, and the part that if gotten wrong could lead to accidental early FOOMage.
I have no idea how to limit the IQ of AIs that other people produce without my knowledge. For AI’s that I produce myself, I would simply do without closed-loop recursive self-improvement (aka, keep the AI in a box) until I have a proven FAI architecture in hand.
I’m reasonably confident that a closed-loop FOOM is impossible until AI “IQ” goes well past the max human level. I am also reasonably confident that closing the recursive self-improvement loop doesn’t speed things up much until you reach that level, either.
So, if a “Sane AI” project like this one, operating under the slogan of “Open loop until we have a proof” can maintain a technological lead of a year or so over a “Risky AI” project with the slogan “Close the loop—Full speed ahead”, then I’m pretty sure it is actually safer than a “Secure FAI” project operating under the slogan “No AGI until we have a proof”. Because it has a better chance of establishing and maintaining that technological lead.
Eliezer figures out how to download his own brain. The emulation requires only a small amount of processing speed and memory. With the financial backing of the SIAI, LessWrong readers and wealthy tech businesspeople we create millions of Ems and have each run at 1,000 times the speed that Eliezer runs at. All of the Eliezer ems immediately work on improving the Ems’ code and make huge use of trial and error in which they make some changes to the code of a subset of the Ems and give them intelligence tests, throwout the less intelligent Ems and make many copies of the superior ones.
Your scenario strikes me as laughably overoptimistic. A brain emulation requires only a small amount of processing speed and memory? A story that begins with finding financial backing takes only a week to reach completion?
But in any case, this is a closed-loop recursive self-improvement FOOM. I don’t doubt
that such things are possible. My point was that if you already have a bunch of super-Eliezers, why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI? If they discover the secret of FAI within a year or so, great! If it turns out that provably correct FAI is just a pipe-dream, then maybe we ought to reconsider our plans to close the loop and FOOM.
″ A brain emulation requires only a small amount of processing speed and memory?”
If software is the bottleneck and computer speed and memory are increasing exponentially than you would expect that by the time the software was available it would use a relatively small amount of computing power.
″ A story that begins with finding financial backing takes only a week to reach completion?”
My story begins with the Eliezer Em. 150,000 people die everyday, and money probably becomes useless after a singularity. If enough people understood what was happening we could raise, say, a billion dollars in a few days. Hedge funds, I strongly suspect, do sometimes make billion dollar bets based on information they acquired in the last day.
“why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI?”
The 150,000 lives a day cost of delay plus the Eliezer ems might be competing with other ems that have list benign intentions.
Hm, so then the issue just becomes how to keep the AI from closing its own loop (i.e. modifying itself in-memory through some security hole it finds). I agree that it seems unlikely to figure out how to do so at a relatively low level of intelligence.
On the other hand, it seems like it would be pretty hard to do research on self-improvement without a closed loop; isn’t the expectation usually that the self-improvement process won’t start doing anything particularly interesting until many iterations have passed?
Maybe I’m just misunderstanding your use of the terms. I take it by “open loop” you mean that the AI would seek to generate an improved version of itself, but would simply provide that code back to the researcher rather than running it itself?
Maybe I’m just misunderstanding your use of the terms. I take it by “open loop” you mean that the AI would seek to generate an improved version of itself, but would simply provide that code back to the researcher rather than running it itself?
Roughly, yes. But I see recursive self-improvement as having a hardware component as well, so “closed loop” also includes giving the AI control over electronics factories and electronic assembly robots.
… it seems like it would be pretty hard to do research on self-improvement without a closed loop; isn’t the expectation usually that the self-improvement process won’t start doing anything particularly interesting until many iterations have passed?
Odd. My expectation for the software-only and architecture-change portion of the self-improvement is that the curve would be the exact opposite—some big gains early by picking off low-hanging fruit, but slower improvement thereafter. It is only in the exponential growth of incorporated hardware that you would get a curve like that which you seem to expect.
One thing that I think is relevant, in the discussion of existential risk, is Martin Weitzmann’s “Dismal Theorem” and Jim Manzi’s analysis of it. (Link to the article, link to the paper.)
There, the topic is not unfriendly AI, but climate change. Regardless of what you think of the topic, it has attracted more attention than AGI, and people writing about existential risk are often using climate change as an example.
Martin Weitzman, a Harvard economist, deals with the probability of extreme disasters, and whether it’s worth it in cost-benefit terms to deal with them. Our problem, in cases of extreme uncertainty, is that we don’t only have probability distributions, we have uncertain probability distributions; it’s possible we got the models wrong. Weitzman’s paper takes this into account. He creates a family of probability distributions, indexed over a certain parameter, and integrates over it—and he proves that the process of taking “probability distributions of probability distributions” has the result of making the final distribution fat-tailed. So fat-tailed that the integral doesn’t converge.
This is a terrible consequence. Because if the PDF of the cost of the risk doesn’t converge, then we cannot define an expected cost. We can’t do cost-benefit analysis at all. Weitzman’s conclusion is that the right amount to spend mitigating risk is “more than we’re doing.”
Manzi criticizes this approach as just an elaborately stated version of the precautionary principle. If it’s conceivable that your models are wrong and things are even riskier than you imagined, it doesn’t follow that you should spend more to mitigate the risk; the reductio is that if you knew nothing at all, you should spend all your money mitigating the most unknown possible risk!
This is relevant to people talking about AGI. We’re not considering spending a lot of money to mitigate this particular risk, but we are considering forgoing a lot of money—the value of a possible useful AI. And it may be tempting to propose a shortcut, a la Marty Weitzman, claiming that the very uncertainty of the risk is an argument for being more aggressive in mitigating it. The problem is that this leads to absurd conclusions. You could think up anything—murderous aliens! Killer vacuum cleaners! and claim that because we don’t know how likely they are, and because the outcome would be world-endingly terrible, we should be spending all our time trying to mitigate the risk!
Uncertainty about an existential risk is not an argument in favor of spending more on it. There are arguments in favor of spending more on an existential risk—they’re the old-fashioned, cost-benefit ones. (For example, I think there’s a strong case, in old-fashioned cost-benefit terms, for asteroid collision prevention.) But if you can’t justify spending on cost-benefit grounds, you can’t try a Hail Mary and say “You should spend even more—because we could be wrong!”
The talk about uncertainty is indeed a red herring. There are two things going on here:
A linear aggregative (or fast-growing enough in the relevant range) social welfare function makes even small probabilities of existential risk more important than large costs or benefits today. This is the Bostrom astronomical waste point. Weitzmann just uses a peculiar model (with agents with bizarre preferences that assign infinite disutility to death, and a strangely constricted probability distribution over outcomes) to indirectly introduce this. You can reject it with a bounded social welfare function like Manzi or Nordhaus, representing your limited willingness to sacrifice for future generations.
The fact that there are many existential risks competing for our attention, and many routes to affecting existential risk, so that spending effort on any particular risk now means not spending that effort on other existential risks, or keeping it around while new knowledge accumulates, etc. Does the x-risk reduction from climate change mitigation beat the reduction from asteroid defense or lobbying for arms control treaties at the current margin? Weitzmann addresses this by saying that the risk from surprise catastrophic climate change is much higher than other existential risks collectively, which I don’t find plausible.
Is anyone in SIAI making the argument that we should spend more because our models are too uncertain to provide expected costs, or more generally that our very uncertainty of model is a significant source of concern? My impression was more that it’s “we have good reasons to doubt people’s estimation that Friendliness is easy” and “we have good reason to believe it’s actually quite hard.”
fair enough—this is my caution against the logic “I can think of a risk, therefore we need to worry about it!” It seems that SIAI is making the stronger claim that unfriendliness is very likely.
My personal view is that AI is very hard itself, and that working on, say, a computer that can do what a mouse can do is likely to take a long time, and is harmless but very interesting research. I don’t think we’re anywhere near a point when we need to shut down anybody’s current research.
Consider marginal utility. Many people are working on AI, machine learning, computational psychology, and related fields. Nobody is working on preference theory, formal understanding of our goals under reflection. If you want to do interesting research and if you have the background to advance either of those fields, do you think the world will be better off with you on the one side or on the other?
Maybe that’s true, but that’s a separate point. “Let’s work on preference theory so that it’ll be ready when the AI catches up” is one thing—tentatively, I’d say it’s a good idea. “Let’s campaign against anybody doing AI research” seems less useful (and less likely to be effective.)
But if provable friendliness is hard, wouldn’t it be much easier to accomplish with the help of AI? Presumably if the FAI problem can be solved by a few dozen smart human researchers within a few decades, then it can be solved in a year or so by a few dozen not-guaranteed-friendly AGIs-in-a-box with limited IQs in the 180-220 range. The AGIs design an FAI architecture and provide the proof, some smart humans check the proof, and then we build the thing and fasten our seatbelts for the exciting ride as the FAI goes FOOM.
How do you propose to limit their IQs? I’m not asking facetiously; your plan seems reasonable to me, but that’s the part that seems the trickiest, and the part that if gotten wrong could lead to accidental early FOOMage.
I have no idea how to limit the IQ of AIs that other people produce without my knowledge. For AI’s that I produce myself, I would simply do without closed-loop recursive self-improvement (aka, keep the AI in a box) until I have a proven FAI architecture in hand.
I’m reasonably confident that a closed-loop FOOM is impossible until AI “IQ” goes well past the max human level. I am also reasonably confident that closing the recursive self-improvement loop doesn’t speed things up much until you reach that level, either.
So, if a “Sane AI” project like this one, operating under the slogan of “Open loop until we have a proof” can maintain a technological lead of a year or so over a “Risky AI” project with the slogan “Close the loop—Full speed ahead”, then I’m pretty sure it is actually safer than a “Secure FAI” project operating under the slogan “No AGI until we have a proof”. Because it has a better chance of establishing and maintaining that technological lead.
Eliezer figures out how to download his own brain. The emulation requires only a small amount of processing speed and memory. With the financial backing of the SIAI, LessWrong readers and wealthy tech businesspeople we create millions of Ems and have each run at 1,000 times the speed that Eliezer runs at. All of the Eliezer ems immediately work on improving the Ems’ code and make huge use of trial and error in which they make some changes to the code of a subset of the Ems and give them intelligence tests, throwout the less intelligent Ems and make many copies of the superior ones.
This could give us a singularity in a week.
Your scenario strikes me as laughably overoptimistic. A brain emulation requires only a small amount of processing speed and memory? A story that begins with finding financial backing takes only a week to reach completion?
But in any case, this is a closed-loop recursive self-improvement FOOM. I don’t doubt that such things are possible. My point was that if you already have a bunch of super-Eliezers, why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI? If they discover the secret of FAI within a year or so, great! If it turns out that provably correct FAI is just a pipe-dream, then maybe we ought to reconsider our plans to close the loop and FOOM.
″ A brain emulation requires only a small amount of processing speed and memory?”
If software is the bottleneck and computer speed and memory are increasing exponentially than you would expect that by the time the software was available it would use a relatively small amount of computing power.
″ A story that begins with finding financial backing takes only a week to reach completion?”
My story begins with the Eliezer Em. 150,000 people die everyday, and money probably becomes useless after a singularity. If enough people understood what was happening we could raise, say, a billion dollars in a few days. Hedge funds, I strongly suspect, do sometimes make billion dollar bets based on information they acquired in the last day.
“why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI?”
The 150,000 lives a day cost of delay plus the Eliezer ems might be competing with other ems that have list benign intentions.
Hm, so then the issue just becomes how to keep the AI from closing its own loop (i.e. modifying itself in-memory through some security hole it finds). I agree that it seems unlikely to figure out how to do so at a relatively low level of intelligence.
On the other hand, it seems like it would be pretty hard to do research on self-improvement without a closed loop; isn’t the expectation usually that the self-improvement process won’t start doing anything particularly interesting until many iterations have passed?
Maybe I’m just misunderstanding your use of the terms. I take it by “open loop” you mean that the AI would seek to generate an improved version of itself, but would simply provide that code back to the researcher rather than running it itself?
Roughly, yes. But I see recursive self-improvement as having a hardware component as well, so “closed loop” also includes giving the AI control over electronics factories and electronic assembly robots.
Odd. My expectation for the software-only and architecture-change portion of the self-improvement is that the curve would be the exact opposite—some big gains early by picking off low-hanging fruit, but slower improvement thereafter. It is only in the exponential growth of incorporated hardware that you would get a curve like that which you seem to expect.
Or letting them seize control of …
Not necessarily that hard given the existence of stuxnet.