In the open, non-classified crypto world, we pick standard crypto algorithms by getting competing designs from dozens of teams, who then attack each other’s designs, with the rest of the research community joining in. This seems like a good model for FAI as well, if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case.
if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case
Why do you think so? Do you expect an actual FAI-building organization to start working in the next few years? Because, assuming the cautionary position is actually the correct one, then FAI organization will surely get lots of people and resources in time?
Many people are interested in building FAI, or just AGI (which at least isn’t harder to build than specifically Friendly AGI). Assuming available funds increase slowly over time, a team trying to build a FAI with few safeguards will be able to be funded before a team that requires many safeguards, and will also work faster (fewer constraints on result), and so will likely finish first.
But if AGI is not closer than several decades ahead, then, assuming the cautionary position is the correct one, it will become wide-spread and universally accepted. Any official well-funded teams will work with many safeguards and lots of constraints. Only stupid cranks will work without these, and they’ll work without funding too.
You’re not addressing my point about a scenario where available funds increase slowly.
Concretely (with arbitrary dates): suppose that in 2050, FAI theory is fully proven. By 2070, it is universally accepted, but still no-one knows how to build an AGI, or maybe no-one has sufficient processing power.
In 2090, several governments reach the point of being able to fund a non-Friendly AGI (which is much cheaper). In 2120, they will be able to fund a fully Friendly AGI.
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
Remember, we’re describing the situation where the cautionary position is provably correct. So your “greatest temptation ever” is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.
My conditional was “cautionary position is the correct one”. I meant, provably correct.
Leaving out the “provably” makes a big difference. If you add “provably” then I think the conditional is so unlikely that I don’t know why you’d assume it.
Well, assuming EY’s view of intelligence, the “cautionary position” is likely to be a mathematical statement. And then why not prove it? Given several decades? That’s a lot of time.
One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule “be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name” or even subtler “be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity”. The second will likely act identically to a Friendly AI.
I thought you were merely specifying that the FAI theory was proven to be Friendly. But you’re also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn’t understand that was what you were suggesting.
Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.
In the open, non-classified crypto world, we pick standard crypto algorithms by getting competing designs from dozens of teams, who then attack each other’s designs, with the rest of the research community joining in. This seems like a good model for FAI as well, if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case.
Why do you think so? Do you expect an actual FAI-building organization to start working in the next few years? Because, assuming the cautionary position is actually the correct one, then FAI organization will surely get lots of people and resources in time?
Many people are interested in building FAI, or just AGI (which at least isn’t harder to build than specifically Friendly AGI). Assuming available funds increase slowly over time, a team trying to build a FAI with few safeguards will be able to be funded before a team that requires many safeguards, and will also work faster (fewer constraints on result), and so will likely finish first.
But if AGI is not closer than several decades ahead, then, assuming the cautionary position is the correct one, it will become wide-spread and universally accepted. Any official well-funded teams will work with many safeguards and lots of constraints. Only stupid cranks will work without these, and they’ll work without funding too.
You’re not addressing my point about a scenario where available funds increase slowly.
Concretely (with arbitrary dates): suppose that in 2050, FAI theory is fully proven. By 2070, it is universally accepted, but still no-one knows how to build an AGI, or maybe no-one has sufficient processing power.
In 2090, several governments reach the point of being able to fund a non-Friendly AGI (which is much cheaper). In 2120, they will be able to fund a fully Friendly AGI.
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
Remember, we’re describing the situation where the cautionary position is provably correct. So your “greatest temptation ever” is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.
That one has a provably Friendly AI is not the same thing as that any other AI is provably going to do terrible things.
My conditional was “cautionary position is the correct one”. I meant, provably correct.
It’s like with dreams of true universal objective morality: even if in some sense there is one, some agents are just going to ignore it.
Leaving out the “provably” makes a big difference. If you add “provably” then I think the conditional is so unlikely that I don’t know why you’d assume it.
Well, assuming EY’s view of intelligence, the “cautionary position” is likely to be a mathematical statement. And then why not prove it? Given several decades? That’s a lot of time.
One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule “be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name” or even subtler “be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity”. The second will likely act identically to a Friendly AI.
I thought you were merely specifying that the FAI theory was proven to be Friendly. But you’re also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn’t understand that was what you were suggesting.
Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.