Why not have team members whose entire job is to prevent this sort of thing from happening? At least one team writing software for NASA has software testers who have a semi-antagonistic relationship with the software engineers.
In the open, non-classified crypto world, we pick standard crypto algorithms by getting competing designs from dozens of teams, who then attack each other’s designs, with the rest of the research community joining in. This seems like a good model for FAI as well, if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case.
if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case
Why do you think so? Do you expect an actual FAI-building organization to start working in the next few years? Because, assuming the cautionary position is actually the correct one, then FAI organization will surely get lots of people and resources in time?
Many people are interested in building FAI, or just AGI (which at least isn’t harder to build than specifically Friendly AGI). Assuming available funds increase slowly over time, a team trying to build a FAI with few safeguards will be able to be funded before a team that requires many safeguards, and will also work faster (fewer constraints on result), and so will likely finish first.
But if AGI is not closer than several decades ahead, then, assuming the cautionary position is the correct one, it will become wide-spread and universally accepted. Any official well-funded teams will work with many safeguards and lots of constraints. Only stupid cranks will work without these, and they’ll work without funding too.
You’re not addressing my point about a scenario where available funds increase slowly.
Concretely (with arbitrary dates): suppose that in 2050, FAI theory is fully proven. By 2070, it is universally accepted, but still no-one knows how to build an AGI, or maybe no-one has sufficient processing power.
In 2090, several governments reach the point of being able to fund a non-Friendly AGI (which is much cheaper). In 2120, they will be able to fund a fully Friendly AGI.
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
Remember, we’re describing the situation where the cautionary position is provably correct. So your “greatest temptation ever” is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.
My conditional was “cautionary position is the correct one”. I meant, provably correct.
Leaving out the “provably” makes a big difference. If you add “provably” then I think the conditional is so unlikely that I don’t know why you’d assume it.
Well, assuming EY’s view of intelligence, the “cautionary position” is likely to be a mathematical statement. And then why not prove it? Given several decades? That’s a lot of time.
One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule “be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name” or even subtler “be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity”. The second will likely act identically to a Friendly AI.
I thought you were merely specifying that the FAI theory was proven to be Friendly. But you’re also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn’t understand that was what you were suggesting.
Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.
From reading that article I can confirm I’ve worked on programming projects that had a Validated, spec everything, someone explicitly rechecks everything, document everything, sign off on everything approach and also projects which had more of that Diet Coke and Pizza feel, and I didn’t even start working until almost 10 years after that article was written. That article allowed me to pull together some disparate points about checking risk management and expressed them clearly, so thank you for posting it.
For your question, I tried to start answering it, but I’m not sure I was able to finish in a timely manner, (The team structure was hard to lay out in text, I feel like a flow chart might have been better) so here is the layout of teams I have so far:
I think you would probably have at least some elements of a hierarchical approach, by trying to break FAI down into smaller categories and assign them each teams, in addition to having a team that checked the integration of the elements itself. And at each step, having a part of the team that both tried to make code for it, and a part of the team that tested the existing codes handling, and then iterating that step, repeatedly, depending on how large the remaining problem space was.
To attempt to give an example, if the first step is to break down FAI into “Values” “Problem Solving” and “Knowledge” (This is just a hypothetical, I’m not proposing this specific split) The Problem Solving team would realize they are going to have to further break down “Problem Solving.” Because the amount of time it would take to test all of the FAI’s problem solving abilities in all fields is far too large. You would also have to have a process to account for “We attempted to break it down into these three things, but now we realize we’re going to need this fourth thing.”
You would then need another overteam for “Well, this particular problem is intractable for now, so can we work on other elements in the mean time and put in some kind of temporary solution?” For instance, if in problem solving, “Black Swan Events” gets it’s own category, the overteam might realize that the overall category is intractable for now, (Because even if you code it to handle Black Swan Events, how could the validation team validate the Black Swans Events code?) and then you might have to say something along the lines of “Well, if we can at least get the FAI to recognize a Black Swan, we can have it call for external help if it sees one, because that is sufficiently better than the existing alternatives even though it clearly isn’t perfect.” or “Damn, there does not seem to be a way to solve this right now, and there isn’t a worthwhile ‘Good Enough’ solution, so we’ll have to put the project on hold until we figure this out.” The overteam would of course also have to have it’s own validation team, which would almost certainly be the most important part, so you would probably want it to be validated again, just to make more sure.
Also, since I just coded that, I should now try to get someone else to validate it before I assume it is correct.
Strongly seconded. While getting good people is essential (the original point about rationality standards), checks and balances are a critical element of a project like this.
The level of checks needed probably depends on the scope of the project. For the feasibility analysis, perhaps you don’t need anything more than splitting your research group into two teams, one assigned to prove, the other disprove the feasibility of a given design (possibly switching roles at some point in the process).
Why not have team members whose entire job is to prevent this sort of thing from happening? At least one team writing software for NASA has software testers who have a semi-antagonistic relationship with the software engineers.
http://www.fastcompany.com/magazine/06/writestuff.html
What would a good checks and balances style structure for an FAI team look like?
In the open, non-classified crypto world, we pick standard crypto algorithms by getting competing designs from dozens of teams, who then attack each other’s designs, with the rest of the research community joining in. This seems like a good model for FAI as well, if only the FAI-building organization had enough human and financial resources, which unfortunately probably won’t be the case.
Why do you think so? Do you expect an actual FAI-building organization to start working in the next few years? Because, assuming the cautionary position is actually the correct one, then FAI organization will surely get lots of people and resources in time?
Many people are interested in building FAI, or just AGI (which at least isn’t harder to build than specifically Friendly AGI). Assuming available funds increase slowly over time, a team trying to build a FAI with few safeguards will be able to be funded before a team that requires many safeguards, and will also work faster (fewer constraints on result), and so will likely finish first.
But if AGI is not closer than several decades ahead, then, assuming the cautionary position is the correct one, it will become wide-spread and universally accepted. Any official well-funded teams will work with many safeguards and lots of constraints. Only stupid cranks will work without these, and they’ll work without funding too.
You’re not addressing my point about a scenario where available funds increase slowly.
Concretely (with arbitrary dates): suppose that in 2050, FAI theory is fully proven. By 2070, it is universally accepted, but still no-one knows how to build an AGI, or maybe no-one has sufficient processing power.
In 2090, several governments reach the point of being able to fund a non-Friendly AGI (which is much cheaper). In 2120, they will be able to fund a fully Friendly AGI.
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even “rule the world forever and reshape it in your image” territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
Remember, we’re describing the situation where the cautionary position is provably correct. So your “greatest temptation ever” is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.
That one has a provably Friendly AI is not the same thing as that any other AI is provably going to do terrible things.
My conditional was “cautionary position is the correct one”. I meant, provably correct.
It’s like with dreams of true universal objective morality: even if in some sense there is one, some agents are just going to ignore it.
Leaving out the “provably” makes a big difference. If you add “provably” then I think the conditional is so unlikely that I don’t know why you’d assume it.
Well, assuming EY’s view of intelligence, the “cautionary position” is likely to be a mathematical statement. And then why not prove it? Given several decades? That’s a lot of time.
One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule “be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name” or even subtler “be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity”. The second will likely act identically to a Friendly AI.
I thought you were merely specifying that the FAI theory was proven to be Friendly. But you’re also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn’t understand that was what you were suggesting.
Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.
From reading that article I can confirm I’ve worked on programming projects that had a Validated, spec everything, someone explicitly rechecks everything, document everything, sign off on everything approach and also projects which had more of that Diet Coke and Pizza feel, and I didn’t even start working until almost 10 years after that article was written. That article allowed me to pull together some disparate points about checking risk management and expressed them clearly, so thank you for posting it.
For your question, I tried to start answering it, but I’m not sure I was able to finish in a timely manner, (The team structure was hard to lay out in text, I feel like a flow chart might have been better) so here is the layout of teams I have so far:
I think you would probably have at least some elements of a hierarchical approach, by trying to break FAI down into smaller categories and assign them each teams, in addition to having a team that checked the integration of the elements itself. And at each step, having a part of the team that both tried to make code for it, and a part of the team that tested the existing codes handling, and then iterating that step, repeatedly, depending on how large the remaining problem space was.
To attempt to give an example, if the first step is to break down FAI into “Values” “Problem Solving” and “Knowledge” (This is just a hypothetical, I’m not proposing this specific split) The Problem Solving team would realize they are going to have to further break down “Problem Solving.” Because the amount of time it would take to test all of the FAI’s problem solving abilities in all fields is far too large. You would also have to have a process to account for “We attempted to break it down into these three things, but now we realize we’re going to need this fourth thing.”
You would then need another overteam for “Well, this particular problem is intractable for now, so can we work on other elements in the mean time and put in some kind of temporary solution?” For instance, if in problem solving, “Black Swan Events” gets it’s own category, the overteam might realize that the overall category is intractable for now, (Because even if you code it to handle Black Swan Events, how could the validation team validate the Black Swans Events code?) and then you might have to say something along the lines of “Well, if we can at least get the FAI to recognize a Black Swan, we can have it call for external help if it sees one, because that is sufficiently better than the existing alternatives even though it clearly isn’t perfect.” or “Damn, there does not seem to be a way to solve this right now, and there isn’t a worthwhile ‘Good Enough’ solution, so we’ll have to put the project on hold until we figure this out.” The overteam would of course also have to have it’s own validation team, which would almost certainly be the most important part, so you would probably want it to be validated again, just to make more sure.
Also, since I just coded that, I should now try to get someone else to validate it before I assume it is correct.
Strongly seconded. While getting good people is essential (the original point about rationality standards), checks and balances are a critical element of a project like this.
The level of checks needed probably depends on the scope of the project. For the feasibility analysis, perhaps you don’t need anything more than splitting your research group into two teams, one assigned to prove, the other disprove the feasibility of a given design (possibly switching roles at some point in the process).