From a policy perspective, my current hypothesis is that we would have to use the approach of solving the problems in approximately the order they are likely to come up, modified by the likelihood of them happening and the severity they would have if they occured.
As an example: It doesn’t help to write policy for how to keep a super intelligent AI from using a hacked communication channel that is likely to take over everything that is connected to the internet in 2053 and then turn earth into unfriendly computronium in 2054, if a country hands over nuclear launch capabilities to an dumb AI in 2023 and because the dumb AI doesn’t have a Petrov module, there is a nuclear war in 2024 that kills billions. We would probably want to write the nuclear policy first, to give ourselves time to write the communication channel policy, unless the nuclear war risk was lower then the communication channel risk by a big margin.
Also, I suppose there are two different types of AI risk.
Type A: The AI itself, through either accident, malice, poor design, causes the existential risk.
Type B: Humans institute an AI as part of a countermeasure to help avoid an existential risk (Imagine automated security at a Nuclear Weapons facility). Other Humans find a flaw in the AI’s abilities and cause the existential risk themselves. (Bypassing the automated security and stealing a nuclear weapon)
My understanding is that most people are generally discussing type A, with a nod to type B, since they are related. As an example, it is possible to design an AI, attempt to avoid Type B by giving it sufficient capabilities, and then accidentally cause a Type A problem because you forgot to convey what you considered to be common sense. (An AI stops a nuclear weapon from being stolen from the facility by detonating the nuclear weapon: It doesn’t exist, and so can’t be stolen from the facility!)
But in terms of specific policies, I am not sure what the criteria above would find as the most important policy to advance first so I am not sure what to suggest.
From a policy perspective, my current hypothesis is that we would have to use the approach of solving the problems in approximately the order they are likely to come up, modified by the likelihood of them happening and the severity they would have if they occured.
As an example: It doesn’t help to write policy for how to keep a super intelligent AI from using a hacked communication channel that is likely to take over everything that is connected to the internet in 2053 and then turn earth into unfriendly computronium in 2054, if a country hands over nuclear launch capabilities to an dumb AI in 2023 and because the dumb AI doesn’t have a Petrov module, there is a nuclear war in 2024 that kills billions. We would probably want to write the nuclear policy first, to give ourselves time to write the communication channel policy, unless the nuclear war risk was lower then the communication channel risk by a big margin.
Also, I suppose there are two different types of AI risk.
Type A: The AI itself, through either accident, malice, poor design, causes the existential risk. Type B: Humans institute an AI as part of a countermeasure to help avoid an existential risk (Imagine automated security at a Nuclear Weapons facility). Other Humans find a flaw in the AI’s abilities and cause the existential risk themselves. (Bypassing the automated security and stealing a nuclear weapon)
My understanding is that most people are generally discussing type A, with a nod to type B, since they are related. As an example, it is possible to design an AI, attempt to avoid Type B by giving it sufficient capabilities, and then accidentally cause a Type A problem because you forgot to convey what you considered to be common sense. (An AI stops a nuclear weapon from being stolen from the facility by detonating the nuclear weapon: It doesn’t exist, and so can’t be stolen from the facility!)
But in terms of specific policies, I am not sure what the criteria above would find as the most important policy to advance first so I am not sure what to suggest.