I agree that separately pursuing many tractable paths in parallel seems wise. We want to buy every lottery ticket that gives us some small additional chance of survival that we can afford.
However, I am pretty pessimistic about the pursuit of goalcraft yielding helpful results in the relevant timeframe of < 10 years. For two reasons.
One: figuring out a set of values we’d be ok with not just endorsing in the short term but actually locking-in irreversibly for the indefinite future seems really hard.
Two: actually convincing the people in power to accept the findings of the goalcraft researchers and put those ‘universally approvable goals’ into the AI that they control instead of their own interpretation of their own personal goals seems really hard. Similarly, I don’t see a plausible way to legislate this.
Thus, my conclusion is that this is not a particularly good research bet to make amongst the many possible options. I wouldn’t try to stop someone from pursuing it, but I wouldn’t feel hopeful that they were contributing to the likelihood of humanity surviving the next few decades.
I agree. I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean” (including checking with me when it’s unclear or you’ll make a major impact). This wraps in the huge advantage of corrigibility in the broad Christiano sense. See my recent post Corrigibility or DWIM is an attractive primary goal for AGI.
To Roko’s point: there’s an important distinction here from your scenario. Instead of expecting the whole world to coordinate on staying at AI levels, if you can get DWIM to work, you can go to sapient, self-improving ASI and keep it under human control. That’s something somebody is likely to try.
I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean”
Yeah but then what are they going to ask it to do?
I think that’s the important question. It deserves a lot more thought. I’m planning a post focusing on this.
In short, if they’re remotely decent people (positive empathy—sadism balance), I think they do net-good things, and the world gets way way better, and increasingly so over time as those individuals get wiser. With an AGI/ASI, it becomes trivially easy to help people, so very little good intention is required.
Anything. AI is a tool. Some people will rip the safety guards off theirs and ask for whatever they want. X-risk wise I don’t think this is a big contributor. The problem is AIs coordinating with each other or betraying their users. Tools can be constructed where they don’t have the means to communicate with other instances of themselves and betrayal can be made unlikely with testing and refinement.
Assymetric attacks will sometimes happen—bioterrorism being the scariest—but as long as the good users with their tools have vastly more resources, each assymetric attacks can be stopped. (At often much more cost than the attack but so far no “doomsday” attack is known. Isolation and sterile barriers and vaccines and drugs and life support can stop any known variant on a biological pathogen and should be able to prevent death from any possible protein based pathogen)
It seems unwise to risk everything on a scenario where we coordinate to not build superintelligence soon.
I agree that separately pursuing many tractable paths in parallel seems wise. We want to buy every lottery ticket that gives us some small additional chance of survival that we can afford.
However, I am pretty pessimistic about the pursuit of goalcraft yielding helpful results in the relevant timeframe of < 10 years. For two reasons.
One: figuring out a set of values we’d be ok with not just endorsing in the short term but actually locking-in irreversibly for the indefinite future seems really hard.
Two: actually convincing the people in power to accept the findings of the goalcraft researchers and put those ‘universally approvable goals’ into the AI that they control instead of their own interpretation of their own personal goals seems really hard. Similarly, I don’t see a plausible way to legislate this.
Thus, my conclusion is that this is not a particularly good research bet to make amongst the many possible options. I wouldn’t try to stop someone from pursuing it, but I wouldn’t feel hopeful that they were contributing to the likelihood of humanity surviving the next few decades.
I agree. I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean” (including checking with me when it’s unclear or you’ll make a major impact). This wraps in the huge advantage of corrigibility in the broad Christiano sense. See my recent post Corrigibility or DWIM is an attractive primary goal for AGI.
To Roko’s point: there’s an important distinction here from your scenario. Instead of expecting the whole world to coordinate on staying at AI levels, if you can get DWIM to work, you can go to sapient, self-improving ASI and keep it under human control. That’s something somebody is likely to try.
Yeah but then what are they going to ask it to do?
I think that’s the important question. It deserves a lot more thought. I’m planning a post focusing on this.
In short, if they’re remotely decent people (positive empathy—sadism balance), I think they do net-good things, and the world gets way way better, and increasingly so over time as those individuals get wiser. With an AGI/ASI, it becomes trivially easy to help people, so very little good intention is required.
Anything. AI is a tool. Some people will rip the safety guards off theirs and ask for whatever they want. X-risk wise I don’t think this is a big contributor. The problem is AIs coordinating with each other or betraying their users. Tools can be constructed where they don’t have the means to communicate with other instances of themselves and betrayal can be made unlikely with testing and refinement.
Assymetric attacks will sometimes happen—bioterrorism being the scariest—but as long as the good users with their tools have vastly more resources, each assymetric attacks can be stopped. (At often much more cost than the attack but so far no “doomsday” attack is known. Isolation and sterile barriers and vaccines and drugs and life support can stop any known variant on a biological pathogen and should be able to prevent death from any possible protein based pathogen)
Yeah but if they are just asking for it to buy them stuff (=make money) then they mostly just join the economy
I think you’re being very hasty!