I have thought about this distinction, and have been choosing to focus on aimability rather than goalcraft. Why? Because I don’t think that going from pre-AGI to goal-pursuing ASI safely is a reasonable goal for the short term. I expect we will need to traverse multiple decades of powerful AIs of varying degrees of generality which are under human control first. Not because it will be impossible to create goal-pursuing ASI, but because we won’t be sure we know how to do so safely, and it would be a dangerously hard to reverse decision to create such. Thus, there will need to be strict worldwide enforcement (with the help of narrow AI systems) preventing the rise of any ASI.
So my view is that we need to focus on regulation that prevents FOOM and prevents catastrophic misuse and forces companies and governments worldwide to keep AI in check. In this view, if we safely make it through the next five years or so, when it becomes possible to create goal-pursuing ASI but not possible to do so safely, then we can use our ‘reflection time’ to work on goalcraft.
I agree that separately pursuing many tractable paths in parallel seems wise. We want to buy every lottery ticket that gives us some small additional chance of survival that we can afford.
However, I am pretty pessimistic about the pursuit of goalcraft yielding helpful results in the relevant timeframe of < 10 years. For two reasons.
One: figuring out a set of values we’d be ok with not just endorsing in the short term but actually locking-in irreversibly for the indefinite future seems really hard.
Two: actually convincing the people in power to accept the findings of the goalcraft researchers and put those ‘universally approvable goals’ into the AI that they control instead of their own interpretation of their own personal goals seems really hard. Similarly, I don’t see a plausible way to legislate this.
Thus, my conclusion is that this is not a particularly good research bet to make amongst the many possible options. I wouldn’t try to stop someone from pursuing it, but I wouldn’t feel hopeful that they were contributing to the likelihood of humanity surviving the next few decades.
I agree. I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean” (including checking with me when it’s unclear or you’ll make a major impact). This wraps in the huge advantage of corrigibility in the broad Christiano sense. See my recent post Corrigibility or DWIM is an attractive primary goal for AGI.
To Roko’s point: there’s an important distinction here from your scenario. Instead of expecting the whole world to coordinate on staying at AI levels, if you can get DWIM to work, you can go to sapient, self-improving ASI and keep it under human control. That’s something somebody is likely to try.
I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean”
Yeah but then what are they going to ask it to do?
I think that’s the important question. It deserves a lot more thought. I’m planning a post focusing on this.
In short, if they’re remotely decent people (positive empathy—sadism balance), I think they do net-good things, and the world gets way way better, and increasingly so over time as those individuals get wiser. With an AGI/ASI, it becomes trivially easy to help people, so very little good intention is required.
Anything. AI is a tool. Some people will rip the safety guards off theirs and ask for whatever they want. X-risk wise I don’t think this is a big contributor. The problem is AIs coordinating with each other or betraying their users. Tools can be constructed where they don’t have the means to communicate with other instances of themselves and betrayal can be made unlikely with testing and refinement.
Assymetric attacks will sometimes happen—bioterrorism being the scariest—but as long as the good users with their tools have vastly more resources, each assymetric attacks can be stopped. (At often much more cost than the attack but so far no “doomsday” attack is known. Isolation and sterile barriers and vaccines and drugs and life support can stop any known variant on a biological pathogen and should be able to prevent death from any possible protein based pathogen)
I have thought about this distinction, and have been choosing to focus on aimability rather than goalcraft. Why? Because I don’t think that going from pre-AGI to goal-pursuing ASI safely is a reasonable goal for the short term. I expect we will need to traverse multiple decades of powerful AIs of varying degrees of generality which are under human control first. Not because it will be impossible to create goal-pursuing ASI, but because we won’t be sure we know how to do so safely, and it would be a dangerously hard to reverse decision to create such. Thus, there will need to be strict worldwide enforcement (with the help of narrow AI systems) preventing the rise of any ASI.
So my view is that we need to focus on regulation that prevents FOOM and prevents catastrophic misuse and forces companies and governments worldwide to keep AI in check. In this view, if we safely make it through the next five years or so, when it becomes possible to create goal-pursuing ASI but not possible to do so safely, then we can use our ‘reflection time’ to work on goalcraft.
It seems unwise to risk everything on a scenario where we coordinate to not build superintelligence soon.
I agree that separately pursuing many tractable paths in parallel seems wise. We want to buy every lottery ticket that gives us some small additional chance of survival that we can afford.
However, I am pretty pessimistic about the pursuit of goalcraft yielding helpful results in the relevant timeframe of < 10 years. For two reasons.
One: figuring out a set of values we’d be ok with not just endorsing in the short term but actually locking-in irreversibly for the indefinite future seems really hard.
Two: actually convincing the people in power to accept the findings of the goalcraft researchers and put those ‘universally approvable goals’ into the AI that they control instead of their own interpretation of their own personal goals seems really hard. Similarly, I don’t see a plausible way to legislate this.
Thus, my conclusion is that this is not a particularly good research bet to make amongst the many possible options. I wouldn’t try to stop someone from pursuing it, but I wouldn’t feel hopeful that they were contributing to the likelihood of humanity surviving the next few decades.
I agree. I think there’s no way the team that achieves AGI is going to choose a goal remotely like CEV or human flourishing. They’re going to want it to “do what I mean” (including checking with me when it’s unclear or you’ll make a major impact). This wraps in the huge advantage of corrigibility in the broad Christiano sense. See my recent post Corrigibility or DWIM is an attractive primary goal for AGI.
To Roko’s point: there’s an important distinction here from your scenario. Instead of expecting the whole world to coordinate on staying at AI levels, if you can get DWIM to work, you can go to sapient, self-improving ASI and keep it under human control. That’s something somebody is likely to try.
Yeah but then what are they going to ask it to do?
I think that’s the important question. It deserves a lot more thought. I’m planning a post focusing on this.
In short, if they’re remotely decent people (positive empathy—sadism balance), I think they do net-good things, and the world gets way way better, and increasingly so over time as those individuals get wiser. With an AGI/ASI, it becomes trivially easy to help people, so very little good intention is required.
Anything. AI is a tool. Some people will rip the safety guards off theirs and ask for whatever they want. X-risk wise I don’t think this is a big contributor. The problem is AIs coordinating with each other or betraying their users. Tools can be constructed where they don’t have the means to communicate with other instances of themselves and betrayal can be made unlikely with testing and refinement.
Assymetric attacks will sometimes happen—bioterrorism being the scariest—but as long as the good users with their tools have vastly more resources, each assymetric attacks can be stopped. (At often much more cost than the attack but so far no “doomsday” attack is known. Isolation and sterile barriers and vaccines and drugs and life support can stop any known variant on a biological pathogen and should be able to prevent death from any possible protein based pathogen)
Yeah but if they are just asking for it to buy them stuff (=make money) then they mostly just join the economy
I think you’re being very hasty!