Roman Yampolsky has said recently (at a Foresight Salon event, the recording should be posted on YouTube soon) that it would be highly valuable if someone could prove that alignment is impossible. Given the high value for informing AI existential safety investment, I agree with Yampolsky we should have more people working on this (trying to prove theorems (or creating very rigorous arguments) as to whether alignment is possible or impossible).
If we knew with very high certainty that alignment is impossible, than that would compel us to invest more resources into 1. bans/regulation on self-improving AI and other forms of dangerous AI (to buy us time) and 2. figuring out how to survive a world where un-aligned AI is likely to be running rampant soon (for instance maybe we could buy ourselves some time by having humans try to survive in a Mars base or underground bunkers or we could try merging with the AI in hopes of preserving some of what we value that way).
Yamopolsky and collaborators have a paper on this here (disclaimer—I haven’t read it and can’t vouch for it’s value).
Also… alignment is obviously continuum and of course 100% alignment with all human values is impossible.
A different thing you could prove is whether it’s possible to guarantee human control over an AI system as it becomes more intelligent.
There’s also a concern that a slightly unaligned system may become more and more aligned as its intelligence is scaled up (either by humans re-building/trianing it with more parameters/hardware or via recursive self-improvement). It would useful if someone could prove whether that is impossible to prevent.
I need to think about this more and read Yampolsky’s paper to really understand what would be the most useful to prove is possible or impossible.
Roman Yampolsky has said recently (at a Foresight Salon event, the recording should be posted on YouTube soon) that it would be highly valuable if someone could prove that alignment is impossible. Given the high value for informing AI existential safety investment, I agree with Yampolsky we should have more people working on this (trying to prove theorems (or creating very rigorous arguments) as to whether alignment is possible or impossible).
If we knew with very high certainty that alignment is impossible, than that would compel us to invest more resources into 1. bans/regulation on self-improving AI and other forms of dangerous AI (to buy us time) and 2. figuring out how to survive a world where un-aligned AI is likely to be running rampant soon (for instance maybe we could buy ourselves some time by having humans try to survive in a Mars base or underground bunkers or we could try merging with the AI in hopes of preserving some of what we value that way).
Yamopolsky and collaborators have a paper on this here (disclaimer—I haven’t read it and can’t vouch for it’s value).
Also… alignment is obviously continuum and of course 100% alignment with all human values is impossible.
A different thing you could prove is whether it’s possible to guarantee human control over an AI system as it becomes more intelligent.
There’s also a concern that a slightly unaligned system may become more and more aligned as its intelligence is scaled up (either by humans re-building/trianing it with more parameters/hardware or via recursive self-improvement). It would useful if someone could prove whether that is impossible to prevent.
I need to think about this more and read Yampolsky’s paper to really understand what would be the most useful to prove is possible or impossible.