I think Eliezer is probably wrong about how useful AI systems will become, including for tasks like AI alignment, before it is catastrophically dangerous. I believe we are relatively quickly approaching AI systems that can meaningfully accelerate progress by generating ideas, recognizing problems for those ideas and, proposing modifications to proposals, __etc.__ and that all of those things will become possible in a small way well before AI systems that can double the pace of AI research
This seems like a crux for the Paul-Eliezer disagreement which can explain many of the other disagreements (it’s certainly my crux). In particular, conditional on taking Eliezer’s side on this point, a number of Eliezer’s other points all seem much more plausible e.g. nanotech, advanced deception/treacherous turns, and pessimism regarding the pace of alignment research.
There’s been a lot of debate on this point, and some of it was distilled by Rohin. Seems to me that the most productive way to move forward on this disagreement would be to distill the rest of the relevant MIRI conversations, and solicit arguments on the relevant cruxes.
How useful AI-systems can be at this sort of thing after becoming catastrophically dangerous is also worth discussing more than is done at present. At least I think so. Between Eliezer and me I think maybe that’s the biggest crux (my intuitions about FOOM are Eliezer-like I think, although AFAIK I’m more unsure/agnostic regarding that than he is).
Obviously a more favorable situation if AGI-system is aligned before it could destroy the world. But even if we think we succeeded with alignment prior to superintelligence (and possible FOOM), we should look for ways it can help with alignment afterwards, so as to provide additional security/alignment-assurance.
As Paul points out, verification will often be a lot easier than generation, and I think techniques that leverage this (also with superintelligent systems that may not be aligned) is underdiscussed. And how easy/hard if would be for an AGI-system to trick us (into thinking it’s being helpful when it really wasn’t) would depend a lot on how we went about things.
Various potential ways of getting help for alignment while keeping “channels of causality” quite limited and verifying the work/output of the AI-system in powerful ways.
This seems like a crux for the Paul-Eliezer disagreement which can explain many of the other disagreements (it’s certainly my crux). In particular, conditional on taking Eliezer’s side on this point, a number of Eliezer’s other points all seem much more plausible e.g. nanotech, advanced deception/treacherous turns, and pessimism regarding the pace of alignment research.
There’s been a lot of debate on this point, and some of it was distilled by Rohin. Seems to me that the most productive way to move forward on this disagreement would be to distill the rest of the relevant MIRI conversations, and solicit arguments on the relevant cruxes.
How useful AI-systems can be at this sort of thing after becoming catastrophically dangerous is also worth discussing more than is done at present. At least I think so. Between Eliezer and me I think maybe that’s the biggest crux (my intuitions about FOOM are Eliezer-like I think, although AFAIK I’m more unsure/agnostic regarding that than he is).
Obviously a more favorable situation if AGI-system is aligned before it could destroy the world. But even if we think we succeeded with alignment prior to superintelligence (and possible FOOM), we should look for ways it can help with alignment afterwards, so as to provide additional security/alignment-assurance.
As Paul points out, verification will often be a lot easier than generation, and I think techniques that leverage this (also with superintelligent systems that may not be aligned) is underdiscussed. And how easy/hard if would be for an AGI-system to trick us (into thinking it’s being helpful when it really wasn’t) would depend a lot on how we went about things.
Various potential ways of getting help for alignment while keeping “channels of causality” quite limited and verifying the work/output of the AI-system in powerful ways.
I’ve started on a series about this: https://www.lesswrong.com/posts/ZmZBataeY58anJRBb/getting-from-unaligned-to-aligned-agi-assisted-alignment