Rohin Shah comments on Classification of AI alignment research: deconfusion, “good enough” non-superintelligent AI alignment, superintelligent AI alignment

Rohin Shah 15 Jul 2020 19:28 UTC
9 points
On the ease of working on superintelligent alignment now vs later… I haven’t read Rohin’s comment yet, but I assume he’ll point out that future non-superintelligent AI could help us align more powerful AI. This is a good argument. I would much rather not trust even non-superintelligent AI that much—we’d basically be rolling the dice on whether that non-superintelligent AI is aligned well enough to get the superintelligent AI perfectly aligned to humans (not just to the non-super AI) - but it’s still a good argument.
Amusingly to me, I said basically the same thing. I do think that “we’ll have a better idea of what AGI will look like” is a more important reason for optimism about future research.
Unless you mean an omnipotent superintelligence, in which case we probably don’t get much of an idea of what that looks like before it no longer matters what we do. In that case I argue that our job is not to align the omnipotent superintelligence, but to instead align the better-than-human AI whose job it is to build and align the next iteration of AI systems, and then say we’ll have a better idea of what the better-than-human AI looks like in the future.
This matters because some people argue that we can de-risk AI by using non-optimizer architectures. I don’t think that’s sufficient to avoid the need for alignment.
+1
I’ll make the opposite argument: don’t search under the streetlight. In practice, working on the right problem is usually orders of magnitude more important than the amount of progress made on the problem—i.e. 1 unit of progress in the best direction is more important than 1000 units of progress in a random useful direction.
+1, though note that you can have beneficial effects other than “solving the problem”, e.g. convincing people there is a problem, field-building (both reputation of the field, and people working in the field). It’s still quite important for these other effects to focus on the right problem (it’s not great if you build a field that then solves the wrong problem).