It seems obvious to me that our current understanding of DL puts us in a better position to do alignment research than we were before the DL revolution.
Not at all obvious. I think we barely get insight, at least so far, from DL.
More broadly, capabilities research can be strategically-relevantly different from other capabilities research.
E.g., capability research that is published or likely will be published, adds to the pile of stuff that arbitrary people can use to make AGI. Capability research that will be kept private has much less of this problem.
Capability research can be more or less “about” understanding AGI in a way that leads to understanding how to align it, vs understanding AGI in a way that leads to be able to make it (whether FAI or UFAI). For example, one could pour a bunch of research into building a giant evolution simulator with rich environment and heuristics for skipping ahead, etc. This is capabilities research that seems to me not super likely to go anywhere, but if it does go anywhere, it seems more likely to lead to AGI that’s opaque and unalignable by strong default, and even if transparency-type stuff can be bolted on, the evolution-engineering itself doesn’t help very much with doing that.
Not at all obvious. I think we barely get insight, at least so far, from DL.
More broadly, capabilities research can be strategically-relevantly different from other capabilities research.
E.g., capability research that is published or likely will be published, adds to the pile of stuff that arbitrary people can use to make AGI. Capability research that will be kept private has much less of this problem.
Capability research can be more or less “about” understanding AGI in a way that leads to understanding how to align it, vs understanding AGI in a way that leads to be able to make it (whether FAI or UFAI). For example, one could pour a bunch of research into building a giant evolution simulator with rich environment and heuristics for skipping ahead, etc. This is capabilities research that seems to me not super likely to go anywhere, but if it does go anywhere, it seems more likely to lead to AGI that’s opaque and unalignable by strong default, and even if transparency-type stuff can be bolted on, the evolution-engineering itself doesn’t help very much with doing that.