We are aiming to develop alignment strategies that would continue to work regardless of how far we scaled up ML or how ML models end up working internally.
Is it fair to say that you are assuming that the AI systems are in fact based on ML, and not some other kind of AI (e.g. GOFAI that actually works somehow, or something more exotic)?
I think that “TAI is based on ML” is plausible, and responsible for a significant part of the total risk posed by AI. That said, I think our work is reasonably likely to be useful even in other worlds (since the same basic difficulties seem likely to arise in different forms) and that it’s useful to think concretely about something that exists today regardless of whether ML is a central ingredient in future AI systems.
Is it fair to say that you are assuming that the AI systems are in fact based on ML, and not some other kind of AI (e.g. GOFAI that actually works somehow, or something more exotic)?
I think that “TAI is based on ML” is plausible, and responsible for a significant part of the total risk posed by AI. That said, I think our work is reasonably likely to be useful even in other worlds (since the same basic difficulties seem likely to arise in different forms) and that it’s useful to think concretely about something that exists today regardless of whether ML is a central ingredient in future AI systems.
Prosaic AI alignment is still a reasonable representation of my position.