That makes sense. My higher level concern with gradient routing (to some extent true for any other safety method) being used throughout training rather than after training is alignment tax, where it might lead to significantly lower performance and not get adopted in frontier models.
Evidence of this for gradient routing: people have tried various forms of modular training before [1], [2] and they never really caught on because its always better to train a combined model which allows optimal sharing of parameters.
Its still a cool idea though, and I would be happy to see it work out :)
[1] Andreas, Jacob et al., “Neural Module Networks.”, CVPR 2016
[2] Ebrahimi, Sayna, et al. “Adversarial continual learning.” ECCV 2020
An ASI democratically aligned with preferences of all humans would probably try to stop and decimate AI development. Only a small portion of the world wants to develop AGI/ASI for economic gain or intellectual curiosity. Most people do not want their jobs threatened, and at most want home robots or other machines to cut down menial chores. If you disagree with this you’re living in a tech bubble.