I understand this in principle, but that seems to imply that for less scary AGIs, this might actually work. That unlocks a pretty massive part of the solution space (e.g. helping with alignment). Obviously we don’t know exactly how much, but that seems reasonably testable (e.g. OOD detection is also a precondition to self-driving cars so people know how to make it well-calibrated).
It’s not a “solution”, but it’s substantially harder to imagine a catastrophic failure from a large AGI project that isn’t actually bidding for superintelligence.
I understand this in principle, but that seems to imply that for less scary AGIs, this might actually work. That unlocks a pretty massive part of the solution space (e.g. helping with alignment). Obviously we don’t know exactly how much, but that seems reasonably testable (e.g. OOD detection is also a precondition to self-driving cars so people know how to make it well-calibrated).
It’s not a “solution”, but it’s substantially harder to imagine a catastrophic failure from a large AGI project that isn’t actually bidding for superintelligence.