Right, I think there are variants of it that might work out, but there’s also the aspect where some people argue that AGI will turn out to essentially be a bag-of-heuristics or similar, where inner alignment becomes less necessary because the heuristics achieve the outer goal even if they don’t do it as flexibly as they could.
Richard Kennaway asked why I would think in those lines but the point of the OP isn’t to make an argument about AI alignment, it’s merely to think in those lines. Conclusions can come later once I’m finished exploring it.
Right, I think there are variants of it that might work out, but there’s also the aspect where some people argue that AGI will turn out to essentially be a bag-of-heuristics or similar, where inner alignment becomes less necessary because the heuristics achieve the outer goal even if they don’t do it as flexibly as they could.
Richard Kennaway asked why I would think in those lines but the point of the OP isn’t to make an argument about AI alignment, it’s merely to think in those lines. Conclusions can come later once I’m finished exploring it.