As such, there have been many proposed solutions for over-optimizing, including early stopping and regularizing. Often protecting against over-optimization amounts to tuning a single hyperparameter. Given that this is a well-established problem, which has well-understood families of solutions, I find it hard to believe that it will be an unsurmountable obstacle for mitigating ML risk.
In the context of AGI or ASI, it does not have well-understood families of solutions. The alignment literature is pretty clear on this; we don’t have a plan that we can be confident will work. (And then separately, the plans we have tend to involve significant “safety taxes” in the form of various competitiveness penalties. So it is unfortunately uncertain whether the solutions will be implemented even if they work.)
That said, of course it’s not an unsurmountable obstacle. Even Yudkowsky agrees that it’s not an unsurmountable obstacle; he merely predicts that it will not be surmounted in time.
(All this is assuming that we are talking about the alignment problem in general rather than the more specific problem you might be referring to, which is “suppose we have an ASI that will straightforwardly carry out whatever instructions we give it, as we intended for them to be carried out. What do we tell it to do?” That problem does seem a lot easier and I’m more excited about solving that one, though it’s still nontrivial. Solutions tend to look pretty meta, and there are various ‘gotchas’ with the obvious things, that people often don’t notice.)
In the context of AGI or ASI, it does not have well-understood families of solutions. The alignment literature is pretty clear on this; we don’t have a plan that we can be confident will work. (And then separately, the plans we have tend to involve significant “safety taxes” in the form of various competitiveness penalties. So it is unfortunately uncertain whether the solutions will be implemented even if they work.)
That said, of course it’s not an unsurmountable obstacle. Even Yudkowsky agrees that it’s not an unsurmountable obstacle; he merely predicts that it will not be surmounted in time.
(All this is assuming that we are talking about the alignment problem in general rather than the more specific problem you might be referring to, which is “suppose we have an ASI that will straightforwardly carry out whatever instructions we give it, as we intended for them to be carried out. What do we tell it to do?” That problem does seem a lot easier and I’m more excited about solving that one, though it’s still nontrivial. Solutions tend to look pretty meta, and there are various ‘gotchas’ with the obvious things, that people often don’t notice.)