I agree that you almost certainly can’t get an optimal predictor. For similar reasons, you can’t train a supervised learner using any obvious approach. This is the reason that I am pessimistic about this kind of “abstract” goal.
That said, I’m not as pessimistic as you are.
Suppose that I define a very elaborate reflective process, which would be prohibitively complex to simulate and whose behavior is probably not constrained in any meaningful way by any short proofs.
I think that a human can in fact try to maximize the output of such a reflective process, “to the best of their abilities.” And this seems good enough for value alignment.
It’s not important that we actually achieve optimality except on shorter-term instrumentally important problems such as gathering resources (for which we can in fact expect the abstractly motivated algorithm to converge to optimality).
I agree that you almost certainly can’t get an optimal predictor. For similar reasons, you can’t train a supervised learner using any obvious approach. This is the reason that I am pessimistic about this kind of “abstract” goal.
That said, I’m not as pessimistic as you are.
Suppose that I define a very elaborate reflective process, which would be prohibitively complex to simulate and whose behavior is probably not constrained in any meaningful way by any short proofs.
I think that a human can in fact try to maximize the output of such a reflective process, “to the best of their abilities.” And this seems good enough for value alignment.
It’s not important that we actually achieve optimality except on shorter-term instrumentally important problems such as gathering resources (for which we can in fact expect the abstractly motivated algorithm to converge to optimality).