I’m thinking that type of architecture is less probable, because it would end up being more complicated than alternatives: it would have a powerful predictor as a sub-component of the utility-maximizing system, so an engineer could have just used the predictor in the first place.
But that’s a speculative argument, and I shouldn’t push it too far.
It seems like powerful AI prediction technology, if successful, would gain an important place in society. A prediction machine whose predictions were consumed by a large portion of society would certainly run into situations in which its predictions effect the future it’s trying to predict; there is little doubt about that in my mind. So, the question is what its behavior would be in these cases.
One type of solution would do as you say, maximizing a utility over the predictions. The utility could be “correctness of this prediction”, but that would be worse for humanity than a Friendly goal.
Another type of solution would instead report such predictive instability as accurately as possible. This doesn’t really dodge the issue; by doing this, the system is choosing a particular output, which may not lead to the best future. However, that’s markedly less concerning (it seems).
I’m thinking that type of architecture is less probable, because it would end up being more complicated than alternatives: it would have a powerful predictor as a sub-component of the utility-maximizing system, so an engineer could have just used the predictor in the first place.
But that’s a speculative argument, and I shouldn’t push it too far.
It seems like powerful AI prediction technology, if successful, would gain an important place in society. A prediction machine whose predictions were consumed by a large portion of society would certainly run into situations in which its predictions effect the future it’s trying to predict; there is little doubt about that in my mind. So, the question is what its behavior would be in these cases.
One type of solution would do as you say, maximizing a utility over the predictions. The utility could be “correctness of this prediction”, but that would be worse for humanity than a Friendly goal.
Another type of solution would instead report such predictive instability as accurately as possible. This doesn’t really dodge the issue; by doing this, the system is choosing a particular output, which may not lead to the best future. However, that’s markedly less concerning (it seems).
It would pass the Turing test—e.g. see here.