Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
How exactly does an Oracle AI predict its own output, before that output is completed?
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.