Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable “prediction”.
Allow me to explicate what XiXiDu so humourously implicates: in the world of AI architectures, there is a division between systems that just peform predictive inference on their knowledge base (prediction-only, ie oracle), and systems which also consider free variables subject to some optimization criteria (planning agents).
The planning module is not something just arises magically in an AI that doesn’t have one. An AI without such a planning module simply computes predictions, it doesn’t also optimize over the set of predictions.
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
After five positive answers, it seems obvious to me that AI will manipulate humans, if such manipulation provides better expected results. So I guess some of those answers would be negative; which one?
See, the efficient ‘cross domain optimization’ in science fictional setting would make the AI able to optimize real world quantities. In real world, it’d be good enough (and a lot easier) if it can only find maximums of any mathematical functions.
Is it able to make a model of the world?
It is able to make a very approximate and bounded mathematical model of the world, optimized for finding maximums of a mathematical function of. Because it is inside the world and only has a tiny fraction of computational power of the world.
Are human reactions also part of this model?
This will make software perform at grossly sub-par level when it comes to making technical solutions to well defined technical problems, compared to other software on same hardware.
Are AI’s possible outputs also part of this model?
Another waste of computational power.
Are human reactions to AI’s outputs also part of this model?
Enormous waste of computational power.
I see no reason to expect your “general intelligence with Machiavellian tendencies” to be even remotely close in technical capability to some “general intelligence which will show you it’s simulator as is, rather than reverse your thought processes to figure out what simulator is best to show”. Hell, we do same with people, we design the communication methods like blueprints (or mathematical formulas or other things that are not in natural language) that decrease the ‘predict other people’s reactions to it’ overhead.
While in the fictional setting you can talk of a grossly inefficient solution that would beat everyone else to a pulp, in practice the massively handicapped designs are not worth worrying about.
‘General intelligence’ sounds good, beware of halo effect. The science fiction tends to accept no substitutes for the anthropomorphic ideals, but the real progress follows dramatically different path.
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
How exactly does an Oracle AI predict its own output, before that output is completed?
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.
Allow me to explicate what XiXiDu so humourously implicates: in the world of AI architectures, there is a division between systems that just peform predictive inference on their knowledge base (prediction-only, ie oracle), and systems which also consider free variables subject to some optimization criteria (planning agents).
The planning module is not something just arises magically in an AI that doesn’t have one. An AI without such a planning module simply computes predictions, it doesn’t also optimize over the set of predictions.
Does the AI have general intelligence?
Is it able to make a model of the world?
Are human reactions also part of this model?
Are AI’s possible outputs also part of this model?
Are human reactions to AI’s outputs also part of this model?
After five positive answers, it seems obvious to me that AI will manipulate humans, if such manipulation provides better expected results. So I guess some of those answers would be negative; which one?
See, the efficient ‘cross domain optimization’ in science fictional setting would make the AI able to optimize real world quantities. In real world, it’d be good enough (and a lot easier) if it can only find maximums of any mathematical functions.
It is able to make a very approximate and bounded mathematical model of the world, optimized for finding maximums of a mathematical function of. Because it is inside the world and only has a tiny fraction of computational power of the world.
This will make software perform at grossly sub-par level when it comes to making technical solutions to well defined technical problems, compared to other software on same hardware.
Another waste of computational power.
Enormous waste of computational power.
I see no reason to expect your “general intelligence with Machiavellian tendencies” to be even remotely close in technical capability to some “general intelligence which will show you it’s simulator as is, rather than reverse your thought processes to figure out what simulator is best to show”. Hell, we do same with people, we design the communication methods like blueprints (or mathematical formulas or other things that are not in natural language) that decrease the ‘predict other people’s reactions to it’ overhead.
While in the fictional setting you can talk of a grossly inefficient solution that would beat everyone else to a pulp, in practice the massively handicapped designs are not worth worrying about.
‘General intelligence’ sounds good, beware of halo effect. The science fiction tends to accept no substitutes for the anthropomorphic ideals, but the real progress follows dramatically different path.
A non-planning oracle AI would predict all the possible futures, including the effects of it’s prediction outputs, human reactions, and so on. However it has no utility function which says some of those futures are better than others. It simply outputs a most likely candidate, or a median of likely futures, or perhaps some summary of the entire set of future paths.
If you add a utility function that sorts over the futures, then it becomes a planning agent. Again, that is something you need to specifically add.
How exactly does an Oracle AI predict its own output, before that output is completed?
One quick hack to avoid infinite loops could be for an AI to assume that it will write some default message (an empty paper, “I don’t know”, an error message, “yes” or “no” with probabilities 50%), then model what would happen next, and finally report the results. The results would not refer to the actual future, but to a future in a hypothetical universe where AI reported the standard message.
Is the difference significant? For insignificant questions, it’s not. But if we later use the Oracle AI to answer questions important for humankind, and the shape of world will change depending on the answer, then the report based on the “null-answer future” may be irrelevant for the real world.
This could be improved by making a few iterations. First, Oracle AI would model itself reporting a default message, let’s call this report R0, and then model the futures after having reported R0. These futures would make a report R1, but instead of writing it, Oracle AI would again model the futures after having reported R1. … With some luck, R42 will be equivalent to R43, so at this moment the Oracle AI can stop iterating and report this fixed point.
Maybe the reports will oscillate forever. For example imagine that you ask Oracle AI whether humankind in any form will survive the year 2100. If Oracle AI says “yes”, people will abandon all x-risk projects, and later they will be killed by some disaster. If Oracle AI says “no”, people will put a lot of energy into x-risk projects, and prevent the disaster. In this case, “no” = R0 = R2 = R4 =..., and “yes” = R1 = R3 = R5...
To avoid being stuck in such loops, we could make the Oracle AI examine all its possible outputs, until it finds one where the future after having reported R really becomes R (or until humans hit the “Cancel” button on this task).
Please note that what I wrote is just a mathematical description of algorithm predicting one’s own output’s influence on the future. Yet the last option, if implemented, is already a kind of judgement about possible futures. Consistent future reports are preferred to inconsistent future reports, therefore the futures allowing consistent reports are preferred to futures not allowing such reports.
At this point I am out of credible ideas how this could be abused, but at least I have shown that an algorithm designed only to predict the future perfectly could—as a side effect of self-modelling—start having kind of preferences over possible futures.
Iterative search, which you more or less have worked out in your post. Take a chess algorithm for example. The future of the board depends on the algorithm’s outputs. In this case the Oracle AI doesn’t rank the future states, it is just concerned with predictive accuracy. It may revise it’s prediction output after considering that the future impact of that output would falsify the original prediction.
This is still not a utility function, because utility implies a ranking over futures above and beyond liklihood.
Or in this example, the AI could output some summary of the iteration history it is able to compute in the time allowed.
Here it is. The process of revision may itself prefer some outputs/futures over other outputs/futures. Inconsistent ones will be iterated away, and the more consistent ones will replace them.
A possible future “X happens” will be removed from the report if the Oracle AI realizes that printing a report “X happens” would prevent X from happening (although X might happen in an alternative future where Oracle AI does not report anything). A possible future “Y happens” will not be removed from the report if the Oracle AI realizes that printing a report “Y happens” really leads to Y happening. Here is a utility function born: it prefers Y to X.
We can dance around the words “utility” and “prefer”, or we can ground them down to math/algorithms.
Take the AIXI formalism for example. “Utility function” has a specific meaning as a term in the optimization process. You can remove the utility term so the algorithm ‘prefers’ only (probable) futures, instead of ‘prefering’ (useful*probable) futures. This is what we mean by “Oracle AI”.