It just cares about correctly reporting the plans that give the highest values for P.
This is what I meant by “not running a consequentialist algorithm”: what matters here is the way in which P depends on a plan.
If P is saying something about how human operators would respond to observing the plan, it introduces a consequentialist aspect into AI’s optimization criteria: it starts to matter what are the consequences of producing a plan, its value depends on the effect produced by choosing it. On the other hand, if P doesn’t say things like that, it might be the case that the value of a plan is not being evaluated consequentialistically, but that might make it more difficult to specify what constitutes a good plan, since plan’s (expected) consequences give a natural (basis for a) metric of its quality.
Hm. This is an intriguing point. I thought by “maximize the actual outcome according to its own criteria of optimality” you meant U, which is my understanding of what an Oracle would do, but instead you meant it would produce plans so as to maximize P, rather than producing plans that would maximize P if implemented, is that about right?
I guess you’d have to produce some list of plans such that each would produce high value for P if selected (which includes an expectation that they would be successfully implemented if selected), given that they appear on the list and all the other plans do as well… you wouldn’t necessarily have to worry about other influences the plan list might have, would you?
Perhaps if we had a more concrete example:
Suppose we ask the AI to advise us on building a sturdy bridge over some river (valuing both sturdiness and bridgeness, probably other things like speed of building, etc.). Stuart_Armstrong’s version would select a list of plans such that given that the operators will view that list, if they select one of the plans, then the AI predicts that they will successfully build a sturdy bridge (or that a sturdy bridge will otherwise come into being). I admit I find the subject a little confusing, but does that sound about right?
This is what I meant by “not running a consequentialist algorithm”: what matters here is the way in which P depends on a plan.
If P is saying something about how human operators would respond to observing the plan, it introduces a consequentialist aspect into AI’s optimization criteria: it starts to matter what are the consequences of producing a plan, its value depends on the effect produced by choosing it. On the other hand, if P doesn’t say things like that, it might be the case that the value of a plan is not being evaluated consequentialistically, but that might make it more difficult to specify what constitutes a good plan, since plan’s (expected) consequences give a natural (basis for a) metric of its quality.
Hm. This is an intriguing point. I thought by “maximize the actual outcome according to its own criteria of optimality” you meant U, which is my understanding of what an Oracle would do, but instead you meant it would produce plans so as to maximize P, rather than producing plans that would maximize P if implemented, is that about right?
I guess you’d have to produce some list of plans such that each would produce high value for P if selected (which includes an expectation that they would be successfully implemented if selected), given that they appear on the list and all the other plans do as well… you wouldn’t necessarily have to worry about other influences the plan list might have, would you?
Perhaps if we had a more concrete example:
Suppose we ask the AI to advise us on building a sturdy bridge over some river (valuing both sturdiness and bridgeness, probably other things like speed of building, etc.). Stuart_Armstrong’s version would select a list of plans such that given that the operators will view that list, if they select one of the plans, then the AI predicts that they will successfully build a sturdy bridge (or that a sturdy bridge will otherwise come into being). I admit I find the subject a little confusing, but does that sound about right?