Essentially, the AI’s utility is given by a function U of a variable C. The AI’s actions are a random variable A, but we want to ‘factor out’ another random variable B.
If we have a probability distribution Q over actions, then, given background evidence E, the standard way to maximise U(C) would be to maximise:
The most obvious idea, for me, is to replace P(B=b|A=a,e) with P(B=b|e), making B artificially independent of A and giving the expression:
∑a,b,cU(c)P(C=c|B=b,A=a,e)P(B=b|e)Q(A=a|e).
If B is dependent on A - if it isn’t, then factoring it out is not interesting—then P(B=b) needs some implicit probability distribution over A (which is independent of Q). So, in essence, this approach relies on two distributions over the possible actions, one that the agent is optimising, the other than is left unoptimised. In terms of Bayes nets, this just seems to be cutting B from A.
Jessica and Chris’s approach also relies on two distributions. But, as far as I understand their approach, the two distributions are taken to be the same, and instead, it is assumed that U(C) cannot be improved by changes to the distribution of A, if one keeps the distribution of B constant. This has the feel of being a kind of differential condition—the infinitesimal impact on U(C) of changes to A but not B is non-positive.
I suspect my version might have some odd behaviour (defining the implicit distribution for A does not seem necessarily natural), but I’m not sure of the transitive properties of the differential approach.
A brief note on factoring out certain variables
Jessica Taylor and Chris Olah has a post on “Maximizing a quantity while ignoring effect through some channel”. I’ll briefly present a different way of doing this, and compare the two.
Essentially, the AI’s utility is given by a function U of a variable C. The AI’s actions are a random variable A, but we want to ‘factor out’ another random variable B.
If we have a probability distribution Q over actions, then, given background evidence E, the standard way to maximise U(C) would be to maximise:
∑a,b,cU(c)P(C=c,B=b,A=a|e)=∑a,b,cU(c)P(C=c|B=b,A=a,e)P(B=b|A=a,e)Q(A=a|e).
The most obvious idea, for me, is to replace P(B=b|A=a,e) with P(B=b|e), making B artificially independent of A and giving the expression:
∑a,b,cU(c)P(C=c|B=b,A=a,e)P(B=b|e)Q(A=a|e).
If B is dependent on A - if it isn’t, then factoring it out is not interesting—then P(B=b) needs some implicit probability distribution over A (which is independent of Q). So, in essence, this approach relies on two distributions over the possible actions, one that the agent is optimising, the other than is left unoptimised. In terms of Bayes nets, this just seems to be cutting B from A.
Jessica and Chris’s approach also relies on two distributions. But, as far as I understand their approach, the two distributions are taken to be the same, and instead, it is assumed that U(C) cannot be improved by changes to the distribution of A, if one keeps the distribution of B constant. This has the feel of being a kind of differential condition—the infinitesimal impact on U(C) of changes to A but not B is non-positive.
I suspect my version might have some odd behaviour (defining the implicit distribution for A does not seem necessarily natural), but I’m not sure of the transitive properties of the differential approach.