You are right! It is insufficient for U to be generatable. What we need is the existence of an efficient randomized algorithm G s.t. when executed G produces a sample of x and then waits for receiving input y, after receiving which it gives an estimate of U(x,y). More formally G produces a pair (x,p) where p is a program and U(x,y)=E[ev(G()2;y)∣G()1=x]. Such a G can be used directly to train a maximizer.
Luckily my approach is within this class (at least superficially; but this lends hope for success after a deeper analysis). When you sample a utility function, a world and an agent and “render” a sequence of observations, you should be able to evaluate the output of the algorithm under training according to the selected utility function.
Actually, I think that if we consider only deterministic maximization policies then an optimal predictor for U wrt a bounded-Somonoff-type measure is sufficient to get an optimal maximization policy. In this case we can do maximization using Levin’s universal search L. A significantly better maximization policy M cannot exist since it would allow us to improve our estimates of U(x,M(x)) and/or U(x,L(x)).
Of course under standard assumptions about derandomization the best deterministic policy is about as good as the best random policy (on average), and in particular requiring U is generatable wrt a suitable measure is sufficient. Random only gives significant advantage in adversarial setups: your toy model of informed oversight is essentially an adversarial setup between A and B.
Also, obviously Levin search is grossly inefficient in practice (like the optimal predictor Λ which is basically a variant of Levin search) but this model suggests that applying a more practical learning algorithm would give satisfactory results.
Also, I think it’s possible to use probabilistic policies which produce computable distributions (since for such a policy outputs with low and high probabilities are always distinguishable).
Yes, this is what you need. I agree that your setup nearly has this property, in that you get out a complete description of the utility function which can be used to evaluate an action.
But my main concern is that even with a complete description of the utility function in hand, you still can’t efficiently evaluate it on possible actions, at least not without some further assumptions about the utility function (which I don’t think hold).
That is, I am concerned that even once you know the utility function, it will have the same character as the U in my toy model (because of logical uncertainty).
As a complexity-theoretically clean example, suppose that I am really unhappy if an output has a steganographically encoded message. Some policies will predictably produce steganographically encoded messages and therefore should receive low utility, but I can’t actually distinguish these messages from random messages.
This example is a bit unsatisfying, because steganographically encoding a message is harder than generating a random message. The novelty example in my post is more satisfying in that sense, but it introduces a much smaller complexity gap. It would be interesting to find a plausible example that has the best of both worlds. If there is no example, it would be very interesting to understand why there isn’t.
As I said, I expect it’s not meaningful to consider agents with arbitrary utility functions. As a working hypothesis, we can assume the utility function should be “sequentially generatable” i.e. have the property I previously described. The nice thing is that if we sample sequentially generatable utility functions then the total utility function (which includes utility inference in its definition) is also sequentially generatable.
To see this, consider a simple setup in which our agent A is exposed to observations o of “teacher” agent B after which A receives input x and produces output y. The total utility function U has the following “sequential generator” GU. GU samples a sequentially generatable utility function V with sequential generator GV. It then constructs an agent B with utility function V and runs the agent to generate o. It uses GV to generate x. After receiving y, it uses GV to compute an estimate of V(x,y) which thus serves as an estimate of U(o,x,y).
It is possible there is a larger complexity class of admissible utility functions (although it’s possible that there isn’t) but in that case I still hope the utility inference procedure or some variant thereof still preserves this class.
I still agree it would be interesting to construct a complexity-theoretic parallel of your toy model for informed oversight. The following is my attempt to do it (which I’m not sure really works).
Consider a directed graph G where each k-valent vertex is labeled by a tensor of rank k with rational coefficients (you can also use other coefficients e.g. a finite field) where each covariant index of the tensor corresponds to an incoming edge and each contravariant index to an outgoing edge. It is thus possible to contract the tensors according to the graph structure obtaining a number v(G)∈Q.
There is a natural concept of isomorphism for graphs as above s.t.G≃H implies v(G)≃v(H). That is, we consider regular graph isomorphisms where relabeling the edges transforms the tensors in the obvious way (it is also possible to consider a more general type of isomorphism where we are also allowed to do a “base change” at each edge, but it is not compatible with the following).
Given such a graph G, it is possible to apply a one-way function to each coefficient of each tensor (otherwise preserving all structure), getting a new graph G′. Obviously G≃H implies G′≃H′.
Suppose the input is a G as above and the output is another such graph H. U(G,H)=1 when v(G′)=v(H′) but not G≃H and U(G,H)=0 otherwise.
Clearly given G it’s easy to construct a random isomorphic H. However it might be hard to check for this kind of isomorphism (although regular graph isomorphism is claimed to be solvable in quasipolynomial time and might easily be solvable in polynomial time to the best of our knowledge). It is also likely to be hard to construct a non-isomorphic H with v(G′)=v(H′).
You are right! It is insufficient for U to be generatable. What we need is the existence of an efficient randomized algorithm G s.t. when executed G produces a sample of x and then waits for receiving input y, after receiving which it gives an estimate of U(x,y). More formally G produces a pair (x,p) where p is a program and U(x,y)=E[ev(G()2;y)∣G()1=x]. Such a G can be used directly to train a maximizer.
Luckily my approach is within this class (at least superficially; but this lends hope for success after a deeper analysis). When you sample a utility function, a world and an agent and “render” a sequence of observations, you should be able to evaluate the output of the algorithm under training according to the selected utility function.
Actually, I think that if we consider only deterministic maximization policies then an optimal predictor for U wrt a bounded-Somonoff-type measure is sufficient to get an optimal maximization policy. In this case we can do maximization using Levin’s universal search L. A significantly better maximization policy M cannot exist since it would allow us to improve our estimates of U(x,M(x)) and/or U(x,L(x)).
Of course under standard assumptions about derandomization the best deterministic policy is about as good as the best random policy (on average), and in particular requiring U is generatable wrt a suitable measure is sufficient. Random only gives significant advantage in adversarial setups: your toy model of informed oversight is essentially an adversarial setup between A and B.
Also, obviously Levin search is grossly inefficient in practice (like the optimal predictor Λ which is basically a variant of Levin search) but this model suggests that applying a more practical learning algorithm would give satisfactory results.
Also, I think it’s possible to use probabilistic policies which produce computable distributions (since for such a policy outputs with low and high probabilities are always distinguishable).
Yes, this is what you need. I agree that your setup nearly has this property, in that you get out a complete description of the utility function which can be used to evaluate an action.
But my main concern is that even with a complete description of the utility function in hand, you still can’t efficiently evaluate it on possible actions, at least not without some further assumptions about the utility function (which I don’t think hold).
That is, I am concerned that even once you know the utility function, it will have the same character as the U in my toy model (because of logical uncertainty).
As a complexity-theoretically clean example, suppose that I am really unhappy if an output has a steganographically encoded message. Some policies will predictably produce steganographically encoded messages and therefore should receive low utility, but I can’t actually distinguish these messages from random messages.
This example is a bit unsatisfying, because steganographically encoding a message is harder than generating a random message. The novelty example in my post is more satisfying in that sense, but it introduces a much smaller complexity gap. It would be interesting to find a plausible example that has the best of both worlds. If there is no example, it would be very interesting to understand why there isn’t.
As I said, I expect it’s not meaningful to consider agents with arbitrary utility functions. As a working hypothesis, we can assume the utility function should be “sequentially generatable” i.e. have the property I previously described. The nice thing is that if we sample sequentially generatable utility functions then the total utility function (which includes utility inference in its definition) is also sequentially generatable.
To see this, consider a simple setup in which our agent A is exposed to observations o of “teacher” agent B after which A receives input x and produces output y. The total utility function U has the following “sequential generator” GU. GU samples a sequentially generatable utility function V with sequential generator GV. It then constructs an agent B with utility function V and runs the agent to generate o. It uses GV to generate x. After receiving y, it uses GV to compute an estimate of V(x,y) which thus serves as an estimate of U(o,x,y).
It is possible there is a larger complexity class of admissible utility functions (although it’s possible that there isn’t) but in that case I still hope the utility inference procedure or some variant thereof still preserves this class.
I still agree it would be interesting to construct a complexity-theoretic parallel of your toy model for informed oversight. The following is my attempt to do it (which I’m not sure really works).
Consider a directed graph G where each k-valent vertex is labeled by a tensor of rank k with rational coefficients (you can also use other coefficients e.g. a finite field) where each covariant index of the tensor corresponds to an incoming edge and each contravariant index to an outgoing edge. It is thus possible to contract the tensors according to the graph structure obtaining a number v(G)∈Q.
There is a natural concept of isomorphism for graphs as above s.t.G≃H implies v(G)≃v(H). That is, we consider regular graph isomorphisms where relabeling the edges transforms the tensors in the obvious way (it is also possible to consider a more general type of isomorphism where we are also allowed to do a “base change” at each edge, but it is not compatible with the following).
Given such a graph G, it is possible to apply a one-way function to each coefficient of each tensor (otherwise preserving all structure), getting a new graph G′. Obviously G≃H implies G′≃H′.
Suppose the input is a G as above and the output is another such graph H. U(G,H)=1 when v(G′)=v(H′) but not G≃H and U(G,H)=0 otherwise.
Clearly given G it’s easy to construct a random isomorphic H. However it might be hard to check for this kind of isomorphism (although regular graph isomorphism is claimed to be solvable in quasipolynomial time and might easily be solvable in polynomial time to the best of our knowledge). It is also likely to be hard to construct a non-isomorphic H with v(G′)=v(H′).