Some problems have this special structure, but it doesn’t seem like all of them do.
For a really straightforward example, consider a random SAT instance. What kind of cheat sheet is nature going to generate, without explicitly planting a solution? (Planted solutions look different from organic solutions to random instances.)
It might be that most human abilities are just as hard to verify as to execute, or else permit some kind of cheat sheet. (Or more generally, permit a cheat sheet for the part of the problem that is easier to verify than execute.) But at face value I don’t see why this would be.
It’s not clear how can you tell that a magic box solves random SAT instances. Some SAT instances are unsolvable so that the magic box will have to occasionally output ⊥ but it’s not clear how to distinguish between instances that are genuinely unsolvable and instances that you cannot solve yourself.
I guess there might be a distributional search problem S s.t.
(i) A solution can be verified in polynomial time.
(ii) Every instance has a solution.
(iii) There is no way to efficiently generate valid input-output pairs obeying a distribution which is indistinguishable from the right distribution.
I see no obvious reason such S cannot exist but I also don’t have an example. It would be interesting to find one.
In any case, it is not clear how can the agent learn to identify models like this. You can only train a predictor on models which admit sampling. On the other hand, if there is some kind of transfer learning that allows generalizing to some class of models which don’t admit sampling then it’s plausible there is also transfer learning that allows generalizing utility inference from sampleable agents to agents in the larger class. Also, it might be possible to train utility inference on some sort of data produced by the predictor instead of on raw observations s.t. it’s possible to sample the distribution of this intermediate data corresponding to complex agents.
But it’s hard to demonstrate something concrete without having some model of how the hypothetical “superpredictor” works. As far as I know, it might be impossible to effectively use unsampleable models.
The point was that we may be able to train an agent to do what we want, even in cases where we can’t effectively build a predictor.
Re: your example. You can do amplification to get exponentially close to certainty (choose instances that are satisfiable with 2⁄3 probability, and then consider the problem “solve at least half of these 1000 instances”). If you really want every instance to have a solution, then you can probably generate the instances pseudorandomly from a small enough seed and do a union bound.
By “predictor” I don’t mean something that produces exact predictions, I mean something that produces probabilistic predictions of given quantities. Maybe we should call it “inductor” to avoid conflation with optimal predictors (even though the concepts are closely related). As I said before, I think that an agent has to have a reasonable model of humans to follow human values. Moreover an agent that doesn’t have a reasonable model of humans is probably much less dangerous since it won’t be able to manipulate humans (although I guess the risk is still non-negligible).
The question is what kind of inductors are complexity-theoretically feasible and what class of models do these inductors correspond to. Bounded Solomonoff induction using Λ works on the class of samplable models. In machine learning language, inductors using samplable models are feasible since it is possible to train the inductors by sampling random such models (i.e. by sampling the bounded Solomonoff ensemble). On the other hand it’s not clear what broader classes of models are admissible if any.
That said, it seems plausible that if it’s feasible to construct inductors for a broader class, my procedure will remain efficient.
Model 1: The “superinductor” works by finding an efficient transformation q of the input sequence x and a good sampleable model for q(x). E.g.q(x) contains the results of substituting the observed SAT-solutions into the observed SAT-instances. In this model, we can apply my procedure by running utility inference on q(x) instead of x.
Model 2: The superinductor works via an optimal predictor for SAT*. I think that it should be relatively straightforward to show that given an optimal predictor for SAT + assuming the ability to design AGIs for target utility functions relative to a SAT oracle, there is an optimal predictor for the total utility function my utility inference procedure defines (after including external agents that run over a SAT oracle). Therefore it is possible to maximize the latter.
*A (poly,log)-optimal predictor for SAT cannot exist unless all sparse problems in NP have efficient heuristic algorithms in some sense, which is unlikely. On the other hand, there is no reason that I know why a (poly,0)-optimal predictor for SAT cannot exist.
Some problems have this special structure, but it doesn’t seem like all of them do.
For a really straightforward example, consider a random SAT instance. What kind of cheat sheet is nature going to generate, without explicitly planting a solution? (Planted solutions look different from organic solutions to random instances.)
It might be that most human abilities are just as hard to verify as to execute, or else permit some kind of cheat sheet. (Or more generally, permit a cheat sheet for the part of the problem that is easier to verify than execute.) But at face value I don’t see why this would be.
It’s not clear how can you tell that a magic box solves random SAT instances. Some SAT instances are unsolvable so that the magic box will have to occasionally output ⊥ but it’s not clear how to distinguish between instances that are genuinely unsolvable and instances that you cannot solve yourself.
I guess there might be a distributional search problem S s.t.
(i) A solution can be verified in polynomial time.
(ii) Every instance has a solution.
(iii) There is no way to efficiently generate valid input-output pairs obeying a distribution which is indistinguishable from the right distribution.
I see no obvious reason such S cannot exist but I also don’t have an example. It would be interesting to find one.
In any case, it is not clear how can the agent learn to identify models like this. You can only train a predictor on models which admit sampling. On the other hand, if there is some kind of transfer learning that allows generalizing to some class of models which don’t admit sampling then it’s plausible there is also transfer learning that allows generalizing utility inference from sampleable agents to agents in the larger class. Also, it might be possible to train utility inference on some sort of data produced by the predictor instead of on raw observations s.t. it’s possible to sample the distribution of this intermediate data corresponding to complex agents.
But it’s hard to demonstrate something concrete without having some model of how the hypothetical “superpredictor” works. As far as I know, it might be impossible to effectively use unsampleable models.
The point was that we may be able to train an agent to do what we want, even in cases where we can’t effectively build a predictor.
Re: your example. You can do amplification to get exponentially close to certainty (choose instances that are satisfiable with 2⁄3 probability, and then consider the problem “solve at least half of these 1000 instances”). If you really want every instance to have a solution, then you can probably generate the instances pseudorandomly from a small enough seed and do a union bound.
By “predictor” I don’t mean something that produces exact predictions, I mean something that produces probabilistic predictions of given quantities. Maybe we should call it “inductor” to avoid conflation with optimal predictors (even though the concepts are closely related). As I said before, I think that an agent has to have a reasonable model of humans to follow human values. Moreover an agent that doesn’t have a reasonable model of humans is probably much less dangerous since it won’t be able to manipulate humans (although I guess the risk is still non-negligible).
The question is what kind of inductors are complexity-theoretically feasible and what class of models do these inductors correspond to. Bounded Solomonoff induction using Λ works on the class of samplable models. In machine learning language, inductors using samplable models are feasible since it is possible to train the inductors by sampling random such models (i.e. by sampling the bounded Solomonoff ensemble). On the other hand it’s not clear what broader classes of models are admissible if any.
That said, it seems plausible that if it’s feasible to construct inductors for a broader class, my procedure will remain efficient.
Model 1: The “superinductor” works by finding an efficient transformation q of the input sequence x and a good sampleable model for q(x). E.g.q(x) contains the results of substituting the observed SAT-solutions into the observed SAT-instances. In this model, we can apply my procedure by running utility inference on q(x) instead of x.
Model 2: The superinductor works via an optimal predictor for SAT*. I think that it should be relatively straightforward to show that given an optimal predictor for SAT + assuming the ability to design AGIs for target utility functions relative to a SAT oracle, there is an optimal predictor for the total utility function my utility inference procedure defines (after including external agents that run over a SAT oracle). Therefore it is possible to maximize the latter.
*A (poly,log)-optimal predictor for SAT cannot exist unless all sparse problems in NP have efficient heuristic algorithms in some sense, which is unlikely. On the other hand, there is no reason that I know why a (poly,0)-optimal predictor for SAT cannot exist.