Could you expand on what the “upper bound” of utility is for a maximizer, and why it’s easy to approach? Perhaps a concrete (but simple) example would help. Say “Clippy” wants to maximize paperclips and minimize waste heat. “HotClippy” is the counterfactual agent that maximizes heat while thinking paperclips are fine if they’re nearly free. What is the maximal value for paperclips?
It seems like the submission is always going to be S(infinity*u + 0v) for this constraint. Any other v will be rejected by the counterfactual or contradict the base agent’s preferences. Any smaller/finite u is a lost opportunity.
Perhaps a concrete (but simple) example would help.
Clippy has utility that awards 1 if Clippy produces one or more paperclips (and 0 otherwise). Clippy can easily produce ten paperclips.
Basically what I’m trying to do is make the AI “do the easy evident thing” rather than “optimise the whole universe just to be absolutely sure they achieved their goal”.
What I’m not following is how you take an optimizer and convince it that the best route is to use a satisficer subagent. Clippy (the maximizer, the agent you’re trying to limit) gets utility from infinite paperclips. It’s ClippyJr (the satisficer) which can be limited to 1. But why would maximizer-clippy prefer to propose that, as opposed to proposing ClippyJrPlus, who is a satisficer, but has a goal of 10^30 paperclips)?
Please include all three agents in an example: M(u-v), S(finite-u), M(εu+v).
Here, I start with a bounded and easy to reach u (that’s a first step in the process), so “u = finite-u”. This is still not safe for a maximiser (usual argument about “being sure” and squeezing ever more tiny amounts of expected utility from optimising the universe). Then the whole system is supposed to produce S(u) rather than M(u). This is achieved by having M(εu+v) allow it, when M(εu+v) expects (counterfactually) to optimise the universe, and would see any optimisation by S(u) as getting in the way (or, if it could co-opt these otimisations, then this is something that M(u-v) would not want it to do).
Technically, you might not need to bound u so sharply—it’s possible that the antagonistic setup will produce a S(u) that is equivalent to S(finite-u) even it u is unbounded (via the reduced impact effect of the interactions between the two maximisers). But it seems sensible to add the extra precaution of starting with a bounded u.
Augh! “I” and “you” are not in the list of agents we’re discussing. Who starts with a bounded u, and how does that impact the decision of what S will be offered by the M(u-v) agent?
u is bounded. All agents start with a bounded u. The “I” is me (Stuart), saying “start this project with a bounded u, as that seems to have less possible failures than a general u”.
how does that impact the decision of what S will be offered by the M(u-v) agent?
With an unbounded u, the M(u-v) agent might be tempted to build a u maximiser (or something like that), counting on M(εu+v) getting a lot of value out of it, and so accepting it.
Basically, for the setup to work, M(εu+v) must get most of its expected value from maximising v (and hence want almost all resources available for v maximising). “bounded u with easily attainable bound” means that M(εu+v) will accept some use of resources by S(u) to increase u, but not very much.
Could you expand on what the “upper bound” of utility is for a maximizer, and why it’s easy to approach? Perhaps a concrete (but simple) example would help. Say “Clippy” wants to maximize paperclips and minimize waste heat. “HotClippy” is the counterfactual agent that maximizes heat while thinking paperclips are fine if they’re nearly free. What is the maximal value for paperclips?
It seems like the submission is always going to be S(infinity*u + 0v) for this constraint. Any other v will be rejected by the counterfactual or contradict the base agent’s preferences. Any smaller/finite u is a lost opportunity.
Clippy has utility that awards 1 if Clippy produces one or more paperclips (and 0 otherwise). Clippy can easily produce ten paperclips.
Basically what I’m trying to do is make the AI “do the easy evident thing” rather than “optimise the whole universe just to be absolutely sure they achieved their goal”.
What I’m not following is how you take an optimizer and convince it that the best route is to use a satisficer subagent. Clippy (the maximizer, the agent you’re trying to limit) gets utility from infinite paperclips. It’s ClippyJr (the satisficer) which can be limited to 1. But why would maximizer-clippy prefer to propose that, as opposed to proposing ClippyJrPlus, who is a satisficer, but has a goal of 10^30 paperclips)?
Please include all three agents in an example: M(u-v), S(finite-u), M(εu+v).
Here, I start with a bounded and easy to reach u (that’s a first step in the process), so “u = finite-u”. This is still not safe for a maximiser (usual argument about “being sure” and squeezing ever more tiny amounts of expected utility from optimising the universe). Then the whole system is supposed to produce S(u) rather than M(u). This is achieved by having M(εu+v) allow it, when M(εu+v) expects (counterfactually) to optimise the universe, and would see any optimisation by S(u) as getting in the way (or, if it could co-opt these otimisations, then this is something that M(u-v) would not want it to do).
Technically, you might not need to bound u so sharply—it’s possible that the antagonistic setup will produce a S(u) that is equivalent to S(finite-u) even it u is unbounded (via the reduced impact effect of the interactions between the two maximisers). But it seems sensible to add the extra precaution of starting with a bounded u.
Augh! “I” and “you” are not in the list of agents we’re discussing. Who starts with a bounded u, and how does that impact the decision of what S will be offered by the M(u-v) agent?
u is bounded. All agents start with a bounded u. The “I” is me (Stuart), saying “start this project with a bounded u, as that seems to have less possible failures than a general u”.
With an unbounded u, the M(u-v) agent might be tempted to build a u maximiser (or something like that), counting on M(εu+v) getting a lot of value out of it, and so accepting it.
Basically, for the setup to work, M(εu+v) must get most of its expected value from maximising v (and hence want almost all resources available for v maximising). “bounded u with easily attainable bound” means that M(εu+v) will accept some use of resources by S(u) to increase u, but not very much.