Satisficers’ undefined behaviour
I previously posted an example of a satisficer (an agent seeking to achieve a certain level of expected utility u) transforming itself into a maximiser (an agent wanting to maximise expected u) to better achieve its satisficing goals.
But the real problem with satisficers isn’t that they “want” to become maximisers; the real problem is that their behaviour is undefined. We conceive of them as agents that would do the minimum required to reach a certain goal, but we don’t specify “minimum required”.
For example, let A be a satisficing agent. It has a utility u that is quadratic in the number of paperclips it builds, except that after building 10100, it gets a special extra exponential reward, until 101000, where the extra reward becomes logarithmic, and after 1010000, it also gets utility in the number of human frowns divided by 3↑↑↑3 (unless someone gets tortured by dust specks for 50 years).
A’s satisficing goal is a minimum expected utility of 0.5, and, in one minute, the agent can press a button to create a single paperclip.
So pressing the button is enough. In the coming minute, A could decide to transform itself into a u-maximiser (as that still ensures the button gets pressed). But it could also do a lot of other things. It could transform itself into a v-maximiser, for many different v’s (generally speaking, given any v, either v or -v will result in the button being pressed). It could break out, send a subagent to transform the universe into cream cheese, and then press the button. It could rewrite itself into a dedicated button pressing agent. It could write a giant Harry Potter fanfic, force people on Reddit to come up with creative solutions for pressing the button, and then implement the best.
All these actions are possible for a satisficer, and are completely compatible with its motivations. This is why satisficers are un(der)defined, and why any behaviour we want from it—such as “minimum required” impact—has to be put in deliberately.
I’ve got some ideas for how to achieve this, being posted here.
- New(ish) AI control ideas by 5 Mar 2015 17:03 UTC; 34 points) (
- In Praise of Maximizing – With Some Caveats by 15 Mar 2015 19:40 UTC; 32 points) (
- If you don’t design for extrapolation, you’ll extrapolate poorly—possibly fatally by 8 Apr 2021 18:10 UTC; 17 points) (
- Un-optimised vs anti-optimised by 14 Apr 2015 18:30 UTC; 12 points) (
- Defining a limited satisficer by 11 Mar 2015 14:23 UTC; 10 points) (
- Closest stable alternative preferences by 20 Mar 2015 12:41 UTC; 6 points) (
- Is ‘satificing’ optimisation? by 24 Aug 2020 11:51 UTC; 4 points) (
- 28 Dec 2020 20:28 UTC; 1 point) 's comment on Tags Discussion/Talk Thread by (
One of this month’s rationality quotes (by Emanuel Lasker) is relevant:
This is what maximizers do: they stop looking when they have a proof that they have found the best move possible. Satisficers behave very differently: when they see a good move, they stop looking and take it.
This makes them difficult to analyze. If there are many different good moves, which one they pick will depend on features specific to their cognitive algorithm / priming / etc., rather than features specific to the problem.
Maximizers don’t take the proven optimal path, they take action when the EV of analyzing actions becomes lower than the current most valuable path. There are no guarantees that there is such a thing as an optimal path in many given situations, and spending resources and opportunities on proving that you will take the best path is not how you maximize at all. The situation changes as you search for the optimal path to that situation.
This is a conception of maximizers that I generally like, and is true if “cost of analysis” is part of the objective function, but it’s important to note that this is not the most generic class of maximizers, but a subset of that class. Note that any maximizer that comes up with a proof that it’s found an optimal solution implicitly knows that the EV of continuing to analyze actions is lower than going ahead with that solution.
I think what you have in mind is more typically referred to as an “optimizer,” like in “metaheuristic optimization.” Tabu search isn’t guaranteed to find you a globally optimal solution, but it’ll get you a better solution than you started with faster than other approaches, and that’s what people generally want. There’s no use taking five years to produce an absolute best plan for assigning packages to trucks going out for delivery tomorrow morning.
But the distinction that Stuart_Armstrong cares about holds: maximizers (as I defined them, without taking analysis costs into consideration) seem easy to analyze and optimizers seem hard to analyze: I can figure out the properties that an absolute best solution has, and there’s a fairly small set of those, but I might have a much harder time figuring out the properties that a solution returned by running tabu search overnight will have. But that might just be a perspective thing; I can actually run tabu search overnight a bunch of times, but I might not be able to actually figure out the set of absolute best solutions.
My intuition is telling me that resource costs are relevant to an agent whether it has a term in the objective function or not. Omohundro’s instrumental goal of efficiency...?
Ah; I’m not requiring a maximizer to be a general intelligence, and my intuitions are honed on things like CPLEX.
I usually treat “satisficer” as “utility for a thing flattens out”, not “utility stops being valuable”. In fact, I’m not sure you can call it utility using your method. In the real world, I don’t think there are any true satisficers—most will still act as if more is better, just that their diminishing returns are sharper than optimizers.
A definition of satisficing that starts with utility units is somewhat incoherent, I think. Utility is already a measure of what one wants—it is nonsense to talk about “not wanting more utility”.
That’s one of the problems with the class of maximizers MIRI talks about. They don’t have diminishing returns on utility per paperclip created.
Of course they can (generally thought of as something logarithmic), just not as sharp a drop-off as satisficers, and with somewhat less utility cost of searching for improvements.
the clippy problem isn’t so much maximizer vs satisficer, it’s just a far too simple goal structure.
Please explain. Do you mean that a given maximizer is looking at the marginal utility of paperclips as the percentage of total paperclip mass? Because that is entirely dependent on the agent’s utility function. Clippy will never stop making paperclips unless making paperclips results in a net loss of paperclips.
A given decision agent is making choices, including clippy, maximizers, and satisficers. All of them have utility functions which include increasing utility for things they like. Generally, both maximizers and satisficers have declining marginal utility for things they like, but increasing absolute utility for them. U(n things) < U(n+1 things), but U(thing #n) < U(thing #n+1).
Agents have competing desires (more than one thing in their utility function). So choices they make have to weigh different things. Do I want N of x and M+1 of y, or do I want N+1 of x and M of y? This is where it gets interesting: a satisficer generally values minimizing time and hassle more than getting more of a thing than really necessary. An optimizer values minimizing time and hassle, but less so compared to getting more desirable future states.
Clippy doesn’t have multiple things to balance against each other, so it doesn’t matter whether its utility function has declining marginal utilty, nor to what degree it declines. It has increasing absolute utilty, and there’s nothing else to optimize, so more clips is always better. This is an unrelated topic to satisficers vs maximizers.
Okay, thank you. I was focusing on the pathological case.
Unless they are maximizers with complicated utility functions. There are NP-hard problems where we can get within 5% of the optimum in polynomial time. You can be a satisficer happy with 95% of the solution. On the other hand, you can incorporate computer time into your utility function and maximize accordingly.
Using units of utility here is going to throw you off. By definition, a unit of utility already takes into account all of A’s preferences. For example, if A got paid in dollars per paperclip, then we could weigh the benefit of making extra paperclips against A’s utility in taking the night off and heading off to the pub. However, this would make less sense if we assumed quadratic returns on utility.
Why would A do any of those other things unless it gained some utility thereby? If A had a utility function that valued writing Harry Potter fanfic but didn’t want to lose out on the paycheck from the paperclip job, standard utility theory would predict that A expend the minimum effort on paperclips necessary to keep its job. A’s marginal utility decreases sharply after passing this minimum and would explain the satisficing.