Here is a (contrived) situation where a satisficer would need to rewrite.
Sally the Satisficer gets invited to participate on a game show. The game starts with a coin toss. If she loses the coin toss, she gets 8 paperclips. If she wins, she gets invited to the Showcase Showdown where she will first be offered a prize of 9 paperclips. If she turns down this first showcase, she is offered the second showcase of 10 paper clips (fans of The Price is Right know the second showcase is always better).
When she first steps on stage she considers whether she should switch to maximizer mode or stick with her satisficer strategy. As a satisficer, she knows that if she wins the coin toss she won’t be able to refuse the 9 paperclip prize since it satisfies her target expected utility of 9. So her expected utility as a satisficer is (1/2) 8 + (1/2) 9 = 8.5. If she won the flip as a maximizer, she would clearly pass on the first showcase and receive the second showcase of 10 paperclips. Thus her expected utility as a maximizer is (1/2) 8 + (1/2) 10 = 9. Switching to maximizer mode meets her target while remaining a satisficer does not, so she rewrites herself to be a maximizer.
Ah, good point. So “picking the best strategy, not just the best individual moves” is similar to self-modifying to be a maximizer in this case.
On the other hand, if our satisficer runs on updateless decision theory, picking the best strategy is already what it does all the time. So I guess it depends on how your satisficer is programmed.
On the other hand, if our satisficer runs on updateless decision theory...
This seems to imply that an updatless satisficer would behave like a maximiser—or that an updatless satisficer with bounded rationality would make themselves into a maximiser as a precaution.
A UDT satisficer is closer to the original than a pure maximizer, because where different strategies fall above the threshold the original tie-breaking rule can still be applied.
Cool example! But your argument relies on certain vagueness in the definitions of “satisficer” and “maximiser”, that between:
A: an agent “content when it reaches a certain level expected utility”; and
B: “simply a maximiser with a bounded utility function”
(These definitions are from the OP).
Looking at the situation you presented: “A” would recognise the situation as having an expected utility as 9, and be content with it (until she loses the coin toss...). “B” would not distinguish between the utility of 9 and the utility of 10. Neither agent would see a need to self-modify.
Your argument treats Sally as (seeing itself) morphing from “A” before the coin toss to “B” after—this, IMO, invalidates your example.
I like this, I really do. I’ve added a mention to it in the post. Note that your point not only shows that a non-timeless satisficer would want to become a maximiser, but that a timeless satisficer would behave as a maximiser already.
I realize this is old (which is why I’m replying to a comment to draw attention), but still, the entire post seems to be predicated on a poor specification of the utility function. Remember, the utility function by definition includes/defines the full preference ordering over outcomes, and must therefore include the idea of acting “satisfied” inside it.
Here, instead, you seem to define a “fake” utility function of U = E(number of paperclips) and then say that the AI will be satisfied at a certain number of paperclips, even though it clearly won’t be because that’s not part of the utility function. That is, something with this purported utility function is already a pure maximiser, not a satisficer at all. Instead, the utility function you’re constructing should be something like U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, in which case in the above example the satisficer really wouldn’t care if it ended up with 9 or 10 paperclips and would remain a satisficer. The notion that a satisficer wants to become a maximiser arises only because you made the “satisficer’s” utility function identical to a maximiser’s to begin with.
(There may be other issues with satisficers, but I don’t think this is one of them. Also, sorry if that came across as confrontational—I just wanted to make my objection as clear as possible.)
Satisficing is a term for a specific type of decision making—quoting wikipedia: “a decision-making strategy that attempts to meet an acceptability threshold. This is contrasted with optimal decision-making, an approach that specifically attempts to find the best option available.”
So by definition a satisficer is an agent that is content with a certain outcome, even though they might prefer a better one. Do you think my model—utility denoting the ideal preferences, and satisficing being content with a certain threshold—is a poor model of this type of agent?
Do you think my model—utility denoting the ideal preferences, and satisficing being content with a certain threshold—is a poor model of this type of agent?
Yes, as I said, I think any preferences of the agent, including being “satisfied”, need to be internalized in the utility function. That is, satisficing should probably be content not with a certain level of utility, but with a certain level of the objective. Anything that’s “outside” the utility function, as satisficing is in this case, will naturally be seen as an unnecessary imposition by the agent and ultimately ignored (if the agent is able to ignore it), regardless of what it is.
For a contrived analogy, modeling a satisficer this way is similar to modelling an honest man as someone who wants to maximize money, but who lives under the rule of law (and who is able to stop the law applying to him whenever he wants at that).
So I did a post saying that a satisfier would turn into an expected utility maximiser, and your point is… that any satisficer should already be an expected utility maximiser :-)
...and your point is… that any satisficer should already be an expected utility maximiser :-)
No, only one that’s modeled the way you’re modeling. I think I’m somehow not being clear, sorry =( My point is that your post is tautological and does an injustice to satisficers. If you move the satisfaction condition inside the utility function, e.g. U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, so that its utility increases to 9 as it gains expected paperclips, and then stops at 9 (which is also not really an optimal definition, but an adequate one), the phenomenon of wanting to be a maximiser disappears. With that utility function, it would be indifferent between being a satisficer and a maximiser.
If you instead changed to a utility function like, let’s say: U = {1 if 8 < E(paperclips) < 11, 0 otherwise}, then it would strictly prefer to remain a satisficer, since a maximiser would inevitably push it into the 0 utility area of the function. I think this is the more standard way to model a satisficer (also with a resource cost thrown in as well), and it’s certainly the more “steelmaned” one, as it avoids problems like the ones in this post.
That’s just a utility maximiser with a bounded utility function.
But this has become a linguistic debate, not a conceputal one. One version of satisficisers (the version I define, which some people intuitively share) will tend to become maximisers. Another version (the bounded utility maximisers that you define) are already maximisers. We both agree on these facts—so what is there to argue about but the linguistics?
Since satisficing is more intuitively that rigorously defined (multiple formal definitions on wikipedia), I don’t think there’s anything more to dispute?
All right, I agree with that. It does seem like satisficers are (or quickly become) a subclass of maximisers by either definition.
Although I think the way I define them is not equivalent to a generic bounded maximiser. When I think of one of those it’s something more like U = paperclips/(|paperclips|+1) than what I wrote (i.e. it still wants to maximize without bound, it’s just less interested in low probabilities of high gains), which would behave rather differently. Maybe I just have unusual mental definitions of both, however.
If the way to satisfice best is to act like a maximizer, then wouldn’t an optimal satisficer simply act like a maximizer, no self-rewriting required?
Here is a (contrived) situation where a satisficer would need to rewrite.
Sally the Satisficer gets invited to participate on a game show. The game starts with a coin toss. If she loses the coin toss, she gets 8 paperclips. If she wins, she gets invited to the Showcase Showdown where she will first be offered a prize of 9 paperclips. If she turns down this first showcase, she is offered the second showcase of 10 paper clips (fans of The Price is Right know the second showcase is always better).
When she first steps on stage she considers whether she should switch to maximizer mode or stick with her satisficer strategy. As a satisficer, she knows that if she wins the coin toss she won’t be able to refuse the 9 paperclip prize since it satisfies her target expected utility of 9. So her expected utility as a satisficer is (1/2) 8 + (1/2) 9 = 8.5. If she won the flip as a maximizer, she would clearly pass on the first showcase and receive the second showcase of 10 paperclips. Thus her expected utility as a maximizer is (1/2) 8 + (1/2) 10 = 9. Switching to maximizer mode meets her target while remaining a satisficer does not, so she rewrites herself to be a maximizer.
Ah, good point. So “picking the best strategy, not just the best individual moves” is similar to self-modifying to be a maximizer in this case.
On the other hand, if our satisficer runs on updateless decision theory, picking the best strategy is already what it does all the time. So I guess it depends on how your satisficer is programmed.
This seems to imply that an updatless satisficer would behave like a maximiser—or that an updatless satisficer with bounded rationality would make themselves into a maximiser as a precaution.
A UDT satisficer is closer to the original than a pure maximizer, because where different strategies fall above the threshold the original tie-breaking rule can still be applied.
Cool example! But your argument relies on certain vagueness in the definitions of “satisficer” and “maximiser”, that between:
A: an agent “content when it reaches a certain level expected utility”; and
B: “simply a maximiser with a bounded utility function”
(These definitions are from the OP).
Looking at the situation you presented: “A” would recognise the situation as having an expected utility as 9, and be content with it (until she loses the coin toss...). “B” would not distinguish between the utility of 9 and the utility of 10. Neither agent would see a need to self-modify.
Your argument treats Sally as (seeing itself) morphing from “A” before the coin toss to “B” after—this, IMO, invalidates your example.
I like this, I really do. I’ve added a mention to it in the post. Note that your point not only shows that a non-timeless satisficer would want to become a maximiser, but that a timeless satisficer would behave as a maximiser already.
I realize this is old (which is why I’m replying to a comment to draw attention), but still, the entire post seems to be predicated on a poor specification of the utility function. Remember, the utility function by definition includes/defines the full preference ordering over outcomes, and must therefore include the idea of acting “satisfied” inside it.
Here, instead, you seem to define a “fake” utility function of U = E(number of paperclips) and then say that the AI will be satisfied at a certain number of paperclips, even though it clearly won’t be because that’s not part of the utility function. That is, something with this purported utility function is already a pure maximiser, not a satisficer at all. Instead, the utility function you’re constructing should be something like U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, in which case in the above example the satisficer really wouldn’t care if it ended up with 9 or 10 paperclips and would remain a satisficer. The notion that a satisficer wants to become a maximiser arises only because you made the “satisficer’s” utility function identical to a maximiser’s to begin with.
(There may be other issues with satisficers, but I don’t think this is one of them. Also, sorry if that came across as confrontational—I just wanted to make my objection as clear as possible.)
Satisficing is a term for a specific type of decision making—quoting wikipedia: “a decision-making strategy that attempts to meet an acceptability threshold. This is contrasted with optimal decision-making, an approach that specifically attempts to find the best option available.”
So by definition a satisficer is an agent that is content with a certain outcome, even though they might prefer a better one. Do you think my model—utility denoting the ideal preferences, and satisficing being content with a certain threshold—is a poor model of this type of agent?
Yes, as I said, I think any preferences of the agent, including being “satisfied”, need to be internalized in the utility function. That is, satisficing should probably be content not with a certain level of utility, but with a certain level of the objective. Anything that’s “outside” the utility function, as satisficing is in this case, will naturally be seen as an unnecessary imposition by the agent and ultimately ignored (if the agent is able to ignore it), regardless of what it is.
For a contrived analogy, modeling a satisficer this way is similar to modelling an honest man as someone who wants to maximize money, but who lives under the rule of law (and who is able to stop the law applying to him whenever he wants at that).
So I did a post saying that a satisfier would turn into an expected utility maximiser, and your point is… that any satisficer should already be an expected utility maximiser :-)
No, only one that’s modeled the way you’re modeling. I think I’m somehow not being clear, sorry =( My point is that your post is tautological and does an injustice to satisficers. If you move the satisfaction condition inside the utility function, e.g. U = {9 if E(paperclips) >= 9, E(paperclips) otherwise}, so that its utility increases to 9 as it gains expected paperclips, and then stops at 9 (which is also not really an optimal definition, but an adequate one), the phenomenon of wanting to be a maximiser disappears. With that utility function, it would be indifferent between being a satisficer and a maximiser.
If you instead changed to a utility function like, let’s say: U = {1 if 8 < E(paperclips) < 11, 0 otherwise}, then it would strictly prefer to remain a satisficer, since a maximiser would inevitably push it into the 0 utility area of the function. I think this is the more standard way to model a satisficer (also with a resource cost thrown in as well), and it’s certainly the more “steelmaned” one, as it avoids problems like the ones in this post.
That’s just a utility maximiser with a bounded utility function.
But this has become a linguistic debate, not a conceputal one. One version of satisficisers (the version I define, which some people intuitively share) will tend to become maximisers. Another version (the bounded utility maximisers that you define) are already maximisers. We both agree on these facts—so what is there to argue about but the linguistics?
Since satisficing is more intuitively that rigorously defined (multiple formal definitions on wikipedia), I don’t think there’s anything more to dispute?
All right, I agree with that. It does seem like satisficers are (or quickly become) a subclass of maximisers by either definition.
Although I think the way I define them is not equivalent to a generic bounded maximiser. When I think of one of those it’s something more like U = paperclips/(|paperclips|+1) than what I wrote (i.e. it still wants to maximize without bound, it’s just less interested in low probabilities of high gains), which would behave rather differently. Maybe I just have unusual mental definitions of both, however.
Maybe bounded maximiser vs maximiser with cutoff? With the second case being a special case of the first (for there are many ways to bound a utility).
Yes, that sounds good. I’ll try using those terms next time.