Do you mean something like “only get 100 paperclips, not more?”
If so—the AGI will never be sure it has 100 paperclips, so it can take lots of precautions to be very, very sure. Like turning all the world into paperclip counters or so
[I think this is more anthropomorphizing ramble than concise arguments. Feel free to ignore :) ]
I get the impression that in this example the AGI would not actually be satisficing. It is no longer maximizing a goal but still optimizing for this rule.
For a satisficing AGI, I’d imagine something vague like “Get many paperclips” resulting in the AGI trying to get paperclips but at some point (an inflection point of diminishing marginal returns? some point where it becomes very uncertain about what the next action should be?) doing something else.
Or for rules like “get 100 paperclips, not more” the AGI might only directionally or opportunistically adhere. Within the rule, this might look like “I wanted to get 100 paperclips, but 98 paperclips are still better than 90, let’s move on” or “Oops, I accidentally got 101 paperclips. Too bad, let’s move on”.
In your example of the AGI taking lots of precautions, the satisficing AGI would not do this because it could be spending its time doing something else.
I suspect there are major flaws with it, but an intuition I have goes something like this:
Humans have in some sense similar decision-making capabilities to early AGI.
The world is incredibly complex and humans are nowhere near understanding and predicting most of it. Early AGI will likely have similar limitations.
Humans are mostly not optimizing their actions, mainly because of limited resources, multiple goals, and because of a ton of uncertainty about the future.
So early AGI might also end up not-optimizing its actions most of the time.
Suppose we assume that the complexity of the world will continue to be sufficiently big such that the AGI will continue to fail to completely understand and predict the world. In that case, the advanced AGI will continue to not-optimize to some extent.
But it might look like near-complete optimization to us.
Would an AGI that only tries to satisfice a solution/goal be safer?
Do we have reason to believe that we can/can’t get an AGI to be a satisficer?
Do you mean something like “only get 100 paperclips, not more?”
If so—the AGI will never be sure it has 100 paperclips, so it can take lots of precautions to be very, very sure. Like turning all the world into paperclip counters or so
[I think this is more anthropomorphizing ramble than concise arguments. Feel free to ignore :) ]
I get the impression that in this example the AGI would not actually be satisficing. It is no longer maximizing a goal but still optimizing for this rule.
For a satisficing AGI, I’d imagine something vague like “Get many paperclips” resulting in the AGI trying to get paperclips but at some point (an inflection point of diminishing marginal returns? some point where it becomes very uncertain about what the next action should be?) doing something else.
Or for rules like “get 100 paperclips, not more” the AGI might only directionally or opportunistically adhere. Within the rule, this might look like “I wanted to get 100 paperclips, but 98 paperclips are still better than 90, let’s move on” or “Oops, I accidentally got 101 paperclips. Too bad, let’s move on”.
In your example of the AGI taking lots of precautions, the satisficing AGI would not do this because it could be spending its time doing something else.
I suspect there are major flaws with it, but an intuition I have goes something like this:
Humans have in some sense similar decision-making capabilities to early AGI.
The world is incredibly complex and humans are nowhere near understanding and predicting most of it. Early AGI will likely have similar limitations.
Humans are mostly not optimizing their actions, mainly because of limited resources, multiple goals, and because of a ton of uncertainty about the future.
So early AGI might also end up not-optimizing its actions most of the time.
Suppose we assume that the complexity of the world will continue to be sufficiently big such that the AGI will continue to fail to completely understand and predict the world. In that case, the advanced AGI will continue to not-optimize to some extent.
But it might look like near-complete optimization to us.
Just saw the inverse question was already asked and answered.