Where does the presumption that an AGI necessarily becomes an unbounded optimizer come from if it is not architected that way? Remind me because I am confused. Tools, oracles and these neuromorphic laborers we talked about before do not seem to have this bug (although maybe they could develop something like it.)
My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.
My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.
I wouldn’t call it a bug, generally. Depending on what you want your AI to do, it may very well be a feature; it’s just that there are consequences, and you need to take those into account when deciding just what and how much you need the AI’s final goals to do to get a good outcome.
I think I see what you’re saying, but I am going to go out on a limb here and stick by “bug.” Unflagging, unhedged optimization of a single goal seems like an error, no matter what.
Please continue to challenge me on this, and I’ll try to develop this idea.
Approach #1:
I am thinking that in practical situations single-mindedness actually does not even achieve the ends of a single-minded person. It leads them in wrong directions.
Suppose the goals and values of a person or a machine are entirely single-minded (for instance, “I only eat, sleep and behave ethically so I can play Warcraft or do medical research for as many years as possible, until I die”) and the rest are all “instrumental.”
I am inclined to believe that if they allocated their cognitive resources in that way, such a person or machine would run into all kinds of problems very rapidly, and fail to accomplish their basic goal..
If you are constantly asking “but how does every small action I take fit into my Warcraft-playing?” then you’re spending too much effort on constant re-optimization, and not enough on action.
Trying to optimize all of the time costs a lot. That’s why we use rules of thumb for behavior instead.
Even if all you want is to be an optimal WarCraft player, it’s better to just designate some time and resources for self-care or for learning how to live effectively with the people who can help. The optimal player would really focus on self-care or social skills during that time, and stop imagining WarCraft games for a while.
While the optimal Warcraft player is learning social skills, learning social skills effectively becomes her primary objective. For all practical purposes, she has swapped utility functions for a while.
Now let’s suppose we’re in the middle of a game of WarCraft. To be an optimal Warcraft player for more one game, we also have to have a complex series of interrupts and rules (smell smoke, may screw up important relationship, may lose job and therefore not be able to buy new joystick).
If you smell smoke, the better mind architecture seems to involve swapping out the larger goal of Warcraft-playing in favor of extreme focus on dealing with the possibility that the house is burning down.
Approach #2: Perhaps finding the perfect goal is impossible-that goals must be discovered and designed over time. Goal-creation is subject to bounded rationality, so perhaps a superintelligence, like people, would incorporate a goal-revision algorithm on purpose.
Approach #3: Goals may derive from first principles which are arrived at non-rationally (I did not say irrationally, there is a difference). If a goal is non-rational, and its consequences have yet to be fully explored, then there is a non-zero probability that, at a later time, this goal may prove self-inconsistent, and have to be altered.
Under such circumstances, single-minded drives risk disaster.
Approach #4:
Suppose the system is designed in some way to be useful to people. It is very difficult to come up with unambiguous, airtightly consistent goals in this realm.
If a goal has anything to do with pleasing people, what they want changes unpredictably with time. Changing the landscape of an entire planet, for example, would not be an appropriate response for an AI that was very driven to please its master, even if the master claimed that was they really wanted.
I am still exploring here, but I am veering toward thinking that utility function optimization, in any pure form, just plain yields flawed minds.
You’re talking about runtime optimizations. Those are fine. You’re totally allowed to run some meta-analysis, figure out you’re spending more time on goal-tree updating than the updates gain you in utility, and scale that process down in frequency, or even make it dependent on how much cputime you need for itme-critical ops in a given moment. Agents with bounded computational resources will never have enough cputime to compute provably optimal actions in any case (the problem is uncomputable); so how much you spend on computation before you draw the line and act out your best guess is always a tradeoff you need to make. This doesn’t mean your ideal top-level goals—the ones you’re trying to implement as best you can—can’t maximize.
Approach #2: May want more goals
For this to work, you’d still need to specify how exactly that algorithm works; how you can tell good new goals from bad ones. Once you do, this turns into yet another optimization problem you can install as a (or the only) final goal, and have it produce subgoals as you continue to evaluate it.
Approach #3: Derive goals?
I may not have understood this at all, but are you talking about something like CEV? In that case, the details of what should be done in the end do depend on fine details of the environment which the AI would have to read out and (possibly expensively) evaluate before going into full optimization mode. That doesn’t mean you can’t just encode the algorithm of how to decide what to ultimately do as the goal, though.
Approach #4: Humans are hard.
You’re right; it is difficult! Especially so if you want it to avoid wireheading (the humans, not itself), and brainwashing, keep society working indefinitely, and not accidentally squash even a few important values. It’s also known as the FAI content problem. That said, I think solving it is still our best bet when choosing what goals to actually give our first potentially powerful AI.
Are you convinced that an AI will probably pursue the goals discussed in this section?
AI systems need not be architected to optimize a fully-specified, narrowly defined utility function at all.
Where does the presumption that an AGI necessarily becomes an unbounded optimizer come from if it is not architected that way? Remind me because I am confused. Tools, oracles and these neuromorphic laborers we talked about before do not seem to have this bug (although maybe they could develop something like it.)
My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.
I wouldn’t call it a bug, generally. Depending on what you want your AI to do, it may very well be a feature; it’s just that there are consequences, and you need to take those into account when deciding just what and how much you need the AI’s final goals to do to get a good outcome.
I think I see what you’re saying, but I am going to go out on a limb here and stick by “bug.” Unflagging, unhedged optimization of a single goal seems like an error, no matter what.
Please continue to challenge me on this, and I’ll try to develop this idea.
Approach #1:
I am thinking that in practical situations single-mindedness actually does not even achieve the ends of a single-minded person. It leads them in wrong directions.
Suppose the goals and values of a person or a machine are entirely single-minded (for instance, “I only eat, sleep and behave ethically so I can play Warcraft or do medical research for as many years as possible, until I die”) and the rest are all “instrumental.”
I am inclined to believe that if they allocated their cognitive resources in that way, such a person or machine would run into all kinds of problems very rapidly, and fail to accomplish their basic goal..
If you are constantly asking “but how does every small action I take fit into my Warcraft-playing?” then you’re spending too much effort on constant re-optimization, and not enough on action.
Trying to optimize all of the time costs a lot. That’s why we use rules of thumb for behavior instead.
Even if all you want is to be an optimal WarCraft player, it’s better to just designate some time and resources for self-care or for learning how to live effectively with the people who can help. The optimal player would really focus on self-care or social skills during that time, and stop imagining WarCraft games for a while.
While the optimal Warcraft player is learning social skills, learning social skills effectively becomes her primary objective. For all practical purposes, she has swapped utility functions for a while.
Now let’s suppose we’re in the middle of a game of WarCraft. To be an optimal Warcraft player for more one game, we also have to have a complex series of interrupts and rules (smell smoke, may screw up important relationship, may lose job and therefore not be able to buy new joystick).
If you smell smoke, the better mind architecture seems to involve swapping out the larger goal of Warcraft-playing in favor of extreme focus on dealing with the possibility that the house is burning down.
Approach #2: Perhaps finding the perfect goal is impossible-that goals must be discovered and designed over time. Goal-creation is subject to bounded rationality, so perhaps a superintelligence, like people, would incorporate a goal-revision algorithm on purpose.
Approach #3: Goals may derive from first principles which are arrived at non-rationally (I did not say irrationally, there is a difference). If a goal is non-rational, and its consequences have yet to be fully explored, then there is a non-zero probability that, at a later time, this goal may prove self-inconsistent, and have to be altered.
Under such circumstances, single-minded drives risk disaster.
Approach #4:
Suppose the system is designed in some way to be useful to people. It is very difficult to come up with unambiguous, airtightly consistent goals in this realm.
If a goal has anything to do with pleasing people, what they want changes unpredictably with time. Changing the landscape of an entire planet, for example, would not be an appropriate response for an AI that was very driven to please its master, even if the master claimed that was they really wanted.
I am still exploring here, but I am veering toward thinking that utility function optimization, in any pure form, just plain yields flawed minds.
You’re talking about runtime optimizations. Those are fine. You’re totally allowed to run some meta-analysis, figure out you’re spending more time on goal-tree updating than the updates gain you in utility, and scale that process down in frequency, or even make it dependent on how much cputime you need for itme-critical ops in a given moment. Agents with bounded computational resources will never have enough cputime to compute provably optimal actions in any case (the problem is uncomputable); so how much you spend on computation before you draw the line and act out your best guess is always a tradeoff you need to make. This doesn’t mean your ideal top-level goals—the ones you’re trying to implement as best you can—can’t maximize.
For this to work, you’d still need to specify how exactly that algorithm works; how you can tell good new goals from bad ones. Once you do, this turns into yet another optimization problem you can install as a (or the only) final goal, and have it produce subgoals as you continue to evaluate it.
I may not have understood this at all, but are you talking about something like CEV? In that case, the details of what should be done in the end do depend on fine details of the environment which the AI would have to read out and (possibly expensively) evaluate before going into full optimization mode. That doesn’t mean you can’t just encode the algorithm of how to decide what to ultimately do as the goal, though.
You’re right; it is difficult! Especially so if you want it to avoid wireheading (the humans, not itself), and brainwashing, keep society working indefinitely, and not accidentally squash even a few important values. It’s also known as the FAI content problem. That said, I think solving it is still our best bet when choosing what goals to actually give our first potentially powerful AI.
The problem is that it’s easy for any utility function to become narrow, to be narrowly interpreted.