Rather, where the utility function is simple AND the program is stupid. Paperclippers are not useful things.
I was thinking of current top chess programs as smart(well above average humans), with simple utility functions.
Reinforcement-based utility definition plus difficult games with well-defined winning conditions seems to constitute a counterexample to this principle (a way of doing AI that won’t hit the wall you described).
This is a good example, but it might not completely explain it away.
Can we, by hand or by algorithm, construct a utility function that does what we want, even when we know exactly what we want?
I think you could still have a situation in which a smarter agent does worse because it’s learned utility function does not match the winning conditions (it’s learned utility function would constitute a created subgoal of “maximize reward”)
Learning about the world and constructing subgoals would probably be part of any near-human AI. I don’t think we have a way to construct reliable subgoals, even with a rules-defined supergoal and perfect knowledge of the world. (such a process would be a huge boon for FAI)
Likewise, I don’t think we can be certain that the utility functions we create by hand would reliably lead a high-intelligence AI to seek the goal we want, even for well-defined tasks.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
So I think it would hit the same kind of wall, at least in some games.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
Thinking deeply until you get eaten by a sabertooth is not smart.
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.
I was thinking of current top chess programs as smart(well above average humans), with simple utility functions.
This is a good example, but it might not completely explain it away.
Can we, by hand or by algorithm, construct a utility function that does what we want, even when we know exactly what we want?
I think you could still have a situation in which a smarter agent does worse because it’s learned utility function does not match the winning conditions (it’s learned utility function would constitute a created subgoal of “maximize reward”)
Learning about the world and constructing subgoals would probably be part of any near-human AI. I don’t think we have a way to construct reliable subgoals, even with a rules-defined supergoal and perfect knowledge of the world. (such a process would be a huge boon for FAI)
Likewise, I don’t think we can be certain that the utility functions we create by hand would reliably lead a high-intelligence AI to seek the goal we want, even for well-defined tasks.
A smarter agent might have the advantage of learning the winning conditions faster, but if it is comparatively better at implementing a flawed utility function than it is at fixing it’s utility function, then could be outpaced by stupider versions, and you’re working more in an evolutionary design space.
So I think it would hit the same kind of wall, at least in some games.
I meant the AI to be limited to the formal game universe, which should be easily feasible for non-superintelligent AIs. In this case, smarter agents always have an advantage, maximization of reward is the same as the intended goal.
Thinking deeply until you get eaten by a sabertooth is not smart.
Answer is here, thinking out loud is below
If you give the AI a perfect utility function for a game, it still has to break down subgoals and seek those. You don’t have a good general theory for making sure your generated subgoals actually serve your supergoals, but you’ve tweaked things enough that it’s actually very good at achieving the ‘midlevel’ things.
When you give it something more complex, it improperly breaks down the goal into faulty subgoals that are ineffective or contradictory, and then effectively carries out each of them. This yields a mess.
At this point you could improve some of the low level goal-achievement and do much better at a range of low level tasks, but this wouldn’t buy you much in the complex tasks, and might just send you further off track.
If you understand that the complex subgoals are faulty, you might be able to re-patch it, but this might not help you solve different problems of similar complexity, let alone more complex problems.
What led me to this answer:
There may not be a trade off at play here. For example: At each turn you give the AI indefinite time and memory to learn all it can from the information it has so far, and to plan. (limted by your patience and budget, but let’s handwave that computation resources are cheap, and every turn the AI comes in well below it’s resource limit.)
You have a fairly good move optimizer that can achieve a wide range of in game goals, and a reward modeler that tries to learn what it is supposed to do and updates the utility function.
But how do they know how to maximize reward? I was assuming they have to learn the reward criteria. If they have a flawed concept of that criteria, they will seek non-reward.
If the utility function is one and the same as winning, then the (see Top)
End-of-conversation status:
I don’t see a clear argument, and failing that, I can’t take confidence in a clear lawful conclusion (AGI hits a wall). I don’t think this line of inquiry is worthwhile.