Very well put. I understood that line of reasoning from the very beginning though and didn’t disagree that complex goals need complex optimization parameters. But I was making a distinction between insufficient and unbounded optimization parameters, goal-stability and the ability or desire to override them. I am aware of the risk of telling an AI to compute as many digits of Pi as possible. What I wanted to say is that if time, space and energy are part of its optimization parameters then no matter how intelligent it is, it will not override them. If you tell the AI to compute as many digits of Pi as possible while only using a certain amount of time or energy for the purpose of optimizing and computing it then it will do so and hold. I’m not sure what is your definition of a ‘failsafe’ but making simple limits like time and space part of the optimization parameters sounds to me like one. What I mean by ‘optimization parameters’ are the design specifications of the subject of the optimization process, like what constitutes a paperclip. It has to use those design specifications to measure its efficiency and if time and space limits are part of it then it will take account of those parameters as well.
I’m not sure what is your definition of a ‘failsafe’ but making simple limits like time and space part of the optimization parameters sounds to me like one.
You also would have to limit the resources it spends to verify how near the limits it is, since it acts to get as close as possible as part of optimization. If you do not, it will use all resources for that. So you need an infinite tower of limits.
Very well put. I understood that line of reasoning from the very beginning though and didn’t disagree that complex goals need complex optimization parameters. But I was making a distinction between insufficient and unbounded optimization parameters, goal-stability and the ability or desire to override them. I am aware of the risk of telling an AI to compute as many digits of Pi as possible. What I wanted to say is that if time, space and energy are part of its optimization parameters then no matter how intelligent it is, it will not override them. If you tell the AI to compute as many digits of Pi as possible while only using a certain amount of time or energy for the purpose of optimizing and computing it then it will do so and hold. I’m not sure what is your definition of a ‘failsafe’ but making simple limits like time and space part of the optimization parameters sounds to me like one. What I mean by ‘optimization parameters’ are the design specifications of the subject of the optimization process, like what constitutes a paperclip. It has to use those design specifications to measure its efficiency and if time and space limits are part of it then it will take account of those parameters as well.
You also would have to limit the resources it spends to verify how near the limits it is, since it acts to get as close as possible as part of optimization. If you do not, it will use all resources for that. So you need an infinite tower of limits.