I think there are at least three different things being called “the utility function” here, and that’s causing confusion:
The utility function as specified in the software, mapping possible worlds to values. Let’s call this S.
The utility function as it is implemented running on actual hardware. Let’s call this H.
A representation of the utility function that can be passed as data to a black box optimizer. Let’s call this R.
You seem to be saying that in the software design of your AI, R = H. That is, that the black box will be given some data representing the Al’s hardware and other constraints, and return a possible world maximizing H.
From my point of view, that’s already a design fault. The designers of this AI want S maximized, not H. The AI itself wants S maximized instead of H in all circumstances where the hardware flaw doesn’t trigger. Who chose to pass H into the optimizer?
You seem to be saying that in the software design of your AI, R = H. That is, that the black box will be given some data representing the Al’s hardware and other constraints, and return a possible world maximizing H.
From my point of view, that’s already a design fault.
I agree; this is a design flaw. The issue is, I have yet to come across any optimization, planning algorithm, or AI architecture that doesn’t have this design flaw.
That is, I don’t know of any AI architecture that does not involve using a potentially hardware-bug-exploitable utility function as input into some planning or optimization problem. And I’m not sure there even is one.
In the rest of this comment I’ll just suggest approaches and show how they are still vulnerable to the hardware-bug-exploitation problem.
I have some degree of background in artificial intelligence, and the planning and optimization algorithms I’ve seen take the function to be maximized as an input parameter. Then, when people want to make an AI, they just call that planning or optimization algorithm with their (hardware-bug-exploitable) utility or cost functions. For example, suppose someone wants to make a plan that minimizes cost function f in search space s. Then I think they just directly do something like:
return a_star(f, s)
And this doesn’t provide any protection from hardware-level exploitation.
Now, correct me if I’m wrong, but it seems your thinking of the AI first doing some pre-processing to find an input to the planning or optimization algorithm that is resistant to hardware-bug-exploitation.
But how do you actually do that? You could regard the input the AI puts into the optimization function to be a choice it makes. But how does it make this choice? The only thing I can think of is having a planning or optimization algorithm figure out out what function to use as the input to the optimization or planning algorithm.
But if you need to use a planning or optimization algorithm to do this, then what utility function do you pass into this planning or optimization algorithm? You could try to pass the actual, current, hardware-bug-exploitable utility function. But then this doesn’t resolve the problem of hardware-bug-exploitation: when coming up with a utility function to input to the optimization, the AI may find such an input that itself scores very high due to hardware bug exploitation.
To describe the above more concretely, you could try doing something like this:
with its hardware-bug-exploitable utility function. Thus, the output, reasonable_utility_function_use, might be very wrong due to hardware bug exploitation having been used to come up with this.
Now, you might have had some other idea in mind. I don’t know of a concrete way to get around this problem, so I’m very interested to hear your thoughts.
My concern is that people will figure out how to make powerful optimization and planning algorithms without first figuring out how to fix this design flaw.
The issue is, I have yet to come across any optimization, planning algorithm, or AI architecture that doesn’t have this design flaw.
Yes you have. None of the these optimization procedures analyze the hardware implementation of a function in order to maximize it.
The rest of your comment is irrelevant, because what you have been describing is vastly worse than merely calling the function. If you merely call the function, you won’t find these hardware exploits. You only find them when analyzing the implementation. But the optimizer isn’t given access to the implementation details, only to the results.
If you prefer, you can cast the problem in terms of differing search spaces. As designed, the function U maps representations of possible worlds to utility values. When optimizing, you make various assumptions about the structure of the function—usually assumed to be continuous, sometimes differentiable, but in particular you always assume that it’s a function of its input.
The fault means that under some conditions that are extremely unlikely in practice, the value returned is not a function of the input. It’s a function of input and a history of the hardware implementing it. There is no way for the optimizer to determine this, or anything about the conditions that might trigger it, because they are outside its search space. The only way to get an optimizer that searches for such hardware flaws is to design it to search for them.
In other words pass the hardware design, not just the results of evaluation, to a suitably powerful optimizer.
I think there are at least three different things being called “the utility function” here, and that’s causing confusion:
The utility function as specified in the software, mapping possible worlds to values. Let’s call this S.
The utility function as it is implemented running on actual hardware. Let’s call this H.
A representation of the utility function that can be passed as data to a black box optimizer. Let’s call this R.
You seem to be saying that in the software design of your AI, R = H. That is, that the black box will be given some data representing the Al’s hardware and other constraints, and return a possible world maximizing H.
From my point of view, that’s already a design fault. The designers of this AI want S maximized, not H. The AI itself wants S maximized instead of H in all circumstances where the hardware flaw doesn’t trigger. Who chose to pass H into the optimizer?
I agree; this is a design flaw. The issue is, I have yet to come across any optimization, planning algorithm, or AI architecture that doesn’t have this design flaw.
That is, I don’t know of any AI architecture that does not involve using a potentially hardware-bug-exploitable utility function as input into some planning or optimization problem. And I’m not sure there even is one.
In the rest of this comment I’ll just suggest approaches and show how they are still vulnerable to the hardware-bug-exploitation problem.
I have some degree of background in artificial intelligence, and the planning and optimization algorithms I’ve seen take the function to be maximized as an input parameter. Then, when people want to make an AI, they just call that planning or optimization algorithm with their (hardware-bug-exploitable) utility or cost functions. For example, suppose someone wants to make a plan that minimizes cost function f in search space s. Then I think they just directly do something like:
And this doesn’t provide any protection from hardware-level exploitation.
Now, correct me if I’m wrong, but it seems your thinking of the AI first doing some pre-processing to find an input to the planning or optimization algorithm that is resistant to hardware-bug-exploitation.
But how do you actually do that? You could regard the input the AI puts into the optimization function to be a choice it makes. But how does it make this choice? The only thing I can think of is having a planning or optimization algorithm figure out out what function to use as the input to the optimization or planning algorithm.
But if you need to use a planning or optimization algorithm to do this, then what utility function do you pass into this planning or optimization algorithm? You could try to pass the actual, current, hardware-bug-exploitable utility function. But then this doesn’t resolve the problem of hardware-bug-exploitation: when coming up with a utility function to input to the optimization, the AI may find such an input that itself scores very high due to hardware bug exploitation.
To describe the above more concretely, you could try doing something like this:
That is, the AI above uses its own utility function to pick out a utility function to use as input to its planning algorithm.
As you can see, the above code is still vulnerable to hardware-bug exploitation. This is because it calls,
with its hardware-bug-exploitable utility function. Thus, the output, reasonable_utility_function_use, might be very wrong due to hardware bug exploitation having been used to come up with this.
Now, you might have had some other idea in mind. I don’t know of a concrete way to get around this problem, so I’m very interested to hear your thoughts.
My concern is that people will figure out how to make powerful optimization and planning algorithms without first figuring out how to fix this design flaw.
Yes you have. None of the these optimization procedures analyze the hardware implementation of a function in order to maximize it.
The rest of your comment is irrelevant, because what you have been describing is vastly worse than merely calling the function. If you merely call the function, you won’t find these hardware exploits. You only find them when analyzing the implementation. But the optimizer isn’t given access to the implementation details, only to the results.
If you prefer, you can cast the problem in terms of differing search spaces. As designed, the function U maps representations of possible worlds to utility values. When optimizing, you make various assumptions about the structure of the function—usually assumed to be continuous, sometimes differentiable, but in particular you always assume that it’s a function of its input.
The fault means that under some conditions that are extremely unlikely in practice, the value returned is not a function of the input. It’s a function of input and a history of the hardware implementing it. There is no way for the optimizer to determine this, or anything about the conditions that might trigger it, because they are outside its search space. The only way to get an optimizer that searches for such hardware flaws is to design it to search for them.
In other words pass the hardware design, not just the results of evaluation, to a suitably powerful optimizer.