Are you familiar with primal and dual formulations of optimization problems?
Nope, sorry! Tell me! ^_^
Replacing it with −1000 if PA=0 should make it work, by the same reasoning that makes accelerations of ±2 never optimal.
But that doesn’t really capture our intuitions either—if, in a more general model, a human ended up dying by accident, we wouldn’t want the agent to pay the price for this each and every turn thereafter.
The short, rushed version (with links to longer versions that I think will be helpful to get what’s going on):
Linear programming problems are, to some extent, ‘all the same,’ so I’ll just talk about the classic example, the diet problem. You need to consume at least a minimum amount of certain macronutrients and calories, and have access to many foods with varying nutritional value and cost. The primal formulation asks the question “what amount of what foods should I buy, in order to satisfy my nutritional requirements at minimum cost?”. The dual formulation is a little more awkward to state, but one way to think about it is that it asks the question “what is the largest lower bound on the price of a diet that satisfies the nutritional requirements?”
The difference between the two is called the ‘optimality gap.’ For linear programs, it’s always 0; that is, the largest lower bound is equal to the optimal solution.
(These links may also help as explanations (I’ll find book explanations later): Jeremy Kun (which steps through the diet example in more detail), wiki), Ghaoui.)
One thing that falls out of this is the interpretation of the decision variables in the dual formulation. We can express the primal linear programming problem as something like the following:
min c’x
s.t. Ax≥b
x≥0
(c is the cost vector, x is the amount of each food, s.t. is an abbreviation of ‘subject to’, A is the nutrition matrix, and b is the vector of minimum nutritional requirements, and x>=0 means you can’t eat food in reverse.)
The corresponding dual problem is the following:
max b’y
s.t. A’y≤c
y≥0
But what the heck is y? It’s the shadowprice corresponding to the constraint (notice that we use A transpose, instead of A, so we have as many dual variables as constraints in the primal problem). That is, only binding constraints have nonzero shadow prices, and the shadow price tells you how valuable it would be (in terms of your objective function) to be able to relax the limit on that constraint by 1 unit.
In the terms of the diet problem, this tells you how much more cheaply you could eat if you reduced your minimum dietary intake of various nutrients, or how much more expensive it would be to increase them. (Remember, these are the constraints that are tight at the lower bound, and thus determine it.)
So what’s the overall idea? If the AI is using normal optimization techniques, it can determine which of its constraints are binding and which are immaterial (it doesn’t matter if you increase the penalty of the dice flying off because it won’t alter the optimal trajectory, but if you decrease the penalty for uncomfortable acceleration by enough, then the plan will change so we can get there sooner). It seems possible to have the AI, if provided with expert trajectories, do standard pattern recognition techniques to identify boundaries, which it can then communicate as constraints with associated shadow prices.
“It looks to me like there’s an unstated penalty for accelerating more than 1 unit a turn that’s at least n units of reward. Is this a real constraint or just a lack of creativity on the part of the experts?”
(There’s some machine ethics work by Anderson and Anderson on building an ethical reasoning bot trained on a bunch of human expert decisions that managed to do casuistry, basically, where you would give it a new moral dilemma and it would be able to both tell you what action it thought was preferable and which cases it was using to make that determination.)
But that doesn’t really capture our intuitions either—if, in a more general model, a human ended up dying by accident, we wouldn’t want the agent to pay the price for this each and every turn thereafter.
This is a ‘pay for health’ rather than a ‘weregild’ system. I think it would seem intuitive to pay the robot 1000 reward every turn that the human is still alive, but that’s decision-theoretically equivalent to the case where you penalize it every turn that they’re dead. If you use time-discounting, an infinite series of payments (in either direction) can be neatly transformed into a lump sum, but if you don’t want to use time-discounting but don’t want infinite sums floating around, you can have it be −1000 for every turn the human is dead and it hasn’t reached the space station yet (so it cuts off when the −1s cut off).
[Unless, of course, you pay the robot 1000 reward for every turn the human is alive and hasn’t reached the space station, and a penalty of −1 every turn it doesn’t reach the station. That wouldn’t end well. :P]
Nope, sorry! Tell me! ^_^
But that doesn’t really capture our intuitions either—if, in a more general model, a human ended up dying by accident, we wouldn’t want the agent to pay the price for this each and every turn thereafter.
The short, rushed version (with links to longer versions that I think will be helpful to get what’s going on):
Linear programming problems are, to some extent, ‘all the same,’ so I’ll just talk about the classic example, the diet problem. You need to consume at least a minimum amount of certain macronutrients and calories, and have access to many foods with varying nutritional value and cost. The primal formulation asks the question “what amount of what foods should I buy, in order to satisfy my nutritional requirements at minimum cost?”. The dual formulation is a little more awkward to state, but one way to think about it is that it asks the question “what is the largest lower bound on the price of a diet that satisfies the nutritional requirements?”
The difference between the two is called the ‘optimality gap.’ For linear programs, it’s always 0; that is, the largest lower bound is equal to the optimal solution.
(These links may also help as explanations (I’ll find book explanations later): Jeremy Kun (which steps through the diet example in more detail), wiki), Ghaoui.)
One thing that falls out of this is the interpretation of the decision variables in the dual formulation. We can express the primal linear programming problem as something like the following:
min c’x
s.t. Ax≥b
x≥0
(c is the cost vector, x is the amount of each food, s.t. is an abbreviation of ‘subject to’, A is the nutrition matrix, and b is the vector of minimum nutritional requirements, and x>=0 means you can’t eat food in reverse.)
The corresponding dual problem is the following:
max b’y
s.t. A’y≤c
y≥0
But what the heck is y? It’s the shadow price corresponding to the constraint (notice that we use A transpose, instead of A, so we have as many dual variables as constraints in the primal problem). That is, only binding constraints have nonzero shadow prices, and the shadow price tells you how valuable it would be (in terms of your objective function) to be able to relax the limit on that constraint by 1 unit.
In the terms of the diet problem, this tells you how much more cheaply you could eat if you reduced your minimum dietary intake of various nutrients, or how much more expensive it would be to increase them. (Remember, these are the constraints that are tight at the lower bound, and thus determine it.)
(Applied Mathematical Programming Ch. 4)
So what’s the overall idea? If the AI is using normal optimization techniques, it can determine which of its constraints are binding and which are immaterial (it doesn’t matter if you increase the penalty of the dice flying off because it won’t alter the optimal trajectory, but if you decrease the penalty for uncomfortable acceleration by enough, then the plan will change so we can get there sooner). It seems possible to have the AI, if provided with expert trajectories, do standard pattern recognition techniques to identify boundaries, which it can then communicate as constraints with associated shadow prices.
“It looks to me like there’s an unstated penalty for accelerating more than 1 unit a turn that’s at least n units of reward. Is this a real constraint or just a lack of creativity on the part of the experts?”
(There’s some machine ethics work by Anderson and Anderson on building an ethical reasoning bot trained on a bunch of human expert decisions that managed to do casuistry, basically, where you would give it a new moral dilemma and it would be able to both tell you what action it thought was preferable and which cases it was using to make that determination.)
This is a ‘pay for health’ rather than a ‘weregild’ system. I think it would seem intuitive to pay the robot 1000 reward every turn that the human is still alive, but that’s decision-theoretically equivalent to the case where you penalize it every turn that they’re dead. If you use time-discounting, an infinite series of payments (in either direction) can be neatly transformed into a lump sum, but if you don’t want to use time-discounting but don’t want infinite sums floating around, you can have it be −1000 for every turn the human is dead and it hasn’t reached the space station yet (so it cuts off when the −1s cut off).
[Unless, of course, you pay the robot 1000 reward for every turn the human is alive and hasn’t reached the space station, and a penalty of −1 every turn it doesn’t reach the station. That wouldn’t end well. :P]