On the topic of comparing controllers to utility functions—how does a controller decide what kinds of probabilistic tradeoffs are worth making? For instance, if you have a utility function, it’s straightforward to determine whether you prefer, say, a choice that creates X1 new lives with probability P_x1 and kills Y1 people with probability P_y1, versus a choice that creates X2 new new lives with probability P_x2 and kills Y2 people with probability P_y2. How does one model that choice in a control theory framework?
How does one model that choice in a control theory framework?
I see two main challenges. First, we need to somehow encode distributions, and second, we need to look ahead. Both of those are doable, but it’s worth mentioning explicitly that the bread and butter of utility maximization (considering probabilistic gambles, and looking ahead to the future) are things that need to be built in the control theory framework, and can be built in a number of different ways. (If we do have a scenario where it’s easy to enumerate the choice set, or at least the rules that generate the choice set, and it’s also easy to express the preference function, then utility is the right approach to take.)
The closest to the utility framework is likely to wrap the probability distributions over outcomes as the states, and then the ‘error’ is basically a measure of how much one distribution differs from the distribution we’re shooting for. Possible actions are probably fed into a simulator circuit, that spits out the expected distribution. It looks like we could basically express this problem as “minimize opportunity cost while pursuing many options,” as if we ever simulate a plan and think it’s better than our current best plan we replace the current best plan, but if we simulate a plan and it’s not better than our current best plan we look for a new plan to simulate. (You’d also likely bake in some stopping criterion.)
So it would probably look at choice 1, encode the discrete pmf as the reference state, then look at choice 2, and decide whether or not the error is positive (which it responds to by switching to choice 2) or negative (which it responds to by acting on choice 1). But in order to compare pmfs and get a sense of positive or negative I need to have some mathematical function, which would be the utility function in the utility framework.
We also might notice that this makes it easy for endowment effect problems to creep in- if none of the options are obviously better than any of the other options, it would default to whichever one came first. On the flip side, it makes it easy to start working with the first mediocre plan we come across, and then abandon that plan if a better one shows up. That is, this is more suited to operating in continuous time than a “plan, then act” utility maximization framework.
Thanks, that makes sense.
On the topic of comparing controllers to utility functions—how does a controller decide what kinds of probabilistic tradeoffs are worth making? For instance, if you have a utility function, it’s straightforward to determine whether you prefer, say, a choice that creates X1 new lives with probability P_x1 and kills Y1 people with probability P_y1, versus a choice that creates X2 new new lives with probability P_x2 and kills Y2 people with probability P_y2. How does one model that choice in a control theory framework?
I see two main challenges. First, we need to somehow encode distributions, and second, we need to look ahead. Both of those are doable, but it’s worth mentioning explicitly that the bread and butter of utility maximization (considering probabilistic gambles, and looking ahead to the future) are things that need to be built in the control theory framework, and can be built in a number of different ways. (If we do have a scenario where it’s easy to enumerate the choice set, or at least the rules that generate the choice set, and it’s also easy to express the preference function, then utility is the right approach to take.)
The closest to the utility framework is likely to wrap the probability distributions over outcomes as the states, and then the ‘error’ is basically a measure of how much one distribution differs from the distribution we’re shooting for. Possible actions are probably fed into a simulator circuit, that spits out the expected distribution. It looks like we could basically express this problem as “minimize opportunity cost while pursuing many options,” as if we ever simulate a plan and think it’s better than our current best plan we replace the current best plan, but if we simulate a plan and it’s not better than our current best plan we look for a new plan to simulate. (You’d also likely bake in some stopping criterion.)
So it would probably look at choice 1, encode the discrete pmf as the reference state, then look at choice 2, and decide whether or not the error is positive (which it responds to by switching to choice 2) or negative (which it responds to by acting on choice 1). But in order to compare pmfs and get a sense of positive or negative I need to have some mathematical function, which would be the utility function in the utility framework.
We also might notice that this makes it easy for endowment effect problems to creep in- if none of the options are obviously better than any of the other options, it would default to whichever one came first. On the flip side, it makes it easy to start working with the first mediocre plan we come across, and then abandon that plan if a better one shows up. That is, this is more suited to operating in continuous time than a “plan, then act” utility maximization framework.