Ideally, this is where I would exhibit some example that demonstrates the utility of thinking this way: an ethical problem that utilitarianism can’t answer well but a control theory approach can, or a self-help or educational problem that other methods couldn’t resolve and this method can.
So I’m not entirely sure whether this is actually correct, and I could be entirely off, but could the control theory approach be relevant for problems like:
If you have an unbounded utility function, it won’t converge
If you have a bounded utility function, you may consider a universe with (say) 10^18 tortured people to be equally bad as a universe with any higher number of tortured people
Conversely, if you have a bounded utility function, you may consider a universe with (say) 10^18 units of positive utility to be equally good as a universe with any higher number of good things
If you do have some clear specific goal (e.g. build a single paperclip factory), then after that goal has been fulfilled, you may keep building more paperclip factories just in case there was something wrong with the first factory, or your sense data is mistaken and you haven’t actually built a factory, etc.
Intuitively it seems to me that the way that human goal-directed behavior works is by some mechanism bringing either desirable or undesirable things into our mental awareness, with the achievement or elimination of that thing then becoming the reference towards which feedback is applied. This kind of architecture might then help fix problems 2-3, in that if an AI becomes aware of there existing more bad things / there being the potential for more good things, it would begin to move towards fixing that, independent of how many other good things already existed. Problem 4 is trickier, but might be related to there some being set of criteria governing whether or not possibilities are brought into mental awareness.
This does look like a fruitful place to look, but one of the main problems here with demonstrating superiority is that the systems can emulate each other pretty well. Claims of superiority typically take the form of “X seems more intuitive” or “I can encode X in less space using this structure” rather than “X comes to a different, better conclusion.” For example:
If you have a bounded utility function, you may consider a universe with (say) 10^18 tortured people to be equally bad as a universe with any higher number of tortured people
You can have asymptotic bounds that “mostly” solve this problem, or at least they solve this problem about as well as a controller would.
For example, suppose my utility based on the number of people that are alive is the logistic function (with x0 set to, say, 1,000 or 1,000,000). Then I will always prefer a world where X1 people are alive to a world where X2 people are alive iff X1>X2, but the utility is bounded above by 1, and has nice global properties.
Basically, it smooths together the “I would like more people to be alive” desire and the “I would like humanity to continue” desire in a continuous fashion, such that a 50-50 flip that doubles the human population (and wealth and so on) on heads and eliminates them on tails looks like a terrible idea (despite being neutral if your utility function is linear in the number of humans alive). I’m not sure that the logistic has the local behavior that we would want at any particular population size, but something like it probably does.
The solution that a controller would apply to this is typically referring to “upper bound on control effort.” That is, the error can be arbitrarily large, but at some point you simply don’t have any more ability to adjust the system, and so having 1e18 more people tortured than you want is “just as bad” as having 1e6 more people tortured than you want because both situations are bad enough to employ your maximal effort trying to reduce the number. One thing about this approach is that the bound is determined by your ability to affect the world rather than your capacity to care, but it’s not clear to me if that actually makes much of a difference, either mathematically or physically.
On the topic of comparing controllers to utility functions—how does a controller decide what kinds of probabilistic tradeoffs are worth making? For instance, if you have a utility function, it’s straightforward to determine whether you prefer, say, a choice that creates X1 new lives with probability P_x1 and kills Y1 people with probability P_y1, versus a choice that creates X2 new new lives with probability P_x2 and kills Y2 people with probability P_y2. How does one model that choice in a control theory framework?
How does one model that choice in a control theory framework?
I see two main challenges. First, we need to somehow encode distributions, and second, we need to look ahead. Both of those are doable, but it’s worth mentioning explicitly that the bread and butter of utility maximization (considering probabilistic gambles, and looking ahead to the future) are things that need to be built in the control theory framework, and can be built in a number of different ways. (If we do have a scenario where it’s easy to enumerate the choice set, or at least the rules that generate the choice set, and it’s also easy to express the preference function, then utility is the right approach to take.)
The closest to the utility framework is likely to wrap the probability distributions over outcomes as the states, and then the ‘error’ is basically a measure of how much one distribution differs from the distribution we’re shooting for. Possible actions are probably fed into a simulator circuit, that spits out the expected distribution. It looks like we could basically express this problem as “minimize opportunity cost while pursuing many options,” as if we ever simulate a plan and think it’s better than our current best plan we replace the current best plan, but if we simulate a plan and it’s not better than our current best plan we look for a new plan to simulate. (You’d also likely bake in some stopping criterion.)
So it would probably look at choice 1, encode the discrete pmf as the reference state, then look at choice 2, and decide whether or not the error is positive (which it responds to by switching to choice 2) or negative (which it responds to by acting on choice 1). But in order to compare pmfs and get a sense of positive or negative I need to have some mathematical function, which would be the utility function in the utility framework.
We also might notice that this makes it easy for endowment effect problems to creep in- if none of the options are obviously better than any of the other options, it would default to whichever one came first. On the flip side, it makes it easy to start working with the first mediocre plan we come across, and then abandon that plan if a better one shows up. That is, this is more suited to operating in continuous time than a “plan, then act” utility maximization framework.
Also, controllers are more robust then utility agents. Utility agents tend to go haywire upon discovering that some term in their utility function isn’t actually quite well-defined. Keep in mind that it’s impossible to predict future discoveries ahead of time and what their implications for the well-definiteness of terms might be.
So I’m not entirely sure whether this is actually correct, and I could be entirely off, but could the control theory approach be relevant for problems like:
If you have an unbounded utility function, it won’t converge
If you have a bounded utility function, you may consider a universe with (say) 10^18 tortured people to be equally bad as a universe with any higher number of tortured people
Conversely, if you have a bounded utility function, you may consider a universe with (say) 10^18 units of positive utility to be equally good as a universe with any higher number of good things
If you do have some clear specific goal (e.g. build a single paperclip factory), then after that goal has been fulfilled, you may keep building more paperclip factories just in case there was something wrong with the first factory, or your sense data is mistaken and you haven’t actually built a factory, etc.
Intuitively it seems to me that the way that human goal-directed behavior works is by some mechanism bringing either desirable or undesirable things into our mental awareness, with the achievement or elimination of that thing then becoming the reference towards which feedback is applied. This kind of architecture might then help fix problems 2-3, in that if an AI becomes aware of there existing more bad things / there being the potential for more good things, it would begin to move towards fixing that, independent of how many other good things already existed. Problem 4 is trickier, but might be related to there some being set of criteria governing whether or not possibilities are brought into mental awareness.
Does this make sense?
This does look like a fruitful place to look, but one of the main problems here with demonstrating superiority is that the systems can emulate each other pretty well. Claims of superiority typically take the form of “X seems more intuitive” or “I can encode X in less space using this structure” rather than “X comes to a different, better conclusion.” For example:
You can have asymptotic bounds that “mostly” solve this problem, or at least they solve this problem about as well as a controller would.
For example, suppose my utility based on the number of people that are alive is the logistic function (with x0 set to, say, 1,000 or 1,000,000). Then I will always prefer a world where X1 people are alive to a world where X2 people are alive iff X1>X2, but the utility is bounded above by 1, and has nice global properties.
Basically, it smooths together the “I would like more people to be alive” desire and the “I would like humanity to continue” desire in a continuous fashion, such that a 50-50 flip that doubles the human population (and wealth and so on) on heads and eliminates them on tails looks like a terrible idea (despite being neutral if your utility function is linear in the number of humans alive). I’m not sure that the logistic has the local behavior that we would want at any particular population size, but something like it probably does.
The solution that a controller would apply to this is typically referring to “upper bound on control effort.” That is, the error can be arbitrarily large, but at some point you simply don’t have any more ability to adjust the system, and so having 1e18 more people tortured than you want is “just as bad” as having 1e6 more people tortured than you want because both situations are bad enough to employ your maximal effort trying to reduce the number. One thing about this approach is that the bound is determined by your ability to affect the world rather than your capacity to care, but it’s not clear to me if that actually makes much of a difference, either mathematically or physically.
Thanks, that makes sense.
On the topic of comparing controllers to utility functions—how does a controller decide what kinds of probabilistic tradeoffs are worth making? For instance, if you have a utility function, it’s straightforward to determine whether you prefer, say, a choice that creates X1 new lives with probability P_x1 and kills Y1 people with probability P_y1, versus a choice that creates X2 new new lives with probability P_x2 and kills Y2 people with probability P_y2. How does one model that choice in a control theory framework?
I see two main challenges. First, we need to somehow encode distributions, and second, we need to look ahead. Both of those are doable, but it’s worth mentioning explicitly that the bread and butter of utility maximization (considering probabilistic gambles, and looking ahead to the future) are things that need to be built in the control theory framework, and can be built in a number of different ways. (If we do have a scenario where it’s easy to enumerate the choice set, or at least the rules that generate the choice set, and it’s also easy to express the preference function, then utility is the right approach to take.)
The closest to the utility framework is likely to wrap the probability distributions over outcomes as the states, and then the ‘error’ is basically a measure of how much one distribution differs from the distribution we’re shooting for. Possible actions are probably fed into a simulator circuit, that spits out the expected distribution. It looks like we could basically express this problem as “minimize opportunity cost while pursuing many options,” as if we ever simulate a plan and think it’s better than our current best plan we replace the current best plan, but if we simulate a plan and it’s not better than our current best plan we look for a new plan to simulate. (You’d also likely bake in some stopping criterion.)
So it would probably look at choice 1, encode the discrete pmf as the reference state, then look at choice 2, and decide whether or not the error is positive (which it responds to by switching to choice 2) or negative (which it responds to by acting on choice 1). But in order to compare pmfs and get a sense of positive or negative I need to have some mathematical function, which would be the utility function in the utility framework.
We also might notice that this makes it easy for endowment effect problems to creep in- if none of the options are obviously better than any of the other options, it would default to whichever one came first. On the flip side, it makes it easy to start working with the first mediocre plan we come across, and then abandon that plan if a better one shows up. That is, this is more suited to operating in continuous time than a “plan, then act” utility maximization framework.
Also, controllers are more robust then utility agents. Utility agents tend to go haywire upon discovering that some term in their utility function isn’t actually quite well-defined. Keep in mind that it’s impossible to predict future discoveries ahead of time and what their implications for the well-definiteness of terms might be.