My main problem with this chapter is that Bostrom assumes that an AI has a single utility function, and that it is expressed as a function on possible states of the universe. Theoretically, you could design a program that is given a domain (in the form of, say, a probabilistic program) and outputs a good action within this domain. In principle, you could have asked it to optimize a function over the universe, but if you don’t, then it won’t. So I think that this program can determine near-optimal behavior across a variety of domains without actually having a utility function (especially not one that is over states of the universe). Of course, this would still be dangerous because anyone could ask it to optimize a function over the universe, but I do think that the assumption that AIs have utility functions over universe states should be more clearly stated and discussed.
It should also be noted that we currently do know how to solve domain-specific optimization problems (such as partially observable Markov decision processes) given enough computation power, but we do not know how to optimize a function over universe states in a way that is agnostic about how to model the universe; this is related to the ontological crisis problem.
1) A single utility function? What would it mean to have multiple?
2) Suppose action is the derivative of the utility function in some sense. Then you can derive a utility function from the actions taken in various circumstances. If the ‘curl’ of the function was not 0, then it was wasting effort. If it was, then it was acting as if it had a utility function anyway.
1) A single utility function? What would it mean to have multiple?
It would mean there is a difficult-to-characterize ecosystem of competing/cooperating agents. Does this sort of cognitive architecture seem familiar at all? :)
My general problem with “utilitarianism” is that it’s sort of like Douglas Adams’ “42.” An answer of the wrong type to a difficult question. Of course we should maximize, that is a useful ingredient of the answer, but is not the only (or the most interesting) ingredient.
My general problem with “utilitarianism” is that it’s sort of like Douglas Adams’ “42.” An answer of the wrong type to a difficult question. Of course we should maximize, that is a useful ingredient of the answer, but is not the only (or the most interesting) ingredient.
Taking off from the end of that point, I might add (but I think this was probably part of your total point, here, about “the most interesting” ingredient) that people sometimes forget that utilitarianism is not a theory itself about what is normatively desirable, and least not much of one.
For Bentham-style “greatest good for the greatest number” to have any meaning, it has to be supplemented with a view of what property, state of being, action type, etc, counts as a “good” thing, to begin with.
Once this is defined, we can then go on to maximize that—seeking to achieve the most of that, for the most people (or relevant entities.)
But greatest good for the greatest number means nothing until we figure out a theory of normativity, or meta-normativity that can be instantiated across specific, varying situations and scenarios.
IF the “good” is maximizing simple total body weight, then adding up the body weight of all people in possible world A, vs in possible world B, etc, will allow us a utilitarian decision among possible worlds.
IF the “good” were fitness, or mental healty, or educational achievement… we use the same calculus, but the target property is obviously different.
Utilitarianism is sometimes a person’s default answer, until you remind them that this is not an answer at all about what is good. It is just an implementation standard for how that good is to be devided up. Kind of a trivial point, I guess, but worth reminding ourselves from time to time that utilitarianism is not a theory of what is actually good, but how that might be distributed, if that admits of scarcity.
In the same way people’s minds do. They are inconsistent but will notice the setup very quickly and stop. (I don’t find Dutch book arguments very convincing, really).
(a) In what settings do you want an architecture like that, and
(b) Ethics dictate we don’t just want to replace entities for the sake of efficiency even if they disagree. This leads to KILL ALL HUMANS. So, we might get an architecture like that due to how history played out. And then it’s just a brute fact.
I am guessing (a) has to do with “robustness” (I am not prepared to mathematise what I mean yet, but I am thinking about it).
People that think about UDT/blackmail are thinking precisely about how to win in settings I am talking about.
Pick a side of this fence. Will AI resist running-in-circles trivially, or is its running in circles all that’s saving us from KILL ALL HUMANS objectives like you say in part b?
1) Perhaps you give it one domain and a utility function within that domain, and it returns a good action in this domain. Then you give it another domain and a different utility function, and it returns a good action in this domain. Basically I’m saying that it doesn’t maximize a single unified utility function.
2) You prove too much. This implies that the Unix cat program has a utility function (or else it is wasting effort). Technically you could view it as having a utility function of “1 if I output what the source code of cat outputs, 0 otherwise”, but this really isn’t a useful level of analysis. Also, if you’re going to go the route of assigning a silly utility function to this program, then this is a utility function over something like “memory states in an abstract virtual machine”, not “states of the universe”, so it will not necessarily (say) try to break out of its box to get more computation power.
On 2, we’re talking about things in the space of agents. Unix utilities are not agents.
But if you really want to go that route? You didn’t prove it wrong, just silly. The more agent-like the thing we’re talking about, the less silly it is.
I don’t think the connotations of “silly” are quite right here. You could still use this program to do quite a lot of useful inference and optimization across a variety of domains, without killing everyone. Sort of like how frequentist statistics can be very accurate in some cases despite being suboptimal by Bayesian standards. Bostrom mostly only talks about agent-like AIs, and while I think that this is mostly the right approach, he should have been more explicit about that. As I said before, we don’t currently know how to build agent-like AGIs at the moment because we haven’t solved the ontology mapping problem, but we do know how to build non-agentlike cross-domain optimizers given enough computation power.
Okay. We seem to be disputing definitions here. By your definition, it is totally possible to build a very good cross-domain optimizer without it being an agent (so it doesn’t optimize a utility function over the universe). It seems like we mostly agree on matters of fact.
2) Suppose action is the derivative of the utility function in some sense. Then you can derive a utility function from the actions taken in various circumstances. If the ‘curl’ of the function was not 0, then it was wasting effort. If it was, then it was acting as if it had a utility function anyway.
How do you propose to discover the utility function of an agent by observing its actions? You will only ever see a tiny proportion of the possible situations it could be in, and in those situations you will not observe any of the actions it could have made but did not.
Are there qualifications to the orthogonality thesis besides those mentioned?
My main problem with this chapter is that Bostrom assumes that an AI has a single utility function, and that it is expressed as a function on possible states of the universe. Theoretically, you could design a program that is given a domain (in the form of, say, a probabilistic program) and outputs a good action within this domain. In principle, you could have asked it to optimize a function over the universe, but if you don’t, then it won’t. So I think that this program can determine near-optimal behavior across a variety of domains without actually having a utility function (especially not one that is over states of the universe). Of course, this would still be dangerous because anyone could ask it to optimize a function over the universe, but I do think that the assumption that AIs have utility functions over universe states should be more clearly stated and discussed.
It should also be noted that we currently do know how to solve domain-specific optimization problems (such as partially observable Markov decision processes) given enough computation power, but we do not know how to optimize a function over universe states in a way that is agnostic about how to model the universe; this is related to the ontological crisis problem.
1) A single utility function? What would it mean to have multiple?
2) Suppose action is the derivative of the utility function in some sense. Then you can derive a utility function from the actions taken in various circumstances. If the ‘curl’ of the function was not 0, then it was wasting effort. If it was, then it was acting as if it had a utility function anyway.
It would mean there is a difficult-to-characterize ecosystem of competing/cooperating agents. Does this sort of cognitive architecture seem familiar at all? :)
My general problem with “utilitarianism” is that it’s sort of like Douglas Adams’ “42.” An answer of the wrong type to a difficult question. Of course we should maximize, that is a useful ingredient of the answer, but is not the only (or the most interesting) ingredient.
Taking off from the end of that point, I might add (but I think this was probably part of your total point, here, about “the most interesting” ingredient) that people sometimes forget that utilitarianism is not a theory itself about what is normatively desirable, and least not much of one. For Bentham-style “greatest good for the greatest number” to have any meaning, it has to be supplemented with a view of what property, state of being, action type, etc, counts as a “good” thing, to begin with. Once this is defined, we can then go on to maximize that—seeking to achieve the most of that, for the most people (or relevant entities.)
But greatest good for the greatest number means nothing until we figure out a theory of normativity, or meta-normativity that can be instantiated across specific, varying situations and scenarios.
IF the “good” is maximizing simple total body weight, then adding up the body weight of all people in possible world A, vs in possible world B, etc, will allow us a utilitarian decision among possible worlds.
IF the “good” were fitness, or mental healty, or educational achievement… we use the same calculus, but the target property is obviously different.
Utilitarianism is sometimes a person’s default answer, until you remind them that this is not an answer at all about what is good. It is just an implementation standard for how that good is to be devided up. Kind of a trivial point, I guess, but worth reminding ourselves from time to time that utilitarianism is not a theory of what is actually good, but how that might be distributed, if that admits of scarcity.
How would they interact such that it’s not simply adding over them, and they don’t end up being predictably Dutch-bookable?
In the same way people’s minds do. They are inconsistent but will notice the setup very quickly and stop. (I don’t find Dutch book arguments very convincing, really).
Seems like a layer of inefficiency to have to resist temptation to run in circles rather than just want to go uphill.
There are two issues:
(a) In what settings do you want an architecture like that, and
(b) Ethics dictate we don’t just want to replace entities for the sake of efficiency even if they disagree. This leads to KILL ALL HUMANS. So, we might get an architecture like that due to how history played out. And then it’s just a brute fact.
I am guessing (a) has to do with “robustness” (I am not prepared to mathematise what I mean yet, but I am thinking about it).
People that think about UDT/blackmail are thinking precisely about how to win in settings I am talking about.
Pick a side of this fence. Will AI resist running-in-circles trivially, or is its running in circles all that’s saving us from KILL ALL HUMANS objectives like you say in part b?
If the latter, we are so utterly screwed.
1) Perhaps you give it one domain and a utility function within that domain, and it returns a good action in this domain. Then you give it another domain and a different utility function, and it returns a good action in this domain. Basically I’m saying that it doesn’t maximize a single unified utility function.
2) You prove too much. This implies that the Unix cat program has a utility function (or else it is wasting effort). Technically you could view it as having a utility function of “1 if I output what the source code of cat outputs, 0 otherwise”, but this really isn’t a useful level of analysis. Also, if you’re going to go the route of assigning a silly utility function to this program, then this is a utility function over something like “memory states in an abstract virtual machine”, not “states of the universe”, so it will not necessarily (say) try to break out of its box to get more computation power.
On 2, we’re talking about things in the space of agents. Unix utilities are not agents.
But if you really want to go that route? You didn’t prove it wrong, just silly. The more agent-like the thing we’re talking about, the less silly it is.
I don’t think the connotations of “silly” are quite right here. You could still use this program to do quite a lot of useful inference and optimization across a variety of domains, without killing everyone. Sort of like how frequentist statistics can be very accurate in some cases despite being suboptimal by Bayesian standards. Bostrom mostly only talks about agent-like AIs, and while I think that this is mostly the right approach, he should have been more explicit about that. As I said before, we don’t currently know how to build agent-like AGIs at the moment because we haven’t solved the ontology mapping problem, but we do know how to build non-agentlike cross-domain optimizers given enough computation power.
I don’t see how being able to using a non-agent program to do useful things means it’s not silly to say it has a utility function. It’s not an agent.
Okay. We seem to be disputing definitions here. By your definition, it is totally possible to build a very good cross-domain optimizer without it being an agent (so it doesn’t optimize a utility function over the universe). It seems like we mostly agree on matters of fact.
How do you propose to discover the utility function of an agent by observing its actions? You will only ever see a tiny proportion of the possible situations it could be in, and in those situations you will not observe any of the actions it could have made but did not.
Observationally, you can’t. But given its source code...