Do you agree that FAI probably needs to have a complex utility function, because most simple ones lead to futures we wouldn’t want to happen?
How would I know that unless I knew what I “want”? What notion of “want” are you thinking of, if not something like “values endorsed in reflective equilibrium”? I assume you’re not thinking of “want” as opposed to “like” …
Perhaps you mean “there aren’t any simple utility functions that I would choose to implement in an AI right now and let it run, knowing it would then take over the world” but I don’t think that shows FAI probably needs to have a complex utility function. It could just be that I need more time to think things over but will eventually decide to implement a simple utility function.
Retreating further along the line of Eliezer’s reasoning to find the point where you start to disagree: how about AIs that don’t take over the world? For example, I want an AI that I can ask for a cheeseburger, and it will produce a cheeseburger for me while respecting my implied wishes to not burn the world with molecular nanotech or kill the neighbor’s dog for meat. Do you agree that such a device needs to have lots of specific knowledge about humans, and not just about cheeseburgers? If yes, then how is the goal of solving the world’s problems (saving kids in Africa, stopping unfriendly AIs, etc) relevantly different from the goal of making a cheeseburger?
Cousin_it and I had an offline chat. To recap my arguments:
It’s not clear that a cheeseburger-making AI needs to have lots of specific knowledge about humans. As we discussed, one possibility is to give it a utility function that assigns negative value to anything consequential crossing some geographical boundary, except a cheeseburger.
More generally, the fact that we can’t easily think of a way to solve a particular real-world problem (with minimal side effects) using an AI with a simple utility function is only weak evidence that such a simple utility function doesn’t exist, since the space of utility functions simple enough to be hand coded is still enormous.
Even if there are some real-world problems that can’t be solved using simple utility functions, I do not just want to solve particular problem. I want to get “what I really want” (in some sense that I can’t define clearly, but is more like “my reflective equilibrium values” than “my current behavioral tendencies”), and it seems plausible that “what I really want” is less complex than the information needed to tell an AI how to solve particular problems while keeping the world otherwise unchanged.
I think Eliezer’s “thou art godshatter” argument was meant to be generally applicable: if it was sound, then we can conclude that anyone who thinks their values are simple is wrong in an objective sense. We no longer seem to have such an argument (that is still viable) and those who are proponents for “complexity of value” perhaps have to retreat to something like “people who think they have simple values are wrong in the sense that their plans would be bad according to my values”.
“thou art godshatter” argument … We no longer seem to have such an argument (that is still viable)
How about this restatement (focus), I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
There are many possible values, and we could construct agents that optimize any one of them. Among these many values only few are simple in abstract (i.e. not referring to artifacts in the world) and explicit form. What kind of agents are human, which of the many possible values are associated with them in a sufficiently general sense of value-association that applies to humans? For humans to have specifically those few-of-many simple values, they would need to be constructed in a clean way with an explicit goal architecture that places those particular abstract goals in charge. But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
This is an antiprediction, it argues from a priori improbability, rather than tries to shift an existing position that favors simple values. At this point, it intuitively feels to me that simple values are unprivileged, and there seems to be no known reason to expect that they would be more likely than anything else (that has more data). This kind of simplicity is not the kind from Occam’s razor: it seems like expecting air molecules in the room to keep in one corner, arranged on a regular grid, as opposed to being distributed in a macroscopically more uniform, but microscopically enormously detailed configuration. We have this brain-artifact that is known to hold lots of data, and expecting all this data to amount to a simple summary doesn’t look plausible to me.
I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
Yes, this post is making a different point from that one.
But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
But why think all that complexity is relevant? Surely at least some of the complexity is not relevant (for example a Tourette sufferer’s tendency to curse at random, or the precise ease with which some people get addicted to gambling). Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Again, it’s an antiprediction, argument about what your prior should be, and not an argument that takes the presumption of simple values being plausible as a starting point and then tries to convince that it should be tuned down. Something is relevant, brains are probably relevant, this is where decision-making happens. The claim that the relevant decision-theoretic summary of that has any given rare property, like simplicity, is something that needs strong evidence, if you start from that prior. I don’t see why privileging simplicity is a viable starting point, why this hypothesis deserves any more consideration than the claim that the future should contain a perfect replica of myself from the year 1995.
Why is simplicity assumed to be a rare property? There are large classes of things that tend to be “simple”, like much of mathematics; don’t you have to argue that “brains” or minds belong to the class of things whose members do not tend to be “simple”? (I can see obvious reasons that one would think this is obvious, and non-obvious reasons that one would think this is non-obvious, which makes me think that it shouldn’t be assumed, but if I’m the only person around who could make the non-obvious arguments then I guess we’re out of luck.)
When answer is robustly unattainable, it’s pointless to speculate what it might be, you can only bet or build conditional plans. If “values” are “simple”, but you don’t know that, your state of knowledge about the “values” remains non-simple, and that is what you impart to AI. What does it matter which of these things, the confused state of knowledge or the correct answer to our confusion, do we call “values”?
If I think the correct answer to our confusion will ultimately turn out to be something complex (in the sense of Godshatter-like), then I can rule out any plans that eventually call for hard coding such an answer into an AI. This seems to be Eliezer’s argument (or one of his main arguments) for implementing CEV.
On the other hand, if I think the correct answer may turn out to be simple, even if I don’t know what it is now, then there’s a chance I can find out the answer directly in the next few decades and then hard code that answer into an AI. Something like CEV is no longer the obvious best approach.
(Personally I still prefer a “meta-ethical” or “meta-philosophical” approach, but we’d need a different argument for it besides “thou art godshatter”/”complexity of value”.)
How would I know that unless I knew what I “want”? What notion of “want” are you thinking of, if not something like “values endorsed in reflective equilibrium”? I assume you’re not thinking of “want” as opposed to “like” …
Perhaps you mean “there aren’t any simple utility functions that I would choose to implement in an AI right now and let it run, knowing it would then take over the world” but I don’t think that shows FAI probably needs to have a complex utility function. It could just be that I need more time to think things over but will eventually decide to implement a simple utility function.
Retreating further along the line of Eliezer’s reasoning to find the point where you start to disagree: how about AIs that don’t take over the world? For example, I want an AI that I can ask for a cheeseburger, and it will produce a cheeseburger for me while respecting my implied wishes to not burn the world with molecular nanotech or kill the neighbor’s dog for meat. Do you agree that such a device needs to have lots of specific knowledge about humans, and not just about cheeseburgers? If yes, then how is the goal of solving the world’s problems (saving kids in Africa, stopping unfriendly AIs, etc) relevantly different from the goal of making a cheeseburger?
Cousin_it and I had an offline chat. To recap my arguments:
It’s not clear that a cheeseburger-making AI needs to have lots of specific knowledge about humans. As we discussed, one possibility is to give it a utility function that assigns negative value to anything consequential crossing some geographical boundary, except a cheeseburger.
More generally, the fact that we can’t easily think of a way to solve a particular real-world problem (with minimal side effects) using an AI with a simple utility function is only weak evidence that such a simple utility function doesn’t exist, since the space of utility functions simple enough to be hand coded is still enormous.
Even if there are some real-world problems that can’t be solved using simple utility functions, I do not just want to solve particular problem. I want to get “what I really want” (in some sense that I can’t define clearly, but is more like “my reflective equilibrium values” than “my current behavioral tendencies”), and it seems plausible that “what I really want” is less complex than the information needed to tell an AI how to solve particular problems while keeping the world otherwise unchanged.
I think Eliezer’s “thou art godshatter” argument was meant to be generally applicable: if it was sound, then we can conclude that anyone who thinks their values are simple is wrong in an objective sense. We no longer seem to have such an argument (that is still viable) and those who are proponents for “complexity of value” perhaps have to retreat to something like “people who think they have simple values are wrong in the sense that their plans would be bad according to my values”.
How about this restatement (focus), I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
There are many possible values, and we could construct agents that optimize any one of them. Among these many values only few are simple in abstract (i.e. not referring to artifacts in the world) and explicit form. What kind of agents are human, which of the many possible values are associated with them in a sufficiently general sense of value-association that applies to humans? For humans to have specifically those few-of-many simple values, they would need to be constructed in a clean way with an explicit goal architecture that places those particular abstract goals in charge. But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
This is an antiprediction, it argues from a priori improbability, rather than tries to shift an existing position that favors simple values. At this point, it intuitively feels to me that simple values are unprivileged, and there seems to be no known reason to expect that they would be more likely than anything else (that has more data). This kind of simplicity is not the kind from Occam’s razor: it seems like expecting air molecules in the room to keep in one corner, arranged on a regular grid, as opposed to being distributed in a macroscopically more uniform, but microscopically enormously detailed configuration. We have this brain-artifact that is known to hold lots of data, and expecting all this data to amount to a simple summary doesn’t look plausible to me.
Yes, this post is making a different point from that one.
But why think all that complexity is relevant? Surely at least some of the complexity is not relevant (for example a Tourette sufferer’s tendency to curse at random, or the precise ease with which some people get addicted to gambling). Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Again, it’s an antiprediction, argument about what your prior should be, and not an argument that takes the presumption of simple values being plausible as a starting point and then tries to convince that it should be tuned down. Something is relevant, brains are probably relevant, this is where decision-making happens. The claim that the relevant decision-theoretic summary of that has any given rare property, like simplicity, is something that needs strong evidence, if you start from that prior. I don’t see why privileging simplicity is a viable starting point, why this hypothesis deserves any more consideration than the claim that the future should contain a perfect replica of myself from the year 1995.
Why is simplicity assumed to be a rare property? There are large classes of things that tend to be “simple”, like much of mathematics; don’t you have to argue that “brains” or minds belong to the class of things whose members do not tend to be “simple”? (I can see obvious reasons that one would think this is obvious, and non-obvious reasons that one would think this is non-obvious, which makes me think that it shouldn’t be assumed, but if I’m the only person around who could make the non-obvious arguments then I guess we’re out of luck.)
When answer is robustly unattainable, it’s pointless to speculate what it might be, you can only bet or build conditional plans. If “values” are “simple”, but you don’t know that, your state of knowledge about the “values” remains non-simple, and that is what you impart to AI. What does it matter which of these things, the confused state of knowledge or the correct answer to our confusion, do we call “values”?
If I think the correct answer to our confusion will ultimately turn out to be something complex (in the sense of Godshatter-like), then I can rule out any plans that eventually call for hard coding such an answer into an AI. This seems to be Eliezer’s argument (or one of his main arguments) for implementing CEV.
On the other hand, if I think the correct answer may turn out to be simple, even if I don’t know what it is now, then there’s a chance I can find out the answer directly in the next few decades and then hard code that answer into an AI. Something like CEV is no longer the obvious best approach.
(Personally I still prefer a “meta-ethical” or “meta-philosophical” approach, but we’d need a different argument for it besides “thou art godshatter”/”complexity of value”.)