Cousin_it and I had an offline chat. To recap my arguments:
It’s not clear that a cheeseburger-making AI needs to have lots of specific knowledge about humans. As we discussed, one possibility is to give it a utility function that assigns negative value to anything consequential crossing some geographical boundary, except a cheeseburger.
More generally, the fact that we can’t easily think of a way to solve a particular real-world problem (with minimal side effects) using an AI with a simple utility function is only weak evidence that such a simple utility function doesn’t exist, since the space of utility functions simple enough to be hand coded is still enormous.
Even if there are some real-world problems that can’t be solved using simple utility functions, I do not just want to solve particular problem. I want to get “what I really want” (in some sense that I can’t define clearly, but is more like “my reflective equilibrium values” than “my current behavioral tendencies”), and it seems plausible that “what I really want” is less complex than the information needed to tell an AI how to solve particular problems while keeping the world otherwise unchanged.
I think Eliezer’s “thou art godshatter” argument was meant to be generally applicable: if it was sound, then we can conclude that anyone who thinks their values are simple is wrong in an objective sense. We no longer seem to have such an argument (that is still viable) and those who are proponents for “complexity of value” perhaps have to retreat to something like “people who think they have simple values are wrong in the sense that their plans would be bad according to my values”.
“thou art godshatter” argument … We no longer seem to have such an argument (that is still viable)
How about this restatement (focus), I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
There are many possible values, and we could construct agents that optimize any one of them. Among these many values only few are simple in abstract (i.e. not referring to artifacts in the world) and explicit form. What kind of agents are human, which of the many possible values are associated with them in a sufficiently general sense of value-association that applies to humans? For humans to have specifically those few-of-many simple values, they would need to be constructed in a clean way with an explicit goal architecture that places those particular abstract goals in charge. But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
This is an antiprediction, it argues from a priori improbability, rather than tries to shift an existing position that favors simple values. At this point, it intuitively feels to me that simple values are unprivileged, and there seems to be no known reason to expect that they would be more likely than anything else (that has more data). This kind of simplicity is not the kind from Occam’s razor: it seems like expecting air molecules in the room to keep in one corner, arranged on a regular grid, as opposed to being distributed in a macroscopically more uniform, but microscopically enormously detailed configuration. We have this brain-artifact that is known to hold lots of data, and expecting all this data to amount to a simple summary doesn’t look plausible to me.
I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
Yes, this post is making a different point from that one.
But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
But why think all that complexity is relevant? Surely at least some of the complexity is not relevant (for example a Tourette sufferer’s tendency to curse at random, or the precise ease with which some people get addicted to gambling). Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Again, it’s an antiprediction, argument about what your prior should be, and not an argument that takes the presumption of simple values being plausible as a starting point and then tries to convince that it should be tuned down. Something is relevant, brains are probably relevant, this is where decision-making happens. The claim that the relevant decision-theoretic summary of that has any given rare property, like simplicity, is something that needs strong evidence, if you start from that prior. I don’t see why privileging simplicity is a viable starting point, why this hypothesis deserves any more consideration than the claim that the future should contain a perfect replica of myself from the year 1995.
Why is simplicity assumed to be a rare property? There are large classes of things that tend to be “simple”, like much of mathematics; don’t you have to argue that “brains” or minds belong to the class of things whose members do not tend to be “simple”? (I can see obvious reasons that one would think this is obvious, and non-obvious reasons that one would think this is non-obvious, which makes me think that it shouldn’t be assumed, but if I’m the only person around who could make the non-obvious arguments then I guess we’re out of luck.)
Cousin_it and I had an offline chat. To recap my arguments:
It’s not clear that a cheeseburger-making AI needs to have lots of specific knowledge about humans. As we discussed, one possibility is to give it a utility function that assigns negative value to anything consequential crossing some geographical boundary, except a cheeseburger.
More generally, the fact that we can’t easily think of a way to solve a particular real-world problem (with minimal side effects) using an AI with a simple utility function is only weak evidence that such a simple utility function doesn’t exist, since the space of utility functions simple enough to be hand coded is still enormous.
Even if there are some real-world problems that can’t be solved using simple utility functions, I do not just want to solve particular problem. I want to get “what I really want” (in some sense that I can’t define clearly, but is more like “my reflective equilibrium values” than “my current behavioral tendencies”), and it seems plausible that “what I really want” is less complex than the information needed to tell an AI how to solve particular problems while keeping the world otherwise unchanged.
I think Eliezer’s “thou art godshatter” argument was meant to be generally applicable: if it was sound, then we can conclude that anyone who thinks their values are simple is wrong in an objective sense. We no longer seem to have such an argument (that is still viable) and those who are proponents for “complexity of value” perhaps have to retreat to something like “people who think they have simple values are wrong in the sense that their plans would be bad according to my values”.
How about this restatement (focus), I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.
There are many possible values, and we could construct agents that optimize any one of them. Among these many values only few are simple in abstract (i.e. not referring to artifacts in the world) and explicit form. What kind of agents are human, which of the many possible values are associated with them in a sufficiently general sense of value-association that applies to humans? For humans to have specifically those few-of-many simple values, they would need to be constructed in a clean way with an explicit goal architecture that places those particular abstract goals in charge. But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.
This is an antiprediction, it argues from a priori improbability, rather than tries to shift an existing position that favors simple values. At this point, it intuitively feels to me that simple values are unprivileged, and there seems to be no known reason to expect that they would be more likely than anything else (that has more data). This kind of simplicity is not the kind from Occam’s razor: it seems like expecting air molecules in the room to keep in one corner, arranged on a regular grid, as opposed to being distributed in a macroscopically more uniform, but microscopically enormously detailed configuration. We have this brain-artifact that is known to hold lots of data, and expecting all this data to amount to a simple summary doesn’t look plausible to me.
Yes, this post is making a different point from that one.
But why think all that complexity is relevant? Surely at least some of the complexity is not relevant (for example a Tourette sufferer’s tendency to curse at random, or the precise ease with which some people get addicted to gambling). Don’t you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?
Again, it’s an antiprediction, argument about what your prior should be, and not an argument that takes the presumption of simple values being plausible as a starting point and then tries to convince that it should be tuned down. Something is relevant, brains are probably relevant, this is where decision-making happens. The claim that the relevant decision-theoretic summary of that has any given rare property, like simplicity, is something that needs strong evidence, if you start from that prior. I don’t see why privileging simplicity is a viable starting point, why this hypothesis deserves any more consideration than the claim that the future should contain a perfect replica of myself from the year 1995.
Why is simplicity assumed to be a rare property? There are large classes of things that tend to be “simple”, like much of mathematics; don’t you have to argue that “brains” or minds belong to the class of things whose members do not tend to be “simple”? (I can see obvious reasons that one would think this is obvious, and non-obvious reasons that one would think this is non-obvious, which makes me think that it shouldn’t be assumed, but if I’m the only person around who could make the non-obvious arguments then I guess we’re out of luck.)