There is no way for human values to magically jump inside the AI, so if it’s not specifically created to reflect them, it won’t have them, and whatever the AI ends up with won’t come close to human values, because human values are too complex to be resembled by any given structure that happens to be formed in the AI.
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
The more AI’s preference diverges from ours, the more we lose, and this loss is on astronomic scale (even if preference diverges relatively little).
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the twofollow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The idea was to “approximate human values”—not to express them in precise detail
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
I’m not convinced by the claim that human values have high Kolmogorov complexity.
In particular, Eliezer’s article Not for the Sake of Happiness Alone is totally at odds with my own beliefs. In my mind, it’s incoherent to give anything other than subjective experiences ethical consideration. My own preference for real science over imagined science is entirely instrumental and not at all terminal.
Now, maybe Eliezer is confused about what his terminal values are, or maybe I’m confused about what my terminal values are, or maybe our terminal values are incompatible. In any case, it’s not obvious that an AI should care about anything other than the subjective experiences of sentient beings.
Suppose that it’s okay for an AI to exclude everything but subjective experience from ethical consideration. Is there then still reason to expect that human values have high Kolmogorov complexity?
I don’t have a low complexity description to offer, but it seems to me that one can get a lot of mileage out of the principles “if an individual prefers state A to state B whenever he/she/it is in either of state A or state B, then state A is superior for that individual to state B” and “when faced with two alternatives, the moral alternative is the one that you would prefer if you were going to live through the lives of all sentient beings involved.”
Of course “sentient being” is ill-defined and one would have to do a fair amount of work frame the things that I just said in more formal terms, but anyway, it’s not clear to me that there’s a really serious problem here.
I totally agree that if the creation of a superhuman AI is going to precede all other existential threats then we should focus all of our resources on trying to get the superhuman AI to be as friendly as possible.
Have you read the Heaven post by denisbider and the two follow-ups constituting a mini-wireheading series? There have been other posts on the difference between wanting and liking; but it illustrates a fairly strong problem with wireheading: Even if all we’re worried about is “subjective states,” many people won’t want to be put in that subjective state, even knowing they’ll like it. Forcing them into it or changing their value system so they do want it are ethically suboptimal solutions.
So, it seems to me that if anything other than maximized absolute wireheading for everyone is the AI’s goal, it’s gonna start to get complicated.
Thanks for the references to the posts which I had not seen before and which I find relevant. I’m sympathetic toward denisbider’s view, but will read the comments to see if I find diverging views compelling.
Maybe you should start with what’s linked from fake fake utility functions then (the page on the wiki wasn’t organized quite as I expected).
But I would qualify the last sentence of my reply by saying that the best way to get a superhuman AI to be as friendly as possible may not be to work on friendly AI or advocate for friendly AI. For example, it may be best to work toward geopolitical stability to minimize the chances of some country rashly creating a potentially unsafe AI out of a sense of desperation during wartime.
(?) I never said that.
Yes, I was agreeing with what I inferred your attitude to be rather than agreeing with something that you said. (I apologize if I distorted your views—if you’d like I can edit my comment to remove the suggestion that you hold the position that I attributed to you.)
I don’t believe that we “should focus all of our resources” on FAI, as there are many other worthy activities to focus on. The argument is that this particular problem gets disproportionally little attention, and while with other risks we can in principle luck out even if they get no attention, it isn’t so for AI. Failing to take FAI seriously is fatal, failing to take nanotech seriously isn’t necessarily fatal.
Thus, although strictly speaking I agree with your implication, I don’t see its condition plausible, and so implication as whole relevant.
Re: “Is there then still reason to expect that human values have high Kolmogorov complexity?”
Human values are mosly a product of their genes and their memes. There is an awful lot of information in those. However, it is true that you can fairly closely approximate human values—or those of any other creature—by the directive to make as many grandchildren as possible—which seems reasonably simple.
Most of the arguments for humans having complex values appear to list a whole bunch of proximate goals—as though that constitutes evidence.
I disagree. You need to know much more than just the drive for grandchildren, given the massively diverse ways we observe even in our present world for species to propagate, all of which correspond to different articulable values once they reach human intelligence.
Human values should be expected to have a high K-complexity because you would need to specify both the genes/early environment, and the precise place in history/Everett branches where humans are now.
The idea was to “approximate human values”—not to express them in precise detail: nobody cares much if Jim likes strawberry jam more than he likes raspberry jam.
The environment mostly drops out of the equation—because most of it is shared between the agents involved—and because of the phenomenon of Canalisation: http://en.wikipedia.org/wiki/Canalisation_%28genetics%29
Sure, but I take “approximation” to mean something like getting you within 10 or so bits of the true distribution, but the heuristic you gave still leaves you maybe 500 or so bits away, which is huge, and far more than you implied.
That would help you on message length if you had already stored one person’s values and were looking to store a second person’s. It does not for describing the first person’s value, or some aggregate measure of humans’ values.
10 bits!!! That’s not much of a message!
The idea of a shared environment arises because the proposed machine—in which the human-like values are to be implemented—is to live in the same world as the human. So, one does not need to specify all the details of the environment—since these are shared naturally between the agents in question.
10 bits short of the needed message, not a 10-bit message. I mean that e.g. an approximation gives 100 bits when full accuracy would be 110 bits (and 10 bits is an upper bound).
That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.
Re: “That still doesn’t answer my point; it just shows how once you have one agent, adding others is easy. It doesn’t show how getting the first, or the “general” agent is easy.”
To specify the environment, choose the universe, galaxy, star, planet, lattiude, longitude and time. I am not pretending that information is simple, just that it is already there, if your project is building an intelligent agent.
Re: “10 bits short of the needed message”.
Yes, I got that the first time. I don’t think you are appreciating the difficulty of coding even relatively simple utility functions. A couple of ASCII characters is practically nothing!
ASCII characters aren’t a relevant metric here. Getting within 10 bits of the correct answer means that you’ve narrowed it down to 2^10 = 1024 distinct equiprobable possibilities [1], one of which is correct. Sounds like an approximation to me! (if a bit on the lower end of the accuracy expected out of one)
[1] or probability distribution with the same KL divergence from the true governing distribution
Or you can implement constant K-complexity learn-by-example algorithm and get all the rest from environment.
How about “Do as your creators do (generalize this as your creators generalize)”?