I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent. ... With a safe genie, wishing is superfluous. Just run the genie.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”,
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
I make that assumption explicit here.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
If enough people agree with you (and I’m inclined that way myself), then updating will be built into the CEV.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
See The Hidden Complexity of Wishes, for example.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.