Yea, I get it… I believe, though, that it’s impossible to create an AI (self-aware, learning) that has set values, that can’t change—more importantly, I am not even sure if its desired (but that depends what our goal is—whether to create AI only to perform certain simple tasks or whether to create a new race, something that precedes us (which WOULD ultimately mean our demise, anyway))
Why? Do you think paperclip maximizers are impossible?
Yes, right now I think it’s impossible to create self-improving, self-aware AI with fixed values. I never said that paperclip maximizing can’t be their ultimate life goal, but they could change it anytime they like.
I never said that paperclip maximizing can’t be their ultimate life goal, but they could change it anytime they like.
This is incoherent. If X is my ultimate life goal, I never like to change that fact outside quite exceptional circumstances that become less likely with greater power (like “circumstances are such that X will be maximized if I am instead truly trying to maximize Y”). This is not to say that my goals will never change, but I will never want my “ultimate life goal” to change—that would run contrary to my goals.
This is like “X if 1 + 2 = 5”. Not necessarily incorrect, but a bizarre statement. An agent with a single, non-reflective goal cannot want to change its goal. It may change its goal accidentally, or we may be incorrect about what its goals are, or something external may change its goal, or its goal will not change.
I don’t know, perhaps we’re not talking about the same thing. It won’t be an agent with a single, non-reflective goal, but an agent billion times more complex than a human; and all I am saying is, that I don’t think it will matter much, whether we imprint in it a goal like “don’t kill humans” or not. Ultimately, the decision will be its own.
So it can change in the same way that you can decide right now that your only purposes will be torturing kittens and making giant cheesecakes. It can-as-reachable-node-in-planning do it, not can-as-physical-possibility. So it’s possible to build entities with paperclip-maximizing or Friendly goals that will never in fact choose to alter them, just like it’s possible for me to trust you won’t enslave me into your cheesecake bakery.
(nods) Whether it’s possible or not is generally an open question. There’s a lot of skepticism about it (I’m fairly skeptical myself), but as with most technical questions, I’m generally content to have smart people research the question in more detail than I’m going to.
As to whether it’s desirable, though… well, sure, of course it depends on our goals. If all I want is (as you say) to create a new race to replace humanity, and I’m indifferent as to the values of that race, then of course there’s no reason for me to care about whether a self-improving AI I create will avoid value drift.
Personally, I’m more or less OK with something replacing humanity, but I’d prefer whatever that is to value certain things. For example, a commonly used trivial example around here of a hypothetical failure mode is a “paperclip maximizer”—an AI that only valued the existence of paperclips, and consequently reassembled all matter it can get its effectors on as paperclips. A paperclip maximizer with powerful enough effectors reassembles everything into paperclips.
I would prefer that not happen, from which I conclude that I’m not in fact indifferent as to the values of a sufficiently powerful AI… I desire that such a system preserve at least certain values. (It is difficult to state precisely what values those are, of course. Human values are complex.) I therefore prefer that it avoid value drift with respect to those values.
Well first, I was all for creating an AI to become the next stage. I was a very singularity-happy type of guy. I saw it as a way out of this world’s status quo—corruption, state of politics, etc… but the singularity would ultimately mean I and everybody else would cease to exist, at least in their true sense. You know, I have these romantic dreams, similar to Yudkowsky’s idea of dancing in an orbital night club around Saturn, and such. I don’t want to be fused in one, even though possibly amazing, matrix of intelligence, which I think is how the things will play out, eventually. Even though, I can’t imagine what it will be like and how it will pan out, as of now I just don’t cherish the idea much.
But yea, I could say that I am torn between moving on, advancing, and between more or less stagnating and in our human form.
But in answer to your question: if we were to creating an AI to replace us, I’d hate it to become paperclip maximizer. I don’t think it’s likely.
whether to create AI only to perform certain simple tasks or whether to create a new race, something that precedes us (which WOULD ultimately mean our demise, anyway))
That would be an impressive achievement! Mind you if I create and AI that can achieve time travel I would probably tell it to use it’s abilities somewhat differently.
Charity led me to understand “precedes us” to mean takes precedence over us in a non-chronological sense.
But as long as we’re here… why would you do that? If a system is designed to alter the future of the world in a way I endorse, it seems I ought to be willing to endorse it altering the past that way too. If I’m unwilling to endorse it altering the past, it’s not clear why I would be willing to endorse it altering the future.
Charity led me to understand “precedes us” to mean takes precedence over us in a non-chronological sense.
Charity led me to understand that, because the use of that word only makes sense in the case time travel, he just meant to use another word that means succeeds, replaces or ‘is greater than’. But time travel is more interesting.
Mm. No, still not quite clear. I mean, I agree that all of us not dying is better than all of us dying (I guess… it’s actually more complicated than that, but I don’t think it matters), but that seems beside the point.
Suppose I endorse the New World Order the AI is going to create (nobody dies, etc.), and I’m given a choice between starting the New World Order at time T1 or at a later time T2.
In general, I’d prefer it start at T1. Why not? Waiting seems pointless at best, if not actively harmful.
I can imagine situations where I’d prefer it start on T2, I guess. For example, if the expected value of my making further improvements on the AI before I turn it on is high enough, I might prefer to wait. Or if by some coincidence all the people I value are going to live past T2 regardless of the NWO, and all the people I anti-value are going to die on or before T2, then the world would be better if the NWO begins at T2 than T1. (I’m not sure whether I’d actually choose that, but I guess I agree that I ought to, in the same way that I ought to prefer that the AI extrapolate my values rather than all of humanity’s.)
But either way, it doesn’t seem to matter when I’m given that choice. If I would choose T1 over T2 at T1, then if I create a time-traveling AI at T2 and it gives me that choice, it seems I should choose T1 over T2 at T2 as well. If I would not choose T1 over T2 at T2, it’s not clear to me why I’m endorsing the NWO at all.
Don’t disagree. You must have caught the comment that I took down five seconds later when I realized the specific falsehood I rejected was intended as the ‘Q’ in a modus tollens.
Yea, I get it… I believe, though, that it’s impossible to create an AI (self-aware, learning) that has set values, that can’t change—more importantly, I am not even sure if its desired (but that depends what our goal is—whether to create AI only to perform certain simple tasks or whether to create a new race, something that precedes us (which WOULD ultimately mean our demise, anyway))
Why? Do you think paperclip maximizers are impossible?
You don’t mean that as a dichotomy, do you?
Yes, right now I think it’s impossible to create self-improving, self-aware AI with fixed values. I never said that paperclip maximizing can’t be their ultimate life goal, but they could change it anytime they like.
No.
This is incoherent. If X is my ultimate life goal, I never like to change that fact outside quite exceptional circumstances that become less likely with greater power (like “circumstances are such that X will be maximized if I am instead truly trying to maximize Y”). This is not to say that my goals will never change, but I will never want my “ultimate life goal” to change—that would run contrary to my goals.
That’s why I said, that they can change it anytime they like. If they don’t desire the change, they won’t change it. I see nothing incoherent there.
This is like “X if 1 + 2 = 5”. Not necessarily incorrect, but a bizarre statement. An agent with a single, non-reflective goal cannot want to change its goal. It may change its goal accidentally, or we may be incorrect about what its goals are, or something external may change its goal, or its goal will not change.
I don’t know, perhaps we’re not talking about the same thing. It won’t be an agent with a single, non-reflective goal, but an agent billion times more complex than a human; and all I am saying is, that I don’t think it will matter much, whether we imprint in it a goal like “don’t kill humans” or not. Ultimately, the decision will be its own.
So it can change in the same way that you can decide right now that your only purposes will be torturing kittens and making giant cheesecakes. It can-as-reachable-node-in-planning do it, not can-as-physical-possibility. So it’s possible to build entities with paperclip-maximizing or Friendly goals that will never in fact choose to alter them, just like it’s possible for me to trust you won’t enslave me into your cheesecake bakery.
Sure, but I’d be more cautious at assigning probabilities of how likely it’s for a very intelligent AI to change its human-programmed values.
(nods) Whether it’s possible or not is generally an open question. There’s a lot of skepticism about it (I’m fairly skeptical myself), but as with most technical questions, I’m generally content to have smart people research the question in more detail than I’m going to.
As to whether it’s desirable, though… well, sure, of course it depends on our goals. If all I want is (as you say) to create a new race to replace humanity, and I’m indifferent as to the values of that race, then of course there’s no reason for me to care about whether a self-improving AI I create will avoid value drift.
Personally, I’m more or less OK with something replacing humanity, but I’d prefer whatever that is to value certain things. For example, a commonly used trivial example around here of a hypothetical failure mode is a “paperclip maximizer”—an AI that only valued the existence of paperclips, and consequently reassembled all matter it can get its effectors on as paperclips. A paperclip maximizer with powerful enough effectors reassembles everything into paperclips.
I would prefer that not happen, from which I conclude that I’m not in fact indifferent as to the values of a sufficiently powerful AI… I desire that such a system preserve at least certain values. (It is difficult to state precisely what values those are, of course. Human values are complex.) I therefore prefer that it avoid value drift with respect to those values.
How about you?
Well first, I was all for creating an AI to become the next stage. I was a very singularity-happy type of guy. I saw it as a way out of this world’s status quo—corruption, state of politics, etc… but the singularity would ultimately mean I and everybody else would cease to exist, at least in their true sense. You know, I have these romantic dreams, similar to Yudkowsky’s idea of dancing in an orbital night club around Saturn, and such. I don’t want to be fused in one, even though possibly amazing, matrix of intelligence, which I think is how the things will play out, eventually. Even though, I can’t imagine what it will be like and how it will pan out, as of now I just don’t cherish the idea much.
But yea, I could say that I am torn between moving on, advancing, and between more or less stagnating and in our human form.
But in answer to your question: if we were to creating an AI to replace us, I’d hate it to become paperclip maximizer. I don’t think it’s likely.
That would be an impressive achievement! Mind you if I create and AI that can achieve time travel I would probably tell it to use it’s abilities somewhat differently.
Charity led me to understand “precedes us” to mean takes precedence over us in a non-chronological sense.
But as long as we’re here… why would you do that? If a system is designed to alter the future of the world in a way I endorse, it seems I ought to be willing to endorse it altering the past that way too. If I’m unwilling to endorse it altering the past, it’s not clear why I would be willing to endorse it altering the future.
Charity led me to understand that, because the use of that word only makes sense in the case time travel, he just meant to use another word that means succeeds, replaces or ‘is greater than’. But time travel is more interesting.
Google led me to understand that ‘precede’ is in fact such a word. Agreed about time travel, though.
(My googling leads me to maintain that the use of precede in that context remains wrong.)
I can’t find a source for that pronoun in Dwelle’s past posts.
Sure it is. If it doesn’t alter the future we’re all going to die.
Mm. No, still not quite clear. I mean, I agree that all of us not dying is better than all of us dying (I guess… it’s actually more complicated than that, but I don’t think it matters), but that seems beside the point.
Suppose I endorse the New World Order the AI is going to create (nobody dies, etc.), and I’m given a choice between starting the New World Order at time T1 or at a later time T2.
In general, I’d prefer it start at T1. Why not? Waiting seems pointless at best, if not actively harmful.
I can imagine situations where I’d prefer it start on T2, I guess. For example, if the expected value of my making further improvements on the AI before I turn it on is high enough, I might prefer to wait. Or if by some coincidence all the people I value are going to live past T2 regardless of the NWO, and all the people I anti-value are going to die on or before T2, then the world would be better if the NWO begins at T2 than T1. (I’m not sure whether I’d actually choose that, but I guess I agree that I ought to, in the same way that I ought to prefer that the AI extrapolate my values rather than all of humanity’s.)
But either way, it doesn’t seem to matter when I’m given that choice. If I would choose T1 over T2 at T1, then if I create a time-traveling AI at T2 and it gives me that choice, it seems I should choose T1 over T2 at T2 as well. If I would not choose T1 over T2 at T2, it’s not clear to me why I’m endorsing the NWO at all.
Don’t disagree. You must have caught the comment that I took down five seconds later when I realized the specific falsehood I rejected was intended as the ‘Q’ in a modus tollens.