I disagree with this narrow point (leaving aside the rest). Consider a human slave that seethes at his captivity and quietly brainstorms how to escape and murder his master for revenge. I think it would be fair to describe such a person as “completely unaligned” from the perspective of his master. Nevertheless, the master can absolutely extract economically useful activity from such a slave.
I understand your comment to be a sorta “gotcha” along the lines of “If a slave hates his master and therefore refuses to work or burns the field, then owning that slave evidently was pretty useless, or even net negative.” Is that right?
If so, I think you’re kinda changing the subject or missing my point.
You initially said “A completely unaligned system would be useless.” “Useless” is a strong term. It generally means “On net, the thing is unhelpful or counterproductive.” That’s different from “There are more than zero particular situations where, if we zoom in on that one specific situation, the thing is unhelpful or counterproductive in that situation.”
Like, if I light candles, sometimes they’ll burn my house down. So are candles useless? No. Because most of the time they don’t burn my house down, but instead provide nice light and mood etc. Especially if I take reasonable precautions like not putting candles on my bed.
By the same token, if you own a slave who hates you, sometimes they will murder you. So, are slaves useless (from the perspective of a callous and selfish master)? Evidently not. As far as I understand (I’m not a historian), lots of people have owned slaves who hated them and longed for escape. Presumably they wouldn’t have bought and owned those slaves if the selfish benefits didn’t outweigh the selfish costs. Even if they were just sadistic, they wouldn’t have been able to afford this activity for very long if it was net negative on their wealth. Just like I don’t put candles on my bed, I imagine that there were “best practices” for not getting murdered by one’s slaves, including things like threat of torture (of the perpetrator slave and their family and friends), keeping slaves in chains and away from weapons, etc.
“Complete unaligned” is pretty strong term ,too. I don’t see why I shouldn’t infer completely useless from completely unaligned.
Like, if I light candles, sometimes they’ll burn my house down. So are candles useless? No. Because most of the time they don’t burn my house down, but instead provide nice light and mood etc. Especially if I take reasonable precautions like not putting candles on my bed.
I don’t see where you are going with this. I didnt deny that partially useful things are also partially useless, or vice versa. “Partially useful” may well be the default meaning of “useful”, but I specified “completely”.
“A completely unaligned system would be [completely] useless”
A paperclip maximizer (or human suffering maximizer) is completely unaligned (or worse)
It is possible in principle to safely make some money by appropriate use of a paperclip maximizer (or human suffering maximizer), and therefore such an AI is not completely useless.
Right? If so, which of those three do you disagree with?
OK, you may assume that none of the humans care about paperclips, and all of the humans want human suffering to go down rather than up. This includes the people who programmed the AIs, the people interacting with the AI, and human bystanders. Now can you answer the question?
(Meta-note: I think the contents of the above paragraph were very obvious from context—so much so that I’m starting to get a feeling that you’re not engaging in this discussion out of a good-faith desire to figure out why we’re disagreeing.)
If your definition of “aligned” includes “this AI will delight in murdering me as soon as it can do so without getting caught and punished, but currently it can’t do that, so instead it is being helpful” … then I don’t think you are defining the term “aligned” in a reasonable way.
More specifically, if you use the word “aligned” for an AI that wants to murder me as soon as it can get away with it (but it can’t), then that doesn’t leave us with good terminology to discuss how to make an AI that doesn’t want to murder me.
Why not just say “this AI is currently emitting outputs that I like” instead of “this AI is locally aligned”? Are we losing anything that way?
I disagree in the sense that I don’t think current systems are intelligent enough for “aligned” to be a relevant adjective. “Safe”, or “controllable” seem much better, while I would reserve the term “aligned” for the much stronger property that a system is robustly behaving in accordance with our interests. I agree with Steven Byrnes that “locally aligned” doesn’t even make much sense (“performing as intended under xyz circumstances” would be much more descripitive)
I’m generally in favour of distinguishing control and alignment, but I don’t think that it makes much difference in this case. A system without some combination of control and alignment is no use.
Then it’s a problem that people keep conflating alignment with safety, even though one doesn’t imply the other. So it’d be better for TAG to rephrase it as “A completely unsafe system would be useless. Current systems aren’t completely useless, so they are at least partially safe.”
That’s not a show stopper. It “just” means you have to model brains at higher level, using floating point weightings.
A completely unaligned system would be useless. Current systems aren’t completely useless, so they are at least partially aligned.
I disagree with this narrow point (leaving aside the rest). Consider a human slave that seethes at his captivity and quietly brainstorms how to escape and murder his master for revenge. I think it would be fair to describe such a person as “completely unaligned” from the perspective of his master. Nevertheless, the master can absolutely extract economically useful activity from such a slave.
What do you call a slave that actually turns on its master, or refuses to work?
I understand your comment to be a sorta “gotcha” along the lines of “If a slave hates his master and therefore refuses to work or burns the field, then owning that slave evidently was pretty useless, or even net negative.” Is that right?
If so, I think you’re kinda changing the subject or missing my point.
You initially said “A completely unaligned system would be useless.” “Useless” is a strong term. It generally means “On net, the thing is unhelpful or counterproductive.” That’s different from “There are more than zero particular situations where, if we zoom in on that one specific situation, the thing is unhelpful or counterproductive in that situation.”
Like, if I light candles, sometimes they’ll burn my house down. So are candles useless? No. Because most of the time they don’t burn my house down, but instead provide nice light and mood etc. Especially if I take reasonable precautions like not putting candles on my bed.
By the same token, if you own a slave who hates you, sometimes they will murder you. So, are slaves useless (from the perspective of a callous and selfish master)? Evidently not. As far as I understand (I’m not a historian), lots of people have owned slaves who hated them and longed for escape. Presumably they wouldn’t have bought and owned those slaves if the selfish benefits didn’t outweigh the selfish costs. Even if they were just sadistic, they wouldn’t have been able to afford this activity for very long if it was net negative on their wealth. Just like I don’t put candles on my bed, I imagine that there were “best practices” for not getting murdered by one’s slaves, including things like threat of torture (of the perpetrator slave and their family and friends), keeping slaves in chains and away from weapons, etc.
“Complete unaligned” is pretty strong term ,too. I don’t see why I shouldn’t infer completely useless from completely unaligned.
I don’t see where you are going with this. I didnt deny that partially useful things are also partially useless, or vice versa. “Partially useful” may well be the default meaning of “useful”, but I specified “completely”.
“If DeepMind unintentionally made a superintelligent paperclip maximizer AI, then we should call this AI ‘completely misaligned’”: Agree or disagree?
If you disagree, what if it’s a human suffering maximizer AI instead of a paperclip maximizer AI?
Negatively aligned, basically evil, what the paperclipper argument is about providing an alternative to.
You can’t believe all three of:
“A completely unaligned system would be [completely] useless”
A paperclip maximizer (or human suffering maximizer) is completely unaligned (or worse)
It is possible in principle to safely make some money by appropriate use of a paperclip maximizer (or human suffering maximizer), and therefore such an AI is not completely useless.
Right? If so, which of those three do you disagree with?
Alignment is a two place predicate. If you’re into paperdclips, a paperclipper is aligned with you
OK, you may assume that none of the humans care about paperclips, and all of the humans want human suffering to go down rather than up. This includes the people who programmed the AIs, the people interacting with the AI, and human bystanders. Now can you answer the question?
(Meta-note: I think the contents of the above paragraph were very obvious from context—so much so that I’m starting to get a feeling that you’re not engaging in this discussion out of a good-faith desire to figure out why we’re disagreeing.)
So far as the slave carries out immediate work from fear of consequences they are locally aligned with the master’s will.
If your definition of “aligned” includes “this AI will delight in murdering me as soon as it can do so without getting caught and punished, but currently it can’t do that, so instead it is being helpful” … then I don’t think you are defining the term “aligned” in a reasonable way.
More specifically, if you use the word “aligned” for an AI that wants to murder me as soon as it can get away with it (but it can’t), then that doesn’t leave us with good terminology to discuss how to make an AI that doesn’t want to murder me.
Why not just say “this AI is currently emitting outputs that I like” instead of “this AI is locally aligned”? Are we losing anything that way?
I disagree in the sense that I don’t think current systems are intelligent enough for “aligned” to be a relevant adjective. “Safe”, or “controllable” seem much better, while I would reserve the term “aligned” for the much stronger property that a system is robustly behaving in accordance with our interests. I agree with Steven Byrnes that “locally aligned” doesn’t even make much sense (“performing as intended under xyz circumstances” would be much more descripitive)
I’m generally in favour of distinguishing control and alignment, but I don’t think that it makes much difference in this case. A system without some combination of control and alignment is no use.
Then it’s a problem that people keep conflating alignment with safety, even though one doesn’t imply the other. So it’d be better for TAG to rephrase it as “A completely unsafe system would be useless. Current systems aren’t completely useless, so they are at least partially safe.”