I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.
I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.