Regarding your first point… as I understand it, SI (it no longer refers to itself as SIAI, incidentally) rejects as too dangerous to pursue any approach (biologically inspired or otherwise) that leads to a black-box AGI, because a black-box AGI will not constrain its subsequent behavior in ways that preserve the things we value except by unlikely chance. The idea is that we can get safety only by designing safety considerations into the system from the ground up; if we give up control of that design, we give up the ability to design a safe system.
Regarding your second point… there isn’t any assumption that AGIs won’t feel stuff, or that its feelings can be ignored. (Nor even that they are mere “feelings” rather than genuine feelings.) Granted, Yudkowski talks here about going out of his way to ensure something like that, but he treats this as an additional design constraint that adequate engineering knowledge will enable us to implement, not as some kind of natural default or simplifying assumption. (Also, I haven’t seen any indication that this essay has particularly informed SI’s subsequent research. Those more closely—which is to say, at all—affiliated with SI might choose to correct me here.) And there certainly isn’t an expectation that its behavior will be predictable at any kind of granular level.
What there is is the expectation that a FAI will be designed such that its unpredictable behaviors (including feelings, if it has feelings) will never act against its values, and such that its values won’t change over time.
So, maybe you’re right that explicitly modeling what an AGI feels (again, no scare-quotes needed or desired) is critically important to the process of AGI design. Or maybe not. If it turns out to be, I expect that SI is as willing to approach design that way as any other. (Which should not be taken as an expression of confidence in their actual ability to design an AGI, Friendly or otherwise.)
Personally, I find it unlikely that such explicit modeling will be useful, let alone necessary. I expect that AGI feelings will be a natural consequence of more fundamental aspects of the AGI’s design interacting with its environment, and that explicitly modeling those feelings will be no more necessary than explicitly modeling how it solves a math problem. A sufficiently powerful AGI will develop strategies for solving math problems, and will develop feelings, unless specifically designed not to. I expect that both its problem-solving strategies and its feelings will surprise us.
I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.
(nods)
Regarding your first point… as I understand it, SI (it no longer refers to itself as SIAI, incidentally) rejects as too dangerous to pursue any approach (biologically inspired or otherwise) that leads to a black-box AGI, because a black-box AGI will not constrain its subsequent behavior in ways that preserve the things we value except by unlikely chance. The idea is that we can get safety only by designing safety considerations into the system from the ground up; if we give up control of that design, we give up the ability to design a safe system.
Regarding your second point… there isn’t any assumption that AGIs won’t feel stuff, or that its feelings can be ignored. (Nor even that they are mere “feelings” rather than genuine feelings.) Granted, Yudkowski talks here about going out of his way to ensure something like that, but he treats this as an additional design constraint that adequate engineering knowledge will enable us to implement, not as some kind of natural default or simplifying assumption. (Also, I haven’t seen any indication that this essay has particularly informed SI’s subsequent research. Those more closely—which is to say, at all—affiliated with SI might choose to correct me here.) And there certainly isn’t an expectation that its behavior will be predictable at any kind of granular level.
What there is is the expectation that a FAI will be designed such that its unpredictable behaviors (including feelings, if it has feelings) will never act against its values, and such that its values won’t change over time.
So, maybe you’re right that explicitly modeling what an AGI feels (again, no scare-quotes needed or desired) is critically important to the process of AGI design. Or maybe not. If it turns out to be, I expect that SI is as willing to approach design that way as any other. (Which should not be taken as an expression of confidence in their actual ability to design an AGI, Friendly or otherwise.)
Personally, I find it unlikely that such explicit modeling will be useful, let alone necessary. I expect that AGI feelings will be a natural consequence of more fundamental aspects of the AGI’s design interacting with its environment, and that explicitly modeling those feelings will be no more necessary than explicitly modeling how it solves a math problem. A sufficiently powerful AGI will develop strategies for solving math problems, and will develop feelings, unless specifically designed not to. I expect that both its problem-solving strategies and its feelings will surprise us.
But I could be wrong.
I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.