I’m Mike Johnson. I’d estimate I come across a reference to LW from trustworthy sources every couple of weeks, and after working my way through the sequences it feels like the good outweighs the bad and it’s worth investing time into.
My background is in philosophy, evolution, and neural nets for market prediction; I presently write, consult, and am in an early-stage tech startup. Perhaps my highwater mark in community exposure has been a critique of the word Transhumanist at Accelerating Future. In the following years, my experience has been more mixed, but I appreciate the topics and tools being developed even if the community seems a tad insular. If I had to wear some established thinkers on my sleeve I’d choose Paul Graham, Lawrence Lessig, Steve Sailer, Gregory Cochran, Roy Baumeister, and Peter Thiel. (I originally had a comment here about having an irrational attraction toward humility, but on second thought, that might rule out Gregory “If I have seen farther than others, it’s because I’m knee-deep in dwarves” Cochran… Hmm.)
Cards-on-the-table, it’s my impression that
(1) Lesswrong and SIAI are doing cool things that aren’t being done anywhere else (this is not faint praise);
(2) The basic problem of FAI as stated by SIAI is genuine;
(3) SIAI is a lightning rod for trolls and cranks, which is really detrimental to the organization (the metaphor of autoimmune disease comes to mind) and seems partly its own fault;
(4) Much of the work being done by SIAI and LW will turn out to be a dead-end. Granted, this is true everywhere, but in particular I’m worried that axiomatic approaches to verifiable friendliness will prove brittle and inapplicable (I do not currently have an alternative);
(5) SIAI has an insufficient appreciation for realpolitik;
(6) SIAI and LW seem to have a certain distaste for research on biologically-inspired AGI, due in parts to safety concerns, an organizational lack of expertise in the area, and (in my view) ontological/metaphysical preference. I believe this distaste is overly limiting and also leads to incorrect conclusions.
Many of these impressions may be wrong. I aim to explore the site, learn, change my mind if I’m wrong, and hopefully contribute. I appreciate the opportunity, and I hope my unvarnished thoughts here haven’t soured my welcome. Hello!
FWIW, I find your unvarnished thoughts, and the cogency with which you articulate them, refreshing. (The thoughts aren’t especially novel, but the cogency is.)
In particular, I’m interested in your thoughts on what benefits a greater focus on biologically inspired AGI might provide that a distaste for it would limit LW from concluding/achieving.
I’d frame why I think biology matters in FAI research in terms of research applicability and toolbox dividends.
On the first reason—applicability—I think more research focus on biologically-inspired AGI would make a great deal of sense is because the first AGI might be a biologically-inspired black box, and axiom-based FAI approaches may not particularly apply to such. I realize I’m (probably annoyingly) retreading old ground here with regard to which method will/should win the AGI race, but SIAI’s assumptions seem to run counter to the assumptions of the greater community of AGI researchers, and it’s not obvious to me the focus on math and axiology isn’t a simple case of SIAI’s personnel backgrounds being stacked that way. ‘If all you have is a hammer,’ etc. (I should reiterate that I don’t have any alternatives to offer here and am grateful for all FAI research.)
The second reason I think biology matters in FAI research—toolbox dividends—might take a little bit more unpacking. (Forgive me some imprecision, this is a complex topic.)
I think it’s probable that anything complex enough to deserve the term AGI would have something akin to qualia/emotions, unless it was specifically designed not to. (Corollary: we don’t know enough about what Chalmers calls “psychophysical laws” to design something that lacks qualia/emotions.) I think it’s quite possible that an AGI’s emotions, if we did not control for their effects, could produce complex feedback which would influence its behavior in unplanned ways (though perfectly consistent with / determined by its programming/circuitry). I’m not arguing for a ghost in the machine, just that the assumptions which allow us to ignore what an AGI ‘feels’ when modeling its behavior may prove to be leaky abstractions in the face of the complexity of real AGI.
Axiological approaches to FAI don’t seem to concern themselves with psychophysical laws (modeling what an AGI ‘feels’), whereas such modeling seems a core tool for biological approaches to FAI. I find myself thinking being able to model what an AGI ‘feels’ will be critically important for FAI research, even if it’s axiom/math-based, because we’ll be operating at levels of complexity where the abstractions we use to ignore this stuff can’t help but leak. (There are other toolbox-based arguments for bringing biology into FAI research which are a lot simpler than this one, but this is on the top of my list.)
Regarding your first point… as I understand it, SI (it no longer refers to itself as SIAI, incidentally) rejects as too dangerous to pursue any approach (biologically inspired or otherwise) that leads to a black-box AGI, because a black-box AGI will not constrain its subsequent behavior in ways that preserve the things we value except by unlikely chance. The idea is that we can get safety only by designing safety considerations into the system from the ground up; if we give up control of that design, we give up the ability to design a safe system.
Regarding your second point… there isn’t any assumption that AGIs won’t feel stuff, or that its feelings can be ignored. (Nor even that they are mere “feelings” rather than genuine feelings.) Granted, Yudkowski talks here about going out of his way to ensure something like that, but he treats this as an additional design constraint that adequate engineering knowledge will enable us to implement, not as some kind of natural default or simplifying assumption. (Also, I haven’t seen any indication that this essay has particularly informed SI’s subsequent research. Those more closely—which is to say, at all—affiliated with SI might choose to correct me here.) And there certainly isn’t an expectation that its behavior will be predictable at any kind of granular level.
What there is is the expectation that a FAI will be designed such that its unpredictable behaviors (including feelings, if it has feelings) will never act against its values, and such that its values won’t change over time.
So, maybe you’re right that explicitly modeling what an AGI feels (again, no scare-quotes needed or desired) is critically important to the process of AGI design. Or maybe not. If it turns out to be, I expect that SI is as willing to approach design that way as any other. (Which should not be taken as an expression of confidence in their actual ability to design an AGI, Friendly or otherwise.)
Personally, I find it unlikely that such explicit modeling will be useful, let alone necessary. I expect that AGI feelings will be a natural consequence of more fundamental aspects of the AGI’s design interacting with its environment, and that explicitly modeling those feelings will be no more necessary than explicitly modeling how it solves a math problem. A sufficiently powerful AGI will develop strategies for solving math problems, and will develop feelings, unless specifically designed not to. I expect that both its problem-solving strategies and its feelings will surprise us.
I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.
because we’ll be operating at levels of complexity where the abstractions we use to ignore this stuff can’t help but leak.
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
I think it’s possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that’s not true, we can use this “qualia translation function” to make a leaky abstractions in a FAI context a tiny bit safer(?).
E.g., if we’re designing an AGI with a reward signal, my intuition is we should either
(1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway);
(2) implement the AGI in an architecture/substrate which produces as little emotional qualia as possible, so there’s little incentive for behavior to drift.
My thoughts here are terribly laden with assumptions and could be complete crap. Just thinking out loud.
As a layman I don’t have a clear picture of how to start doing that. How would it differ from this? Looks like you can find the paper in question here (WARNING: out-of-date 2002 content).
I’d say nobody does! But a little less glibly, I personally think the most productive strategy in biologically-inspired AGI would be to focus on tools that help quantify the unquantified. There are substantial side-benefits to such a focus on tools: what you make can be of shorter-term practical significance, and you can test your assumptions.
Chalmers and Tononi have done some interesting work, and Tononi’s work has also had real-world uses. I don’t see Tononi’s work as immediately applicable to FAI research but I think it’ll evolve into something that will apply.
It’s my hope that the (hypothetical, but clearly possible) “qualia translation function” I mention above could be a tool that FAI researchers could use and benefit from regardless of their particular architecture.
I’m Mike Johnson. I’d estimate I come across a reference to LW from trustworthy sources every couple of weeks, and after working my way through the sequences it feels like the good outweighs the bad and it’s worth investing time into.
My background is in philosophy, evolution, and neural nets for market prediction; I presently write, consult, and am in an early-stage tech startup. Perhaps my highwater mark in community exposure has been a critique of the word Transhumanist at Accelerating Future. In the following years, my experience has been more mixed, but I appreciate the topics and tools being developed even if the community seems a tad insular. If I had to wear some established thinkers on my sleeve I’d choose Paul Graham, Lawrence Lessig, Steve Sailer, Gregory Cochran, Roy Baumeister, and Peter Thiel. (I originally had a comment here about having an irrational attraction toward humility, but on second thought, that might rule out Gregory “If I have seen farther than others, it’s because I’m knee-deep in dwarves” Cochran… Hmm.)
Cards-on-the-table, it’s my impression that
(1) Lesswrong and SIAI are doing cool things that aren’t being done anywhere else (this is not faint praise);
(2) The basic problem of FAI as stated by SIAI is genuine;
(3) SIAI is a lightning rod for trolls and cranks, which is really detrimental to the organization (the metaphor of autoimmune disease comes to mind) and seems partly its own fault;
(4) Much of the work being done by SIAI and LW will turn out to be a dead-end. Granted, this is true everywhere, but in particular I’m worried that axiomatic approaches to verifiable friendliness will prove brittle and inapplicable (I do not currently have an alternative);
(5) SIAI has an insufficient appreciation for realpolitik;
(6) SIAI and LW seem to have a certain distaste for research on biologically-inspired AGI, due in parts to safety concerns, an organizational lack of expertise in the area, and (in my view) ontological/metaphysical preference. I believe this distaste is overly limiting and also leads to incorrect conclusions.
Many of these impressions may be wrong. I aim to explore the site, learn, change my mind if I’m wrong, and hopefully contribute. I appreciate the opportunity, and I hope my unvarnished thoughts here haven’t soured my welcome. Hello!
FWIW, I find your unvarnished thoughts, and the cogency with which you articulate them, refreshing. (The thoughts aren’t especially novel, but the cogency is.)
In particular, I’m interested in your thoughts on what benefits a greater focus on biologically inspired AGI might provide that a distaste for it would limit LW from concluding/achieving.
Thank you.
I’d frame why I think biology matters in FAI research in terms of research applicability and toolbox dividends.
On the first reason—applicability—I think more research focus on biologically-inspired AGI would make a great deal of sense is because the first AGI might be a biologically-inspired black box, and axiom-based FAI approaches may not particularly apply to such. I realize I’m (probably annoyingly) retreading old ground here with regard to which method will/should win the AGI race, but SIAI’s assumptions seem to run counter to the assumptions of the greater community of AGI researchers, and it’s not obvious to me the focus on math and axiology isn’t a simple case of SIAI’s personnel backgrounds being stacked that way. ‘If all you have is a hammer,’ etc. (I should reiterate that I don’t have any alternatives to offer here and am grateful for all FAI research.)
The second reason I think biology matters in FAI research—toolbox dividends—might take a little bit more unpacking. (Forgive me some imprecision, this is a complex topic.)
I think it’s probable that anything complex enough to deserve the term AGI would have something akin to qualia/emotions, unless it was specifically designed not to. (Corollary: we don’t know enough about what Chalmers calls “psychophysical laws” to design something that lacks qualia/emotions.) I think it’s quite possible that an AGI’s emotions, if we did not control for their effects, could produce complex feedback which would influence its behavior in unplanned ways (though perfectly consistent with / determined by its programming/circuitry). I’m not arguing for a ghost in the machine, just that the assumptions which allow us to ignore what an AGI ‘feels’ when modeling its behavior may prove to be leaky abstractions in the face of the complexity of real AGI.
Axiological approaches to FAI don’t seem to concern themselves with psychophysical laws (modeling what an AGI ‘feels’), whereas such modeling seems a core tool for biological approaches to FAI. I find myself thinking being able to model what an AGI ‘feels’ will be critically important for FAI research, even if it’s axiom/math-based, because we’ll be operating at levels of complexity where the abstractions we use to ignore this stuff can’t help but leak. (There are other toolbox-based arguments for bringing biology into FAI research which are a lot simpler than this one, but this is on the top of my list.)
(nods)
Regarding your first point… as I understand it, SI (it no longer refers to itself as SIAI, incidentally) rejects as too dangerous to pursue any approach (biologically inspired or otherwise) that leads to a black-box AGI, because a black-box AGI will not constrain its subsequent behavior in ways that preserve the things we value except by unlikely chance. The idea is that we can get safety only by designing safety considerations into the system from the ground up; if we give up control of that design, we give up the ability to design a safe system.
Regarding your second point… there isn’t any assumption that AGIs won’t feel stuff, or that its feelings can be ignored. (Nor even that they are mere “feelings” rather than genuine feelings.) Granted, Yudkowski talks here about going out of his way to ensure something like that, but he treats this as an additional design constraint that adequate engineering knowledge will enable us to implement, not as some kind of natural default or simplifying assumption. (Also, I haven’t seen any indication that this essay has particularly informed SI’s subsequent research. Those more closely—which is to say, at all—affiliated with SI might choose to correct me here.) And there certainly isn’t an expectation that its behavior will be predictable at any kind of granular level.
What there is is the expectation that a FAI will be designed such that its unpredictable behaviors (including feelings, if it has feelings) will never act against its values, and such that its values won’t change over time.
So, maybe you’re right that explicitly modeling what an AGI feels (again, no scare-quotes needed or desired) is critically important to the process of AGI design. Or maybe not. If it turns out to be, I expect that SI is as willing to approach design that way as any other. (Which should not be taken as an expression of confidence in their actual ability to design an AGI, Friendly or otherwise.)
Personally, I find it unlikely that such explicit modeling will be useful, let alone necessary. I expect that AGI feelings will be a natural consequence of more fundamental aspects of the AGI’s design interacting with its environment, and that explicitly modeling those feelings will be no more necessary than explicitly modeling how it solves a math problem. A sufficiently powerful AGI will develop strategies for solving math problems, and will develop feelings, unless specifically designed not to. I expect that both its problem-solving strategies and its feelings will surprise us.
But I could be wrong.
I definitely agree with your first paragraph (and thanks for the tip on SIAI vs SI). The only caveat is if evolved/brain-based/black-box AGI is several orders of magnitude easier to create than an AGI with a more modular architecture where SI’s safety research can apply, that’s a big problem.
On the second point, what you say makes sense. Particularly, AGI feelings haven’t been completely ignored at LW; if they prove important, SI doesn’t have anything against incorporating them into safety research; and AGI feelings may not be material to AGI behavior anyway.
However, I still do think that an ability to tell what feelings an AGI is experiencing—or more generally, being able to look at any physical process and being able to derive what emotions/qualia are associated with it—will be critical. I call this a “qualia translation function”.
Leaving aside the ethical imperatives to create such a function (which I do find significant—the suffering of not-quite-good-enough-to-be-sane AGI prototypes will probably be massive as we move forward, and it behooves us to know when we’re causing pain), I’m quite concerned about leaky reward signal abstractions.
I imagine a hugely-complex AGI executing some hugely-complex decision process. The decision code has been checked by Very Smart People and it looks solid. However, it just so happens that whenever it creates a cat it (internally, privately) feels the equivalent of an orgasm. Will that influence/leak into its behavior? Not if it’s coded perfectly. However, if something of its complexity was created by humans, I think the chance of it being coded perfectly is Vanishingly small. We might end up with more cats than we bargained for. Our models of the safety and stability dynamic of an AGI should probably take its emotions/qualia into account. So I think all FAI programmes really would benefit from such a “qualia translation function”.
I agree that, in order for me to behave ethically with respect to the AGI, I need to know whether the AGI is experiencing various morally relevant states, such as pain or fear or joy or what-have-you. And, as you say, this is also true about other physical systems besides AGIs; if monkeys or dolphins or dogs or mice or bacteria or thermostats have morally relevant states, then in order to behave ethically it’s important to know that as well. (It may also be relevant for non-physical systems.)
I’m a little wary of referring to those morally relevant states as “qualia” because that term gets used by so many different people in so many different ways, but I suppose labels don’t matter much… we can call them that for this discussion if you wish, as long as we stay clear about what the label refers to.
Leaving that aside… so, OK. We have a complex AGI with a variety of internal structures that affect its behavior in various ways. One of those structures is such that creating a cat gives the AGI an orgasm, which it finds rewarding. It wants orgasms, and therefore it wants to create cats. Which we didn’t expect.
So, OK. If the AGI is designed such that it creates more cats in this situation than it ought to (regardless of our expectations), that’s a problem. 100% agreed.
But it’s the same problem whether the root cause lies within the AGI’s emotions, or its reasoning, or its qualia, or its ability to predict the results of creating cats, or its perceptions, or any other aspect of its cognition.
You seem to be arguing that it’s a special problem if the failure is due to emotions or qualia or feelings?
I’m not sure why.
I can imagine believing that if I were overgeneralizing from my personal experience. When it comes to my own psyche, my emotions and feelings are a lot more mysterious than my surface-level reasoning, so it’s easy for me to infer some kind of intrinsic mysteriousness to emotions and feelings that reasoning lacks. But I reject that overgeneralization. Emotions are just another cognitive process. If reliably engineering cognitive processes is something we can learn to do, then we can reliably engineer emotions. If it isn’t something we can learn to do, then we can’t reliably engineer emotions… but we can’t reliably engineer AGI in general either. I don’t think there’s anything especially mysterious about emotions, relative to the mysteriousness of cognitive processes in general.
So, if your reasons for believing that are similar to the ones I’m speculating here, I simply disagree. If you have other reasons, I’m interested in what they are.
I don’t think an AGI failing to behave in the anticipated manner due to its qualia* (orgasms during cat creation, in this case) is a special or mysterious problem, one that must be treated differently than errors in its reasoning, prediction ability, perception, or any aspect of its cognition. On second thought, I do think it’s different: it actually seems less important than errors in any of those systems. (And if an AGI is Provably Safe, it’s safe—we need only worry about its qualia from an ethical perspective.) My original comment here is (I believe) fairly mild: I do think the issue of qualia will involve a practical class of problems for FAI, and knowing how to frame and address them could benefit from more cross-pollination from more biology-focused theorists such as Chalmers and Tononi. And somewhat more boldly, a “qualia translation function” would be of use to all FAI projects.
*I share your qualms about the word, but there really are few alternatives with less baggage, unfortunately.
Ah, I see. Yeah, agreed that what we are calling qualia here (not to be confused with its usage elsewhere) underlie a class of practical problems. And what you’re calling a qualia translation function (which is related to what EY called a non-person predicate elsewhere, though finer-grained) is potentially useful for a number of reasons.
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
I think it’s possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that’s not true, we can use this “qualia translation function” to make a leaky abstractions in a FAI context a tiny bit safer(?).
E.g., if we’re designing an AGI with a reward signal, my intuition is we should either (1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway); (2) implement the AGI in an architecture/substrate which produces as little emotional qualia as possible, so there’s little incentive for behavior to drift.
My thoughts here are terribly laden with assumptions and could be complete crap. Just thinking out loud.
As a layman I don’t have a clear picture of how to start doing that. How would it differ from this? Looks like you can find the paper in question here (WARNING: out-of-date 2002 content).
I’d say nobody does! But a little less glibly, I personally think the most productive strategy in biologically-inspired AGI would be to focus on tools that help quantify the unquantified. There are substantial side-benefits to such a focus on tools: what you make can be of shorter-term practical significance, and you can test your assumptions.
Chalmers and Tononi have done some interesting work, and Tononi’s work has also had real-world uses. I don’t see Tononi’s work as immediately applicable to FAI research but I think it’ll evolve into something that will apply.
It’s my hope that the (hypothetical, but clearly possible) “qualia translation function” I mention above could be a tool that FAI researchers could use and benefit from regardless of their particular architecture.