Stuart: The majority of people proposing the “bringing up baby AGI” approach to encouraging AGI ethics, are NOT making the kind of naive cognitive error you describe here. This approach to AGI ethics is not founded on naive anthropomorphism. Rather, it is based on the feeling of having a mix of intuitive and rigorous understanding of the AGI architectures in question, the ones that will be taught ethics.
For instance, my intuition is that if we taught an OpenCog system to be loving and ethical, then it would very likely be so, according to broad human standards. This intuition is NOT based on naively anthropomorphizing OpenCog systems, but rather based on my understanding of the actual OpenCog architecture (which has many significant differences from the human cognitive architecture).
No one, so far as I know, claims to have an airtight PROOF that this kind of approach to AGI ethics will work. However, the intuition that it will work is based largely on understanding of the specifics of the AGI architectures in question, not just on anthropomorphism.
If you want to counter-argue against this approach, you should argue about it in the context of the specific AGI architectures in question. Or else you should present some kind of principled counter-argument. Just claiming “anthropomorphism” isn’t very convincing.
First of all, all of these methods involve integrating the AGI in human society. So the AGI is forming its values, at least in part, through doing something (possibly talking) and getting a response from some human. That human will be interpreting the AGI’s answers, and selecting the right response, using their own theory of the AGI’s mind—nearly certainly an anthopomorphisation! Even if that human develops experience dealing with the AGI, their understanding will be limited (as our understanding of other humans is limited, except worse than that).
So the AGI programmer is taking a problem that they can’t solve through direct coding, and putting the AGI through interactions so that it will acquire the values that the programmer can’t specify directly, in settings where the other interactors will be prone to anthropomorphisation.
ie: “I can’t solve this problem formally, but I do understand it’s structure enough to be reasonably sure that anthropomorphic interactions will solve it”.
If that’s the claim, I would expect the programmer to be very schooled in the properties and perils of anthropomorphisation, and to cast their arguments, as much as possible, in formal logic or code form. For instance, if we want the AGI to “love” us: what kind of behaviour would we expect that this entailed, and why would this code acquire that behaviour from these interactions? If you couldn’t use the word love, or any close synonyms, could you still describe the process and show that it will perform well? If you can’t describe love without saying “love”, then you are counting on a shared non-formalised human understanding of what love is, and hoping that the AGI will stumble upon the same understanding—you don’t know the contours of the definition, and the potential pitfalls, but you’re counting on the AGI to avoid them.
Those four types of behaviours that I mentioned there, and that we need to separate—don’t just decry the use of anthropomorphisation in the description, but say which parts of the open cog system will be used to distinguish between them, and select the friendly behaviour rather than the others. You know how your system works—reassure me! :-)
Stuart—Yeah, the line of theoretical research you suggest is worthwhile.…
However, it’s worth noting that I and the other OpenCog team members are pressed for time, and have a lot of concrete OpenCog work to do. It would seem none of us really feels like taking a lot of time, at this stage, to carefully formalize arguments about what the system is likely to do in various situations once it’s finished. We’re too consumed with trying to finish the system, which is a long and difficult task in itself...
I will try to find some time in the near term to sketch a couple example arguments of the type you request… but it won’t be today...
As a very rough indication for the moment… note that OpenCog has explicit Goal Node objects in its AtomSpace knowledge store, and then one can look at the explicit probabilistic ImplicationLinks pointing to these GoalNodes from various combinations of contexts and actions. So one can actually look, in principle, at the probabilistic relations between (context, action) pairs and goals that OpenCog is using to choose actions.
Now, for a quite complex OpenCog system, it may be hard to understand what all these probabilistic relations mean. But for a young OpenCog doing simple things, it will be easier. So one would want to validate for a young OpenCog doing simple things, that the information in the system’s AtomSpace is compatible with 1 rather than 2-4.… One would then want to validate that, as the system gets more mature and does more complex things, there is not a trend toward more of 2-4 and less of 1 ….
Ben, your response is logical (if not correct), but the fact that many AI researchers advocate the “upbringing approach” (for other architectures) makes me very suspicious that they’re anthropomorphising after all.
An AGI that is not either deeply neuromorphic or possessing a well-defined and formally stable utility function sounds like… frankly one of the worst ideas I’ve ever heard. I’m having difficulty imagining a way you could demonstrate the safety of such a system, or trust it enough at any point to give it enough resources to learn. Considering that the fate of intelligent life in our future light cone may hang in the balance, standards of safety must obviously be very high! Intuition is, I’m sorry, simply not an acceptable criteria on which to wager at least billions, and perhaps trillions of lives. The expected utility math does not wash if you actually expect OpenCog to work.
On a more technical level, human values are broadly defined as some function over a typical human brain. There may be some (or many) optimizations possible, but not such that we can rely on them. So, for a really good model of human values, we should not expect to need less than the entropy of a human brain. In other words, nobody, whether they’re Eliezer Yudkowsky with his formalist approach or you, is getting away with less than about ten petabytes of good training samples. Those working on uploads can skip this step entirely, but neuromorphic AI is likely to be fundamentally less useful.
And this assumes that every bit of evidence can be mapped directly to a bit in a typical human brain map. In reality, for a non-FOOMed AI, the mapping it likely to be many orders of magnitude less efficient. I suspect, but cannot demonstrate right now, that a formalist approach starting with a clean framework along the lines of AIXI is going to be more efficient. Quite aside from that, even assuming you can acquire enough data to train your machine reliably, then you still need it to do… something. Human values include a lot of unpleasant qualities. Simply giving it human values and then allowing it to grow to superhuman intellect is grossly unsafe. Ted Bundy had human values. If your plan is to train it on examples of only nice people, then you’ve got a really serious practical problem of how to track down >10 petabytes of really good data on the lives of saints. A formalist approach like CEV, for all the things that bug me about it, simply does not have that issue, because its utility function is defined as functions of the observed values of real humans.
In other words, for a system that’s as alien as the architecture of OpenCog, even if we assume that the software is powerful and general enough to work (which I’m in no way convinced of), attempting to inculate it with human values is extremely difficult, dangerous, and just plan unethical.
Stuart: The majority of people proposing the “bringing up baby AGI” approach to encouraging AGI ethics, are NOT making the kind of naive cognitive error you describe here. This approach to AGI ethics is not founded on naive anthropomorphism. Rather, it is based on the feeling of having a mix of intuitive and rigorous understanding of the AGI architectures in question, the ones that will be taught ethics.
For instance, my intuition is that if we taught an OpenCog system to be loving and ethical, then it would very likely be so, according to broad human standards. This intuition is NOT based on naively anthropomorphizing OpenCog systems, but rather based on my understanding of the actual OpenCog architecture (which has many significant differences from the human cognitive architecture).
No one, so far as I know, claims to have an airtight PROOF that this kind of approach to AGI ethics will work. However, the intuition that it will work is based largely on understanding of the specifics of the AGI architectures in question, not just on anthropomorphism.
If you want to counter-argue against this approach, you should argue about it in the context of the specific AGI architectures in question. Or else you should present some kind of principled counter-argument. Just claiming “anthropomorphism” isn’t very convincing.
Thanks for your answer, Ben!
First of all, all of these methods involve integrating the AGI in human society. So the AGI is forming its values, at least in part, through doing something (possibly talking) and getting a response from some human. That human will be interpreting the AGI’s answers, and selecting the right response, using their own theory of the AGI’s mind—nearly certainly an anthopomorphisation! Even if that human develops experience dealing with the AGI, their understanding will be limited (as our understanding of other humans is limited, except worse than that).
So the AGI programmer is taking a problem that they can’t solve through direct coding, and putting the AGI through interactions so that it will acquire the values that the programmer can’t specify directly, in settings where the other interactors will be prone to anthropomorphisation.
ie: “I can’t solve this problem formally, but I do understand it’s structure enough to be reasonably sure that anthropomorphic interactions will solve it”.
If that’s the claim, I would expect the programmer to be very schooled in the properties and perils of anthropomorphisation, and to cast their arguments, as much as possible, in formal logic or code form. For instance, if we want the AGI to “love” us: what kind of behaviour would we expect that this entailed, and why would this code acquire that behaviour from these interactions? If you couldn’t use the word love, or any close synonyms, could you still describe the process and show that it will perform well? If you can’t describe love without saying “love”, then you are counting on a shared non-formalised human understanding of what love is, and hoping that the AGI will stumble upon the same understanding—you don’t know the contours of the definition, and the potential pitfalls, but you’re counting on the AGI to avoid them.
Those four types of behaviours that I mentioned there, and that we need to separate—don’t just decry the use of anthropomorphisation in the description, but say which parts of the open cog system will be used to distinguish between them, and select the friendly behaviour rather than the others. You know how your system works—reassure me! :-)
Stuart—Yeah, the line of theoretical research you suggest is worthwhile.…
However, it’s worth noting that I and the other OpenCog team members are pressed for time, and have a lot of concrete OpenCog work to do. It would seem none of us really feels like taking a lot of time, at this stage, to carefully formalize arguments about what the system is likely to do in various situations once it’s finished. We’re too consumed with trying to finish the system, which is a long and difficult task in itself...
I will try to find some time in the near term to sketch a couple example arguments of the type you request… but it won’t be today...
As a very rough indication for the moment… note that OpenCog has explicit Goal Node objects in its AtomSpace knowledge store, and then one can look at the explicit probabilistic ImplicationLinks pointing to these GoalNodes from various combinations of contexts and actions. So one can actually look, in principle, at the probabilistic relations between (context, action) pairs and goals that OpenCog is using to choose actions.
Now, for a quite complex OpenCog system, it may be hard to understand what all these probabilistic relations mean. But for a young OpenCog doing simple things, it will be easier. So one would want to validate for a young OpenCog doing simple things, that the information in the system’s AtomSpace is compatible with 1 rather than 2-4.… One would then want to validate that, as the system gets more mature and does more complex things, there is not a trend toward more of 2-4 and less of 1 ….
Interesting line of thinking indeed! …
Ben, your response is logical (if not correct), but the fact that many AI researchers advocate the “upbringing approach” (for other architectures) makes me very suspicious that they’re anthropomorphising after all.
An AGI that is not either deeply neuromorphic or possessing a well-defined and formally stable utility function sounds like… frankly one of the worst ideas I’ve ever heard. I’m having difficulty imagining a way you could demonstrate the safety of such a system, or trust it enough at any point to give it enough resources to learn. Considering that the fate of intelligent life in our future light cone may hang in the balance, standards of safety must obviously be very high! Intuition is, I’m sorry, simply not an acceptable criteria on which to wager at least billions, and perhaps trillions of lives. The expected utility math does not wash if you actually expect OpenCog to work.
On a more technical level, human values are broadly defined as some function over a typical human brain. There may be some (or many) optimizations possible, but not such that we can rely on them. So, for a really good model of human values, we should not expect to need less than the entropy of a human brain. In other words, nobody, whether they’re Eliezer Yudkowsky with his formalist approach or you, is getting away with less than about ten petabytes of good training samples. Those working on uploads can skip this step entirely, but neuromorphic AI is likely to be fundamentally less useful.
And this assumes that every bit of evidence can be mapped directly to a bit in a typical human brain map. In reality, for a non-FOOMed AI, the mapping it likely to be many orders of magnitude less efficient. I suspect, but cannot demonstrate right now, that a formalist approach starting with a clean framework along the lines of AIXI is going to be more efficient. Quite aside from that, even assuming you can acquire enough data to train your machine reliably, then you still need it to do… something. Human values include a lot of unpleasant qualities. Simply giving it human values and then allowing it to grow to superhuman intellect is grossly unsafe. Ted Bundy had human values. If your plan is to train it on examples of only nice people, then you’ve got a really serious practical problem of how to track down >10 petabytes of really good data on the lives of saints. A formalist approach like CEV, for all the things that bug me about it, simply does not have that issue, because its utility function is defined as functions of the observed values of real humans.
In other words, for a system that’s as alien as the architecture of OpenCog, even if we assume that the software is powerful and general enough to work (which I’m in no way convinced of), attempting to inculate it with human values is extremely difficult, dangerous, and just plan unethical.