I think that ``CEV″ is usually used as shorthand for ``an AI that implements the CEV of Humanity″. This is what I am referring to, when I say ``CEV″. So, what I mean when I say that ``CEV is a bad alignment target″, is that, for any reasonable set of definitions, it is a bad idea, to build an AI, that does what ``a Group″ wants it to do (in expectation, from the perspective of essentially any human individual, compared to extinction). Since groups and individuals, are completely different types of things, it should not be surprising to learn, that doing what one type of thing wants (such as ``a Group″), is bad for a completely different type of thing (such as a human individual). In other words, I think that ``an AI that implements the CEV of Humanity″, is a bad alignment target, in the same sense, as I think that SRAI is a bad alignment target.
But I don’t think your comment uses ``CEV″ in this sense. I assume that we can agree, that aiming for ``the CEV of a chimp″, can be discovered to be a bad idea (for example by referring to facts about chimps, and using thought experiments, to see what these facts about chimps, implies about likely outcomes). Similarly, it must be possible to discover, that aiming for ``the CEV of Humanity″, is also a bad idea (for human individuals). Surely, discovering this, cannot be, by definition, impossible. Thus, I think that you are in fact, not, using ``CEV″ as shorthand for ``an AI that implements the CEV of Humanity″. (I am referring to your sentence: ``If it’s not something to aim at, then it’s not a properly constructed CEV.″)
Your comment makes perfect sense, if I read ``CEV″ as shorthand for ``an AI that implements the CEV of a single human designer″. I was not expecting this terminology. But it is a perfectly reasonable terminology, and I am happy to make my argument, using this terminology. If we are using this terminology, then I think that you are completely right, about the problem that I am trying to describe, being a proxy issue (thus, if this is was indeed your intended meaning, then I was completely wrong, when I said that I was not referring to a proxy issue. In this terminology, it is indeed a proxy issue). So, using this terminology, I would describe my concerns as: ``an AI that implements the CEV of Humanity″ is a predictably bad proxy, for ``an AI that implements the CEV of a single human designer″. Because ``an AI that implements the CEV of Humanity″, is far, far, worse, than extinction, form the perspective of essentially any human individual (which, presumably, disqualifies it as a proxy, for ``an AI that implements the CEV of a single human designer″. If this does not disqualify it as a proxy, then I think that this particular human designer, is a very dangerous person (from the perspective of essentially any human individual)). Using this terminology (and assuming a non unhinged designer), I would say that if your proposed project, is to use ``an AI that implements the CEV of Humanity″, as a proxy, for ``an AI that implements the CEV of a single human designer″, then this constitutes a, predictable, proxy failure. Further, I would say that pushing ahead, despite this predictable failure, with a project that is trying to implement ``an AI that implements the CEV of Humanity″ (as a proxy), inflicts an unnecessary s-risk, on everyone. Thus, I think it would be a bad idea, to pursue such a project (from the perspective of essentially any human individual. Presumably including the designer).
If we take the case of Bob, and his Suffering Reducing AI (SRAI) project (and everyone has agreed to use this terminology), then we can tell Bob:
SRAI is not a good proxy, for ``an AI that implements the CEV of Bob″ (assuming that you, Bob, do not want to kill everyone). Thus, you will run into a, predictable, issue, when your project tries to use SRAI as a proxy, for ``an AI that implements the CEV of Bob″. If you are implementing a safety measure successfully, then this will still, at best, lead to your project failing safely. At worst, your safety measure will fail, and SRAI will kill everyone. So, please don’t proceed with your project, given that it would put everyone at risk of being killed by SRAI (and this would be an unnecessary risk, because your project will predictably fail, due to a predictable proxy issue).
By making sufficient progress, on the ``what alignment target should be aimed at?″ question, before Bob gets started on his SRAI project, it is possible to avoid the unnecessary extinction risks, associated with the proxy failure, that Bob will predictably run into, if his project uses SRAI, as a proxy for ``an AI that implements the CEV of Bob″. Similarly, it is possible to avoid the unnecessary s-risks, associated with the proxy failure, that Dave will predictably run into, if Dave uses ``an AI that implements the CEV of Humanity″, as a proxy, for ``an AI that implements the CEV of Dave″ (because any ``Group AI″, is very bad for human individuals (including Dave)).
Mitigating the unnecessary extinction risks, that are inherent in any SRAI project, does not require an answer, to the ``what alignment target should be aimed at?″ question (it was a long time ago, but if I remember correctly, Yudkowsky did this, around two decades ago. It seems likely, that anyone that is careful and capable enough, to hit an alignment target, will be able to understand that old explanation, of why SRAI, is a bad alignment target. So, generating such an explanation, was sufficient for mitigating the extinction risks, associated with a successfully implemented SRAI. Generating such an explanation, did not require an answer, to the ``what alignment target should be aimed at?″ question. One can demonstrate that a given bad answer, is a bad answer, without having any good answer). Similarly, avoiding the unnecessary s-risks, that are inherent in any ``Group AI″ project, does not require an answer, to the ``what alignment target should be aimed at?″ question. (I strongly agree, that finding an actual answer to this question, is probably very, very, difficult. I am simply pointing out, that even partial progress, on this question, can be very useful)
(I think that there are other issues, related to AI projects, whose purpose is to aim at ``the CEV, of a single human designer″. I will not get into this here, but I thought that it made sense, to at least mention, that there are other issues, related to this type of project)
Since groups and individuals, are completely different types of things,
I don’t think this is obviously justifiable. It seems to me that cells work together to be a person, together tracking and implementing the agency of the aggregate system according to their interest as part of that combined entity, and in the same way, people work together to be a group, together tracking and implementing the agency of the group. I’m pretty sure that if you try to calculate my CEV with me in a box, you end up with an error like “import error: the rest of the reachable social graph of friendships and caring”. I cannot know what I want without deliberating with others who I intend to be in a society with long term, because I will know that whatever answer I give for my CEV, it will be highly probably misaligned with the rest of the people I care about. And I expect that the network of mutual utility across humanity is fairly well connected such that if I import friends, it ends up being a recursive import that requires evaluation of everyone on earth.
(By the way, any chance you could use fewer commas? The reading speed I can reach on your comments are reduced by them due to having to bump up to deliberate thinking to check whatever I’ve joined sentence fragments the way you meant. No worries if not, though.)
I think that extrapolation is a genuinely unintuitive concept. I would for example not be very surprised if it turns out that you are right, and that it is impossible to reasonably extrapolate you if the AI that is doing the extrapolation is cut off from all information about other humans. I don’t think that this fact is in tension with my statement, that individuals and groups are completely different types of things. Taking your cell analogy: I think that implementing the CEV of you could lead to the death of every single cell in your body (for example if your mind is uploaded in a way that does not preserve information about any individual cell). I don’t think that it is strange in general, if an extrapolated version of a human individual, is completely fine with the complete annihilation of every cell in her body (and this is true, despite the fact that ``hostility towards cells″ is not a common thing). Such an outcome is no indication of any technical failure, in an AI project, that was aiming for the CEV of that individual. This shows why there is no particular reason to think, that doing what a human individual wants, would be good for any of her cells (for any reasonable definition of ``doing what a human individual wants″). And this fact remains true, even if it is also the case, that a given cell would become impossible to understand, if that cell was isolated from other cells.
A related tangent here relates to the fact that extrapolation is a genuinely unintuitive concept. I think that this has important implications for AI safety. This fact is for example central to my argument about ``Last Judge″ type proposals in my post:
(I will try to reduce the commas. I see what you are talking about. I have in the past been forced to do something about an overuse of both footnotes and parentheses. Reading badly written academic history books seems to be making things worse (if one is analysing AI proposals where the AI is getting its goal from humans, then it makes sense to me to at least try to understand humans))
I think that implementing the CEV of you could lead to the death of every single cell in your body (for example if your mind is uploaded in a way that does not preserve information about any individual cell)
I will take this bet at any amount. My cells are a beautiful work of art crafted by evolution, and I am a guest in their awesome society. Any future where my cells’ information is lost rather than transmuted and the original stored is unacceptable to me. Switching to another computational substrate without deep translation of the information in my cells is effectively guaranteed to need to examine the information in a significant fraction of my cells at a deep level, such that a generative model can be constructed which has significantly higher accuracy at cell information reconstruction than any generative model of today would. I suspect I am only unusual in having thought through this enough to identify this value, and that it is common in somewhat-less-transhumanist circles, usually manifesting as a resistance to augmentation rather than a desire to augment in a way that maintains a biology-like substrate.
Now, to be clear, I do want to rewrite my cells at a deep level—a sort of highly advanced dynamics-faithful “style transfer” into some much more advanced substrate, in particular one that operates smoothly between temperatures 2 kelvin and ~310 kelvin or ideally much higher (though if it turns out that a long adaptation period is needed to switch between ultra low temp and ultra high temp, that’s fine, I expect that the chemicals that operate smoothly at the respective temperatures will look rather different). I also expect to not want to be stuck with using carbon; I don’t currently understand chemistry enough to confidently tell you any of the things I’m asking for in this paragraph are definitely possible, but my hunch is that there are other atoms which form stronger bonds and have smaller fields that could be used instead, ie classic precise nanotech sorts of stuff. probably takes a lot of energy to construct them, if they’re possible.
But again, no uplift without being able to map the behaviors of my cells in high fidelity.
Interesting. I haven’t heard this perspective. Can you say a little more about why you want to preserve the precise information in your cells? Is it solely about their impact on your mind’s function? What level of approximation would you be okay with?
I’d be fine with having my mind simulated with a low-res body simulation, as long as that body felt more-or-less right and supported a range of moods and emotions similar to the ones I have now—but I’d be fine with a range of moods being not quite the same as the ones caused by the intricacies of my current body.
I was clearly wrong regarding how you feel about your cells. But surely the question of whether or not an AI that is implementing the CEV of Steve, would result in any surviving cells, is an empirical question? (which must settled by referring to facts about Steve. And trying to figure out what these facts mean in terms of how the CEV of Steve would treat his cells). It cannot possibly be the case that it is impossible, by definition, to discover that any reasonable way of extrapolating Steve would result in all his cells dying?
Thank you for engaging on this. Reading your description of how you view your own cells was a very informative window, into how a human mind can work. (I find it entirely possible, that I am very wrong regarding how most people view their cells. Or about how they would view their cells upon reflection. I will probably not try to introspect, regarding how I feel about my own cells, while this exchange is still fresh)
Zooming out a bit, and looking at this entire conversation, I notice that I am very confused. I will try to take a step back from LW and gain some perspective, before I return to this debate.
I think that ``CEV″ is usually used as shorthand for ``an AI that implements the CEV of Humanity″. This is what I am referring to, when I say ``CEV″. So, what I mean when I say that ``CEV is a bad alignment target″, is that, for any reasonable set of definitions, it is a bad idea, to build an AI, that does what ``a Group″ wants it to do (in expectation, from the perspective of essentially any human individual, compared to extinction). Since groups and individuals, are completely different types of things, it should not be surprising to learn, that doing what one type of thing wants (such as ``a Group″), is bad for a completely different type of thing (such as a human individual). In other words, I think that ``an AI that implements the CEV of Humanity″, is a bad alignment target, in the same sense, as I think that SRAI is a bad alignment target.
But I don’t think your comment uses ``CEV″ in this sense. I assume that we can agree, that aiming for ``the CEV of a chimp″, can be discovered to be a bad idea (for example by referring to facts about chimps, and using thought experiments, to see what these facts about chimps, implies about likely outcomes). Similarly, it must be possible to discover, that aiming for ``the CEV of Humanity″, is also a bad idea (for human individuals). Surely, discovering this, cannot be, by definition, impossible. Thus, I think that you are in fact, not, using ``CEV″ as shorthand for ``an AI that implements the CEV of Humanity″. (I am referring to your sentence: ``If it’s not something to aim at, then it’s not a properly constructed CEV.″)
Your comment makes perfect sense, if I read ``CEV″ as shorthand for ``an AI that implements the CEV of a single human designer″. I was not expecting this terminology. But it is a perfectly reasonable terminology, and I am happy to make my argument, using this terminology. If we are using this terminology, then I think that you are completely right, about the problem that I am trying to describe, being a proxy issue (thus, if this is was indeed your intended meaning, then I was completely wrong, when I said that I was not referring to a proxy issue. In this terminology, it is indeed a proxy issue). So, using this terminology, I would describe my concerns as: ``an AI that implements the CEV of Humanity″ is a predictably bad proxy, for ``an AI that implements the CEV of a single human designer″. Because ``an AI that implements the CEV of Humanity″, is far, far, worse, than extinction, form the perspective of essentially any human individual (which, presumably, disqualifies it as a proxy, for ``an AI that implements the CEV of a single human designer″. If this does not disqualify it as a proxy, then I think that this particular human designer, is a very dangerous person (from the perspective of essentially any human individual)). Using this terminology (and assuming a non unhinged designer), I would say that if your proposed project, is to use ``an AI that implements the CEV of Humanity″, as a proxy, for ``an AI that implements the CEV of a single human designer″, then this constitutes a, predictable, proxy failure. Further, I would say that pushing ahead, despite this predictable failure, with a project that is trying to implement ``an AI that implements the CEV of Humanity″ (as a proxy), inflicts an unnecessary s-risk, on everyone. Thus, I think it would be a bad idea, to pursue such a project (from the perspective of essentially any human individual. Presumably including the designer).
If we take the case of Bob, and his Suffering Reducing AI (SRAI) project (and everyone has agreed to use this terminology), then we can tell Bob:
SRAI is not a good proxy, for ``an AI that implements the CEV of Bob″ (assuming that you, Bob, do not want to kill everyone). Thus, you will run into a, predictable, issue, when your project tries to use SRAI as a proxy, for ``an AI that implements the CEV of Bob″. If you are implementing a safety measure successfully, then this will still, at best, lead to your project failing safely. At worst, your safety measure will fail, and SRAI will kill everyone. So, please don’t proceed with your project, given that it would put everyone at risk of being killed by SRAI (and this would be an unnecessary risk, because your project will predictably fail, due to a predictable proxy issue).
By making sufficient progress, on the ``what alignment target should be aimed at?″ question, before Bob gets started on his SRAI project, it is possible to avoid the unnecessary extinction risks, associated with the proxy failure, that Bob will predictably run into, if his project uses SRAI, as a proxy for ``an AI that implements the CEV of Bob″. Similarly, it is possible to avoid the unnecessary s-risks, associated with the proxy failure, that Dave will predictably run into, if Dave uses ``an AI that implements the CEV of Humanity″, as a proxy, for ``an AI that implements the CEV of Dave″ (because any ``Group AI″, is very bad for human individuals (including Dave)).
Mitigating the unnecessary extinction risks, that are inherent in any SRAI project, does not require an answer, to the ``what alignment target should be aimed at?″ question (it was a long time ago, but if I remember correctly, Yudkowsky did this, around two decades ago. It seems likely, that anyone that is careful and capable enough, to hit an alignment target, will be able to understand that old explanation, of why SRAI, is a bad alignment target. So, generating such an explanation, was sufficient for mitigating the extinction risks, associated with a successfully implemented SRAI. Generating such an explanation, did not require an answer, to the ``what alignment target should be aimed at?″ question. One can demonstrate that a given bad answer, is a bad answer, without having any good answer). Similarly, avoiding the unnecessary s-risks, that are inherent in any ``Group AI″ project, does not require an answer, to the ``what alignment target should be aimed at?″ question. (I strongly agree, that finding an actual answer to this question, is probably very, very, difficult. I am simply pointing out, that even partial progress, on this question, can be very useful)
(I think that there are other issues, related to AI projects, whose purpose is to aim at ``the CEV, of a single human designer″. I will not get into this here, but I thought that it made sense, to at least mention, that there are other issues, related to this type of project)
I don’t think this is obviously justifiable. It seems to me that cells work together to be a person, together tracking and implementing the agency of the aggregate system according to their interest as part of that combined entity, and in the same way, people work together to be a group, together tracking and implementing the agency of the group. I’m pretty sure that if you try to calculate my CEV with me in a box, you end up with an error like “import error: the rest of the reachable social graph of friendships and caring”. I cannot know what I want without deliberating with others who I intend to be in a society with long term, because I will know that whatever answer I give for my CEV, it will be highly probably misaligned with the rest of the people I care about. And I expect that the network of mutual utility across humanity is fairly well connected such that if I import friends, it ends up being a recursive import that requires evaluation of everyone on earth.
(By the way, any chance you could use fewer commas? The reading speed I can reach on your comments are reduced by them due to having to bump up to deliberate thinking to check whatever I’ve joined sentence fragments the way you meant. No worries if not, though.)
I think that extrapolation is a genuinely unintuitive concept. I would for example not be very surprised if it turns out that you are right, and that it is impossible to reasonably extrapolate you if the AI that is doing the extrapolation is cut off from all information about other humans. I don’t think that this fact is in tension with my statement, that individuals and groups are completely different types of things. Taking your cell analogy: I think that implementing the CEV of you could lead to the death of every single cell in your body (for example if your mind is uploaded in a way that does not preserve information about any individual cell). I don’t think that it is strange in general, if an extrapolated version of a human individual, is completely fine with the complete annihilation of every cell in her body (and this is true, despite the fact that ``hostility towards cells″ is not a common thing). Such an outcome is no indication of any technical failure, in an AI project, that was aiming for the CEV of that individual. This shows why there is no particular reason to think, that doing what a human individual wants, would be good for any of her cells (for any reasonable definition of ``doing what a human individual wants″). And this fact remains true, even if it is also the case, that a given cell would become impossible to understand, if that cell was isolated from other cells.
A related tangent here relates to the fact that extrapolation is a genuinely unintuitive concept. I think that this has important implications for AI safety. This fact is for example central to my argument about ``Last Judge″ type proposals in my post:
The proposal to add a ``Last Judge″ to an AI, does not remove the urgency, of making progress on the ``what alignment target should be aimed at?″ question.
(I will try to reduce the commas. I see what you are talking about. I have in the past been forced to do something about an overuse of both footnotes and parentheses. Reading badly written academic history books seems to be making things worse (if one is analysing AI proposals where the AI is getting its goal from humans, then it makes sense to me to at least try to understand humans))
I will take this bet at any amount. My cells are a beautiful work of art crafted by evolution, and I am a guest in their awesome society. Any future where my cells’ information is lost rather than transmuted and the original stored is unacceptable to me. Switching to another computational substrate without deep translation of the information in my cells is effectively guaranteed to need to examine the information in a significant fraction of my cells at a deep level, such that a generative model can be constructed which has significantly higher accuracy at cell information reconstruction than any generative model of today would. I suspect I am only unusual in having thought through this enough to identify this value, and that it is common in somewhat-less-transhumanist circles, usually manifesting as a resistance to augmentation rather than a desire to augment in a way that maintains a biology-like substrate.
Now, to be clear, I do want to rewrite my cells at a deep level—a sort of highly advanced dynamics-faithful “style transfer” into some much more advanced substrate, in particular one that operates smoothly between temperatures 2 kelvin and ~310 kelvin or ideally much higher (though if it turns out that a long adaptation period is needed to switch between ultra low temp and ultra high temp, that’s fine, I expect that the chemicals that operate smoothly at the respective temperatures will look rather different). I also expect to not want to be stuck with using carbon; I don’t currently understand chemistry enough to confidently tell you any of the things I’m asking for in this paragraph are definitely possible, but my hunch is that there are other atoms which form stronger bonds and have smaller fields that could be used instead, ie classic precise nanotech sorts of stuff. probably takes a lot of energy to construct them, if they’re possible.
But again, no uplift without being able to map the behaviors of my cells in high fidelity.
Interesting. I haven’t heard this perspective. Can you say a little more about why you want to preserve the precise information in your cells? Is it solely about their impact on your mind’s function? What level of approximation would you be okay with?
I’d be fine with having my mind simulated with a low-res body simulation, as long as that body felt more-or-less right and supported a range of moods and emotions similar to the ones I have now—but I’d be fine with a range of moods being not quite the same as the ones caused by the intricacies of my current body.
I was clearly wrong regarding how you feel about your cells. But surely the question of whether or not an AI that is implementing the CEV of Steve, would result in any surviving cells, is an empirical question? (which must settled by referring to facts about Steve. And trying to figure out what these facts mean in terms of how the CEV of Steve would treat his cells). It cannot possibly be the case that it is impossible, by definition, to discover that any reasonable way of extrapolating Steve would result in all his cells dying?
Thank you for engaging on this. Reading your description of how you view your own cells was a very informative window, into how a human mind can work. (I find it entirely possible, that I am very wrong regarding how most people view their cells. Or about how they would view their cells upon reflection. I will probably not try to introspect, regarding how I feel about my own cells, while this exchange is still fresh)
Zooming out a bit, and looking at this entire conversation, I notice that I am very confused. I will try to take a step back from LW and gain some perspective, before I return to this debate.