The thing I’ve never understood about CEV is how the AI can safely read everyone’s brain. The whole point of CEV is that the AI is unsafe unless it has a human value system, but before it can get one, it has to open everyones heads and scan their brains!? That doesn’t sound like something I’d trust a UFAI to do properly.
I bring this up because without knowing how the CEV is supposed to occur it is hard to analyse this post. I also agree with JoshuaZ that this didn’t deserve a top-level post.
Presumably by starting with some sort of prior, and incrementally updating off of available information (the Web, conversation with humans, psychology literature, etc). At any point it would have to use its current model to navigate tradeoffs between the acquisition of new information about idealised human aims and the fulfillment of those aims.
This does point to another more serious problem, which is that you can’t create an AI to “maximize the expected value of the utility function written in this sealed envelope” without a scheme for interpersonal comparison of utility functions (if you assign 50% probability to the envelope containing utility function A, and 50% probability to the envelope containing utility function B, you need an algorithm to select between actions when each utility function alone would favor a different action). See this OB post by Bostrom.
The C in CEV stands for Coherent, not Collective. You should not think of CEV output as occurring through brute-force simulation of everyone on Earth. The key step is to understand the cognitive architecture of human decision-making in the abstract. The AI has to find the right concepts (the analogues of utility function, terminal values, etc). Then it is supposed to form a rational completion and ethical idealization of the actual architecture, according to criteria already implicit in that architecture. Only then does it apply the resulting decision procedure to the contingent world around us.
Still not following you. Does it rely on everyone’s preferences or not? If it does then it has to interact with everybody. It might not have to scan their brains and brute force an answer, but it has to do something to find out what they want. And surely this means letting it loose before it has human values? Even if you plan to have it just go round and interview everyone, I still wouldn’t trust it.
Does it rely on everyone’s preferences or not? If it does then it has to interact with everybody.
CEV is more like figuring out an ethical theory, than it is about running around fighting fires, granting wishes, and so on. The latter part is the implementation of the ethical theory. That part—the implementation—has to be consultative or otherwise responsive to individual situations. But the first part, CEV per se—deciding on principles—is not going to require peering into the mind of every last human being, or even very many of them.
It is basically an exercise in applied neuroscience. We want to understand the cognitive basis of human rationality and decision-making, including ethical and metaethical thought, and introduce that into an AI. And it’s going to be a fairly abstract thing. Although human beings love food, sex, and travel, there is no way that these are going to be axiomatic values for an AI, because we are capable of coming up with ideas about what amounts to good or bad treatment of organisms or entities with none of those interests. So even if our ethical AI looks at an individual human being and says, that person should be fed, it won’t be because its theory of the world says “every sentient being must be given food” as an ethical first principle. The ultimate basis of such judgments is going to be something a whole lot more abstract which doesn’t even refer to or presuppose human beings directly, but which, when applied to an entity like a human being, is capable of giving rise to such judgments.
(By the way, I don’t mean that such reasoning from abstract beginnings will be the basis of real-time judgments. You don’t go through life recomputing everything from first principles at every moment; as you discover important new implications of those principles, or effective new heuristics of action, you store them in memory and act directly on them later on. The ultimate basis of such complex decision-making in an AI would be pragmatically relevant only under certain circumstances, as when it was asked to justify a particular decision from first principles, or when it was faced with very novel situations, or when it was engaged in thought experiments.)
Although I have been insisting that the output of CEV—something like a theory of ethical action—must be independent, not only of what we all want from moment to moment, but even independent of many basic human realities (because it must have something to say about entities which don’t have those qualities or needs), it cannot be entirely independent of human nature. We all agree that certain outcomes are bad. That agreement is coming from something in us; a rock, for example, would not agree, or disagree. So some part of human decision-making cognition is necessary for a Friendly AI to be recognized as such. The whole idea of CEV is about extracting that part of how we work, and copying it across to silicon.
This is where the applied neuroscience comes in. We cannot trust pure thought to get this right. Even pure thought combined with psychological experiment is probably not enough; we need to understand what the brain is doing when we make these judgments. At the same time, pure neuroscience is not enough either; it would just give us a neutral causal description of how the brain works; it wouldn’t tell us how to normatively employ that information in making a Friendly AI. Thus, applied neuroscience. The human beings who set a CEV process in motion would need to avoid two things: they would have to avoid wrong apriori normatives, and they would have to avoid wrong implicit normatives. By an implicit normative, I mean something, some factor in their thinking and their practice, which isn’t explicitly recognized as helping to determine the CEV outcome, but which is doing so anyway.
I’m saying a lot which isn’t in the existing expositions of CEV (e.g. Yudkowsky, Mijic, Nesov), but it comes just from taking the philosophy and adding the fact that all this information about human meta-preferences is meant to come from the study of the brain. That’s called neuroscience, and in principle even fallible human cognition, in the form of human neuroscience, may figure all this out before we ever have self-enhancing AIs. In other words, we may reach a point where the combination of psychology, philosophy, neuroscience really is telling us, this is what humans actually want, or want to want, etc. (Though it may be hard for non-experts to tell the difference between false premature claims of such knowledge, and the real thing.)
All that could happen even before there is a Singularity, and in that case the strategy for a Friendly outcome will be able to dispense with automating the deduction of neuroethical first principles, and concentrate on simply ensuring that the first transhuman AI operates according to those principles, rather than according to some utility function which, when pursued with superhuman intelligence, leads to disaster. But the CEV philosophy, and the idea of “reflective decision theory”, is meant to offer a way forward, if we do figure out artificial intelligence before we figure out artificial ethics.
The thing I’ve never understood about CEV is how the AI can safely read everyone’s brain. The whole point of CEV is that the AI is unsafe unless it has a human value system, but before it can get one, it has to open everyones heads and scan their brains!? That doesn’t sound like something I’d trust a UFAI to do properly.
I bring this up because without knowing how the CEV is supposed to occur it is hard to analyse this post. I also agree with JoshuaZ that this didn’t deserve a top-level post.
Presumably by starting with some sort of prior, and incrementally updating off of available information (the Web, conversation with humans, psychology literature, etc). At any point it would have to use its current model to navigate tradeoffs between the acquisition of new information about idealised human aims and the fulfillment of those aims.
This does point to another more serious problem, which is that you can’t create an AI to “maximize the expected value of the utility function written in this sealed envelope” without a scheme for interpersonal comparison of utility functions (if you assign 50% probability to the envelope containing utility function A, and 50% probability to the envelope containing utility function B, you need an algorithm to select between actions when each utility function alone would favor a different action). See this OB post by Bostrom.
The C in CEV stands for Coherent, not Collective. You should not think of CEV output as occurring through brute-force simulation of everyone on Earth. The key step is to understand the cognitive architecture of human decision-making in the abstract. The AI has to find the right concepts (the analogues of utility function, terminal values, etc). Then it is supposed to form a rational completion and ethical idealization of the actual architecture, according to criteria already implicit in that architecture. Only then does it apply the resulting decision procedure to the contingent world around us.
Still not following you. Does it rely on everyone’s preferences or not? If it does then it has to interact with everybody. It might not have to scan their brains and brute force an answer, but it has to do something to find out what they want. And surely this means letting it loose before it has human values? Even if you plan to have it just go round and interview everyone, I still wouldn’t trust it.
CEV is more like figuring out an ethical theory, than it is about running around fighting fires, granting wishes, and so on. The latter part is the implementation of the ethical theory. That part—the implementation—has to be consultative or otherwise responsive to individual situations. But the first part, CEV per se—deciding on principles—is not going to require peering into the mind of every last human being, or even very many of them.
It is basically an exercise in applied neuroscience. We want to understand the cognitive basis of human rationality and decision-making, including ethical and metaethical thought, and introduce that into an AI. And it’s going to be a fairly abstract thing. Although human beings love food, sex, and travel, there is no way that these are going to be axiomatic values for an AI, because we are capable of coming up with ideas about what amounts to good or bad treatment of organisms or entities with none of those interests. So even if our ethical AI looks at an individual human being and says, that person should be fed, it won’t be because its theory of the world says “every sentient being must be given food” as an ethical first principle. The ultimate basis of such judgments is going to be something a whole lot more abstract which doesn’t even refer to or presuppose human beings directly, but which, when applied to an entity like a human being, is capable of giving rise to such judgments.
(By the way, I don’t mean that such reasoning from abstract beginnings will be the basis of real-time judgments. You don’t go through life recomputing everything from first principles at every moment; as you discover important new implications of those principles, or effective new heuristics of action, you store them in memory and act directly on them later on. The ultimate basis of such complex decision-making in an AI would be pragmatically relevant only under certain circumstances, as when it was asked to justify a particular decision from first principles, or when it was faced with very novel situations, or when it was engaged in thought experiments.)
Although I have been insisting that the output of CEV—something like a theory of ethical action—must be independent, not only of what we all want from moment to moment, but even independent of many basic human realities (because it must have something to say about entities which don’t have those qualities or needs), it cannot be entirely independent of human nature. We all agree that certain outcomes are bad. That agreement is coming from something in us; a rock, for example, would not agree, or disagree. So some part of human decision-making cognition is necessary for a Friendly AI to be recognized as such. The whole idea of CEV is about extracting that part of how we work, and copying it across to silicon.
This is where the applied neuroscience comes in. We cannot trust pure thought to get this right. Even pure thought combined with psychological experiment is probably not enough; we need to understand what the brain is doing when we make these judgments. At the same time, pure neuroscience is not enough either; it would just give us a neutral causal description of how the brain works; it wouldn’t tell us how to normatively employ that information in making a Friendly AI. Thus, applied neuroscience. The human beings who set a CEV process in motion would need to avoid two things: they would have to avoid wrong apriori normatives, and they would have to avoid wrong implicit normatives. By an implicit normative, I mean something, some factor in their thinking and their practice, which isn’t explicitly recognized as helping to determine the CEV outcome, but which is doing so anyway.
I’m saying a lot which isn’t in the existing expositions of CEV (e.g. Yudkowsky, Mijic, Nesov), but it comes just from taking the philosophy and adding the fact that all this information about human meta-preferences is meant to come from the study of the brain. That’s called neuroscience, and in principle even fallible human cognition, in the form of human neuroscience, may figure all this out before we ever have self-enhancing AIs. In other words, we may reach a point where the combination of psychology, philosophy, neuroscience really is telling us, this is what humans actually want, or want to want, etc. (Though it may be hard for non-experts to tell the difference between false premature claims of such knowledge, and the real thing.)
All that could happen even before there is a Singularity, and in that case the strategy for a Friendly outcome will be able to dispense with automating the deduction of neuroethical first principles, and concentrate on simply ensuring that the first transhuman AI operates according to those principles, rather than according to some utility function which, when pursued with superhuman intelligence, leads to disaster. But the CEV philosophy, and the idea of “reflective decision theory”, is meant to offer a way forward, if we do figure out artificial intelligence before we figure out artificial ethics.