I don’t yet have much of an opinion on what the best way to do it is, I’m just saying it needs doing. We need more brains on the problem. Eliezer’s meta-ethics is, I think, far from obviously correct. Moving toward normative ethics, CEV is also not obviously the correct solution for Friendly AI, though it is a good research proposal. The fate of the galaxy cannot rest on Eliezer’s moral philosophy alone.
We need critically-minded people to say, “I don’t think that’s right, and here are four arguments why.” And then Eliezer can argue back, or change his position. And then the others can argue back, or change their positions. This is standard procedure for solving difficult problems, but as of yet I haven’t seen much published dialectic like this in trying to figure out the normative foundations for the Friendly AI project.
Let me give you an explicit example. CEV takes extrapolated human values as the source of an AI’s eventually-constructed utility function. Is that the right way to go about things, or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth? What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?
or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function
...this sentence makes me think that we really aren’t on the same page at all with respect to naturalistic metaethics. What is a reason for action? How would a computer program enumerate them all?
A ‘reason for action’ is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically—i.e. that they have intrinsic value apart from being valued by an agent.
A source of normativity (a reason for action) is anything that grounds/justifies an ‘ought’ or ‘should’ statement. Why should I look both ways before crossing the street? Presumably, this ‘should’ is justified by reference to my desires, which could be gravely thwarted if I do not look both ways before crossing the street. If I strongly desired to be run over by cars, the ‘should’ statement might no longer be justified. Some people might say I should look both ways anyway, because God’s command to always look before crossing a street provides me with reason for action to do that even if it doesn’t help fulfill my desires. But I don’t believe that proposed reason for action exists.
I wonder, since it’s important to stay pragmatic, if it would be good to design a “toy example” for this sort of ethics.
It seems like the hard problem here is to infer reasons for action, from an individual’s actions. People do all sorts of things; but how can you tell from those choices what they really value? Can you infer a utility function from people’s choices, or are there sets of choices that don’t necessarily follow any utility function?
The sorts of “toy” examples I’m thinking of here are situations where the agent has a finite number of choices. Let’s say you have Pac-Man in a maze. His choices are his movements in four cardinal directions. You watch Pac-Man play many games; you see what he does when he’s attacked by a ghost; you see what he does when he can find something tasty to eat; you see when he’s willing to risk the danger to get the food.
From this, I imagine you could do some hidden Markov stuff to infer a model of Pac-Man’s behavior—perhaps an if-then tree.
Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don’t know how to systematize that more broadly.)
From this, could you do an “extrapolated” model of what Pac-Man would do if he knew when and where the ghosts were coming? Sure—and that would be, if I’ve understood correctly, CEV for Pac-Man.
It seems to me that, more subtle philosophy aside, this is what we’re trying to do. I haven’t read the literature lukeprog has, but it seems to me that Pac-Man’s “reasons for actions” are completely described by that if-then tree of his behavior. Why didn’t he go left that time? Because there was a ghost there. Why does that matter? Because Pac-Man always goes away from ghosts. (You could say: Pac-Man desires to avoid ghosts.)
It also seems to me, not that I really know this line of work, that one incremental thing that can be done towards CEV (or some other sort of practical metaethics) is this kind of toy model. Yes, ultimately understanding human motivation is a huge psychology and neuroscience problem, but before we can assimilate those quantities of data we may want to make sure we know what to do in the simple cases.
Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don’t know how to systematize that more broadly.)
Something like:
Run simulations of agents that can chose randomly out of the same actions as the agent has. Look for regularities in the world state that occur more or less frequently in the sensible agent compared to random agent. Those things could be said to be what it likes and dislikes respectively.
To determine terminal vs instrumental values look at the decision tree and see which of the states gets chosen when a choice is forced.
Perhaps the next step would be to add to the model a notion of second-order desire, or analyze a Pac-Man whose apparent terminal values can change when they’re exposed to certain experiences or moral arguments.
I think the reason you’re having trouble with the standard philosophical category of “reasons for action” is because you have the admirable quality of being confused by that which is confused. I think the “reasons for action” category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let’s call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.
How would a computer program enumerate all motives (which according to me, is co-exensional with “all reasons for action”)? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.
But really, I’m not yet proposing a solution. What I’ve described above doesn’t even reflect my own meta-ethics. It’s just an example. I’m merely raising questions that need to be considered very carefully.
And of course I’m not the only one to do so. Others have raised concerns about CEV and its underlying meta-ethical assumptions. Will Newsome raised some common worries about CEV and proposed computational axiology instead. Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
The philosophical foundations of the Friendly AI project need more philosophical examination, I think. Perhaps you are very confident about your meta-ethical views and about CEV; I don’t know. But I’m not confident about them. And as you say, we’ve only got one shot at this. We need to make sure we get it right. Right?
As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems
Now, it’s just a wild guess here, but I’m guessing that a lot of philosophers who use the language “reasons for action” would disagree that “knowing the Baby-eaters evolved to eat babies” is a reason to eat babies. Am I wrong?
I’m merely raising questions that need to be considered very carefully.
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
I tend to be a bit gruff around people who merely raise questions
Interesting. I always assumed that raising a question was the first step toward answering it—especially if you don’t want yourself to be the only person who tries to answer it. The point of a post like the one we’re commenting on is that hopefully one or more people will say, “Huh, yeah, it’s important that we get this issue right,” and devote some brain energy to getting it right.
I’m sure the “figure it out and move on” track doesn’t meet much with the “I’m uncomfortable settling on an answer” track, but what about the “pose important questions so we can work together to settle on an answer” track? I see myself on that third track, engaging in both the ‘pose important questions’ and the ‘settle on an answer’ projects.
Interesting. I always assumed that raising a question was the first step toward answering it
Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around “raising questions”—the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who’d rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don’t want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.
So yes, if there’s an interim solution on the table and someone says “Ah, but surely we must ask more questions” instead of “No, you idiot, can’t you see that there’s a better way” or “But it looks to me like the preponderance of evidence is actually pointing in this here other direction”, alarms do go off inside my head. There’s a failure mode of answering too prematurely, but when someone talks explicitly about the importance of raising questions—this being language that is mainly explicitly used within the failure-mode groups—alarms go off and I want to see it demonstrated that they can think in terms of definite answers and preponderances of evidence at all besides just raising questions; I want a demonstration that true curiosity, wanting an actual answer, isn’t dead inside them, and that they have the mental capacity to do what’s needed to that effect—namely, weigh evidence in the scales and arrive at a non-balanced answer, or propose alternative solutions that are supposed to be better.
I’m impressed with your blog, by the way, and generally consider you to be a more adept rationalist than the above paragraphs might imply—but when it comes to this particular matter of metaethics, I’m not quite sure that you strike me as aggressive enough that if you had twenty years to sort out the mess, I would come back twenty years later and find you with a sheet of paper with the correct answer written on it, as opposed to a paper full of questions that clearly need to be very carefully considered.
Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I’m in either the 1st category or the 4th category.
Let me, then, be very clear:
I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.
I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don’t have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn’t discover the urgency of these problems until about 6 months ago. I’ve have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn’t happen, well, I hope to at least enable others who can work on this problem full time, like yourself.
I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most people in general, snort the water they’re drinking through their nose. On the other hand, the time it takes to solve a problem expands to meet the time you’re given. For many philosophers, the time we have to answer the questions is… billions of years. For me, and people like me, it’s a few decades.
Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you’re moving at a reasonable clip.
On the other hand, a good metaethical answer to the question “What sort of stuff is morality made out of?” is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car’s wheels than like finding the Lost Ark. I’m not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you’re setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup… but it doesn’t seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn’t seem like a good amount of time for absorbing someone else’s solution. If you plan to do something interesting with your life requiring correct metaethics then maybe we should have a Skype videocall or even an in-person meeting at some point.
The main open moral question SIAI actually does need a concrete answer to is “How exactly does one go about construing an extrapolated volition from the giant mess that is a human mind?”, which takes good metaethics as a background assumption but is fundamentally a moral question rather than a metaethical one. On the other hand, I think we’ve basically got covered “What sort of stuff is this mysterious rightness?”
What did you think of the free will sequence as a template for doing naturalistic cognitive philosophy where the first question is always “What algorithm feels from the inside like my philosophical intutions?”
I should add that I don’t think I will have meta-ethical solutions in 15 years, significantly because I’m not optimistic that I can get someone pay my living expenses while I do 15 years of research. (Why should they? I haven’t proven my abilities.) But I think these problems are answerable, and that we are in a fantastic position to answer them if we want to do so. We know an awful lot about physics, psychology, logic, neuroscience, AI, and so on. Even experts that were active 15 years before now did not have all these advantages. More importantly, most thinkers today do not even take advantage of them.
Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.
Seconding JGWeissman — you’d probably be accepted as a Visiting Fellow in an instant, and if you turn out to be sufficiently good at the kind of research and thinking that they need to have done, maybe you could join them as a paid researcher.
I want to solve these problems in 15 years, not 20. … the time it takes to solve a problem expands to meet the time you’re given.
15 years is much too much; if you haven’t solved metaethics after 15 years of serious effort, you probably never will. The only things that’re actually time consuming on that scale are getting stopped with no idea how to proceed, and wrong turns into muck. I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I suppose if you let “sufficiently clear thinker” do enough work this is just trivial.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
It is also worth keeping in mind, that scientific discoveries routinely impact the concepts we use to understand the world. The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history. So it seems plausible the crucial insight needed for a successful metaethics will come from a scientific discovery that someone concentrating on philosophy for a month wouldn’t make.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
Supposing anyone had already succeeded, how strong an expectation do you think we should have of knowing about it?
Not all that strong. It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
But I think “success” might actually be underdetermined here. Some philosophers may have had the right insights, but I suspect that if they had communicated those insights in the formal method necessary for Friendly AI the insights would have felt insightful to readers and the papers would have gotten attention. Of course, I’m not even familiar with cutting edge metaethics. There may well be something like that out there. It doesn’t help that no one here seems willing to actually read philosophy in non-blog format.
It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history.
I think it’s correct, but it’s definitely not detailed; some major questions, like “how to weight and reconcile conflicting preferences”, are skipped entirely.
I think it’s correct, but it’s definitely not detailed;
What do you believe to be the reasons? Didn’t he try or fail? I’m trying to fathom what kind of person is a sufficiently clear thinker. If not even EY is a sufficiently clear thinker, then your statement that such a person could come up with a detailed metaethics in a month seems self-evident. If someone is a sufficiently clear thinker to accomplish a certain task then they will complete it if they try. What’s the point? It sounds like you are saying that there are many smart people that could accomplish the task if they only tried. But if in fact EY is not one of them, that’s bad.
Yesterday I read In Praise of Boredom. It seems that EY also views intelligence as something proactive:
...if I ever do fully understand the algorithms of intelligence, it will destroy all remaining novelty—no matter what new situation I encounter, I’ll know I can solve it just by being intelligent...
No doubt I am a complete layman when it comes to what intelligence is. But as far as I am aware it is a kind of goal-oriented evolutionary process equipped with a memory. It is evolutionary insofar as it still needs to stumble upon novelty. Intelligence is not a meta-solution but an efficient searchlight that helps to discover unknown unknowns. Intelligence is also a tool that can efficiently exploit previous discoveries, combine and permute them. But claiming that you just have to be sufficiently intelligent to solve a given problem sounds like it is more than that. I don’t see that. I think that if something crucial is missing, something you don’t know that it is missing, you’ll have to discover it first and not invent it by the sheer power of intelligence.
A month sounds considerably overoptimistic to me. Wrong steps and backtracking are probably to be expected, and it would probably be irresponsible to commit to a solution before allowing other intelligent people (who really want to find the right answer, not carry on endless debate) to review it in detail. For a sufficiently intelligent and committed worker, I would not be surprised if they could produce a reliably correct metaethical theory within two years, perhaps one, but a month strikes me as too restrictive.
the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered
Of course, this one applies to scaremongers in general, not just technophobes.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
Knowing the Baby-eaters want to eat babies is a reason for them to eat babies. It is not a reason for us to let them eat babies. My biggest problem with desirism in general is that it provides no reason for us to want to fulfill others’ desires. Saying that they want to fulfill their desires is obvious. Whether we help or hinder them is based entirely on our own reasons for action.
Desirism claims that moral value exists as a relation between desires and states of affairs.
Desirism claims that desires themselves are the primary objects of moral evaluation.
Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.
The moral thing to do is to shape my desires to fulfill others’ desires, insofar as they are malleable. This is what I meant by “we should want to fulfill others’ desires,” though I acknowledge that a significant amount of precision and clarity was lost in the original statement. Is this all correct?
The desirism FAQ needs updating, and is not a very clear presentation of the theory, I think.
One problem is that much of the theory is really just a linguistic proposal. That’s true for all moral theories, but it can be difficult to separate the linguistic from the factual claims. I think Alonzo Fyfe and I are doing a better job of that in our podcast. The latest episode is The Claims of Desirism, Part 1.
Unfortunately, we’re not making moral claims yet. In meta-ethics, there is just too much groundwork to lay down first. Kinda like how Eliezer took like like 200 posts to build up to talking about meta-ethics.
Is knowing that Baby-eaters want babies to be eaten a reason, on your view, to design an FAI that optimizes its surroundings for (among other things) baby-eating?
I very much doubt it. Even if we assume my own current meta-ethical views are correct—an assumption I don’t have much confidence in—this wouldn’t leave us with reason to design an FAI that optimizes its surroundings for (among other things) baby-eating. Really, this goes back to a lot of classical objections to utilitarianism.
For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I’m just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
Eliezer-2007 quotes Robyn Dawes, saying that the below is “so true it’s not even funny”:
Norman R. F. Maier noted that when a group faces a problem, the natural tendency of its members is to propose possible solutions as they begin to discuss the problem. Consequently, the group interaction focuses on the merits and problems of the proposed solutions, people become emotionally attached to the ones they have suggested, and superior solutions are not suggested. Maier enacted an edict to enhance group problem solving: “Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any.”
...
I have often used this edict with groups I have led—particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately. While I have no objective criterion on which to judge the quality of the problem solving of the groups, Maier’s edict appears to foster better solutions to problems.
Is this a change of attitude, or am I just not finding the synthesis?
Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.
There’s a big different between “not immediately” and “never”. Don’t propose a solution immediately, but do at least have a detailed working guess at a solution (which can be used to move to the next problem) in a year. Don’t “merely” raise a question, make sure that finding an answer is also part of the agenda.
It’s a matter of the twelfth virtue of rationality, the intention to cut through to the answer, whatever the technique. The purpose of holding off on proposing solutions is to better find solutions, not to stop at asking the question.
I suggest that he still holds both of those positions (at least, I know I do so do not see why he wouldn’t) but that they apply to slightly different contexts. Eliezer’s elaboration in the descendant comments from the first quote seemed to illustrate why fairly well. They also, if I recall, allowed that you do not fit into the ‘actually answering is unsophisticated’ crowd, which further narrows down just what he is meaning.
The impression I get is that EY-2011 believes that he has already taken the necessary time to understand the relevant issues, questions, and problems and that his proposed solution is therefore unlikely to be improved upon by further up-front thinking about the problem, rather than by working on implementing the solution he has in mind and seeing what difficulties come up.
Whether that’s a change of attitude, IMHO, depends a lot on whether his initial standard for what counts as an adequate understanding of the relevant issues, questions, and problems was met, or whether it was lowered.
I’m not really sure what that initial standard was in the first place, so I have no idea which is the case. Nor am I sure it matters; presumably what matters more is whether the current standard is adequate.
The point of the Dawes quote is to hold off on proposing solutions until you’ve thoroughly comprehended the issue, so that you get better solutions. It doesn’t advocate discussing problems simply for the sake of discussing them. Between both quotes there’s a consistent position that the point is to get the right answer, and discussing the question only has a point insofar as it leads to getting that answer. If you’re discussing the question without proposing solutions ad infinitum, you’re not accomplishing anything.
Keep in mind that talking with regard to solutions is just so darn useful. Even if you propose an overly specific solution early, than it has a large surface area of features that can be attacked to prove it incompatible with the problem. You can often salvage and mutate what’s left of the broken idea. There’s not a lot of harm in that, rather there is a natural give and take whereby dismissing a proposed solution requires identifying what part of the problem requirements are contradicted, and it may very well not have occurred to you to specify that requirement in the first place.
I believe it has been observed that experts almost always talk in terms of candidate solutions, and amateurs attempt to build up from a platform of the problem itself. Experts of course having objectively better performance. The algorithm for provably moral superintelligences might not have a lot of prior solutions to draw from, but you could, for instance, find some inspiration even from the outside view of how some human political systems have maintained generally moral dispositions.
There is a bias to associate your status with ideas you have vocalized in the past since they reflect on the quality of your thinking, but you can’t throw the baby out with the bathwater.
The Maier quote comes off as way to strong for me. And what’s with this conclusion:
While I have no objective criterion on which to judge the quality of the problem solving of the groups, Maier’s edict appears to foster better solutions to problems.
I think there’s a synthesis possible. There’s a purpose of finding a solid answer, but finding it requires a period of exploration rather than getting extremely specific in the beginning of the search.
If you don’t spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track?
Just asking. :)
Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.
I presume you encounter them later on. Maybe while doing more ground-level thinking about how to actually implement your meta-ethics you realise that it isn’t quite coherent.
I’m not sure if this flying-by-the-seat-of-your-pants approach is best, but as has been pointed out before, there are costs associated with taking too long as well as with not being careful enough, there must come a point where the risk is too small and the time it would take to fix it too long.
Well, I’ll certainly agree that more potential problems are surfaced by moving ahead with the implementation than by going back to the customer with another round of questions about the requirements.
I can see that you might question the usefulness of the notion of a “reason for action” as something over and above the notion of “ought”, but I don’t see a better case for thinking that “reason for action” is confused.
The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical “ought” statements? If so, why?
Perhaps I should note that philosophers talking this way make a distinction between “motivating reasons” and “normative reasons”. A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)
Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone’s well-being to increase. The strength of R is the extent to which that person’s well-being increases. One weighs one’s reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It’s pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)
Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).
The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it’s pretty clear that you don’t HAVE to think about things this way, but it is far from clear that this is confused/incoherent.
Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don’t necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that’s important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I’m sure Luke knows.)
For the record, as a good old Humean I’m currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don’t always, and perhaps rarely, support maximizing X, and I don’t think external reasons for maximizing X exist. For example, I don’t think X has intrinsic value (in Korsgaard’s sense of “intrinsic value”).
Thanks for the link to that paper on rational choice theories and decision theories!
Categorical oughts and reasons have always confused me. What do you see as the difference, and which type of each are you thinking of? The types of categorical reasons or reasons with which I’m most familiar are Kant’s and Korsgaard’s.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don’t use the term “categorical reason” any differently than “external reason”.
S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought.
I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
Hard to be confident about these things, but I don’t see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don’t see it.
Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
Nitpick: Wallach & Collin are cited only for the term ‘artificial moral agents’ (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.
A ‘reason for action’ is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically—i.e. that they have intrinsic value apart from being valued by an agent.
Okay, but all of those (to the extent that they’re coherent) are observations about human axiology. Beware of committing the mind projection fallacy with respect to compellingness — you find those to be plausible sources of normativity because your brain is that of “a particular species of primate on planet Earth”. If your AI were looking for “reasons for action” that would compel all agents, it would find nothing, and if it were looking for all of the “reasons for action” that would compel each possible agent, it would spend an infinite amount of time enumerating stupid pointless motivations. It would eventually notice categorical imperatives, fairness, compassion, etc. but it would also notice drives based on the phase of the moon, based on the extrapolated desires of submarines (according to any number of possible submarine-volition-extrapolating dynamics), based on looking at how people would want to be treated and reversing that, based on the number of living cats in the world modulo 241, based on modeling people as potted plants and considering the direction their leaves are waving...
Sorry… what I said above is not quite right. There are norms that are not reasons for action. For example, epistemological norms might be called ‘reasons to believe.’ ‘Reasons for action’ are the norms relevant to, for example, prudential normativity and moral normativity.
This is either horribly confusing, or horribly confused. I think that what’s going on here is that you (or the sources you’re getting this from) have taken a bundle of incompatible moral theories, identified a role that each of them has a part playing, and generalized a term from one of those theories inappropriately.
The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that “reason for action” does not carve reality, or morality, at the joints.
I’m sort of surprised by how people are taking the notion of “reason for action”. Isn’t this a familiar process when making a decision?
For all courses of action you’re thinking of taking, identify the features (consequences if you that’s you think about things) that count in favor of taking that course of action and those that count against it.
Consider how those considerations weigh against each other. (Do the pros outweigh the cons, by how much, etc.)
Then choose the thing that does best in this weighing process.
The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that “reason for action” does not carve reality, or morality, at the joints.
It is not a presupposition of the people talking this way that if R is a reason to do A in a context C, then R is a reason to do in all contexts.
The people talking this way also understand that a single R might be both a reason to do A and a reason to believe X at the same time. You could also have R be a reason to believe X and a reason to cause yourself to not believe X. Why do you think these things make the discourse incoherent/non-perspicuous? This seems no more puzzling than the familiar fact that believing a certain thing could be epistemically irrational but prudentially rational to (cause yourself) to believe.
or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth?
All the reasons for action that exist? Like, the preferences of all possible minds? I’m not sure that utility function would be computable...
Edit: Actually, if we suppose that all minds are computable, then there’s only a countably infinite number of possible minds, and for any mind with a utility function U(x), there is a mind somewhere in that set with the utility function -U(x). So, depending on how you weight the various possible utility functions, it may be that they’d all cancel out.
What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?
Notice that you’re a human but you care about that. If there weren’t something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn’t have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that’s exactly the sort of thing that CEV is designed to take into account. Don’t you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it?
And if we were all much smarter and still largely didn’t think it was a good idea to care about the interests of other intelligent species… I really don’t think that’ll happen, but honestly, I’ll have to defer to the judgment of our extrapolated selves. They’re smarter and wiser than me, and they’ve heard more of the arguments and evidence than I have. :)
Notice that you’re a human but you care about that. If there weren’t something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn’t have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that’s exactly the sort of thing that CEV is designed to take into account.
The same argument applies to just using one person as the template and saying that their preference already includes caring about all the other people.
The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it’s what you should do. Fairness, also being merely a heuristic, is subject to further improvement, as can be inclusion of volition of aliens in the original definition.
Of course, you might want to fall back to a “reflective injunction” of not inventing overly elaborate plans, since you haven’t had the capability of examining them well enough to rule them superior to more straightforward plans, such as using volition of a single human. But this is still a decision point, and the correct answer is not obvious.
The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it’s what you should do.
This reminds me of the story of the people who encounter a cake, one of whom claims that what’s “fair” is that they get all the cake for themself. It would be a mistake for us to come to a compromise with them on the meaning of “fair”.
Does the argument for including everyone in CEV also argue for including everyone in a discussion of what fairness is?
Don’t you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it?
But making humans more intelligent, more rational would mean to alter their volition. An FAI that would proactively make people become more educated would be similar to one that altered the desires of humans directly. If it told them that the holy Qur’an is not the word of God it would dramatically change their desires. But what if people actually don’t want to learn that truth? In other words, any superhuman intelligence will have a very strong observer effect and will cause a subsequent feedback loop that will shape the future according to the original seed AI, or the influence of its creators. You can’t expect to create a God and still be able to extrapolate the natural desires of human beings. Human desires are not just a fact about their evolutionary history but also a mixture of superstructural parts like environmental and cultural influences. If you have some AI God leading humans into the future then at some point you have altered all those structures and consequently changed human volition. The smallest bias in the original seed AI will be maximized over time by the feedback between the FAI and its human pets.
ETA You could argue that all that matters is the evolutionary template for the human brain. The best way to satisfy it maximally is what we want, what is right. But leaving aside the evolution of culture and the environment seems drastic. Why not go a step further and create a new better mind as well?
I also think it is a mistake to generalize from the people you currently know to be intelligent and reasonable as they might be outliers. Since I am a vegetarian I am used to people telling me that they understand what it means to eat meat but that they don’t care. We should not rule out the possibility that the extrapolated volition of humanity is actually something that would appear horrible and selfish to us “freaks”.
I really don’t think that’ll happen, but honestly, I’ll have to defer to the judgment of our extrapolated selves. They’re smarter and wiser than me, and they’ve heard more of the arguments and evidence than I have.
That is only reasonable if matters of taste are really subject to rational argumentation and judgement. If it really doesn’t matter if we desire pleasure or pain then focusing on smarts might either lead to an infinite regress or nihilism.
Judging from his posts and comments here, I conclude that EY is less interested in dialectic than in laying out his arguments so that other people can learn from them and build on them. So I wouldn’t expect critically-minded people to necessarily trigger such a dialectic.
That said, perhaps that’s an artifact of discussion happening with a self-selected crowd of Internet denizens… that can exhaust anybody. So perhaps a different result would emerge if a different group of critically-minded people, people EY sees as peers, got involved. The Hanson/Yudkowsky debate about FOOMing had more of a dialectic structure, for example.
With respect to your example, the discussion here might be a starting place for that discussion, btw. The discussions here and here and here might also be salient.
Incidentally: the anticipated relationship between what humans want, what various subsets of humans want, and what various supersets including humans want, is one of the first questions I asked when I encountered the CEV notion.
I haven’t gotten an explicit answer, but it does seem (based on other posts/discussions) that on EY’s view a nonhuman intelligent species valuing something isn’t something that should motivate our behavior at all, one way or another. We might prefer to satisfy that species’ preferences, or we might not, but either way what should be motivating our behavior on EY’s view is our preferences, not theirs. What matters on this view is what matters to humans; what doesn’t matter to humans doesn’t matter.
I’m not sure if I buy that, but satisfying “all the reasons for action that exist” does seem to be a step in the wrong direction.
Thanks for the links! I don’t know what “satisfying all the reasons for action that exist” is the solution, but I listed it as an example alternative to Eliezer’s theory. Do you have a preferred solution?
Rolling back to fundamentals: reducing questions about right actions to questions about likely and preferred results seems reasonable. So does treating the likely results of an action as an empirical question. So does approaching an individual’s interests empirically, and as distinct from their beliefs about their interests, assuming they have any. The latter also allows for taking into account the interests of non-sapient and non-sentient individuals, which seems like a worthwhile goal.
Extrapolating a group’s collective interests from the individual interests of its members is still unpleasantly mysterious to me, except in the fortuitous special case where individual interests happen to align neatly. Treating this as an optimization problem with multiple weighted goals is the best approach I know of, but I’m not happy with it; it has lots of problems I don’t know how to resolve.
Much to my chagrin, some method for doing this seems necessary if we are to account for individual interests in groups whose members aren’t peers (e.g., children, infants, fetuses, animals, sufferers of various impairments, minority groups, etc., etc., etc.), which seems good to address.
It’s also at least useful to addressing groups of peers whose interests don’t neatly align… though I’m more sanguine about marketplace competition as an alternative way of addressing that.
Something like this may also turn out to be critical for fully accounting for even an individual human’s interests, if it turns out that the interests of the various sub-agents of a typical human don’t align neatly, which seems plausible.
Accounting for the probable interests of probable entities (e.g., aliens) I’m even more uncertain about. I don’t discount them a priori, but without a clearer understanding of such an accounting would actually look like I really don’t know what to say about them. I guess if we have grounds for reliably estimating the probability of a particular interest being had by a particular entity, then it’s just a subset of the general weighting problem, but… I dunno.
I reject accounting for the posited interests of counterfactual entities, although I can see where the line between that and probabilistic entities as above is hard to specify.
To respond to your example (while agreeing that it is good to have more intelligent people evaluating things like CEV and the meta-ethics that motivates it):
I think the CEV approach is sufficiently meta that if we would conclude on meeting and learning about the aliens, and considering their moral significance, that the right thing to do involves giving weight to their preferences, then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.
then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.
If they are to be given weight at all, then this could as well be done in advance, so prior to observing aliens we give weight to preferences of all possible aliens, conditionally on future observations of which ones turn out to actually exist.
From a perspective of pure math, I think that is the same thing, but in considering practical computability, it does not seem like a good use of computing power to figure what weight to give the preference of a particular alien civilization out of a vast space of possible civilizations, until observing that the particular civilization exists.
One such regularity comes to mind: most aliens would rather be discovered by a superintelligence that was friendly to them than not be discovered, so spreading and searching would optimize their preferences.
I don’t yet have much of an opinion on what the best way to do it is, I’m just saying it needs doing. We need more brains on the problem. Eliezer’s meta-ethics is, I think, far from obviously correct. Moving toward normative ethics, CEV is also not obviously the correct solution for Friendly AI, though it is a good research proposal. The fate of the galaxy cannot rest on Eliezer’s moral philosophy alone.
We need critically-minded people to say, “I don’t think that’s right, and here are four arguments why.” And then Eliezer can argue back, or change his position. And then the others can argue back, or change their positions. This is standard procedure for solving difficult problems, but as of yet I haven’t seen much published dialectic like this in trying to figure out the normative foundations for the Friendly AI project.
Let me give you an explicit example. CEV takes extrapolated human values as the source of an AI’s eventually-constructed utility function. Is that the right way to go about things, or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth? What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?
...this sentence makes me think that we really aren’t on the same page at all with respect to naturalistic metaethics. What is a reason for action? How would a computer program enumerate them all?
A ‘reason for action’ is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically—i.e. that they have intrinsic value apart from being valued by an agent.
A source of normativity (a reason for action) is anything that grounds/justifies an ‘ought’ or ‘should’ statement. Why should I look both ways before crossing the street? Presumably, this ‘should’ is justified by reference to my desires, which could be gravely thwarted if I do not look both ways before crossing the street. If I strongly desired to be run over by cars, the ‘should’ statement might no longer be justified. Some people might say I should look both ways anyway, because God’s command to always look before crossing a street provides me with reason for action to do that even if it doesn’t help fulfill my desires. But I don’t believe that proposed reason for action exists.
Okay, see, this is why I have trouble talking to philosophers in their quote standard language unquote.
I’ll ask again: How would a computer program enumerate all reasons for action?
I wonder, since it’s important to stay pragmatic, if it would be good to design a “toy example” for this sort of ethics.
It seems like the hard problem here is to infer reasons for action, from an individual’s actions. People do all sorts of things; but how can you tell from those choices what they really value? Can you infer a utility function from people’s choices, or are there sets of choices that don’t necessarily follow any utility function?
The sorts of “toy” examples I’m thinking of here are situations where the agent has a finite number of choices. Let’s say you have Pac-Man in a maze. His choices are his movements in four cardinal directions. You watch Pac-Man play many games; you see what he does when he’s attacked by a ghost; you see what he does when he can find something tasty to eat; you see when he’s willing to risk the danger to get the food.
From this, I imagine you could do some hidden Markov stuff to infer a model of Pac-Man’s behavior—perhaps an if-then tree.
Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don’t know how to systematize that more broadly.)
From this, could you do an “extrapolated” model of what Pac-Man would do if he knew when and where the ghosts were coming? Sure—and that would be, if I’ve understood correctly, CEV for Pac-Man.
It seems to me that, more subtle philosophy aside, this is what we’re trying to do. I haven’t read the literature lukeprog has, but it seems to me that Pac-Man’s “reasons for actions” are completely described by that if-then tree of his behavior. Why didn’t he go left that time? Because there was a ghost there. Why does that matter? Because Pac-Man always goes away from ghosts. (You could say: Pac-Man desires to avoid ghosts.)
It also seems to me, not that I really know this line of work, that one incremental thing that can be done towards CEV (or some other sort of practical metaethics) is this kind of toy model. Yes, ultimately understanding human motivation is a huge psychology and neuroscience problem, but before we can assimilate those quantities of data we may want to make sure we know what to do in the simple cases.
Something like:
Run simulations of agents that can chose randomly out of the same actions as the agent has. Look for regularities in the world state that occur more or less frequently in the sensible agent compared to random agent. Those things could be said to be what it likes and dislikes respectively.
To determine terminal vs instrumental values look at the decision tree and see which of the states gets chosen when a choice is forced.
Thanks. Come to think of it that’s exactly the right answer.
Perhaps the next step would be to add to the model a notion of second-order desire, or analyze a Pac-Man whose apparent terminal values can change when they’re exposed to certain experiences or moral arguments.
Eliezer,
I think the reason you’re having trouble with the standard philosophical category of “reasons for action” is because you have the admirable quality of being confused by that which is confused. I think the “reasons for action” category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let’s call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.
How would a computer program enumerate all motives (which according to me, is co-exensional with “all reasons for action”)? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.
But really, I’m not yet proposing a solution. What I’ve described above doesn’t even reflect my own meta-ethics. It’s just an example. I’m merely raising questions that need to be considered very carefully.
And of course I’m not the only one to do so. Others have raised concerns about CEV and its underlying meta-ethical assumptions. Will Newsome raised some common worries about CEV and proposed computational axiology instead. Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
The philosophical foundations of the Friendly AI project need more philosophical examination, I think. Perhaps you are very confident about your meta-ethical views and about CEV; I don’t know. But I’m not confident about them. And as you say, we’ve only got one shot at this. We need to make sure we get it right. Right?
Now, it’s just a wild guess here, but I’m guessing that a lot of philosophers who use the language “reasons for action” would disagree that “knowing the Baby-eaters evolved to eat babies” is a reason to eat babies. Am I wrong?
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
Interesting. I always assumed that raising a question was the first step toward answering it—especially if you don’t want yourself to be the only person who tries to answer it. The point of a post like the one we’re commenting on is that hopefully one or more people will say, “Huh, yeah, it’s important that we get this issue right,” and devote some brain energy to getting it right.
I’m sure the “figure it out and move on” track doesn’t meet much with the “I’m uncomfortable settling on an answer” track, but what about the “pose important questions so we can work together to settle on an answer” track? I see myself on that third track, engaging in both the ‘pose important questions’ and the ‘settle on an answer’ projects.
Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around “raising questions”—the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who’d rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don’t want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.
So yes, if there’s an interim solution on the table and someone says “Ah, but surely we must ask more questions” instead of “No, you idiot, can’t you see that there’s a better way” or “But it looks to me like the preponderance of evidence is actually pointing in this here other direction”, alarms do go off inside my head. There’s a failure mode of answering too prematurely, but when someone talks explicitly about the importance of raising questions—this being language that is mainly explicitly used within the failure-mode groups—alarms go off and I want to see it demonstrated that they can think in terms of definite answers and preponderances of evidence at all besides just raising questions; I want a demonstration that true curiosity, wanting an actual answer, isn’t dead inside them, and that they have the mental capacity to do what’s needed to that effect—namely, weigh evidence in the scales and arrive at a non-balanced answer, or propose alternative solutions that are supposed to be better.
I’m impressed with your blog, by the way, and generally consider you to be a more adept rationalist than the above paragraphs might imply—but when it comes to this particular matter of metaethics, I’m not quite sure that you strike me as aggressive enough that if you had twenty years to sort out the mess, I would come back twenty years later and find you with a sheet of paper with the correct answer written on it, as opposed to a paper full of questions that clearly need to be very carefully considered.
Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I’m in either the 1st category or the 4th category.
Let me, then, be very clear:
I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.
I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don’t have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn’t discover the urgency of these problems until about 6 months ago. I’ve have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn’t happen, well, I hope to at least enable others who can work on this problem full time, like yourself.
I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most people in general, snort the water they’re drinking through their nose. On the other hand, the time it takes to solve a problem expands to meet the time you’re given. For many philosophers, the time we have to answer the questions is… billions of years. For me, and people like me, it’s a few decades.
Any response to this, Eliezer?
Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you’re moving at a reasonable clip.
On the other hand, a good metaethical answer to the question “What sort of stuff is morality made out of?” is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car’s wheels than like finding the Lost Ark. I’m not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you’re setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup… but it doesn’t seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn’t seem like a good amount of time for absorbing someone else’s solution. If you plan to do something interesting with your life requiring correct metaethics then maybe we should have a Skype videocall or even an in-person meeting at some point.
The main open moral question SIAI actually does need a concrete answer to is “How exactly does one go about construing an extrapolated volition from the giant mess that is a human mind?”, which takes good metaethics as a background assumption but is fundamentally a moral question rather than a metaethical one. On the other hand, I think we’ve basically got covered “What sort of stuff is this mysterious rightness?”
What did you think of the free will sequence as a template for doing naturalistic cognitive philosophy where the first question is always “What algorithm feels from the inside like my philosophical intutions?”
I should add that I don’t think I will have meta-ethical solutions in 15 years, significantly because I’m not optimistic that I can get someone pay my living expenses while I do 15 years of research. (Why should they? I haven’t proven my abilities.) But I think these problems are answerable, and that we are in a fantastic position to answer them if we want to do so. We know an awful lot about physics, psychology, logic, neuroscience, AI, and so on. Even experts that were active 15 years before now did not have all these advantages. More importantly, most thinkers today do not even take advantage of them.
Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.
Seconding JGWeissman — you’d probably be accepted as a Visiting Fellow in an instant, and if you turn out to be sufficiently good at the kind of research and thinking that they need to have done, maybe you could join them as a paid researcher.
15 years is much too much; if you haven’t solved metaethics after 15 years of serious effort, you probably never will. The only things that’re actually time consuming on that scale are getting stopped with no idea how to proceed, and wrong turns into muck. I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I suppose if you let “sufficiently clear thinker” do enough work this is just trivial.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
It is also worth keeping in mind, that scientific discoveries routinely impact the concepts we use to understand the world. The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history. So it seems plausible the crucial insight needed for a successful metaethics will come from a scientific discovery that someone concentrating on philosophy for a month wouldn’t make.
Supposing anyone had already succeeded, how strong an expectation do you think we should have of knowing about it?
Not all that strong. It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
But I think “success” might actually be underdetermined here. Some philosophers may have had the right insights, but I suspect that if they had communicated those insights in the formal method necessary for Friendly AI the insights would have felt insightful to readers and the papers would have gotten attention. Of course, I’m not even familiar with cutting edge metaethics. There may well be something like that out there. It doesn’t help that no one here seems willing to actually read philosophy in non-blog format.
Yep:
Related question: suppose someone handed us a successful solution, would we recognize it?
Yep.
So Yudkowsky came up with a correct and detailed metaethics but failed to communicate it?
I think it’s correct, but it’s definitely not detailed; some major questions, like “how to weight and reconcile conflicting preferences”, are skipped entirely.
What do you believe to be the reasons? Didn’t he try or fail? I’m trying to fathom what kind of person is a sufficiently clear thinker. If not even EY is a sufficiently clear thinker, then your statement that such a person could come up with a detailed metaethics in a month seems self-evident. If someone is a sufficiently clear thinker to accomplish a certain task then they will complete it if they try. What’s the point? It sounds like you are saying that there are many smart people that could accomplish the task if they only tried. But if in fact EY is not one of them, that’s bad.
Yesterday I read In Praise of Boredom. It seems that EY also views intelligence as something proactive:
No doubt I am a complete layman when it comes to what intelligence is. But as far as I am aware it is a kind of goal-oriented evolutionary process equipped with a memory. It is evolutionary insofar as it still needs to stumble upon novelty. Intelligence is not a meta-solution but an efficient searchlight that helps to discover unknown unknowns. Intelligence is also a tool that can efficiently exploit previous discoveries, combine and permute them. But claiming that you just have to be sufficiently intelligent to solve a given problem sounds like it is more than that. I don’t see that. I think that if something crucial is missing, something you don’t know that it is missing, you’ll have to discover it first and not invent it by the sheer power of intelligence.
By “a sufficiently clear thinker” you mean an AI++, right? :)
Nah, an AI++ would take maybe five minutes.
A month sounds considerably overoptimistic to me. Wrong steps and backtracking are probably to be expected, and it would probably be irresponsible to commit to a solution before allowing other intelligent people (who really want to find the right answer, not carry on endless debate) to review it in detail. For a sufficiently intelligent and committed worker, I would not be surprised if they could produce a reliably correct metaethical theory within two years, perhaps one, but a month strikes me as too restrictive.
Of course, this one applies to scaremongers in general, not just technophobes.
Knowing the Baby-eaters want to eat babies is a reason for them to eat babies. It is not a reason for us to let them eat babies. My biggest problem with desirism in general is that it provides no reason for us to want to fulfill others’ desires. Saying that they want to fulfill their desires is obvious. Whether we help or hinder them is based entirely on our own reasons for action.
That’s not a bug, it’s a feature.
Are you familiar with desirism? It says that we should want to fulfill others’ desires, but, AFAI can tell, gives no reason why.
No. This is not what desirism says.
From your desirism FAQ:
The moral thing to do is to shape my desires to fulfill others’ desires, insofar as they are malleable. This is what I meant by “we should want to fulfill others’ desires,” though I acknowledge that a significant amount of precision and clarity was lost in the original statement. Is this all correct?
The desirism FAQ needs updating, and is not a very clear presentation of the theory, I think.
One problem is that much of the theory is really just a linguistic proposal. That’s true for all moral theories, but it can be difficult to separate the linguistic from the factual claims. I think Alonzo Fyfe and I are doing a better job of that in our podcast. The latest episode is The Claims of Desirism, Part 1.
I will listen to that.
Unfortunately, we’re not making moral claims yet. In meta-ethics, there is just too much groundwork to lay down first. Kinda like how Eliezer took like like 200 posts to build up to talking about meta-ethics.
So, just to make sure, what I said in the grandparent is not what desirism says?
Ah, oops. I wasn’t familiar with it, and I misunderstood the sentence.
Is knowing that Baby-eaters want babies to be eaten a reason, on your view, to design an FAI that optimizes its surroundings for (among other things) baby-eating?
I very much doubt it. Even if we assume my own current meta-ethical views are correct—an assumption I don’t have much confidence in—this wouldn’t leave us with reason to design an FAI that optimizes its surroundings for (among other things) baby-eating. Really, this goes back to a lot of classical objections to utilitarianism.
For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I’m just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.
Here’s an interesting juxtaposition...
Eliezer-2011 writes:
Eliezer-2007 quotes Robyn Dawes, saying that the below is “so true it’s not even funny”:
Is this a change of attitude, or am I just not finding the synthesis?
Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.
There’s a big different between “not immediately” and “never”. Don’t propose a solution immediately, but do at least have a detailed working guess at a solution (which can be used to move to the next problem) in a year. Don’t “merely” raise a question, make sure that finding an answer is also part of the agenda.
It’s a matter of the twelfth virtue of rationality, the intention to cut through to the answer, whatever the technique. The purpose of holding off on proposing solutions is to better find solutions, not to stop at asking the question.
I suggest that he still holds both of those positions (at least, I know I do so do not see why he wouldn’t) but that they apply to slightly different contexts. Eliezer’s elaboration in the descendant comments from the first quote seemed to illustrate why fairly well. They also, if I recall, allowed that you do not fit into the ‘actually answering is unsophisticated’ crowd, which further narrows down just what he is meaning.
The impression I get is that EY-2011 believes that he has already taken the necessary time to understand the relevant issues, questions, and problems and that his proposed solution is therefore unlikely to be improved upon by further up-front thinking about the problem, rather than by working on implementing the solution he has in mind and seeing what difficulties come up.
Whether that’s a change of attitude, IMHO, depends a lot on whether his initial standard for what counts as an adequate understanding of the relevant issues, questions, and problems was met, or whether it was lowered.
I’m not really sure what that initial standard was in the first place, so I have no idea which is the case. Nor am I sure it matters; presumably what matters more is whether the current standard is adequate.
The point of the Dawes quote is to hold off on proposing solutions until you’ve thoroughly comprehended the issue, so that you get better solutions. It doesn’t advocate discussing problems simply for the sake of discussing them. Between both quotes there’s a consistent position that the point is to get the right answer, and discussing the question only has a point insofar as it leads to getting that answer. If you’re discussing the question without proposing solutions ad infinitum, you’re not accomplishing anything.
Keep in mind that talking with regard to solutions is just so darn useful. Even if you propose an overly specific solution early, than it has a large surface area of features that can be attacked to prove it incompatible with the problem. You can often salvage and mutate what’s left of the broken idea. There’s not a lot of harm in that, rather there is a natural give and take whereby dismissing a proposed solution requires identifying what part of the problem requirements are contradicted, and it may very well not have occurred to you to specify that requirement in the first place.
I believe it has been observed that experts almost always talk in terms of candidate solutions, and amateurs attempt to build up from a platform of the problem itself. Experts of course having objectively better performance. The algorithm for provably moral superintelligences might not have a lot of prior solutions to draw from, but you could, for instance, find some inspiration even from the outside view of how some human political systems have maintained generally moral dispositions.
There is a bias to associate your status with ideas you have vocalized in the past since they reflect on the quality of your thinking, but you can’t throw the baby out with the bathwater.
The Maier quote comes off as way to strong for me. And what’s with this conclusion:
I think there’s a synthesis possible. There’s a purpose of finding a solid answer, but finding it requires a period of exploration rather than getting extremely specific in the beginning of the search.
If you don’t spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track?
Just asking. :)
Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.
I presume you encounter them later on. Maybe while doing more ground-level thinking about how to actually implement your meta-ethics you realise that it isn’t quite coherent.
I’m not sure if this flying-by-the-seat-of-your-pants approach is best, but as has been pointed out before, there are costs associated with taking too long as well as with not being careful enough, there must come a point where the risk is too small and the time it would take to fix it too long.
Well, I’ll certainly agree that more potential problems are surfaced by moving ahead with the implementation than by going back to the customer with another round of questions about the requirements.
I can see that you might question the usefulness of the notion of a “reason for action” as something over and above the notion of “ought”, but I don’t see a better case for thinking that “reason for action” is confused.
The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical “ought” statements? If so, why?
Perhaps I should note that philosophers talking this way make a distinction between “motivating reasons” and “normative reasons”. A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)
Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone’s well-being to increase. The strength of R is the extent to which that person’s well-being increases. One weighs one’s reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It’s pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)
Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).
The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it’s pretty clear that you don’t HAVE to think about things this way, but it is far from clear that this is confused/incoherent.
Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don’t necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that’s important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I’m sure Luke knows.)
utilitymonster,
For the record, as a good old Humean I’m currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don’t always, and perhaps rarely, support maximizing X, and I don’t think external reasons for maximizing X exist. For example, I don’t think X has intrinsic value (in Korsgaard’s sense of “intrinsic value”).
Thanks for the link to that paper on rational choice theories and decision theories!
So are categorical reasons any worse off than categorical oughts?
Categorical oughts and reasons have always confused me. What do you see as the difference, and which type of each are you thinking of? The types of categorical reasons or reasons with which I’m most familiar are Kant’s and Korsgaard’s.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don’t use the term “categorical reason” any differently than “external reason”.
S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought.
I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
Agreed. And I’m skeptical of both. You?
Hard to be confident about these things, but I don’t see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don’t see it.
Nitpick: Wallach & Collin are cited only for the term ‘artificial moral agents’ (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.
Oops. Thanks for the correction.
Okay, but all of those (to the extent that they’re coherent) are observations about human axiology. Beware of committing the mind projection fallacy with respect to compellingness — you find those to be plausible sources of normativity because your brain is that of “a particular species of primate on planet Earth”. If your AI were looking for “reasons for action” that would compel all agents, it would find nothing, and if it were looking for all of the “reasons for action” that would compel each possible agent, it would spend an infinite amount of time enumerating stupid pointless motivations. It would eventually notice categorical imperatives, fairness, compassion, etc. but it would also notice drives based on the phase of the moon, based on the extrapolated desires of submarines (according to any number of possible submarine-volition-extrapolating dynamics), based on looking at how people would want to be treated and reversing that, based on the number of living cats in the world modulo 241, based on modeling people as potted plants and considering the direction their leaves are waving...
If you want to be run over by cars, you should still look both ways.
You might miss otherwise!
One way might be enough, in that case.
That depends entirely on the street, and the direction you choose to look. ;)
Depends on how soon you insist it happen.
Sorry… what I said above is not quite right. There are norms that are not reasons for action. For example, epistemological norms might be called ‘reasons to believe.’ ‘Reasons for action’ are the norms relevant to, for example, prudential normativity and moral normativity.
This is either horribly confusing, or horribly confused. I think that what’s going on here is that you (or the sources you’re getting this from) have taken a bundle of incompatible moral theories, identified a role that each of them has a part playing, and generalized a term from one of those theories inappropriately.
The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that “reason for action” does not carve reality, or morality, at the joints.
I’m sort of surprised by how people are taking the notion of “reason for action”. Isn’t this a familiar process when making a decision?
For all courses of action you’re thinking of taking, identify the features (consequences if you that’s you think about things) that count in favor of taking that course of action and those that count against it.
Consider how those considerations weigh against each other. (Do the pros outweigh the cons, by how much, etc.)
Then choose the thing that does best in this weighing process.
It is not a presupposition of the people talking this way that if R is a reason to do A in a context C, then R is a reason to do in all contexts.
The people talking this way also understand that a single R might be both a reason to do A and a reason to believe X at the same time. You could also have R be a reason to believe X and a reason to cause yourself to not believe X. Why do you think these things make the discourse incoherent/non-perspicuous? This seems no more puzzling than the familiar fact that believing a certain thing could be epistemically irrational but prudentially rational to (cause yourself) to believe.
All the reasons for action that exist? Like, the preferences of all possible minds? I’m not sure that utility function would be computable...
Edit: Actually, if we suppose that all minds are computable, then there’s only a countably infinite number of possible minds, and for any mind with a utility function U(x), there is a mind somewhere in that set with the utility function -U(x). So, depending on how you weight the various possible utility functions, it may be that they’d all cancel out.
Notice that you’re a human but you care about that. If there weren’t something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn’t have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that’s exactly the sort of thing that CEV is designed to take into account. Don’t you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it?
And if we were all much smarter and still largely didn’t think it was a good idea to care about the interests of other intelligent species… I really don’t think that’ll happen, but honestly, I’ll have to defer to the judgment of our extrapolated selves. They’re smarter and wiser than me, and they’ve heard more of the arguments and evidence than I have. :)
The same argument applies to just using one person as the template and saying that their preference already includes caring about all the other people.
The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it’s what you should do. Fairness, also being merely a heuristic, is subject to further improvement, as can be inclusion of volition of aliens in the original definition.
Of course, you might want to fall back to a “reflective injunction” of not inventing overly elaborate plans, since you haven’t had the capability of examining them well enough to rule them superior to more straightforward plans, such as using volition of a single human. But this is still a decision point, and the correct answer is not obvious.
This reminds me of the story of the people who encounter a cake, one of whom claims that what’s “fair” is that they get all the cake for themself. It would be a mistake for us to come to a compromise with them on the meaning of “fair”.
Does the argument for including everyone in CEV also argue for including everyone in a discussion of what fairness is?
But making humans more intelligent, more rational would mean to alter their volition. An FAI that would proactively make people become more educated would be similar to one that altered the desires of humans directly. If it told them that the holy Qur’an is not the word of God it would dramatically change their desires. But what if people actually don’t want to learn that truth? In other words, any superhuman intelligence will have a very strong observer effect and will cause a subsequent feedback loop that will shape the future according to the original seed AI, or the influence of its creators. You can’t expect to create a God and still be able to extrapolate the natural desires of human beings. Human desires are not just a fact about their evolutionary history but also a mixture of superstructural parts like environmental and cultural influences. If you have some AI God leading humans into the future then at some point you have altered all those structures and consequently changed human volition. The smallest bias in the original seed AI will be maximized over time by the feedback between the FAI and its human pets.
ETA You could argue that all that matters is the evolutionary template for the human brain. The best way to satisfy it maximally is what we want, what is right. But leaving aside the evolution of culture and the environment seems drastic. Why not go a step further and create a new better mind as well?
I also think it is a mistake to generalize from the people you currently know to be intelligent and reasonable as they might be outliers. Since I am a vegetarian I am used to people telling me that they understand what it means to eat meat but that they don’t care. We should not rule out the possibility that the extrapolated volition of humanity is actually something that would appear horrible and selfish to us “freaks”.
That is only reasonable if matters of taste are really subject to rational argumentation and judgement. If it really doesn’t matter if we desire pleasure or pain then focusing on smarts might either lead to an infinite regress or nihilism.
Judging from his posts and comments here, I conclude that EY is less interested in dialectic than in laying out his arguments so that other people can learn from them and build on them. So I wouldn’t expect critically-minded people to necessarily trigger such a dialectic.
That said, perhaps that’s an artifact of discussion happening with a self-selected crowd of Internet denizens… that can exhaust anybody. So perhaps a different result would emerge if a different group of critically-minded people, people EY sees as peers, got involved. The Hanson/Yudkowsky debate about FOOMing had more of a dialectic structure, for example.
With respect to your example, the discussion here might be a starting place for that discussion, btw. The discussions here and here and here might also be salient.
Incidentally: the anticipated relationship between what humans want, what various subsets of humans want, and what various supersets including humans want, is one of the first questions I asked when I encountered the CEV notion.
I haven’t gotten an explicit answer, but it does seem (based on other posts/discussions) that on EY’s view a nonhuman intelligent species valuing something isn’t something that should motivate our behavior at all, one way or another. We might prefer to satisfy that species’ preferences, or we might not, but either way what should be motivating our behavior on EY’s view is our preferences, not theirs. What matters on this view is what matters to humans; what doesn’t matter to humans doesn’t matter.
I’m not sure if I buy that, but satisfying “all the reasons for action that exist” does seem to be a step in the wrong direction.
TheOtherDave,
Thanks for the links! I don’t know what “satisfying all the reasons for action that exist” is the solution, but I listed it as an example alternative to Eliezer’s theory. Do you have a preferred solution?
Not really.
Rolling back to fundamentals: reducing questions about right actions to questions about likely and preferred results seems reasonable. So does treating the likely results of an action as an empirical question. So does approaching an individual’s interests empirically, and as distinct from their beliefs about their interests, assuming they have any. The latter also allows for taking into account the interests of non-sapient and non-sentient individuals, which seems like a worthwhile goal.
Extrapolating a group’s collective interests from the individual interests of its members is still unpleasantly mysterious to me, except in the fortuitous special case where individual interests happen to align neatly. Treating this as an optimization problem with multiple weighted goals is the best approach I know of, but I’m not happy with it; it has lots of problems I don’t know how to resolve.
Much to my chagrin, some method for doing this seems necessary if we are to account for individual interests in groups whose members aren’t peers (e.g., children, infants, fetuses, animals, sufferers of various impairments, minority groups, etc., etc., etc.), which seems good to address.
It’s also at least useful to addressing groups of peers whose interests don’t neatly align… though I’m more sanguine about marketplace competition as an alternative way of addressing that.
Something like this may also turn out to be critical for fully accounting for even an individual human’s interests, if it turns out that the interests of the various sub-agents of a typical human don’t align neatly, which seems plausible.
Accounting for the probable interests of probable entities (e.g., aliens) I’m even more uncertain about. I don’t discount them a priori, but without a clearer understanding of such an accounting would actually look like I really don’t know what to say about them. I guess if we have grounds for reliably estimating the probability of a particular interest being had by a particular entity, then it’s just a subset of the general weighting problem, but… I dunno.
I reject accounting for the posited interests of counterfactual entities, although I can see where the line between that and probabilistic entities as above is hard to specify.
Does that answer your question?
To respond to your example (while agreeing that it is good to have more intelligent people evaluating things like CEV and the meta-ethics that motivates it):
I think the CEV approach is sufficiently meta that if we would conclude on meeting and learning about the aliens, and considering their moral significance, that the right thing to do involves giving weight to their preferences, then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.
If they are to be given weight at all, then this could as well be done in advance, so prior to observing aliens we give weight to preferences of all possible aliens, conditionally on future observations of which ones turn out to actually exist.
From a perspective of pure math, I think that is the same thing, but in considering practical computability, it does not seem like a good use of computing power to figure what weight to give the preference of a particular alien civilization out of a vast space of possible civilizations, until observing that the particular civilization exists.
Such considerations could have some regularities even across all the diverse possibilities, which are easy to notice with a Saturn-sized mind.
One such regularity comes to mind: most aliens would rather be discovered by a superintelligence that was friendly to them than not be discovered, so spreading and searching would optimize their preferences.