I think the reason you’re having trouble with the standard philosophical category of “reasons for action” is because you have the admirable quality of being confused by that which is confused. I think the “reasons for action” category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let’s call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.
How would a computer program enumerate all motives (which according to me, is co-exensional with “all reasons for action”)? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.
But really, I’m not yet proposing a solution. What I’ve described above doesn’t even reflect my own meta-ethics. It’s just an example. I’m merely raising questions that need to be considered very carefully.
And of course I’m not the only one to do so. Others have raised concerns about CEV and its underlying meta-ethical assumptions. Will Newsome raised some common worries about CEV and proposed computational axiology instead. Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
The philosophical foundations of the Friendly AI project need more philosophical examination, I think. Perhaps you are very confident about your meta-ethical views and about CEV; I don’t know. But I’m not confident about them. And as you say, we’ve only got one shot at this. We need to make sure we get it right. Right?
As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems
Now, it’s just a wild guess here, but I’m guessing that a lot of philosophers who use the language “reasons for action” would disagree that “knowing the Baby-eaters evolved to eat babies” is a reason to eat babies. Am I wrong?
I’m merely raising questions that need to be considered very carefully.
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
I tend to be a bit gruff around people who merely raise questions
Interesting. I always assumed that raising a question was the first step toward answering it—especially if you don’t want yourself to be the only person who tries to answer it. The point of a post like the one we’re commenting on is that hopefully one or more people will say, “Huh, yeah, it’s important that we get this issue right,” and devote some brain energy to getting it right.
I’m sure the “figure it out and move on” track doesn’t meet much with the “I’m uncomfortable settling on an answer” track, but what about the “pose important questions so we can work together to settle on an answer” track? I see myself on that third track, engaging in both the ‘pose important questions’ and the ‘settle on an answer’ projects.
Interesting. I always assumed that raising a question was the first step toward answering it
Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around “raising questions”—the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who’d rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don’t want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.
So yes, if there’s an interim solution on the table and someone says “Ah, but surely we must ask more questions” instead of “No, you idiot, can’t you see that there’s a better way” or “But it looks to me like the preponderance of evidence is actually pointing in this here other direction”, alarms do go off inside my head. There’s a failure mode of answering too prematurely, but when someone talks explicitly about the importance of raising questions—this being language that is mainly explicitly used within the failure-mode groups—alarms go off and I want to see it demonstrated that they can think in terms of definite answers and preponderances of evidence at all besides just raising questions; I want a demonstration that true curiosity, wanting an actual answer, isn’t dead inside them, and that they have the mental capacity to do what’s needed to that effect—namely, weigh evidence in the scales and arrive at a non-balanced answer, or propose alternative solutions that are supposed to be better.
I’m impressed with your blog, by the way, and generally consider you to be a more adept rationalist than the above paragraphs might imply—but when it comes to this particular matter of metaethics, I’m not quite sure that you strike me as aggressive enough that if you had twenty years to sort out the mess, I would come back twenty years later and find you with a sheet of paper with the correct answer written on it, as opposed to a paper full of questions that clearly need to be very carefully considered.
Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I’m in either the 1st category or the 4th category.
Let me, then, be very clear:
I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.
I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don’t have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn’t discover the urgency of these problems until about 6 months ago. I’ve have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn’t happen, well, I hope to at least enable others who can work on this problem full time, like yourself.
I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most people in general, snort the water they’re drinking through their nose. On the other hand, the time it takes to solve a problem expands to meet the time you’re given. For many philosophers, the time we have to answer the questions is… billions of years. For me, and people like me, it’s a few decades.
Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you’re moving at a reasonable clip.
On the other hand, a good metaethical answer to the question “What sort of stuff is morality made out of?” is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car’s wheels than like finding the Lost Ark. I’m not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you’re setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup… but it doesn’t seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn’t seem like a good amount of time for absorbing someone else’s solution. If you plan to do something interesting with your life requiring correct metaethics then maybe we should have a Skype videocall or even an in-person meeting at some point.
The main open moral question SIAI actually does need a concrete answer to is “How exactly does one go about construing an extrapolated volition from the giant mess that is a human mind?”, which takes good metaethics as a background assumption but is fundamentally a moral question rather than a metaethical one. On the other hand, I think we’ve basically got covered “What sort of stuff is this mysterious rightness?”
What did you think of the free will sequence as a template for doing naturalistic cognitive philosophy where the first question is always “What algorithm feels from the inside like my philosophical intutions?”
I should add that I don’t think I will have meta-ethical solutions in 15 years, significantly because I’m not optimistic that I can get someone pay my living expenses while I do 15 years of research. (Why should they? I haven’t proven my abilities.) But I think these problems are answerable, and that we are in a fantastic position to answer them if we want to do so. We know an awful lot about physics, psychology, logic, neuroscience, AI, and so on. Even experts that were active 15 years before now did not have all these advantages. More importantly, most thinkers today do not even take advantage of them.
Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.
Seconding JGWeissman — you’d probably be accepted as a Visiting Fellow in an instant, and if you turn out to be sufficiently good at the kind of research and thinking that they need to have done, maybe you could join them as a paid researcher.
I want to solve these problems in 15 years, not 20. … the time it takes to solve a problem expands to meet the time you’re given.
15 years is much too much; if you haven’t solved metaethics after 15 years of serious effort, you probably never will. The only things that’re actually time consuming on that scale are getting stopped with no idea how to proceed, and wrong turns into muck. I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I suppose if you let “sufficiently clear thinker” do enough work this is just trivial.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
It is also worth keeping in mind, that scientific discoveries routinely impact the concepts we use to understand the world. The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history. So it seems plausible the crucial insight needed for a successful metaethics will come from a scientific discovery that someone concentrating on philosophy for a month wouldn’t make.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
Supposing anyone had already succeeded, how strong an expectation do you think we should have of knowing about it?
Not all that strong. It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
But I think “success” might actually be underdetermined here. Some philosophers may have had the right insights, but I suspect that if they had communicated those insights in the formal method necessary for Friendly AI the insights would have felt insightful to readers and the papers would have gotten attention. Of course, I’m not even familiar with cutting edge metaethics. There may well be something like that out there. It doesn’t help that no one here seems willing to actually read philosophy in non-blog format.
It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history.
I think it’s correct, but it’s definitely not detailed; some major questions, like “how to weight and reconcile conflicting preferences”, are skipped entirely.
I think it’s correct, but it’s definitely not detailed;
What do you believe to be the reasons? Didn’t he try or fail? I’m trying to fathom what kind of person is a sufficiently clear thinker. If not even EY is a sufficiently clear thinker, then your statement that such a person could come up with a detailed metaethics in a month seems self-evident. If someone is a sufficiently clear thinker to accomplish a certain task then they will complete it if they try. What’s the point? It sounds like you are saying that there are many smart people that could accomplish the task if they only tried. But if in fact EY is not one of them, that’s bad.
Yesterday I read In Praise of Boredom. It seems that EY also views intelligence as something proactive:
...if I ever do fully understand the algorithms of intelligence, it will destroy all remaining novelty—no matter what new situation I encounter, I’ll know I can solve it just by being intelligent...
No doubt I am a complete layman when it comes to what intelligence is. But as far as I am aware it is a kind of goal-oriented evolutionary process equipped with a memory. It is evolutionary insofar as it still needs to stumble upon novelty. Intelligence is not a meta-solution but an efficient searchlight that helps to discover unknown unknowns. Intelligence is also a tool that can efficiently exploit previous discoveries, combine and permute them. But claiming that you just have to be sufficiently intelligent to solve a given problem sounds like it is more than that. I don’t see that. I think that if something crucial is missing, something you don’t know that it is missing, you’ll have to discover it first and not invent it by the sheer power of intelligence.
A month sounds considerably overoptimistic to me. Wrong steps and backtracking are probably to be expected, and it would probably be irresponsible to commit to a solution before allowing other intelligent people (who really want to find the right answer, not carry on endless debate) to review it in detail. For a sufficiently intelligent and committed worker, I would not be surprised if they could produce a reliably correct metaethical theory within two years, perhaps one, but a month strikes me as too restrictive.
the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered
Of course, this one applies to scaremongers in general, not just technophobes.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
Knowing the Baby-eaters want to eat babies is a reason for them to eat babies. It is not a reason for us to let them eat babies. My biggest problem with desirism in general is that it provides no reason for us to want to fulfill others’ desires. Saying that they want to fulfill their desires is obvious. Whether we help or hinder them is based entirely on our own reasons for action.
Desirism claims that moral value exists as a relation between desires and states of affairs.
Desirism claims that desires themselves are the primary objects of moral evaluation.
Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.
The moral thing to do is to shape my desires to fulfill others’ desires, insofar as they are malleable. This is what I meant by “we should want to fulfill others’ desires,” though I acknowledge that a significant amount of precision and clarity was lost in the original statement. Is this all correct?
The desirism FAQ needs updating, and is not a very clear presentation of the theory, I think.
One problem is that much of the theory is really just a linguistic proposal. That’s true for all moral theories, but it can be difficult to separate the linguistic from the factual claims. I think Alonzo Fyfe and I are doing a better job of that in our podcast. The latest episode is The Claims of Desirism, Part 1.
Unfortunately, we’re not making moral claims yet. In meta-ethics, there is just too much groundwork to lay down first. Kinda like how Eliezer took like like 200 posts to build up to talking about meta-ethics.
Is knowing that Baby-eaters want babies to be eaten a reason, on your view, to design an FAI that optimizes its surroundings for (among other things) baby-eating?
I very much doubt it. Even if we assume my own current meta-ethical views are correct—an assumption I don’t have much confidence in—this wouldn’t leave us with reason to design an FAI that optimizes its surroundings for (among other things) baby-eating. Really, this goes back to a lot of classical objections to utilitarianism.
For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I’m just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
Eliezer-2007 quotes Robyn Dawes, saying that the below is “so true it’s not even funny”:
Norman R. F. Maier noted that when a group faces a problem, the natural tendency of its members is to propose possible solutions as they begin to discuss the problem. Consequently, the group interaction focuses on the merits and problems of the proposed solutions, people become emotionally attached to the ones they have suggested, and superior solutions are not suggested. Maier enacted an edict to enhance group problem solving: “Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any.”
...
I have often used this edict with groups I have led—particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately. While I have no objective criterion on which to judge the quality of the problem solving of the groups, Maier’s edict appears to foster better solutions to problems.
Is this a change of attitude, or am I just not finding the synthesis?
Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.
There’s a big different between “not immediately” and “never”. Don’t propose a solution immediately, but do at least have a detailed working guess at a solution (which can be used to move to the next problem) in a year. Don’t “merely” raise a question, make sure that finding an answer is also part of the agenda.
It’s a matter of the twelfth virtue of rationality, the intention to cut through to the answer, whatever the technique. The purpose of holding off on proposing solutions is to better find solutions, not to stop at asking the question.
I suggest that he still holds both of those positions (at least, I know I do so do not see why he wouldn’t) but that they apply to slightly different contexts. Eliezer’s elaboration in the descendant comments from the first quote seemed to illustrate why fairly well. They also, if I recall, allowed that you do not fit into the ‘actually answering is unsophisticated’ crowd, which further narrows down just what he is meaning.
The impression I get is that EY-2011 believes that he has already taken the necessary time to understand the relevant issues, questions, and problems and that his proposed solution is therefore unlikely to be improved upon by further up-front thinking about the problem, rather than by working on implementing the solution he has in mind and seeing what difficulties come up.
Whether that’s a change of attitude, IMHO, depends a lot on whether his initial standard for what counts as an adequate understanding of the relevant issues, questions, and problems was met, or whether it was lowered.
I’m not really sure what that initial standard was in the first place, so I have no idea which is the case. Nor am I sure it matters; presumably what matters more is whether the current standard is adequate.
The point of the Dawes quote is to hold off on proposing solutions until you’ve thoroughly comprehended the issue, so that you get better solutions. It doesn’t advocate discussing problems simply for the sake of discussing them. Between both quotes there’s a consistent position that the point is to get the right answer, and discussing the question only has a point insofar as it leads to getting that answer. If you’re discussing the question without proposing solutions ad infinitum, you’re not accomplishing anything.
Keep in mind that talking with regard to solutions is just so darn useful. Even if you propose an overly specific solution early, than it has a large surface area of features that can be attacked to prove it incompatible with the problem. You can often salvage and mutate what’s left of the broken idea. There’s not a lot of harm in that, rather there is a natural give and take whereby dismissing a proposed solution requires identifying what part of the problem requirements are contradicted, and it may very well not have occurred to you to specify that requirement in the first place.
I believe it has been observed that experts almost always talk in terms of candidate solutions, and amateurs attempt to build up from a platform of the problem itself. Experts of course having objectively better performance. The algorithm for provably moral superintelligences might not have a lot of prior solutions to draw from, but you could, for instance, find some inspiration even from the outside view of how some human political systems have maintained generally moral dispositions.
There is a bias to associate your status with ideas you have vocalized in the past since they reflect on the quality of your thinking, but you can’t throw the baby out with the bathwater.
The Maier quote comes off as way to strong for me. And what’s with this conclusion:
While I have no objective criterion on which to judge the quality of the problem solving of the groups, Maier’s edict appears to foster better solutions to problems.
I think there’s a synthesis possible. There’s a purpose of finding a solid answer, but finding it requires a period of exploration rather than getting extremely specific in the beginning of the search.
If you don’t spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track?
Just asking. :)
Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.
I presume you encounter them later on. Maybe while doing more ground-level thinking about how to actually implement your meta-ethics you realise that it isn’t quite coherent.
I’m not sure if this flying-by-the-seat-of-your-pants approach is best, but as has been pointed out before, there are costs associated with taking too long as well as with not being careful enough, there must come a point where the risk is too small and the time it would take to fix it too long.
Well, I’ll certainly agree that more potential problems are surfaced by moving ahead with the implementation than by going back to the customer with another round of questions about the requirements.
I can see that you might question the usefulness of the notion of a “reason for action” as something over and above the notion of “ought”, but I don’t see a better case for thinking that “reason for action” is confused.
The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical “ought” statements? If so, why?
Perhaps I should note that philosophers talking this way make a distinction between “motivating reasons” and “normative reasons”. A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)
Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone’s well-being to increase. The strength of R is the extent to which that person’s well-being increases. One weighs one’s reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It’s pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)
Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).
The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it’s pretty clear that you don’t HAVE to think about things this way, but it is far from clear that this is confused/incoherent.
Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don’t necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that’s important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I’m sure Luke knows.)
For the record, as a good old Humean I’m currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don’t always, and perhaps rarely, support maximizing X, and I don’t think external reasons for maximizing X exist. For example, I don’t think X has intrinsic value (in Korsgaard’s sense of “intrinsic value”).
Thanks for the link to that paper on rational choice theories and decision theories!
Categorical oughts and reasons have always confused me. What do you see as the difference, and which type of each are you thinking of? The types of categorical reasons or reasons with which I’m most familiar are Kant’s and Korsgaard’s.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don’t use the term “categorical reason” any differently than “external reason”.
S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought.
I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
Hard to be confident about these things, but I don’t see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don’t see it.
Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
Nitpick: Wallach & Collin are cited only for the term ‘artificial moral agents’ (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.
Eliezer,
I think the reason you’re having trouble with the standard philosophical category of “reasons for action” is because you have the admirable quality of being confused by that which is confused. I think the “reasons for action” category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let’s call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.
How would a computer program enumerate all motives (which according to me, is co-exensional with “all reasons for action”)? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.
But really, I’m not yet proposing a solution. What I’ve described above doesn’t even reflect my own meta-ethics. It’s just an example. I’m merely raising questions that need to be considered very carefully.
And of course I’m not the only one to do so. Others have raised concerns about CEV and its underlying meta-ethical assumptions. Will Newsome raised some common worries about CEV and proposed computational axiology instead. Tarleton’s 2010 paper compares CEV to an alternative proposed by Wallach & Collin.
The philosophical foundations of the Friendly AI project need more philosophical examination, I think. Perhaps you are very confident about your meta-ethical views and about CEV; I don’t know. But I’m not confident about them. And as you say, we’ve only got one shot at this. We need to make sure we get it right. Right?
Now, it’s just a wild guess here, but I’m guessing that a lot of philosophers who use the language “reasons for action” would disagree that “knowing the Baby-eaters evolved to eat babies” is a reason to eat babies. Am I wrong?
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don’t expect those two tracks to meet much.
I count myself among the philosophers who would say that “knowing the Baby-eaters want to eat babies” is not a reason (for me) to eat babies. Some philosophers don’t even think that the Baby-eaters’ desires to eat babies are reasons for them to eat babies, not even defeasible reasons.
Interesting. I always assumed that raising a question was the first step toward answering it—especially if you don’t want yourself to be the only person who tries to answer it. The point of a post like the one we’re commenting on is that hopefully one or more people will say, “Huh, yeah, it’s important that we get this issue right,” and devote some brain energy to getting it right.
I’m sure the “figure it out and move on” track doesn’t meet much with the “I’m uncomfortable settling on an answer” track, but what about the “pose important questions so we can work together to settle on an answer” track? I see myself on that third track, engaging in both the ‘pose important questions’ and the ‘settle on an answer’ projects.
Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around “raising questions”—the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who’d rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don’t want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by “raising questions” that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.
So yes, if there’s an interim solution on the table and someone says “Ah, but surely we must ask more questions” instead of “No, you idiot, can’t you see that there’s a better way” or “But it looks to me like the preponderance of evidence is actually pointing in this here other direction”, alarms do go off inside my head. There’s a failure mode of answering too prematurely, but when someone talks explicitly about the importance of raising questions—this being language that is mainly explicitly used within the failure-mode groups—alarms go off and I want to see it demonstrated that they can think in terms of definite answers and preponderances of evidence at all besides just raising questions; I want a demonstration that true curiosity, wanting an actual answer, isn’t dead inside them, and that they have the mental capacity to do what’s needed to that effect—namely, weigh evidence in the scales and arrive at a non-balanced answer, or propose alternative solutions that are supposed to be better.
I’m impressed with your blog, by the way, and generally consider you to be a more adept rationalist than the above paragraphs might imply—but when it comes to this particular matter of metaethics, I’m not quite sure that you strike me as aggressive enough that if you had twenty years to sort out the mess, I would come back twenty years later and find you with a sheet of paper with the correct answer written on it, as opposed to a paper full of questions that clearly need to be very carefully considered.
Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I’m in either the 1st category or the 4th category.
Let me, then, be very clear:
I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.
I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don’t have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn’t discover the urgency of these problems until about 6 months ago. I’ve have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn’t happen, well, I hope to at least enable others who can work on this problem full time, like yourself.
I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most people in general, snort the water they’re drinking through their nose. On the other hand, the time it takes to solve a problem expands to meet the time you’re given. For many philosophers, the time we have to answer the questions is… billions of years. For me, and people like me, it’s a few decades.
Any response to this, Eliezer?
Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you’re moving at a reasonable clip.
On the other hand, a good metaethical answer to the question “What sort of stuff is morality made out of?” is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car’s wheels than like finding the Lost Ark. I’m not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you’re setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup… but it doesn’t seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn’t seem like a good amount of time for absorbing someone else’s solution. If you plan to do something interesting with your life requiring correct metaethics then maybe we should have a Skype videocall or even an in-person meeting at some point.
The main open moral question SIAI actually does need a concrete answer to is “How exactly does one go about construing an extrapolated volition from the giant mess that is a human mind?”, which takes good metaethics as a background assumption but is fundamentally a moral question rather than a metaethical one. On the other hand, I think we’ve basically got covered “What sort of stuff is this mysterious rightness?”
What did you think of the free will sequence as a template for doing naturalistic cognitive philosophy where the first question is always “What algorithm feels from the inside like my philosophical intutions?”
I should add that I don’t think I will have meta-ethical solutions in 15 years, significantly because I’m not optimistic that I can get someone pay my living expenses while I do 15 years of research. (Why should they? I haven’t proven my abilities.) But I think these problems are answerable, and that we are in a fantastic position to answer them if we want to do so. We know an awful lot about physics, psychology, logic, neuroscience, AI, and so on. Even experts that were active 15 years before now did not have all these advantages. More importantly, most thinkers today do not even take advantage of them.
Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.
Seconding JGWeissman — you’d probably be accepted as a Visiting Fellow in an instant, and if you turn out to be sufficiently good at the kind of research and thinking that they need to have done, maybe you could join them as a paid researcher.
15 years is much too much; if you haven’t solved metaethics after 15 years of serious effort, you probably never will. The only things that’re actually time consuming on that scale are getting stopped with no idea how to proceed, and wrong turns into muck. I see no reason why a sufficiently clear thinker couldn’t finish a correct and detailed metaethics in a month.
I suppose if you let “sufficiently clear thinker” do enough work this is just trivial.
But it’s a sui generis problem… I’m not sure what information a time table could be based on other than the fact that it has been way longer than a month and no one has succeeded yet.
It is also worth keeping in mind, that scientific discoveries routinely impact the concepts we use to understand the world. The computational model of the human brain was generated as a hypothesis until after we had built computers and could see what they do, even though, in principle that hypothesis could have been invented at nearly any point in history. So it seems plausible the crucial insight needed for a successful metaethics will come from a scientific discovery that someone concentrating on philosophy for a month wouldn’t make.
Supposing anyone had already succeeded, how strong an expectation do you think we should have of knowing about it?
Not all that strong. It may well be out there in some obscure journal but just wasn’t interesting enough for anyone to bother replying to. Hell, it multiple people may have succeeded.
But I think “success” might actually be underdetermined here. Some philosophers may have had the right insights, but I suspect that if they had communicated those insights in the formal method necessary for Friendly AI the insights would have felt insightful to readers and the papers would have gotten attention. Of course, I’m not even familiar with cutting edge metaethics. There may well be something like that out there. It doesn’t help that no one here seems willing to actually read philosophy in non-blog format.
Yep:
Related question: suppose someone handed us a successful solution, would we recognize it?
Yep.
So Yudkowsky came up with a correct and detailed metaethics but failed to communicate it?
I think it’s correct, but it’s definitely not detailed; some major questions, like “how to weight and reconcile conflicting preferences”, are skipped entirely.
What do you believe to be the reasons? Didn’t he try or fail? I’m trying to fathom what kind of person is a sufficiently clear thinker. If not even EY is a sufficiently clear thinker, then your statement that such a person could come up with a detailed metaethics in a month seems self-evident. If someone is a sufficiently clear thinker to accomplish a certain task then they will complete it if they try. What’s the point? It sounds like you are saying that there are many smart people that could accomplish the task if they only tried. But if in fact EY is not one of them, that’s bad.
Yesterday I read In Praise of Boredom. It seems that EY also views intelligence as something proactive:
No doubt I am a complete layman when it comes to what intelligence is. But as far as I am aware it is a kind of goal-oriented evolutionary process equipped with a memory. It is evolutionary insofar as it still needs to stumble upon novelty. Intelligence is not a meta-solution but an efficient searchlight that helps to discover unknown unknowns. Intelligence is also a tool that can efficiently exploit previous discoveries, combine and permute them. But claiming that you just have to be sufficiently intelligent to solve a given problem sounds like it is more than that. I don’t see that. I think that if something crucial is missing, something you don’t know that it is missing, you’ll have to discover it first and not invent it by the sheer power of intelligence.
By “a sufficiently clear thinker” you mean an AI++, right? :)
Nah, an AI++ would take maybe five minutes.
A month sounds considerably overoptimistic to me. Wrong steps and backtracking are probably to be expected, and it would probably be irresponsible to commit to a solution before allowing other intelligent people (who really want to find the right answer, not carry on endless debate) to review it in detail. For a sufficiently intelligent and committed worker, I would not be surprised if they could produce a reliably correct metaethical theory within two years, perhaps one, but a month strikes me as too restrictive.
Of course, this one applies to scaremongers in general, not just technophobes.
Knowing the Baby-eaters want to eat babies is a reason for them to eat babies. It is not a reason for us to let them eat babies. My biggest problem with desirism in general is that it provides no reason for us to want to fulfill others’ desires. Saying that they want to fulfill their desires is obvious. Whether we help or hinder them is based entirely on our own reasons for action.
That’s not a bug, it’s a feature.
Are you familiar with desirism? It says that we should want to fulfill others’ desires, but, AFAI can tell, gives no reason why.
No. This is not what desirism says.
From your desirism FAQ:
The moral thing to do is to shape my desires to fulfill others’ desires, insofar as they are malleable. This is what I meant by “we should want to fulfill others’ desires,” though I acknowledge that a significant amount of precision and clarity was lost in the original statement. Is this all correct?
The desirism FAQ needs updating, and is not a very clear presentation of the theory, I think.
One problem is that much of the theory is really just a linguistic proposal. That’s true for all moral theories, but it can be difficult to separate the linguistic from the factual claims. I think Alonzo Fyfe and I are doing a better job of that in our podcast. The latest episode is The Claims of Desirism, Part 1.
I will listen to that.
Unfortunately, we’re not making moral claims yet. In meta-ethics, there is just too much groundwork to lay down first. Kinda like how Eliezer took like like 200 posts to build up to talking about meta-ethics.
So, just to make sure, what I said in the grandparent is not what desirism says?
Ah, oops. I wasn’t familiar with it, and I misunderstood the sentence.
Is knowing that Baby-eaters want babies to be eaten a reason, on your view, to design an FAI that optimizes its surroundings for (among other things) baby-eating?
I very much doubt it. Even if we assume my own current meta-ethical views are correct—an assumption I don’t have much confidence in—this wouldn’t leave us with reason to design an FAI that optimizes its surroundings for (among other things) baby-eating. Really, this goes back to a lot of classical objections to utilitarianism.
For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I’m just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.
Here’s an interesting juxtaposition...
Eliezer-2011 writes:
Eliezer-2007 quotes Robyn Dawes, saying that the below is “so true it’s not even funny”:
Is this a change of attitude, or am I just not finding the synthesis?
Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.
There’s a big different between “not immediately” and “never”. Don’t propose a solution immediately, but do at least have a detailed working guess at a solution (which can be used to move to the next problem) in a year. Don’t “merely” raise a question, make sure that finding an answer is also part of the agenda.
It’s a matter of the twelfth virtue of rationality, the intention to cut through to the answer, whatever the technique. The purpose of holding off on proposing solutions is to better find solutions, not to stop at asking the question.
I suggest that he still holds both of those positions (at least, I know I do so do not see why he wouldn’t) but that they apply to slightly different contexts. Eliezer’s elaboration in the descendant comments from the first quote seemed to illustrate why fairly well. They also, if I recall, allowed that you do not fit into the ‘actually answering is unsophisticated’ crowd, which further narrows down just what he is meaning.
The impression I get is that EY-2011 believes that he has already taken the necessary time to understand the relevant issues, questions, and problems and that his proposed solution is therefore unlikely to be improved upon by further up-front thinking about the problem, rather than by working on implementing the solution he has in mind and seeing what difficulties come up.
Whether that’s a change of attitude, IMHO, depends a lot on whether his initial standard for what counts as an adequate understanding of the relevant issues, questions, and problems was met, or whether it was lowered.
I’m not really sure what that initial standard was in the first place, so I have no idea which is the case. Nor am I sure it matters; presumably what matters more is whether the current standard is adequate.
The point of the Dawes quote is to hold off on proposing solutions until you’ve thoroughly comprehended the issue, so that you get better solutions. It doesn’t advocate discussing problems simply for the sake of discussing them. Between both quotes there’s a consistent position that the point is to get the right answer, and discussing the question only has a point insofar as it leads to getting that answer. If you’re discussing the question without proposing solutions ad infinitum, you’re not accomplishing anything.
Keep in mind that talking with regard to solutions is just so darn useful. Even if you propose an overly specific solution early, than it has a large surface area of features that can be attacked to prove it incompatible with the problem. You can often salvage and mutate what’s left of the broken idea. There’s not a lot of harm in that, rather there is a natural give and take whereby dismissing a proposed solution requires identifying what part of the problem requirements are contradicted, and it may very well not have occurred to you to specify that requirement in the first place.
I believe it has been observed that experts almost always talk in terms of candidate solutions, and amateurs attempt to build up from a platform of the problem itself. Experts of course having objectively better performance. The algorithm for provably moral superintelligences might not have a lot of prior solutions to draw from, but you could, for instance, find some inspiration even from the outside view of how some human political systems have maintained generally moral dispositions.
There is a bias to associate your status with ideas you have vocalized in the past since they reflect on the quality of your thinking, but you can’t throw the baby out with the bathwater.
The Maier quote comes off as way to strong for me. And what’s with this conclusion:
I think there’s a synthesis possible. There’s a purpose of finding a solid answer, but finding it requires a period of exploration rather than getting extremely specific in the beginning of the search.
If you don’t spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track?
Just asking. :)
Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.
I presume you encounter them later on. Maybe while doing more ground-level thinking about how to actually implement your meta-ethics you realise that it isn’t quite coherent.
I’m not sure if this flying-by-the-seat-of-your-pants approach is best, but as has been pointed out before, there are costs associated with taking too long as well as with not being careful enough, there must come a point where the risk is too small and the time it would take to fix it too long.
Well, I’ll certainly agree that more potential problems are surfaced by moving ahead with the implementation than by going back to the customer with another round of questions about the requirements.
I can see that you might question the usefulness of the notion of a “reason for action” as something over and above the notion of “ought”, but I don’t see a better case for thinking that “reason for action” is confused.
The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical “ought” statements? If so, why?
Perhaps I should note that philosophers talking this way make a distinction between “motivating reasons” and “normative reasons”. A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)
Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone’s well-being to increase. The strength of R is the extent to which that person’s well-being increases. One weighs one’s reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It’s pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)
Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).
The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it’s pretty clear that you don’t HAVE to think about things this way, but it is far from clear that this is confused/incoherent.
Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don’t necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that’s important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I’m sure Luke knows.)
utilitymonster,
For the record, as a good old Humean I’m currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don’t always, and perhaps rarely, support maximizing X, and I don’t think external reasons for maximizing X exist. For example, I don’t think X has intrinsic value (in Korsgaard’s sense of “intrinsic value”).
Thanks for the link to that paper on rational choice theories and decision theories!
So are categorical reasons any worse off than categorical oughts?
Categorical oughts and reasons have always confused me. What do you see as the difference, and which type of each are you thinking of? The types of categorical reasons or reasons with which I’m most familiar are Kant’s and Korsgaard’s.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don’t use the term “categorical reason” any differently than “external reason”.
S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought.
I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
Agreed. And I’m skeptical of both. You?
Hard to be confident about these things, but I don’t see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don’t see it.
Nitpick: Wallach & Collin are cited only for the term ‘artificial moral agents’ (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.
Oops. Thanks for the correction.