I’ve been meaning to write a post about how I think it’s a really, really bad idea to think about morality in terms of axioms. This seems to be a surprisingly (to me) common habit among LW types, especially since I would have thought it was a habit the metaethics sequence would have stomped out.
(You shouldn’t regard it as a strength of your moral framework that it can’t distinguish humans from non-human animals. That’s evidence that it isn’t capable of capturing complexity of value.)
I agree that thinking about morality exclusively in terms of axioms in a system of classical logical system is likely to be a rather bad idea, since that makes one underestimate the complexity of morality, the strength of non-logical influences, and the extent to which it resembles a system of classical logic in general. But I’m not sure if it’s that problematic as long as you keep in mind that “axioms” is really just shorthand for something like “moral subprograms” or “moral dynamics”.
I did always read the metaethics sequence as establishing the existence of something similar-enough-to-axioms-that-we-might-as-well-use-the-term-axioms-as-shorthand-for-them, with e.g. No Universally Compelling Arguments and Created Already In Motion arguing that you cannot convince a mind about the correctness of some action unless its mind contains a dynamic which reacts to your argument in the way you wish—in other words, unless your argument builds on things that the mind’s decision-making system already cares about, and which could be described as axioms when composing a (static) summary of the mind’s preferences.
You shouldn’t regard it as a strength of your moral framework that it can’t distinguish humans from non-human animals. That’s evidence that it isn’t capable of capturing complexity of value.
I’m not really sure of what you mean here. For one, I didn’t say that my moral framework can’t distinguish humans and non-humans—I do e.g. take a much more negative stance on killing humans than animals, because killing humans would have a destabilizing effect on society and people’s feelings of safety, which would contribute to the creation of much more suffering than killing animals would.
Also, whether or not my personal moral framework can capture complexity of value seems irrelevant—CoV is just the empirical thesis that people in general tend to care about a lot of complex things. My personal consciously-held morals are what I currently want to consciously focus on, not a description of what others want, nor something that I’d program into an AI.
Also, whether or not my personal moral framework can capture complexity of value seems irrelevant—CoV is just the empirical thesis that people in general tend to care about a lot of complex things. My personal consciously-held morals are what I currently want to consciously focus on [...]
Well, I don’t think I should care what I care about. The important thing is what’s right, and my emotions are only relevant to the extent that they communicate facts about what’s right. What’s right is too complex, both in definition and consequentialist implications, and neither my emotions nor my reasoned decisions are capable of accurately capturing it. Any consciously-held morals are only a vague map of morality, not morality itself, and so shouldn’t hold too much import, on pain of moral wireheading/acceptance of a fake utility function.
(Listening to moral intuitions, possibly distilled as moral principles, might give the best moral advice that’s available in practice, but that doesn’t mean that the advice is any good. Observing this advice might fail to give an adequate picture of the subject matter.)
I must be misunderstanding this comment somehow? One still needs to decide what actions to take during every waking moment of their lives, and “in deciding what to do, don’t pay attention to what you want” isn’t very useful advice. (It also makes any kind of instrumental rationality impossible.)
What you want provides some information about what is right, so you do pay attention. When making decisions, you can further make use of moral principles not based on what you want at a particular moment. In both cases, making use of these signals doesn’t mean that you expect them to be accurate, they are just the best you have available in practice.
Estimate of the accuracy of the moral intuitions/principles translates into an estimate of value of information about morality. Overestimation of accuracy would lead to excessive exploitation, while an expectation of inaccuracy argues for valuing research about morality comparatively more than pursuit of moral-in-current-estimation actions.
I’m not a very well educated person in this field, but if I may:
I see my various squishy feelings (desires and what-is-right intuitions are in this list) as loyal pets. Sometimes, they must be disciplined and treated with suspicion, but for the most part, they are there to please you in their own dumb way. They’re no more enemies than one’s preference for foods. In my care for them, I train and reward them, not try to destroy or ignore them. Without them, I have no need to DO better among other people, because I would not be human—that is, some things are important only because I’m a barely intelligent ape-man, and they should STAY important as long as I remain a barely intelligent ape-man. Ignoring something going on in one’s mind, even when one KNOWS it is wrong, can be a source of pain, I’ve found—hypocrisy and indecision are not my friends.
Hope I didn’t make a mess of things with this comment.
I’m roughly in agreement, though I would caution that the exploration/exploitation model is a problematic one to use in this context, for two reasons:
1) It implies a relatively clear map/territory split: there are our real values, and our conscious model of them, and errors in our conscious model do not influence the actual values. But to some extent, our conscious models of our values do shape our unconscious values in that direction—if someone switches to an exploitation phase “too early”, then over time, their values may actually shift over to what the person thought they were.
2) Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it. In other words, if we realize that our conscious values don’t match our unconscious ones, we should revise our conscious values. And sometimes this does happen. But on other occasions, what happens is that our conscious model has become installed as a separate and contradictory set of values, and we need to choose which of the values to endorse (in which situations). This happening is a bad thing if you tend to primarily endorse your unconscious values or a lack of internal conflict, but arguably a good thing if you tend to primarily endorse your conscious values.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them, and we probably shouldn’t use terminology like exploration/exploitation that implies that it would be just one of those.
But to some extent, our conscious models of our values do shape our unconscious values in that direction
This is value drift. At any given time, you should fix (i.e. notice, as a concept) the implicit idealized values at that time and pursue them even if your hardware later changes and starts implying different values (in the sense where your dog or your computer or an alien also should (normatively) pursue them forever, they are just (descriptively) unlikely to, but you should plot to make that more likely, all else equal). As an analogy, if you are interested in solving different puzzles on different days, then the fact that you are no longer interested in solving yesterday’s puzzle doesn’t address the problem of solving yesterday’s puzzle. And idealized values don’t describe valuation of you, the abstract personal identity, of your actions and behavior and desires. They describe valuation of the whole world, including future you with value drift as a particular case that is not fundamentally special. The problem doesn’t change, even if the tendency to be interested in a particular problem does. The problem doesn’t get solved because you are no longer interested in it. Solving a new, different problem does not address the original problem.
Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it
The nature of idealized values is irrelevant to this point: whatever they are, they are that thing that they are, so that any “correction” discards the original problem statement and replaces it with a new one. What you can and should correct are intermediate conclusions. (Alternatively, we are arguing about definitions, and you read in my use of the term “values” what I would call intermediate conclusions, but then again I’m interested in you noticing the particular idea that I refer to with this term.)
if we realize that our conscious values don’t match our unconscious ones
I don’t think “unconscious values” is a good proxy for abstract implicit valuation of the universe, consciously-inaccessible processes in the brain are at a vastly different level of abstraction compared to the idealization I’m talking about.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them
This might be true in the sense that humans probably underdetermine the valuation of the world, so that there are some situations that our implicit preferences can’t compare even in principle. The choice between such situations is arbitrary according to our values. Or our values might just recursively determine the correct choice in every single definable distinction. Any other kind of “creation” will contradict the implicit answer, and so even if it is the correct thing to do given the information available at the time, later reflection can show it to be suboptimal.
(More constructively, the proper place for creativity is in solving problems, not in choosing a supergoal. The intuition is confused on this point, because humans never saw a supergoal, all sane goals that we formulate for ourselves are in one way or another motivated by other considerations, they are themselves solutions to different problems. Thus, creativity is helpful in solving those different problems in order to recognize which new goals are motivated. But this is experience about subgoals, not idealized supergoals.)
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want. I do not understand why the concept would be in relevant to our personal lives, however.
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want.
The question of what is normatively the right thing to do (given the resources available) is the same for a FAI and in our personal lives. My understanding is that “implicit idealized value” is the shape of the correct answer to it, not just a tool restricted to the context of FAI. It might be hard for a human to proceed from this concept to concrete decisions, but this is a practical difficulty, not a restriction on the scope of applicability of the idea. (And to see how much of a practical difficulty it is, it is necessary to actually attempt to resolve it.)
I do not understand why the concept would be in relevant to our personal lives, however.
If idealized value indicates the correct shape of normativity, the question should instead be, How are our personal lives relevant to idealized value? One way was discussed a couple of steps above in this conversation: exploitation/exploration tradeoff. In pursuit of idealized values, if in our personal lives we can’t get much information about them, a salient action is to perform/support research into idealized values (or relevant subproblems, such as preventing/evading global catastrophes).
I’ve been meaning to write a post about how I think it’s a really, really bad idea to think about morality in terms of axioms. This seems to be a surprisingly (to me) common habit among LW types, especially since I would have thought it was a habit the metaethics sequence would have stomped out.
(You shouldn’t regard it as a strength of your moral framework that it can’t distinguish humans from non-human animals. That’s evidence that it isn’t capable of capturing complexity of value.)
I agree that thinking about morality exclusively in terms of axioms in a system of classical logical system is likely to be a rather bad idea, since that makes one underestimate the complexity of morality, the strength of non-logical influences, and the extent to which it resembles a system of classical logic in general. But I’m not sure if it’s that problematic as long as you keep in mind that “axioms” is really just shorthand for something like “moral subprograms” or “moral dynamics”.
I did always read the metaethics sequence as establishing the existence of something similar-enough-to-axioms-that-we-might-as-well-use-the-term-axioms-as-shorthand-for-them, with e.g. No Universally Compelling Arguments and Created Already In Motion arguing that you cannot convince a mind about the correctness of some action unless its mind contains a dynamic which reacts to your argument in the way you wish—in other words, unless your argument builds on things that the mind’s decision-making system already cares about, and which could be described as axioms when composing a (static) summary of the mind’s preferences.
I’m not really sure of what you mean here. For one, I didn’t say that my moral framework can’t distinguish humans and non-humans—I do e.g. take a much more negative stance on killing humans than animals, because killing humans would have a destabilizing effect on society and people’s feelings of safety, which would contribute to the creation of much more suffering than killing animals would.
Also, whether or not my personal moral framework can capture complexity of value seems irrelevant—CoV is just the empirical thesis that people in general tend to care about a lot of complex things. My personal consciously-held morals are what I currently want to consciously focus on, not a description of what others want, nor something that I’d program into an AI.
Well, I don’t think I should care what I care about. The important thing is what’s right, and my emotions are only relevant to the extent that they communicate facts about what’s right. What’s right is too complex, both in definition and consequentialist implications, and neither my emotions nor my reasoned decisions are capable of accurately capturing it. Any consciously-held morals are only a vague map of morality, not morality itself, and so shouldn’t hold too much import, on pain of moral wireheading/acceptance of a fake utility function.
(Listening to moral intuitions, possibly distilled as moral principles, might give the best moral advice that’s available in practice, but that doesn’t mean that the advice is any good. Observing this advice might fail to give an adequate picture of the subject matter.)
I must be misunderstanding this comment somehow? One still needs to decide what actions to take during every waking moment of their lives, and “in deciding what to do, don’t pay attention to what you want” isn’t very useful advice. (It also makes any kind of instrumental rationality impossible.)
What you want provides some information about what is right, so you do pay attention. When making decisions, you can further make use of moral principles not based on what you want at a particular moment. In both cases, making use of these signals doesn’t mean that you expect them to be accurate, they are just the best you have available in practice.
Estimate of the accuracy of the moral intuitions/principles translates into an estimate of value of information about morality. Overestimation of accuracy would lead to excessive exploitation, while an expectation of inaccuracy argues for valuing research about morality comparatively more than pursuit of moral-in-current-estimation actions.
I’m not a very well educated person in this field, but if I may:
I see my various squishy feelings (desires and what-is-right intuitions are in this list) as loyal pets. Sometimes, they must be disciplined and treated with suspicion, but for the most part, they are there to please you in their own dumb way. They’re no more enemies than one’s preference for foods. In my care for them, I train and reward them, not try to destroy or ignore them. Without them, I have no need to DO better among other people, because I would not be human—that is, some things are important only because I’m a barely intelligent ape-man, and they should STAY important as long as I remain a barely intelligent ape-man. Ignoring something going on in one’s mind, even when one KNOWS it is wrong, can be a source of pain, I’ve found—hypocrisy and indecision are not my friends.
Hope I didn’t make a mess of things with this comment.
I’m roughly in agreement, though I would caution that the exploration/exploitation model is a problematic one to use in this context, for two reasons:
1) It implies a relatively clear map/territory split: there are our real values, and our conscious model of them, and errors in our conscious model do not influence the actual values. But to some extent, our conscious models of our values do shape our unconscious values in that direction—if someone switches to an exploitation phase “too early”, then over time, their values may actually shift over to what the person thought they were.
2) Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it. In other words, if we realize that our conscious values don’t match our unconscious ones, we should revise our conscious values. And sometimes this does happen. But on other occasions, what happens is that our conscious model has become installed as a separate and contradictory set of values, and we need to choose which of the values to endorse (in which situations). This happening is a bad thing if you tend to primarily endorse your unconscious values or a lack of internal conflict, but arguably a good thing if you tend to primarily endorse your conscious values.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them, and we probably shouldn’t use terminology like exploration/exploitation that implies that it would be just one of those.
This is value drift. At any given time, you should fix (i.e. notice, as a concept) the implicit idealized values at that time and pursue them even if your hardware later changes and starts implying different values (in the sense where your dog or your computer or an alien also should (normatively) pursue them forever, they are just (descriptively) unlikely to, but you should plot to make that more likely, all else equal). As an analogy, if you are interested in solving different puzzles on different days, then the fact that you are no longer interested in solving yesterday’s puzzle doesn’t address the problem of solving yesterday’s puzzle. And idealized values don’t describe valuation of you, the abstract personal identity, of your actions and behavior and desires. They describe valuation of the whole world, including future you with value drift as a particular case that is not fundamentally special. The problem doesn’t change, even if the tendency to be interested in a particular problem does. The problem doesn’t get solved because you are no longer interested in it. Solving a new, different problem does not address the original problem.
The nature of idealized values is irrelevant to this point: whatever they are, they are that thing that they are, so that any “correction” discards the original problem statement and replaces it with a new one. What you can and should correct are intermediate conclusions. (Alternatively, we are arguing about definitions, and you read in my use of the term “values” what I would call intermediate conclusions, but then again I’m interested in you noticing the particular idea that I refer to with this term.)
I don’t think “unconscious values” is a good proxy for abstract implicit valuation of the universe, consciously-inaccessible processes in the brain are at a vastly different level of abstraction compared to the idealization I’m talking about.
This might be true in the sense that humans probably underdetermine the valuation of the world, so that there are some situations that our implicit preferences can’t compare even in principle. The choice between such situations is arbitrary according to our values. Or our values might just recursively determine the correct choice in every single definable distinction. Any other kind of “creation” will contradict the implicit answer, and so even if it is the correct thing to do given the information available at the time, later reflection can show it to be suboptimal.
(More constructively, the proper place for creativity is in solving problems, not in choosing a supergoal. The intuition is confused on this point, because humans never saw a supergoal, all sane goals that we formulate for ourselves are in one way or another motivated by other considerations, they are themselves solutions to different problems. Thus, creativity is helpful in solving those different problems in order to recognize which new goals are motivated. But this is experience about subgoals, not idealized supergoals.)
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want. I do not understand why the concept would be in relevant to our personal lives, however.
The question of what is normatively the right thing to do (given the resources available) is the same for a FAI and in our personal lives. My understanding is that “implicit idealized value” is the shape of the correct answer to it, not just a tool restricted to the context of FAI. It might be hard for a human to proceed from this concept to concrete decisions, but this is a practical difficulty, not a restriction on the scope of applicability of the idea. (And to see how much of a practical difficulty it is, it is necessary to actually attempt to resolve it.)
If idealized value indicates the correct shape of normativity, the question should instead be, How are our personal lives relevant to idealized value? One way was discussed a couple of steps above in this conversation: exploitation/exploration tradeoff. In pursuit of idealized values, if in our personal lives we can’t get much information about them, a salient action is to perform/support research into idealized values (or relevant subproblems, such as preventing/evading global catastrophes).
What does this mean? It sounds like you’re talking about some kind of objective morality?