sunwillrise comments on Alignment: “Do what I would have wanted you to do”

sunwillrise Jul 13, 2024, 5:06 PM
1 point
0
I think I still endorse essentially all of what I said in that thread. Is there anything in particular you wanted me to talk about?
- cubefox Jul 13, 2024, 5:20 PM
  4 points
  1
  Parent
  Your central claim seemed to be that words like “good” have no associated anticipated experience, with which I disagreed in the comment linked above. You didn’t yet reply to that.
  - sunwillrise Jul 13, 2024, 6:39 PM
    3 points
    2
    Parent
    Well, the claim was the following:
    That might well be evidence (in the Bayesian sense) that a given act, value, or person belongs to a certain category which we slap the label “good” onto. But it has little to do with my initial question. We have no reason to care about the property of “goodness” at all if we do not believe that knowing something is “good” gives us powerful evidence that allows us to anticipate experiences and to constrain the territory around us. Otherwise, “goodness” is just an arbitrary bag of things that is no more useful than the category of “bleggs” that is generated for no coherent reason whatsoever, or the random category “r398t”s that I just made up and contains only apples, weasels, and Ron Weasley. Indeed, we would not even have enough reason to raise the question of what “goodness” is in the first place.
    Yes, knowing that something is (in the moral-cognitivist, moral-realist, observer-independent sense) “good” allows you to anticipate that it… fulfills the preconditions of being “good” (one of which is “increased welfare”, in this particular conception of it). At a conceptual level, that doesn’t provide you relevant anticipated experiences that go beyond the category of “good and everything it contains”; it doesn’t constrain the territory beyond statements that ultimately refer back to goodness itself. It holds the power of anticipated experience only in so much as it is self-referential in the end, which doesn’t provide meaningful evidence that it’s a concept which carves reality at the joints.
    It’s helpful to recall how the entire discussion began. You said, in response to Steven Byrnes’s post:
    This is tangential to the point of the post, but “moral realism” is a much weaker claim than you seem to think. Moral realism only means that some moral claims are literally true. Popular uncontroversial examples: “torturing babies for fun is wrong” or “ceteris paribus, suffering is bad”. It doesn’t mean that someone is necessarily motivated by those claims if they believe they are true. It doesn’t imply that anyone is motivated to be good just from believing that something is good.
    When Seth Herd questioned what you meant by good and “moral claims”, you said that you “don’t think anyone needs to define what words used in ordinary language mean.”
    Now, in standard LW-thought, the meaning of “X is true”, as explained by Eliezer a long time ago, is that it represents the correspondence between reality (the territory) and an observer’s beliefs about reality (the map). Beliefs which are thought to be true pay rent in anticipated experiences about the world. Taking the example of a supposed “moral fact” X, the labeling of it as “fact” (because it fulfills some conditions of membership in this category) implies it must pay rent.
    But if the only way it does that is because it then allows you to claim that “X fulfills the conditions of membership”, then this is not a useful category. It is precisely an arbitrary subset, analogous to the examples I gave in the comment I quoted above. If moral realism is viewed through the lens mentioned by Roko, which does imply specific factual anticipated experiences about the world (which go beyond the definition of “moral realism instead”), namely that “All (or perhaps just almost all) beings, human, alien or AI, when given sufficient computing power and the ability to learn science and get an accurate map-territory morphism, will agree on what physical state the universe ought to be transformed into, and therefore they will assist you in transforming it into this state,” then it’s no longer arbitrary.
    But you specifically disavowed this interpretation, even going so far as to say that “I can believe that I shouldn’t eat meat, or that eating meat is bad, without being motivated to stop eating meat.” So your version of “moral realism” is just choosing a specific set of things you define to be “moral”, without requiring anyone who agrees that this is moral to act in accordance with it (which would indeed be an anticipated experience about the outside world) and without any further explanation of why this choice pays any rent in experiences about the world that’s not self-referential. This is a narrow and shallow definition of realism, and by itself doesn’t explain the reasons for why these ideas were even brought up in the first place.
    I really don’t know if what I’ve written here is going to be helpful for this conversation. Look, if someone tells me that “X is a very massive star,” which they define as “a star that’s very massive” then what I mean by anticipated experiences ^[1] is not “X is very massive” or “X is a star”, because these are already strictly included in (and logically implied by, at a tautological level) the belief about X, but more so stuff about “if there is any planet Y in the close vicinity of X, I expect to see Y rotating around a point inside or just slightly outside X.” The latter contains a reason to care about whether “X is very massive.”
    ^
    In this specific context.
    What links here?
    sunwillrise's comment on Alignment: “Do what I would have wanted you to do” by Oleg Trott (Jul 14, 2024, 7:30 PM; 1 point)
    - cubefox Jul 14, 2024, 7:13 PM
      2 points
      2
      Parent
      
      Yes, knowing that something is (in the moral-cognitivist, moral-realist, observer-independent sense) “good” allows you to anticipate that it… fulfills the preconditions of being “good” (one of which is “increased welfare”, in this particular conception of it). At a conceptual level, that doesn’t provide you relevant anticipated experiences that go beyond the category of “good and everything it contains”; it doesn’t constrain the territory beyond statements that ultimately refer back to goodness itself. It holds the power of anticipated experience only in so much as it is self-referential in the end, which doesn’t provide meaningful evidence that it’s a concept which carves reality at the joints.
      
      I disagree with that. When we expect something to be good, we have some particular set of anticipated experiences (e.g. about increased welfare, extrapolated desires) that are consistent with our expectation, and some other set that is inconsistent with it. We do not merely “expect” a tautology, like “expecting” that good things are good (or that chairs are chairs etc). We can see this by the fact that we may very well see evidence that is inconsistent with our expectation, e.g. evidence that something instead leads to suffering and thus doesn’t increase welfare, and hence isn’t good. Believing something to be good therefore pays rent in anticipated experiences.
      
      Moreover, we can wonder (ask ourselves) whether some particular thing is good or not (like e.g., recycling plastic), and this is not like “wondering” whether chairs are chairs. We are asking a genuine question, not a tautological one.
      
      When Seth Herd questioned what you meant by good and “moral claims”, you said that you “don’t think anyone needs to define what words used in ordinary language mean.”
      
      To be clear, what I said was this: “I don’t think anyone needs to define what words used in ordinary language mean because the validity of any attempt of such a definition would itself have to be checked against the intuitive meaning of the word in common usage.”
      
      But if the only way it does that is because it then allows you to claim that “X fulfills the conditions of membership”, then this is not a useful category.
      
      I think I have identified the confusion here. Assume you don’t know what “bachelor” means, and you ask me which evidence I associate with that term. And I reply: If I believe something is a bachelor, I anticipate evidence that confirms that it is an unmarried man. Now you could reply that this is simply saying “‘bachelor’ fulfills the conditions of membership”. But no, I have given you a non-trivial definition of the term, and if you already knew what “unmarried” and “man” meant (what evidence to expect if those terms apply), you now also know what to anticipate for “bachelor”—what the term “bachelor” means. Giving a definition for X is not the same as merely saying “X fulfills the conditions of membership”.
      
      If moral realism is viewed through the lens mentioned by Roko, which does imply specific factual anticipated experiences about the world (which go beyond the definition of “moral realism instead”), namely that “All (or perhaps just almost all) beings, human, alien or AI, when given sufficient computing power and the ability to learn science and get an accurate map-territory morphism, will agree on what physical state the universe ought to be transformed into, and therefore they will assist you in transforming it into this state,” then it’s no longer arbitrary.
      
      Roko relies here on the assumption that moral beliefs are inherently motivating (“moral internalism”, as discussed by EY here), which is not a requirement for moral realism.
      
      But you specifically disavowed this interpretation, even going so far as to say that “I can believe that I shouldn’t eat meat, or that eating meat is bad, without being motivated to stop eating meat.” So your version of “moral realism”
      
      It is not just my interpretation, that is how the term “moral realism” is commonly defined in philosophy, e.g. in the SEP.
      
      is just choosing a specific set of things you define to be “moral”
      
      Well, I specifically don’t need to propose any definition. What matters for any proposal for a definition (such as EY’s “good ≈ maximizes extrapolated volition”) is that it captures the natural language meaning of the term.
      
      without requiring anyone who agrees that this is moral to act in accordance with it (which would indeed be an anticipated experience about the outside world)
      
      I say that’s confused. If I believe, for example, that raising taxes is bad, then I do have anticipated experiences associated with this belief. I may expect that raising taxes is followed by a weaker economy, more unemployment, less overall wealth, in short: decreased welfare. This expectation does not at all require that anyone agrees with me, nor that anyone is motivated to not raise taxes.
      
      I really don’t know if what I’ve written here is going to be helpful for this conversation.
      
      The central question here is whether (something like) EY’s ethical theory is sound. If it is, CEV could make sense as an alignment target, even if it is not clear how we get there.
      - sunwillrise Jul 14, 2024, 7:30 PM
        1 point
        0
        Parent
        I will try, one more^[1] time, and I will keep this brief.
        I think I have identified the confusion here. Assume you don’t know what “bachelor” means, and you ask me which evidence I associate with that term. And I reply: If I believe something is a bachelor, I anticipate evidence that confirms that it is an unmarried man. Now you could reply that this is simply saying “‘bachelor’ fulfills the conditions of membership”. But no, I have given you a non-trivial definition of the term, and if you already knew what “unmarried” and “man” meant (what evidence to expect if those terms apply), you now also know what to anticipate for “bachelor”—what the term “bachelor” means. Giving a definition for X is not the same as merely saying “X fulfills the conditions of membership”.
        But why do you care about the concept of a bachelor? What makes you pick it out of the space of ideas and concepts as worthy of discussion and consideration? In my conception, it is the fact that you believe it carves reality at the joints by allowing you to have relevant and useful anticipated experiences about the world outside of what is contained inside the very definition or meaning of the word. If we did not know, due to personal experience, that it was useful to know whether someone was a bachelor^[2], we would not talk about it; it would be just as arbitrary and useless a subset of idea-space as “the category of “bleggs” that is generated for no coherent reason whatsoever, or the random category “r398t”s that I just made up and contains only apples, weasels, and Ron Weasley.”
        It is not just my interpretation, that is how the term “moral realism” is commonly defined in philosophy, e.g. in the SEP.
        The SEP entry for “moral realism” is, unfortunately, not sufficient to resolve issues regarding what it means or how useful a concept it is. I would point you to the very introduction of the SEP entry on moral anti-realism:
        It might be expected that it would suffice for the entry for “moral anti-realism” to contain only some links to other entries in this encyclopedia. It could contain a link to “moral realism” and stipulate the negation of the view described there. Alternatively, it could have links to the entries “anti-realism” and “morality” and could stipulate the conjunction of the materials contained therein. The fact that neither of these approaches would be adequate—and, more strikingly, that following the two procedures would yield substantively non-equivalent results—reveals the contentious and unsettled nature of the topic.
        “Anti-realism,” “non-realism,” and “irrealism” may for most purposes be treated as synonymous. Occasionally, distinctions have been suggested for local pedagogic reasons (see, e.g., Wright 1988; Dreier 2004), but no such distinction has generally taken hold. (“Quasi-realism” denotes something very different, to be described below.) All three terms are to be defined in opposition to realism, but since there is no consensus on how “realism” is to be understood, “anti-realism” fares no better. Crispin Wright (1992: 1) comments that “if there ever was a consensus of understanding about ‘realism’, as a philosophical term of art, it has undoubtedly been fragmented by the pressures exerted by the various debates—so much so that a philosopher who asserts that she is a realist about theoretical science, for example, or ethics, has probably, for most philosophical audiences, accomplished little more than to clear her throat.”
        ^
        and possibly final
        ^
        because of reasons that go beyond knowing how to answer the question “is he a bachelor?” or “does he have the properties tautologically contained within the status of bachelors?”
        cubefox Jul 14, 2024, 9:06 PM
        2 points
        0
        Parent
        
        But why do you care about the concept of a bachelor? What makes you pick it out of the space of ideas and concepts as worthy of discussion and consideration?
        
        Well, “bachelor” was just an example of a word for which you don’t know the meaning, but want to know the meaning. The important thing here is that it has a meaning, not how useful the concept is.
        
        But I think you actually want to talk about the meaning of terms like “good”. Apparently you now concede that they are meaningful (are associated with anticipated experiences) and instead claim that the concept of “good” is useless. That is surprising. There is arguably nothing more important than ethics; than the world being in a good state or trajectory. So it is obvious that the term “good” is useful. Especially because it is exactly what an aligned superintelligence should be targeted at. After all, it’s not an accident that EY came up with extrapolated volition as an ethical theory for solving the problem of what a superintelligence should be aligned to. An ASI shouldn’t do bad things and should do good things, and the problem is making the ASI care for being good rather than for something else, like making paperclips.
        
        Regarding the SEP quote: It doesn’t argue that moral internalism is part of moral realism, which was what you originally were objecting to. But we need not even use the term “moral realism”, we only need the claim that statements on what is good or bad have non-trivial truth values, i.e. aren’t purely subjective, or mere expressions of applause, or meaningless, or the like. This is a semantic question about what terms like “good” mean.
        dirk Jul 19, 2024, 1:12 AM
        2 points
        −1
        Parent
        For moral realism to be true in the sense which most people mean when they talk about it, “good” would have to have an observer-independent meaning. That is, it would have to not only be the case that you personally feel that it means some particular thing, but also that people who feel it to mean some other thing are objectively mistaken, for reasons that exist outside of your personal judgement of what is or isn’t good.
        (Also, throughout this discussion and the previous one you’ve misunderstood what it means for beliefs to pay rent in anticipated experiences. For a belief to pay rent, it should not only predict some set of sensory experiences but predict a different set of sensory experiences than would a model not including it. Let me bring in the opening paragraphs of the post:
        Thus begins the ancient parable:
        If a tree falls in a forest and no one hears it, does it make a sound? One says, “Yes it does, for it makes vibrations in the air.” Another says, “No it does not, for there is no auditory processing in any brain.”
        If there’s a foundational skill in the martial art of rationality, a mental stance on which all other technique rests, it might be this one: the ability to spot, inside your own head, psychological signs that you have a mental map of something, and signs that you don’t.
        Suppose that, after a tree falls, the two arguers walk into the forest together. Will one expect to see the tree fallen to the right, and the other expect to see the tree fallen to the left? Suppose that before the tree falls, the two leave a sound recorder next to the tree. Would one, playing back the recorder, expect to hear something different from the other? Suppose they attach an electroencephalograph to any brain in the world; would one expect to see a different trace than the other?
        Though the two argue, one saying “No,” and the other saying “Yes,” they do not anticipate any different experiences. The two think they have different models of the world, but they have no difference with respect to what they expect will happen to them; their maps of the world do not diverge in any sensory detail.
        If you call increasing-welfare “good” and I call honoring-ancestors “good”, our models do not make different predictions about what will happen, only about which things should be assigned the label “good”. That is what it means for a belief to not pay rent.)
        cubefox Jul 19, 2024, 2:34 AM
        1 point
        0
        Parent
        
        For moral realism to be true in the sense which most people mean when they talk about it, “good” would have to have an observer-independent meaning. That is, it would have to not only be the case that you personally feel that it means some particular thing, but also that people who feel it to mean some other thing are objectively mistaken, for reasons that exist outside of your personal judgement of what is or isn’t good.
        
        That would only be a case of ambiguity (one word used with two different meanings). If you mean with saying “good” the same as people usually mean with “chair”, this doesn’t imply anti-realism, just likely misunderstandings.
        
        Assume you are a realist about rocks, but call them trees. That wouldn’t be a contradiction. Realism has nothing to do with “observer-independent meaning”.
        
        For a belief to pay rent, it should not only predict some set of sensory experiences but predict a different set of sensory experiences than would a model not including it.
        
        This doesn’t make sense. A model doesn’t have beliefs, and if there is no belief, there is nothing it (the belief) predicts. Instead, for a belief to “pay rent” it is necessary and sufficient that it makes different predictions than believing its negation.
        
        If you call increasing-welfare “good” and I call honoring-ancestors “good”, our models do not make different predictions about what will happen, only about which things should be assigned the label “good”. That is what it means for a belief to not pay rent.
        
        Compare:
        
        If you call a boulder a “tree” and I call a plant with a woody trunk a “tree”, our models do not make different predictions about what will happen, only about which things should be assigned the label “tree”. That is what it means for a belief to not pay rent.
        
        Of course our beliefs pay rent here, they just pay different rent. If we both express our beliefs with “There is a tree behind the house” then we have just two different beliefs, because we expect different experiences. Which has nothing to do with anti-realism about trees.