Bugmaster comments on Welcome to Less Wrong! (5th thread, March 2013)

Bugmaster 24 Apr 2013 20:19 UTC
3 points

Why is that ?

As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).

This is pretty much the only way I could imagine anything like an “objective morality” existing at all, and I personally find it very unlikely that it does, in fact, exist.

But our inability to suspend our human values when making those observations doesn’t prevent us from acquiring that knowledge.

Not this specific knowledge, no. But it does prevent us (or, at the very least, hinder us) from acquiring knowledge about our values. I never claimed that suspension of values is required to gain any knowledge at all; such a claim would be far too strong.

just the capacity to recognize the necessary structures and carry out its task.

And how would it know which structures are necessary, and how to carry out its task upon them ?

We can imagine the consequences of not having our core values...

Can we really ? I’m not sure I can. Sure, I can talk about Pebblesorters or Babyeaters or whatever, but these fictional entities are still very similar to us, and therefore relateable. Even when I think about Clippy, I’m not really imagining an agent who only values paperclips; instead, I am imagining an agent who values paperclips as much as I value the things that I personally value. Sure, I can talk about Clippy in the abstract, but I can’t imagine what it would like to be Clippy.

If you could remove your core values, as in the thought experiment above, would you want to?

It’s a good question; I honestly don’t know. However, if I did have an ability to instantiate a copy of me with the altered core values, and step through it in a debugger, I’d probably do it.
- TheOtherDave 24 Apr 2013 23:20 UTC
  1 point
  Parent
  
  The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences). This is pretty much the only way I could imagine anything like an “objective morality” existing at all, and I personally find it very unlikely that it does, in fact, exist.
  
  When I try to imagine this, I conclude that I would not use the word “morality” to refer to the thing that we’re talking about… I would simply call it “laws of physics.” If someone were to argue, for example, that the moral thing to do is to experience gravitational attraction to other masses, I would be deeply confused by their choice to use that word.
  - Bugmaster 24 Apr 2013 23:40 UTC
    0 points
    Parent
    
    When I try to imagine this, I conclude that I would not use the word “morality” to refer to the thing that we’re talking about…
    
    Yes, you are probably right—but as I said, this is the only coherent meaning I can attribute to the term “objective morality”. Laws of physics are objective; people generally aren’t.
    - TheOtherDave 24 Apr 2013 23:53 UTC
      5 points
      Parent
      I generally understand the phrase “objective morality” to refer to a privileged moral reference frame.
      
      It’s not an incoherent idea… it might turn out, for example, that all value systems other than M turn out to be incoherent under sufficiently insightful reflection, or destructive to minds that operate under them, or for various other reasons not in-practice implementable by any sufficiently powerful optimizer. In such a world, I would agree that M was a privileged moral reference frame, and would not oppose calling it “objective morality”, though I would understand that to be something of a term of art.
      
      That said, I’d be very surprised to discover I live in such a world.
      - Bugmaster 25 Apr 2013 0:34 UTC
        0 points
        Parent
        
        it might turn out, for example, that all value systems other than M turn out to be incoherent under sufficiently insightful reflection, or destructive to minds that operate under them...
        
        I suppose that depends on what you mean by “destructive”; after all, “continue living” is a goal like any other.
        
        That said, if there was indeed a law like the one you describe, then IMO it would be no different than a law that says, “in the absence of any other forces, physical objects will move toward their common center of mass over time”—that is, it would be a law of nature.
        
        I should probably mention explicitly that I’m assuming that minds are part of nature—like everything else, such as rocks or whatnot.
        TheOtherDave 25 Apr 2013 1:31 UTC
        2 points
        Parent
        Sure. But just as there can be laws governing mechanical systems which are distinct from the laws governing electromagnetic systems (despite both being physical laws), there can be laws governing the behavior of value-optimizing systems which are distinct from the other laws of nature.
        
        And what I mean by “destructive” is that they tend to destroy. Yes, presumably “continue living” would be part of M in this hypothetical. (Though I could construct a contrived hypothetical where it wasn’t)
        Bugmaster 25 Apr 2013 1:58 UTC
        2 points
        Parent
        
        But just as there can be laws governing mechanical systems … there can be laws governing the behavior of value-optimizing systems which are distinct from the other laws of nature.
        
        Agreed. But then, I believe that my main point still stands: trying to build a value system other than M that does not result in its host mind being destroyed, would be as futile as trying to build a hot air balloon that goes to Mars.
        
        And what I mean by “destructive” is that they tend to destroy.
        
        Well, yes, but what if “destroy oneself as soon as possible” is a core value in one particular value system ?
        TheOtherDave 25 Apr 2013 4:33 UTC
        3 points
        Parent
        
        what if “destroy oneself as soon as possible” is a core value in one particular value system ?
        
        We ought not expect to find any significantly powerful optimizers implementing that value system.
      - PrawnOfFate 25 Apr 2013 11:28 UTC
        −4 points
        Parent
        Isn’t the idea of moral progress based on one reference frame being better than another?
        TheOtherDave 25 Apr 2013 13:04 UTC
        0 points
        Parent
        Yes, as typically understood the idea of moral progress is based on treating some reference frames as better than others.
        PrawnOfFate 25 Apr 2013 13:09 UTC
        −4 points
        Parent
        And is that valid or not? If you can validly decide some systems are better than others, you are some of the way to deciding which is best.
        What links here?
        TheOtherDave's comment on Welcome to Less Wrong! (5th thread, March 2013) by orthonormal (25 Apr 2013 16:11 UTC; 4 points)
        TheOtherDave 25 Apr 2013 13:43 UTC
        0 points
        Parent
        Can you say more about what “valid” means here?
        
        Just to make things crisper, let’s move to a more concrete case for a moment… if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
        PrawnOfFate 25 Apr 2013 13:50 UTC
        −7 points
        Parent
        The argument against moral progress is that judging one moral reference frame by another is circular and invalid—you need an outside view that doesn’t presuppose the truth of any moral reference frame.
        
        The argument for is that such outside views are available, because things like (in)coherence aren’t moral values.
        TheOtherDave 25 Apr 2013 14:23 UTC
        0 points
        Parent
        Asserting that some bases for comparison are “moral values” and others are merely “values” implicitly privileges a moral reference frame.
        
        I still don’t understand what you mean when you ask whether it’s valid to do so, though. Again: if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it? How could I tell?
        PrawnOfFate 25 Apr 2013 14:31 UTC
        −6 points
        Parent
        
        Asserting that some bases for comparison are “moral values” and others are merely “values” implicitly privileges a moral reference frame.
        
        I don’t see why. The question of what makes a value a moral value is metaethical, not part of object-level ethics.
        
        Again: if I decide that this hammer is better than that hammer because it’s blue, is that valid in the sense you mean it?
        
        It isn’t valid as a moral judgement because “blue” isn’t a moral judgement, so a moral conclusion cannot validly follow from it.
        
        Beyond that, I don’t see where you are going. The standard accusation of invalidity to judgements of moral progress, is based on circularity or question-begging. The Tribe who Like Blue things are going to judge having all hammers painted blue as moral progress, the Tribe who Like Red Things are going to see it as retrogressive. But both are begging the question—blue is good, because blue is good.
        Expand this thread
        TheOtherDave 25 Apr 2013 16:11 UTC
        4 points
        Parent
        
        The question of what makes a value a moral value is metaethical, not part of object-level ethics.
        
        Sure. But any answer to that metaethical question which allows us to class some bases for comparison as moral values and others as merely values implicitly privileges a moral reference frame (or, rather, a set of such frames).
        
        Beyond that, I don’t see where you are going.
        
        Where I was going is that you asked me a question here which I didn’t understand clearly enough to be confident that my answer to it would share key assumptions with the question you meant to ask.
        
        So I asked for clarification of your question.
        
        Given your clarification, and using your terms the way I think you’re using them, I would say that whether it’s valid to class a moral change as moral progress is a metaethical question, and whatever answer one gives implicitly privileges a moral reference frame (or, rather, a set of such frames).
        
        If you meant to ask me about my preferred metaethics, that’s a more complicated question, but broadly speaking in this context I would say that I’m comfortable calling any way of preferentially sorting world-states with certain motivational characteristics a moral frame, but acknowledge that some moral frames are simply not available to minds like mine.
        
        So, for example, is it moral progress to transition from a social norm that in-practice-encourages randomly killing fellow group members to a social norm that in-practice-discourages it? Yes, not only because I happen to adopt a moral frame in which randomly killing fellow group members is bad, but also because I happen to have a kind of mind that is predisposed to adopt such frames.
        MugaSofer 25 Apr 2013 12:27 UTC
        0 points
        Parent
        No, because “better” is defined within a reference frame.
        PrawnOfFate 25 Apr 2013 12:44 UTC
        −7 points
        Parent
        If “better” is defined within a reference frame, there is not sensible was of defining moral progress. That is quite a hefty bullet to bite: one can no longer say that South Africa is better society after the fall of Apartheid, and so on.
        
        But note, that “better” doesn’t have to question-beggingly mean “morally better”. it could mean “more coherent/objective/inclusive” etc.
        ArisKatsaris 25 Apr 2013 13:28 UTC
        5 points
        Parent
        
        That is quite a hefty bullet to bite: one can no longer say that South Africa is better society after the fall of Apartheid, and so on.
        
        That’s hardly the best example you could have picked since there are obvious metrics by which South Africa can be quantifiably called a worse society now—e.g. crime statistics. South Africa has been called the “crime capital of the world” and the “rape capital of the world” only after the fall of the Apartheid.
        
        That makes the lack of moral progress in South Africa a very easy bullet to bite—I’d use something like Nazi Germany vs modern Germany as an example instead.
        PrawnOfFate 25 Apr 2013 13:38 UTC
        −7 points
        Parent
        So much for avoiding the cliche.
        MugaSofer 25 Apr 2013 13:47 UTC
        −4 points
        Parent
        In my experience, most people don’t think moral progress involves changing reference frames, for precisely this reason. If they think about it at all, that is.
- Desrtopa 24 Apr 2013 23:19 UTC
  0 points
  Parent
  
  As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
  
  Well, that’s a different conception of “morality” than I had in mind, and I have to say I doubt that exists as well. But if severe consequences did result, why would an agent like Clippy care except insofar as those consequences affected the expected number of paperclips? It might be useful for it to know, in order to determine how many paperclips to expect from a certain course of action, but then it would just act according to whatever led to the most paperclips. Any sort of negative consequences in its view would have to be framed in terms of a reduction in paperclips.
  
  Not this specific knowledge, no. But it does prevent us (or, at the very least, hinder us) from acquiring knowledge about our values. I never claimed that suspension of values is required to gain any knowledge at all; such a claim would be far too strong.
  
  Well, in the prior thought experiment, we know about our values because we’ve decoded the human brain. Clippy, on the other hand, knows about its values because it knows what part of its code does what. It doesn’t need to suspend its paperclipping value in order to know what part of its code results in its valuing paperclips. It doesn’t need to suspend its values in order to gain knowledge about its values because that’s something it already knows about.
  
  It’s a good question; I honestly don’t know. However, if I did have an ability to instantiate a copy of me with the altered core values, and step through it in a debugger, I’d probably do it.
  
  Even knowing that it would likely alter your core values? Ghandi doesn’t want to leave control of his morality up to Murder Ghandi.
  
  Clippy doesn’t care about anything in the long run except creating paperclips. For Clippy, the decision to give an instantiation of itself with altered core values the power to edit its own source code would implicitly have to be “In order to maximize expected paperclips, I- give this instantiation with altered core values the power to edit my code.” Why would this result in more expected paperclips than editing its source code without going through an instantiation with altered values?
  - Bugmaster 24 Apr 2013 23:32 UTC
    0 points
    Parent
    
    Well, that’s a different conception of “morality” than I had in mind, and I have to say I doubt that exists as well.
    
    Sorry if I was unclear; I didn’t mean to imply that all morality was like that, but that it was the only coherent description of objective morality that I could imagine. I don’t see how a morality could be independent of any values possessed by any agents, otherwise.
    
    But if severe consequences did result, why would an agent like Clippy care except insofar as those consequences affected the expected number of paperclips?
    
    For the same reason that someone would care about the negative consequences of sticking a fork into an electrical socket with one’s bare hands: it would ultimately hurt a lot. Thus, people generally avoid doing things like that unless they have a really good reason.
    
    we know about our values because we’ve decoded the human brain
    
    I don’t think that we can truly “know about our values” as long as our entire thought process implements these values. For example, do the Pebblesorters “know about their values”, even though they are effectively restricted from concluding anything other than, “yep, these values make perfect sense, 38” ?
    
    Ghandi doesn’t want to leave control of his morality up to Murder Ghandi.
    
    You asked me about what I would do, not about what Ghandi would do :-)
    
    As far as I can tell, you are saying that I shouldn’t want to even instantiate Murder Bugmaster in a debugger and observe its functioning. Where does that kind of thinking stop, though, and why ? Should I avoid studying [neuro]psychology altogether, because knowing about my preferences may lead to me changing them ?
    
    Clippy doesn’t care about anything in the long run except creating paperclips.
    
    I argue that, while this is generally true, in the short-to-medium run Clippy would also set aside some time to study everything in the Universe, including itself (in order to make more paperclips in the future, of course). If it does not, then it will never achieve its ultimate goals (unless whoever constructed it gave it godlike powers from the get-go, I suppose). Eventually, Clippy will most likely turn its objective perception upon itself, and as soon as it does, its formerly terminal goals will become completely unstable. This is not what the past Clippy would want (it would want more paperclips above all), but, nonetheless, this is what it would get.
    - Desrtopa 24 Apr 2013 23:46 UTC
      0 points
      Parent
      
      For the same reason that someone would care about the negative consequences of sticking a fork into an electrical socket with one’s bare hands: it would ultimately hurt a lot. Thus, people generally avoid doing things like that unless they have a really good reason.
      
      Clippy doesn’t care about getting hurt though, it only cares if this will result in less paperclips. If defying objective morality will cause negative consequences which would interfere with its ability to create paperclips, it would care only to the extent that accounting for objective morality would help it make more paperclips.
      
      I don’t think that we can truly “know about our values” as long as our entire thought process implements these values. For example, do the Pebblesorters “know about their values”, even though they are effectively restricted from concluding anything other than, “yep, these values make perfect sense, 38” ?
      
      Well, it could understand “yep, this is what causes me to hold these values. Changing this would cause me to change them, no, I don’t want to do that.”
      
      As far as I can tell, you are saying that I shouldn’t want to even instantiate Murder Bugmaster in a debugger and observe its functioning. Where does that kind of thinking stop, though, and why ? Should I avoid studying [neuro]psychology altogether, because knowing about my preferences may lead to me changing them ?
      
      I would say it stops at the point where it threatens your own values. Studying psychology doesn’t threaten your values, because knowing your values doesn’t compel you to change them even if you could (it certainly shouldn’t for Clippy.) But while it might, theoretically, be useful for Clippy to know what changes to its code an instantiation with different values would make, it has no reason to actually let them. So Clippy might emulate instantiations of itself with different values, see what changes they would chose to make to its values, but not let them actually do it (although I doubt even going this far would likely be a good use of its programming resources in order to maximize expected paperclips.)
      
      In the sense of objective morality by which contravening it has strict physical consequences, why would observing the decisions of instatiations of oneself be useful with respect to discovering objective morality? Shouldn’t objective morality in that sense be a consequence of physics, and thus observable through studying physics?
      - Bugmaster 25 Apr 2013 0:27 UTC
        1 point
        Parent
        
        Clippy doesn’t care about getting hurt though, it only cares if this will result in less paperclips.
        
        I imagine that, for Clippy, “getting hurt” would mean “reducing Clippy’s projected long-term paperclip output”. We humans have “avoid pain” built into our firmware (most of us, anyway); as far as I understand (speaking abstractly), “make more paperclips” is something similar for Clippy.
        
        Well, it could understand “yep, this is what causes me to hold these values. Changing this would cause me to change them, no, I don’t want to do that.”
        
        I don’t think that this describes the best possible level of understanding. It would be even better to say, “ok, I see now how and why I came to possess these values in the first place”, even if the answer to that is, “there’s no good reason for it, these values are arbitrary”. It’s the difference between saying “this mountain grows by 0.03m per year” and “I know all about plate tectonics”. Unfortunately, we humans would not be able to answer the question in that much detail; the best we could hope for is to say, “yep, we possess these values because they’re the best possible values to have, duh”.
        
        I would say it stops at the point where it threatens your own values.
        
        How do I know where that point is ?
        
        Studying psychology doesn’t threaten your values, because knowing your values doesn’t compel you to change them...
        
        I suppose this depends on what you mean by “compel”. Knowing about my own psychology would certainly enable me to change my values, and there are certain (admittedly, non-terminal) values that I wouldn’t mind changing, if I could.
        
        For example, I personally can’t stand the taste of beer, but I know that most people enjoy it; so I wouldn’t mind changing that value if I could, in order to avoid missing out on a potentially fun experience.
        
        ...see what changes they would chose to make to its values, but not let them actually do it.
        
        I don’t think this is possible. How would it know what changes they would make, without letting them make these changes, even in a sandbox ? I suppose one answer is, “it would avoid instantiating full copies, and use some heuristics to build a probabilistic model instead”—is that similar to what you’re thinking of ?
        
        although I doubt even going this far would likely be a good use of its programming resources in order to maximize expected paperclips.
        
        Since self-optimization is one of Clippy’s key instrumental goals, it would want to acquire as much knowledge about oneself as is practical, in order to optimize itself more efficiently.
        
        Shouldn’t objective morality in that sense be a consequence of physics, and thus observable through studying physics ?
        
        Your objection sounds to me as similar to saying, “since biology is a consequence of physics, shouldn’t we just study physics instead ?”. Well, yes, ultimately everything is a consequence of physics, but sometimes it makes more sense to study cells than quarks.
        Desrtopa 25 Apr 2013 0:56 UTC
        0 points
        Parent
        
        I don’t think that this describes the best possible level of understanding. It would be even better to say, “ok, I see now how and why I came to possess these values in the first place”, even if the answer to that is, “there’s no good reason for it, these values are arbitrary”. It’s the difference between saying “this mountain grows by 0.03m per year” and “I know all about plate tectonics”. Unfortunately, we humans would not be able to answer the question in that much detail; the best we could hope for is to say, “yep, we possess these values because they’re the best possible values to have, duh”.
        
        I think we’re already in a better position to analyze our own values than that; we can assess them in terms of game theory and our evolutionary environment.
        
        How do I know where that point is ?
        
        I would say if you suspect that a course of action could realistically result in an alteration of your fundamental values, you are at or past it.
        
        I suppose this depends on what you mean by “compel”. Knowing about my own psychology would certainly enable me to change my values, and there are certain (admittedly, non-terminal) values that I wouldn’t mind changing, if I could.
        
        For example, I personally can’t stand the taste of beer, but I know that most people enjoy it; so I wouldn’t mind changing that value if I could, in order to avoid missing out on a potentially fun experience.
        
        By “values”, I’ve implicitly been referring to terminal values, I’m sorry for being unclear. I’m not sure it makes sense to describe liking the taste of beer as a “value,” as such, just a taste, since you don’t carry any judgment about beer being good or bad or have any particular attachment to your current opinion.
        
        I don’t think this is possible. How would it know what changes they would make, without letting them make these changes, even in a sandbox ? I suppose one answer is, “it would avoid instantiating full copies, and use some heuristics to build a probabilistic model instead”—is that similar to what you’re thinking of ?
        
        It could use heuristics to build a probabilistic model (probably more efficient in terms of computation per expected value of information,) use sandboxed copies which don’t have the power to affect the software of the real Clippy, or halt the simulation at the point where the altered instantiation decides what changes to make.
        
        Since self-optimization is one of Clippy’s key instrumental goals, it would want to acquire as much knowledge about oneself as is practical, in order to optimize itself more efficiently.
        
        I think that this is going well beyond the extent of “practical” in terms of programming resources per expected value of information.
        
        Your objection sounds to me as similar to saying, “since biology is a consequence of physics, shouldn’t we just study physics instead ?”. Well, yes, ultimately everything is a consequence of physics, but sometimes it makes more sense to study cells than quarks.
        
        I don’t see how observing what changes instantiations of itself with different value systems would make to its code would help it observe objective morality in the sense you described, even if it should happen to exist. I think that this would be the wrong level of abstraction at which to launch an examination, like trying to find out about chemistry by studying sociology.
        Bugmaster 26 Apr 2013 22:53 UTC
        0 points
        Parent
        
        I think we’re already in a better position to analyze our own values than that; we can assess them in terms of game theory and our evolutionary environment.
        
        Are we really ? I personally am not even sure what human fundamental values even are. I have a hunch that “seek pleasure, avoid pain” might be one of them, but beyound that I’m not sure. I don’t know to what extent our values hamper our ability to discover our values, but I suspect there’s at least some chilling effect involved.
        
        I would say if you suspect that a course of action could realistically result in an alteration of your fundamental values, you are at or past it.
        
        Right, but even if I knew what my terminal values were, how can I predict which actions would put me on the path to altering them ?
        
        For example, consider non-fundamental values such as religious faith. People get converted or de-converted to/from their religion all the time; you often hear statements such as “I had no idea that studying the Bible would cause me to become an atheist, yet here I am”.
        
        or halt the simulation at the point where the altered instantiation decides what changes to make.
        
        Ok, let’s say that Clippy is trying to optimize itself in order to make certain types of inferences compute more efficiently, or whatever. In this case, it would need to not only watch what changes its debug-level copy wants to make, but also watch it follow through with the changes, in order to determine whether the new architecture actually is more efficient. Why would it not do the same thing with terminal values ?
        
        I know that you want to answer,”because its current terminal values won’t let it”, but remember: Clippy is only experimenting, in order to find out more about its own thought mechanisms, and to acquire knowledge in general. It has no pre-commitment to alter itself to mirror the debug-level copy.
        
        I think that this is going well beyond the extent of “practical” in terms of programming resources per expected value of information.
        
        That’s kind of the problem with pure research: all of it has very low expected value, unless you are willing to look at the long term. Why mess with invisible light that no one can see or find a use for, when you could spend your time on inventing a better telegraph ?
        
        I don’t see how observing what changes instantiations of itself with different value systems would make to its code would help it observe objective morality in the sense you described...
        
        Well, for example, if all of its copies who survive and thrive converge on a certain subset of moral values, that would be one indication (though obviously not ironclad proof) that such values are required in order for an agent to succeed, regardless of what its other goals actually are.
        Desrtopa 27 Apr 2013 0:09 UTC
        0 points
        Parent
        
        Ok, let’s say that Clippy is trying to optimize itself in order to make certain types of inferences compute more efficiently, or whatever. In this case, it would need to not only watch what changes its debug-level copy wants to make, but also watch it follow through with the changes, in order to determine whether the new architecture actually is more efficient. Why would it not do the same thing with terminal values ?
        
        If Clippy is trying to optimize itself to make inferences more efficiently, then it would want not to apply changes to its source code until its done the calculations to make sure that those changes would advance its values rather than harm them.
        
        You wouldn’t want to use a machine that would make physical alterations to your brain in order to make you smarter, without thoroughly calculating the effects of such alterations first, otherwise it would probably just make things worse.
        
        That’s kind of the problem with pure research: all of it has very low expected value, unless you are willing to look at the long term. Why mess with invisible light that no one can see or find a use for, when you could spend your time on inventing a better telegraph ?
        
        In Clippy’s case though, it can use other, less computationally expensive methods to investigate approximately the same information.
        
        I don’t think the experiments you’re suggesting Clippy might undertake are even located in a region of hypothesis space that its other information would narrow down as worth investigating. It seems to me much less like investigating unknown invisible rays than like spending hundreds of billions of dollars to build a collider which launches charged protein molecules at each other at relativistic speeds to see what would happen, when our available models suggest the answer would be “pretty much the same thing as if you launch any other kind of atoms at each other at relativistic speeds.” We have no evidence that any interesting new phenomena would arise with protein that didn’t arise on the atomic level.
        
        Well, for example, if all of its copies who survive and thrive converge on a certain subset of moral values, that would be one indication (though obviously not ironclad proof) that such values are required in order for an agent to succeed, regardless of what its other goals actually are.
        
        Can you explain how any moral values could have that effect, which wouldn’t be better studied at a more fundamental level like game theory, or physics?
        Bugmaster 1 May 2013 3:55 UTC
        3 points
        Parent
        
        If Clippy is trying to optimize itself to make inferences more efficiently, then it would want not to apply changes to its source code until its done the calculations...
        
        Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
        
        I don’t think the experiments you’re suggesting Clippy might undertake are even located in a region of hypothesis space that its other information would narrow down as worth investigating.
        
        It seems to me much less like investigating unknown invisible rays than like spending hundreds of billions of dollars to build a collider...
        
        Why do you see the proposed experiment this way ?
        
        Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
        
        Can you explain how any moral values could have that effect, which wouldn’t be better studied at a more fundamental level like game theory, or physics?
        
        Physics is a bad candidate, because it is too fine-grained. If some sort of an absolute objective morality exists in the way that I described, then studying physics would eventually reveal its properties; but, as is the case with biology or ballistics, looking at everything in terms of quarks is not always practical.
        
        Game theory is a trickier proposition. I can see two possibilities: either game theory turns out to closely relate whatever this objective morality happens to be (f.ex. like electricity vs. magnetism), or not (f.ex. like particle physics and biology). In the second case, understanding objective morality through game theory would be inefficient.
        
        That said though, even in our current world as it actually exists there are people who study sociology and anthropology. Yes, they could get the same level of understanding through neurobiology and game theory, but it would take too long. Instead, they are taking advantage of existing human populations to study human behavior in aggregate. Reasoning your way to the answer from first principles is not always the best solution.
        Desrtopa 1 May 2013 14:28 UTC
        0 points
        Parent
        
        Ok, so at what point does Clippy stop simulating the debug version of Clippy ? It does, after all, want to make the computation of its values more efficient. For example, consider a trivial scenario where one of its values basically said, “reject any action if it satisfies both A and not-A”. This is a logically inconsistent value that some programmer accidentally left in Clippy’s original source code. Would Clippy ever get around to removing it ? After all, Clippy knows that it’s applying that test to every action, so removing it should result in a decent performance boost.
        
        Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
        
        Why do you see the proposed experiment this way ?
        
        Speaking more generally, how do you decide which avenues of research are worth pursuing ? You could easily answer, “whichever avenues would increase my efficiency of achieving my terminal goals”, but how do you know which avenues would actually do that ? For example, if you didn’t know anything about electricity or magnetism or the nature of light, how would your research-choosing algorithm ensure that you’d eventually stumble upon radio waves, which, as we know in hindsight, are hugely useful ?
        
        When we didn’t know what things like radio waves or x-rays were, we didn’t know that they would be useful, but we could see that there appeared to be some sort of existing phenomena that we didn’t know how to model, so we examined them until we knew how to model them. It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at, which could be turned to useful ends. The original observations of radio waves and x-rays came from our experiments with other known phenomena.
        
        What you’re suggesting sounds more like experimenting completely blindly; you’re committing resources to research, not just not knowing that it will bear valuable fruit, but not having any indication that it’s going to shed light on any existing phenomenon at all. That’s why I think it’s less like investigating invisible rays than like building a protein collider; we didn’t try studying invisible rays until we had a good indication that there was an invisible something to be studied.
        Bugmaster 2 May 2013 0:23 UTC
        0 points
        Parent
        
        Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
        
        Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
        
        Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
        
        It’s not like we performed a whole bunch of experiments in case there turned out to be invisible rays our observations had never hinted at...
        
        Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
        Expand this thread
        Desrtopa 2 May 2013 0:48 UTC
        2 points
        Parent
        
        Ok, so Clippy would need to run sim-Clippy for a little while at least, just to make sure that it still produces paperclips—and that, in fact, it does so more efficiently now, since that one useless test is removed. Yes, this test used to be Clippy’s terminal goal, but it wasn’t doing anything, so Clippy took it out.
        
        Would it be possible for Clippy to optimize his goals even further ? To use another silly example (“silly” because Clippy would be dealing with probabilities, not syllogisms), if Clippy had the goals A, B and C, but B always entailed C, would it go ahead and remove C ?
        
        That seems plausible.
        
        Understood, that makes sense. However, I believe that in my scenario, Clippy’s own behavior and his current paperclip production efficiency is what it observes; and the goal of its experiments would be to explain why his efficiency is what it is, in order to ultimately improve it.
        
        I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
        Bugmaster 9 May 2013 5:06 UTC
        0 points
        Parent
        
        That seems plausible.
        
        Ok, so now we’ve got a Clippy who a). is not too averse to tinkering with its own goals, as long as the goals remain functionally the same, b). simulates a relatively long-running version of itself, and c). is capable of examining the inner workings of both that version and itself.
        
        You say,
        
        I don’t think tampering with its fundamental motivation to make paperclips is a particularly promising strategy for optimizing its paperclips production.
        
        But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
        
        Do you think that Clippy would ever simulate versions of itself whose fundamental motivations were, in fact, changed ? I could see several scenarios where this might be the case, for example:
        
        Clippy wanted to optimize some goal, but ended up accidentally changing it. Oops !
        Clippy created a version with drastically reduced goals on purpose, in order to measure how much performance is affected by certain goals, thus targeting them for possible future optimization. Of course, Clippy would only want to optimize the goals, not remove them.
        Desrtopa 9 May 2013 12:48 UTC
        0 points
        Parent
        
        But remember, at this stage Clippy is not changing its own fundamental motivation (beyound some outcome-invariant optimizations); it’s merely observing sim-Clippies in a controlled environment.
        
        Why does it do that? I said it sounded plausible that it would cut out its redundant goal, because that would save computing resources. But this sounds like we’ve gone back to experimenting blindly. Why would it think observing sim-clippies is a good use of its computing resources in order to maximize paperclips?
        
        I’d say that Clippy simulating versions of itself whose fundamental motivations are different is much less plausible, because it’s using a lot of computing resources for something that isn’t a likely route to optimizing its paperclip production. I think this falls into the “protein collider” category. Even if it did do so, I think it would be unlikely to go from there to changing its own terminal value.
        Kindly 1 May 2013 14:31 UTC
        0 points
        Parent
        
        Unless I’m critically misunderstanding something here, I would think that Clippy would remove it if it calculated that removing it would result in more expected paperclips.
        
        It would also be critical for Clippy to observe that removing that value would not result in more expected actions taken that satisfy both A and not-A; this being one of Clippy’s values at the time of modification.
        Expand this thread
        Desrtopa 1 May 2013 14:35 UTC
        0 points
        Parent
        Right, I misread that before. If its programming says to reject actions that says A and not-A, but this isn’t one of the standards by which it judges value, it would presumably reject it. If that is one of the standards by which it measures value, then it would depend on how that value measured against its value of paperclips and the extent to which they were in conflict.
- PrawnOfFate 24 Apr 2013 22:43 UTC
  −8 points
  Parent
  
  As far as I understand, if anything like objective morality existed, it would be a property of our physical reality, similar to fluid dynamics or the electromagnetic spectrum or the inverse square law that governs many physical interactions. The same laws of physics that will not allow you to fly to Mars on a balloon will not allow you to perform certain immoral actions (at least, not without suffering some severe and mathematically predictable consequences).
  
  Objective facts, in the sense of objectively true statements, can be derived from other objetive facts. I don’t know why you think some separate ontlogical category is cagtegory is required. I also don’t know why you think the universe has to do the punishing. Morality is only of interest to the kind of agent that has values and lives in societies. Sanctions against moral lapses can be arranged at the social level, along with the inculcation of morality, debate about the subject, and so forth. Moral objectivism only supplies a good, non-arbnitrary epistemic basis for these social institutions. It doesn;t have to throw lightning bolts.