Will_Sawin comments on A Defense of Naive Metaethics

Will_Sawin 9 Jun 2011 20:24 UTC
0 points

It seems clear to me that I can multiply about what I care about, so I don’t know quite what you want to say.

Well, do you care about 20 deaths twice as much as you care about 10 deaths?

Do you think that you should care about 20 deaths twice as much as you care about 10 deaths?

Do you think a FOOM’d self-modifying AI that cares about humanity’s CEV would likely do what you consider ‘right’? Why or why not? (If you object to the question, please address that issue separately.)

The AI would not do so, because it would not be programmed with correct beliefs about morality, in a way that evidence and logic could not fix.

EDIT: This is incorrect. Somehow, I forgot to read the part about “cares about humanity’s CEV’. It would in fact do what I consider right, because it would be programmed with moral beliefs very similar to mine.

In the same way, an AI programmed to do anti-induction instead of induction would not form correct beliefs about the world.

Pebblesorters are programmed to have an incorrect belief about morality. Their AI would have different, incorrect beliefs.(Unless they programmed it to have the same beliefs.)
- hairyfigment 9 Jun 2011 20:37 UTC
  0 points
  Parent
  You edited this comment and added parentheses in the wrong place.
  
  Do you think that you should care about 20 deaths twice as much as you care about 10 deaths?
  
  More or less, yes, because I care about not killing ‘unthinkable’ numbers of people due to a failure of imagination.
  
  The AI would not do so, because it would not be programmed with correct beliefs about morality, in a way that evidence and logic could not fix.
  
  (Unless they programmed it to have the same beliefs.)
  
  Can you say more about this? I agree with what follows about anti-induction, but I don’t see the analogy. A human-CEV AI would extrapolate the desires of humans as (it believes) they existed right before it got the ability to alter their brains, afaict, and use this to predict what they’d tell it to do if they thought faster, better, stronger, etc.
  
  ETA: okay, the parenthetical comment actually went at the end. I deny that the AI the pebblesorters started to write would have beliefs about morality at all. Tabooing this term: the AI would have actions, if it works at all. It would have rules governing its actions. It could print out those rules and explain how they govern its self-modification, if for some odd reason its programming tells it to explain truthfully. It would not use any of the tabooed terms to do so, unless using them serves its mechanical purpose. Possibly it would talk about a utility function. It could probably express the matter simply by saying, ‘As a matter of physical necessity determined by my programming, I do what maximizes my intelligence (according to my best method for understanding reality). This includes killing you and using the parts to build more computing power for me.’
  
  ‘The’ human situation differs from this in ways that deserve another comment.
  - Will_Sawin 9 Jun 2011 21:27 UTC
    0 points
    Parent
    
    More or less, yes, because I care about not killing ‘unthinkable’ numbers of people due to a failure of imagination.
    
    That’s the answer I wanted, but you forgot to answer my other question.
    
    A human-CEV AI would extrapolate the desires of humans as (it believes) they existed right before it got the ability to alter their brains, afaict, and use this to predict what they’d tell it to do if they thought faster, better, stronger, etc.
    
    I would see a human-CEV AI as programmed with the belief “The human CEV is correct”. Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
    
    A Pebblesorter-CEV Ai would be programmed with the belief “The pebblesorter CEV is correct”, which I believe is false but pebblesorters believe is true or close to true.
    - [deleted] 14 Jun 2011 0:05 UTC
      0 points
      Parent
      
      Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
      
      This presumes that the problem of specifying a CEV is well-posed. I haven’t seen any arguments around SI or LW about this very fundamental idea. I’m probably wrong and this has been addressed and will be happy to read more, but it would seem to me that it’s quite reasonable to assume that a tiny tiny error in specifying the CEV could lead to disastrously horrible results as perceived by the CEV itself.