Houshalter comments on Open thread, Oct. 10 - Oct. 16, 2016

Houshalter 25 Oct 2016 6:11 UTC
0 points

No, I’m asking you to specify it. My point is that you can’t build X if you can’t even recognize X.

And I don’t agree with that. I’ve presented some ideas on how an FAI could be built, and how CEV would work. None of them require “recognizing” FAI. What would it even mean to “recognize” FAI, except to see that it values the kinds of things we value and makes the world better for us.

Learning what humans want is pretty easy. However it’s an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.

I’ve written about one method to accomplish this, though there may be better methods.

Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative.

Humans are 99.999% identical. We have the same genetics, the same brain structures, and mostly the same environments. The only reason this isn’t obvious, is because we spend almost all our time focusing on the differences between people, because that’s what’s useful in everyday life.

I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.

That may be the case, but that’s still not a bad outcome. In the example I used, the values dropped from ISIS members were taken for 2 reasons. That they were based on false beliefs, or that they hurt other people. If you have values based on false beliefs, you should want them to be eliminated. If you have values that hurt other people then it’s only fair that be eliminated. Or else you risk the values of people that want to hurt you.

Later you say that CEV will average values. I don’t have average values.

Well I think it’s accurate, but it’s somewhat nonspecific. Specifically, CEV will find the optimal compromise of values. The values that satisfy the most people the most amount. Or at least dissatisfy the fewest people the least. See the post I just linked for more details, on one example of how that could be implemented. That’s not necessarily “average values”.

In the worst case, people with totally incompatible values will just be allowed to go separate ways, or whatever the most satisfying compromise is. Muslims live on one side of the dyson sphere, Christians on the other, and they never have to interact and can do their own thing.

You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.

My exact words were “If they were more intelligent, informed, and rational… If they knew all the arguments for and against...” Real world problems of persuading people don’t apply. Most people don’t research all the arguments against their beliefs, and most people aren’t rational and seriously consider the hypothesis that they are wrong.

For what it’s worth, I was deconverted like this. Not overnight by any means. But over time I found that the arguments against my beliefs were correct and I updated my belief.

Changing world views is really really hard. There’s no one piece of evidence or one argument to dispute. Religious people believe that there is tons of evidence of God. To them it just seems obviously true. From miracles, to recorded stories, to their own personal experiences, etc. It takes a lot of time to get at every single pillar of the belief and show its flaws. But it is possible. It’s not like Muslims were born believing in Islam. Islam is not encoded in genetics. People deconvert from religions all the time, entire societies have even done it.

In any case, my proposal does not require literally doing this. It’s just a thought experiment. To show that the ideal set of values is what you choose if you had all the correct beliefs.
- Lumifer 25 Oct 2016 15:01 UTC
  0 points
  Parent
  
  What would it even mean to “recognize” FAI
  
  It means that when you look an an AI system, you can tell whether it’s FAI or not.
  
  If you can’t tell, you may be able to build an AI system, but you still won’t know whether it’s FAI or not.
  
  I’ve written about one method to accomplish this
  
  I don’t see what voting systems have to do with CEV. The “E” part means you don’t trust what the real, current humans say, so to making them vote on anything is pointless.
  
  Humans are 99.999% identical.
  
  That’s a meaningless expression without a context. Notably, we don’t have the same genes or the same brain structures. I don’t know about you, but it is really obvious to me that humans are not identical.
  
  ...false beliefs … it’s only fair …
  
  How do you know what’s false? You are a mere human, you might well be mistaken. How do you know what’s fair? Is it an objective thing, something that exists in the territory?
  
  The values that satisfy the most people the most amount.
  
  Right, so the fat man gets thrown under the train… X-)
  
  Muslims live on one side of the dyson sphere, Christians on the other
  
  Hey, I want to live on the inside. The outside is going to be pretty gloomy and cold :-/
  
  Real world problems of persuading people don’t apply.
  
  LOL. You’re just handwaving then. “And here, in the difficult part, insert magic and everything works great!”
  - Houshalter 25 Oct 2016 20:42 UTC
    0 points
    Parent
    
    It means that when you look an an AI system, you can tell whether it’s FAI or not.
    
    Look at it how? Look at it’s source code? I argued that we can write source code that will result in FAI, and you could recognize that. Look at the weights of it’s “brain”? Probably not, anymore than we can look at human brains and recognize what they do. Look at it’s actions? Definitely, FAI is an AI that doesn’t destroy the world etc.
    
    I don’t see what voting systems have to do with CEV. The “E” part means you don’t trust what the real, current humans say, so to making them vote on anything is pointless.
    
    The voting doesn’t have to actually happen. The AI can predict what we would vote for, if we had plenty of time to debate it. And you can get even more abstract than that and have the FAI just figure out the details of E itself.
    
    The point is to solve the “coherent” part. That you can find a set of coherent values from a bunch of different agents or messy human brains. And to show that mathematicians have actually extensively studied a special case of this problem, voting systems.
    
    That’s a meaningless expression without a context. Notably, we don’t have the same genes or the same brain structures. I don’t know about you, but it is really obvious to me that humans are not identical.
    
    Compared to other animals, compared to aliens, yes we are incredibly similar. We do have 99.99% identical DNA, our brains all have the same structure with minor variations.
    
    How do you know what’s false?
    
    Did I claim that I did?
    
    How do you know what’s fair? Is it an objective thing, something that exists in the territory?
    
    I gave a precise algorithm for doing that actually.
    
    Right, so the fat man gets thrown under the train… X-)
    
    Which is the best possible outcome, vs killing 5 other people. But I don’t think these kinds of scenarios are realistic once we have incredibly powerful AI.
    
    LOL. You’re just handwaving then. “And here, in the difficult part, insert magic and everything works great!”
    
    I’m not handwaving anything… There is no magic involved at all. The whole scenario of persuading people is counterfactual and doesn’t need to actually be done. The point is to define more exactly what CEV is. It’s the values you would want if you had the correct beliefs. You don’t need to actually have the correct beliefs, to give your CEV.
    - Lumifer 26 Oct 2016 14:28 UTC
      0 points
      Parent
      I think we have, um, irreconcilable differences and are just spinning wheels here. I’m happy to agree to disagree.