ArisKatsaris comments on The Friendly AI Game

ArisKatsaris 16 Mar 2011 21:44 UTC
2 points
I disagree with all of these four claims
1. They know each other, and so can predict each other’s CEV better than that of the whole of humanity
I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it’s not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors. Unless we’re talking uploaded personalities, which is a whole different discussion.
1. They can explicitly trade utility with each other and encode compromises into the utility function (so that it won’t be a pure CEV)
So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?

I don’t think that’s a good idea at all.
1. The fact they were in this project together indicates a certain commonality of interests and ideas, and may serve to exclude memes that AI-builders would likely consider dangerous (e.g., fundamentalist religion)
Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware? Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?
1. They have had the opportunity of excluding people they don’t like from participating in the project to begin with
I don’t think that the 9999 first contributors get to vote on whether they’ll accept a donation from the 10,000th one. And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I’d urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.

Also, Putin and Ahmadinejad are much more likely than the average human to influence the first AI’s utility function, simply because they have a lot of money and power.

That’s why CEV is far better than CEV.
- DanArmak 18 Mar 2011 13:21 UTC
  2 points
  Parent
  
  I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it’s not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors.
  
  The programmers want the AI to calculate CEV because they expect CEV to be something they will like. We can’t calculate CEV ourselves, but that doesn’t mean we don’t know any of CEV’s (expected) properties.
  
  However, we might be wrong about what CEV will turn out to be like, and we may come to regret pre-committing to CEV. That’s why I think we should prefer CEV, because we can predict it better.
  
  So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?
  
  What I meant was that they might oppose and override some of the input to the CEV from the rest of humanity.
  
  However, it might also be a good idea to override some of your own CEV results, because we don’t know in advance what the CEV will be. We define the desired result as “the best possible extrapolation”, but our implementation may produce something different. It’s very dangerous to precommit the whole future universe to something you don’t yet know at the moment of precommitment (my point number 1). So, you’d want to include overrides about things you’re certain should not be in the CEV.
  
  Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware?
  
  This is a misleading question.
  
  If you are certain that the CEV will decide against fundamentalist religion, you should not oppose precommitting the AI to oppose fundamentalist religion, because you’re certain this won’t change the outcome. If you don’t want to include this modification to the AI, that means you 1) accept there is a possibility of religion being part of the CEV, and 2) want to precommit to living with that religion if it is part of the CEV.
  
  Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?
  
  Maybe intelligent people like dangerous memes. I don’t know, because I’m not yet that intelligent. I do know though that having high intelligence doesn’t imply anything about goals or morals.
  
  Broadly, this question is similar to “why do you think this brilliant AI-genie might misinterpret our request to alleviate world hunger?”
  
  I don’t think that the 9999 first contributors get to vote on whether they’ll accept a donation from the 10,000th one.
  
  Why not? If they’re controlling the project at that point, they can make that decision.
  
  And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I’d urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.
  
  I’m not being vocal about any actual group I may know of that is working on AI :-)
  
  I might still want to be vocal about my approach, and might want any competing groups to adopt it. I don’t have good probabilitiy estimates on this, but it might be the case that I would prefer CEV to CEV.
  
  That’s why CEV is far better than CEV.
  
  Why are you certain of this? At the very least it depends on who the person contributing money is.
  
  “Humanity” includes a huge variety of different people. Depending on the CEV it may also include an even wider variety of people who lived in the past and counterfactuals who might live in the future. And the CEV, as far as I know, is vastly underspecified right now—we don’t even have a good conceptual test that would tell us if a given scenario is a probable outcome of CEV, let alone a generative way to calculate that outcome.
  
  Saying that the CEV “will best please everyone” is just handwaving this aside. Precommitting the whole future lightcone to the result of a process we don’t know in advance is very dangerous, and very scary. It might be the best possible compromise between all humans, but it is not the case that all humans have equal input into the behavior of the first AI. I have not seen any good arguments claiming that implementing CEV is a better strategy than just trying to be to build the first AI before anyone else and then making it implement a narrow CEV.
  
  Suppose that the first AI is fully general, and can do anything you ask of it. What reason is there for its builders, whoever they are, to ask to it to implement CEV rather than CEV?
  - TheOtherDave 18 Mar 2011 15:18 UTC
    4 points
    Parent
    In an idealized form, I agree with you.
    
    That is, if I really take the CEV idea seriously as proposed, there simply is no way I can prefer CEV(me + X) to CEV(me)… if it turns out that I would, if I knew enough and thought about it carefully enough and “grew” enough and etc., care about other people’s preferences (either in and of themselves, as in “I hadn’t thought of that but now that you point it out I want that too”, or by reference to their owners, as in “I don’t care about that but if you do then fine let’s have that too,” for which distinction I bet there’s a philosophical term of art that I don’t know), then the CEV-extraction process will go ahead and optimize for those preferences as well, even if I don’t actually know what they are, or currently care about them; even if I currently think they are a horrible evil bad no-good idea. (I might be horrified by that result, but presumably I should endorse it anyway.)
    
    This works precisely because the CEV-extraction process as defined depends on an enormous amount of currently-unavailable data in the course of working out the target’s “volition” given its current desires, including entirely counterfactual data about what the target would want if exposed to various idealized and underspecified learning/”growing” environments.
    
    That said, the minute we start talking instead about some actual realizable thing in the world, some approximation of CEV-me computable by a not-yet-godlike intelligence, it stops being quite so clear that all of the above is true.
    
    An approximate-CEV extractor might find things in your brain that I would endorse if I knew about them (given sufficient time and opportunity to discuss it with you and “grow” and so forth) but that it wasn’t able to actually compute based on just my brain as a target, in which case pointing it at both of us might be better (in my own terms!) than pointing it at just me.
    
    It comes down to a question of how much we trust the seed AI that’s doing the extraction to actually solve the problem.
    
    It’s also perhaps worth asking what happens if I build the CEV-extracting seed AI and point it at my target community and it comes back with “I don’t have enough capability to compute CEV for that community. I will have to increase my capabilities in order to solve that problem.”