Would a FAI reward us for helping create it?

TwistingFingersDec 30, 2011, 7:33 AM

−11 points

We expect that post-singularity there will still be limited resources in the form of available computational resources until heat death.

Those resources do not necessarily need to be allocated fairly. In fact, I would guess that if they were allocated unfairly the most like beneficiaries would be those people that helped contribute to the creation of a friendly AI.

Now for some open questions:

What probability distribution of extra resources do you expect with respect to various possible contributions to the creation of friendly AI?

Would donating to the SIAI suffice for acquiring these extra resources?

TwistingFingersDec 30, 2011, 7:33 AM

−11 points

22 comments1 min readLW link Archive

wedrifid Dec 30, 2011, 9:47 AM
5 points


Would a FAI reward us for helping create it?

Iff we program it to.

Trite but true. This isn’t a question about fundamental behavior of AIs. It’s a question of what the preferences the GAI creators wanted to impart on their AI and how well they managed to implement them. An AI that rewards to some degree could qualify as friendly but it doesn’t seem to be a requirement.

Here’s another question: If a group of people cooperated to save you and your species from near certain death and gave you and those dear to you an unbounded life of general awesomeness would you reward them? If so then an FAI may just well reward them too. If most people would reward in that circumstance then an FAI could plausibly also reward. But I don’t pretend to know what people’s extrapolated volition looks like or how the most likely FAI would be implemented.
- Vladimir_Nesov Dec 30, 2011, 10:35 AM
  10 points
  Parent
  
  Compare:
  
  Q: Would a calculator answer “59” if asked “7*8=”?
  A: Iff we program it to. Trite but true. This isn’t a question about fundamental behavior of calculators.
  
  “FAI” is a rather specific kind of program, and it won’t do any given thing just because its programmers wanted it to, its behavior isn’t controlled by what its programmers want, not in any reasonably direct way; just as the correct answer to what 7*8 is, isn’t controlled by what calculator’s designers want. If it does answer “7*8=59″, it’s not a calculator.
  - wedrifid Dec 30, 2011, 10:52 AM
    −3 points
    Parent
    
    
    Compare:
    
    Q: Would a calculator answer “59” if asked “7*8=”? A: Iff we program it to. Trite but true. This isn’t a question about fundamental behavior of calculators.
    
    Comparison result: NOT EQUAL. (For multiple reasons, come to think of it. Those being multiple results, parameterisation, and currently being ambiguously specified.)
    
    “FAI” is a rather specific kind of program
    
    My comment rather clearly assumed for that and further asserted that:
    
    An AI that rewards to some degree could qualify as friendly but it doesn’t seem to be a requirement.
    
    That is, there is a class of artificial intelligence algorithms which can be considered ‘friendly’ and within that class there are algorithms that would reward and other algorithms which would not reward. This is in stark contrast to other behaviors which could be output by algorithms which would necessarily exclude them from being in the class ‘friendly’ - such as torturing or killing anyone I cared about.
    - Vladimir_Nesov Dec 30, 2011, 11:10 AM
      2 points
      Parent
      
      My point is that specific behaviors is not the kind of thing that we can make decisions about in programming a FAI, so I don’t see how “iff we program it to” applies to a question of plausibility of a specific behavior. Rather, we can talk of which behaviors seem more or less plausible given what abstract properties the idea of “FAI” assumes, and depending on other parameters that influence a particular variant of its implementation (such as whether it optimizes human or chimp values). So on that level, it’s not plausible that FAI would start torturing people or maximizing paperclips, and these properties are not within variation of what the concept includes.
      
      there is a class of artificial intelligence algorithms which can be considered ‘friendly’ and within that class there are algorithms that would reward and other algorithms which would not reward.
      
      “Things that are mostly Friendly” is a huge class in which humanly constructible FAIs are a tiny dot (I expect we can either do a perfect job or none at all, while it’s theoretically but not humanly possible to create almost-perfect-but-not-quite FAIs). I’m talking about that dot, and I expect within that dot, the answer to this question is determined one way or the other, and we don’t know which. Is it actually the correct decision to “reward FAI’s creators”? If it is, FAI does it, if it’s not, FAI doesn’t do it. Whether programmers want it to be done doesn’t plausibly influence whether it’s the correct thing to do, and FAI does the correct thing, or it’s not a FAI.
      
      (More carefully, it’s not even clear what the question means, since it compares counterfactuals, and there is still no reliable theory of counterfactual reasoning. Like, “What do you mean, if we did that other thing? Look at what actually happened.” More usefully, the question is probably wrong in the sense that it poses a false dilemma, assumes things some of which will likely break.)
      What links here?
      Vladimir_Nesov's comment on Would a FAI reward us for helping create it? by TwistingFingers (Dec 30, 2011, 11:29 AM; 0 points)
      - wedrifid Dec 30, 2011, 11:20 PM
        1 point
        Parent
        
        
        My point is that specific behaviors is not the kind of thing that we can make decisions about in programming a FAI, so I don’t see how “iff we program it to” applies to a question of plausibility of a specific behavior.
        
        There is more than one way to program an FAI—see for example CEV which is currently ambiguous. There are also different individuals or groups of individuals which an AI can be friendly to and still qualify as “Friendly Enough” to warrant the label. It is likely that the actual (and coherently extrapolatable) preferences of humans differ with respect to whether rewarding AI-encouragers is a good thing.
        
        “Things that are mostly Friendly” is a huge class in which humanly constructible FAIs are a tiny dot (I expect we can either do a perfect job or none at all, while it’s theoretically but not humanly possible to create almost-perfect-but-not-quite FAIs). I’m talking about that dot, and I expect within that dot, the answer to this question is determined one way or the other, and we don’t know which.
        
        I’m pleasantly surprised. It seems that we disagree with respect to actual predictions about the universe rather than the expected, and more common “just miscommunication/responding to a straw man”. Within that dot the answer is not determined!
        
        Whether programmers want it to be done doesn’t plausibly influence whether it’s the correct thing to do, and FAI does the correct thing, or it’s not a FAI.
        
        I’m familiar with the point—and make it myself rather frequently. It does not apply here—due to the aforementioned rejection of the “determined within the dot” premise.
      - Will_Newsome Jan 3, 2012, 12:14 PM
        0 points
        Parent
        
        How likely do you think it is that all humanly-buildable AGIs converge on whatever FAI converges on in less time than it takes for a typical black hole to evaporate? (Eghggh. Time breaks down around singularities (at least from a human perspective) so I can’t phrase this right, but maybe you get my gist.)
    - Vladimir_Nesov Dec 30, 2011, 11:29 AM
      0 points
      Parent
      
      (Just finished updating my reply, hopefully resolving some ambiguities present in its original form.)
- timtyler Dec 30, 2011, 12:54 PM
  0 points
  Parent
  
  
  Would a FAI reward us for helping create it?
  
  Iff we program it to.
  
  Trite but true. [...]
  
  This question is perhaps best reimagined by asking whether this is a universal instrumental value.
  
  Then the answer is, fairly clearly, yes.
[ ]

[deleted]
- ArisKatsaris Dec 30, 2011, 7:53 PM
  3 points
  Parent
  
  So you decided to bring it up yourself, seriously what the hell? Isn’t that like pushing someone over a cliff while saying “Careful, you were getting near the edge”?
Manfred Dec 30, 2011, 8:10 AM
3 points

Nah, I don’t think there’s any percentage in acting like that for the AI. You’ll just have to do it for the not-getting-turned-into -smileyfaces :P
[deleted]Dec 30, 2011, 8:49 AM
2 points

You’re getting into basilisk territory here. It probably makes no difference, at least at this point in time, when nobody has any clue what the most probable superintelligences will look like, and what their exact policies will be on this sort of thing. Although I guess we can assume that a superintelligence will use whatever policy has the highest likelihood of making itself come into existence, which as far as I can tell, is not necessarily the same as whatever policy has the highest likelihood of getting you to contribute to its creation.

Be thankful for that and pre-commit to always refuse extortion, before anyone does figure this out.
Shmi Dec 30, 2011, 8:07 AM
2 points


We expect that post-singularity there will still be limited resources in the form of available computational resources until heat death.

Personally, I expect the FAI to give a baby universe to everyone who wants one, so the question is moot.

If not, I do not expect the FAI to care about past contributions, since its goal would be to maximize something like integral of (fun*population) over time, so the people with the highest fun/resource ratio would be rewarded, most likely those with the lowest IQ, as they would be happy to be injected with the fun drug and kept in suspended animation for as long as possible.
- orthonormal Dec 30, 2011, 6:00 PM
  5 points
  Parent
  
  
  its goal would be to maximize something like integral of (fun*population) over time
  
  That’s not what LW refers to as an FAI, but instead a failed FAI. See posts like this one and this one, and this wiki entry.
  - Shmi Dec 30, 2011, 9:04 PM
    −1 points
    Parent
    
    I mean it in this sense.
    - orthonormal Dec 30, 2011, 9:24 PM
      6 points
      Parent
      
      I would bet US$100 that, if asked, Eliezer would say that
      
      the people with the highest fun/resource ratio would be rewarded, most likely those with the lowest IQ, as they would be happy to be injected with the fun drug and kept in suspended animation for as long as possible.
      
      shows a complete misinterpretation of Fun Theory.
      
      I’m not dismissing the possibility of your scenario, just pointing out that SIAI is explicitly excluding that type of outcome from their definition of “Friendly”.
      - Shmi Dec 30, 2011, 9:59 PM
        −1 points
        Parent
        
        
        I’m not dismissing the possibility of your scenario, just pointing out that SIAI is explicitly excluding that type of outcome from their definition of “Friendly”.
        
        Only under the unlimited resources assumption, which is not the case here.
        Eliezer Yudkowsky Oct 8, 2012, 10:04 AM
        1 point
        Parent
        
        I am explicitly calling that unFriendly given bounded resources.
- wedrifid Dec 30, 2011, 9:37 AM
  3 points
  Parent
  
  
  Personally, I expect the FAI to give a baby universe to everyone who wants one, so the question is moot.
  
  I don’t know what I expect but that is certainly what I want it to do.
James_Miller Dec 30, 2011, 8:21 AM
1 point


If they were allocated unfairly the most like beneficiaries would be those people that helped contribute to the creation of a friendly AI.

No, fairly.
- endoself Dec 30, 2011, 7:13 PM
  0 points
  Parent
  
  I think they meant ‘equally’.
Estarlio Dec 30, 2011, 7:34 PM
0 points

You made a certain prediction about the AI’s likely behaviour, and you either did or didn’t contribute to the AI’s creation based on that. However, whether or not it rewards you wont change that prediction, nor will it change whether you’re the sort of thing that will act in a certain manner based on that prediction.
Morendil Dec 30, 2011, 9:37 AM
0 points


I would guess that if they were allocated unfairly the most like beneficiaries would be those people that helped contribute

Why?