[deleted] comments on Holden’s Objection 1: Friendliness is dangerous

[deleted]May 26, 2012, 5:47 PM
1 point

Why do you think this is “very likely”?

It just seems likely, based on my understanding of what people like and approve of.

Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.

religion, bigots, conservatism.

Why do you think it very likely these people’s CEV will contradict their current values and beliefs? Please consider that:

These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. “If we knew more, thought faster, grew closer together, etc”.

We emphatically don’t know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV’s task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.

That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.

In these examples, you expect other people’s extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity’s CEV? Can you think of probable examples?

I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I’m too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.

Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).

Likely candidates? That’s like asking “which of your beliefs are false”. All I can say is which are most uncertain. I can’t say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don’t yet exist.

But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it’s safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won’t.

Not quite. Let’s imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed

I doubt any one human’s values are reflectively consistent. At the very least, every human’s values contradict one another in the sense that they compete among themselves for the human’s resources, and the human in different moods and at different points in time prefers to spend on different values.

Sorry, I made that too specific. I didn’t mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.

a non-CEV process whcih more directly relies on my and other people’s non-extrapolated preferences.

This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I’d want the CEV philosophising to be done eventually (ASAP, actually).
- DanArmak May 27, 2012, 3:24 PM
  0 points
  Parent
  
  Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do.
  
  Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it’s right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
  
  I think CEV will end up more like transhumanism than like islam.
  
  Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace “if they were smarter etc” looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn’t matter) which converge on other value-sets.
  
  In the end, as the Confessor said, “you have judged: what else is there?” I have judged, and where I am certain enough about my judgement I would rather that other people’s CEV not override me.
  
  Other than that I agree with you about using a non-CEV seed etc. I just don’t think we should later let CEV decide anything it likes without the seed explicitly constraining it.
  - wedrifid May 27, 2012, 4:20 PM
    0 points
    Parent
    
    Right according to whose values?
    
    CEV’s. Where by an unqualified “CEV” I take nyan to be referring to CEV (“the Coherently Extrapolated Values of Humanity”). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like “all properly behaving people of my tribe would agree and if they don’t we may need to beat them with sticks until they do.”
    
    The problem is precisely that people disagree pre-extrapolation about [when it’s right to interfere], and therefore we fear their individual volitions will disagree even post extrapolation.
    
    And the bracketed condition is generalisable to all sorts of things—including those preferences that we haven’t even considered the possibility of significant disagreement about. Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
    
    I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.
    
    I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the ‘right’ behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That’s pretty bad but certainly not the worst thing that could happen. It is fairly trivially not “right” to not let that happen if you can easily stop it.
    
    Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn’t allow us to say that non-intervention in a given extreme circumstance is ‘right’.
    - DanArmak May 27, 2012, 5:15 PM
      0 points
      Parent
      
      Partially replacing one’s own preferences with preferences that are not one’s own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly ‘right’.
      
      That’s exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can’t undo it later.
      
      I note that any assertion that “intervention is strictly not the wrong thing to do” that is not qualified [...]
      
      I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
      
      Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
      
      The Pacifier
      
      Gah. I had to read through many paragraphs of drivel plot and then all I came up with was “a device that zaps people, making them into babies, but that is reversible”. You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.
      - wedrifid May 27, 2012, 7:53 PM
        2 points
        Parent
        
        I’m certainly not advocating absolute unqualified non-intervention. I wrote “a value of noninterference in certain matters”. Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.
        
        Nonintervention doesn’t just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.
        
        I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.
        
        Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.
        DanArmak May 27, 2012, 9:33 PM
        0 points
        Parent
        That’s a good point. The AI’s ability to not interfere is constrained by its need to monitor everything that’s going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn’t there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.
        
        And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
        
        It does appear that universal surveillance is the cost of universally binding promises (you won’t be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.
        
        I’d like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required—like making sure no-one is torturing simulated people or raising their baby wrong. If there’s no generally acceptable FAI behavior that doesn’t include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.
        wedrifid May 27, 2012, 10:29 PM
        0 points
        Parent
        
        And while a technologically advanced AI might monitor using tools we humans couldn’t even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that’s enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.
        
        It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are ‘safe’, using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking—and in particular a great big class of what it knows you are not thinking—but not necessary detailed knowledge of what you are thinking specifically.
        
        For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.
        DanArmak May 28, 2012, 8:19 AM
        0 points
        Parent
        Prediction has some disadvantages compared to constant observation:
        
        Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
        Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone’s state to correct the simulations.
        Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.
        
        OTOH, surveillance might be much cheaper (I don’t know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.