Giles comments on Friendly AI Society

Giles Mar 7, 2012, 11:36 PM
0 points
The “making your explicit reasoning trustworthy” link is broken (I’m not sure that relative URLs are reliable here).

I like the analogy between the human visual system and file download/lossy compression.

It uses bounded rationality, not just because that’s what we evolved, but because heuristics, probabilistic logic and rational ignorance have a higher marginal cost efficiency (the improvements in decision making don’t produce a sufficient gain to outweigh the cost of the extra thinking).

I’m not sure about this. I think we use bounded rationality because that’s the only kind that can physically exist in the universe. You seem to be making the stronger statement that we’re near-optimal in terms of rationality—does this mean that Less Wrong can’t work?

A more sensible value set for it to have is that it just likes paperclips and want lots and lots of them to exist

OK… I see lots of inferential distance to cover here.

I don’t think that anyone thinks a paperclip maximiser as such is likely. It’s simply an arbitrary point taken out of the “recursive optimisation process” subset of mind design space. It’s chosen to give an idea of how alien and dangerous minds can be and still be physically plausible, not as a typical example of the minds we think will actually appear.

That aside, there’s no particular reason to expect that a typical utility maximiser will have a “sensible” utility function. Its utility function might have some sensible features if programmed in explicitly by a human, but if it was programmed by an uncontrolled AI… forget it. You don’t know how much the AI will have jumped around value space before deciding to self-modify into something with a stable goal.

No obvious mistakes in the “conditional stability” section, although it’s not entirely obvious that these conditions would come about (even if carefully engineered, e.g. the suggested daimonid plan).

It’s also not obvious that in such a stable society there would still be any humans.

In the long term, once free of the Earth or after the discovery of self-replicating nanotechnology, when an AI could untraceably create computing resources outside the view of other AIs, all bets are off.

This might be a problem if “the long term” turns out to be on the order of weeks or less.

we might have some still slightly recognisably human representatives fit to sit at the decision table and, just perhaps...

I just worry that this kind of plan involves throwing away most of our bargaining power. In this pre-AI world, it’s the human values that have all the bargaining power and we should take full advantage of that.

Still, I want to see more posts like this! Generating good ideas is really hard, and this really does look like an honest effort to do so.
- Douglas_Reay Mar 8, 2012, 1:50 AM
  0 points
  Parent
  
  Still, I want to see more posts like this! Generating good ideas is really hard, and this really does look like an honest effort to do so.
  
  Thank you.
  
  Maybe there should be a tag that means “the ideas in this post resulted from a meetup discussion, and are not endorsed as being necessarily good ideas, but rather have been posted to keep track of the quality of ideas being produced by the meetup’s current discussion method, so feel free to skip it”.
  
  Many brainstorming techniques have a stage during which criticism is withheld, to avoid people self-censoring out of fear ideas that were good (or which might spark good ideas in others).
  
  But maybe LessWrong is not the right place for a meetup to keep such a record of their discussions? Where might be a better place?
  - Jayson_Virissimo Mar 8, 2012, 2:09 PM
    0 points
    Parent
    
    Many brainstorming techniques have a stage during which criticism is withheld, to avoid people self-censoring out of fear ideas that were good (or which might spark good ideas in others).
    
    This probably doesn’t work.
    - Douglas_Reay Mar 8, 2012, 3:26 PM
      0 points
      Parent
      Interesting study. Does that apply only to techniques that have no later ‘criticism’ stage, or does it apply to all techniques that have at least one ‘no ciriticism’ stage?
      
      Having a poke at Google Scholar gives a mixed response:
      
      this meta analysis says that, in general, most brainstorming techniques work poorly.
      
      this paper suggests it can work, however, if done electronically in a certain way.
- Douglas_Reay Mar 8, 2012, 1:31 AM
  0 points
  Parent
  
  It’s also not obvious that in such a stable society there would still be any humans.
  
  In the long term, once free of the Earth or after the discovery of self-replicating nanotechnology, when an AI could untraceably create computing resources outside the view of other AIs, all bets are off.
  
  This might be a problem if “the long term” turns out to be on the order of weeks or less.
  
  we might have some still slightly recognisably human representatives fit to sit at the decision table and, just perhaps...
  
  I just worry that this kind of plan involves throwing away most of our bargaining power. In this pre-AI world, it’s the human values that have all the bargaining power and we should take full advantage of that.
  
  I look upon the question of whether we should take full advantage of that from two perspectives.
  
  From one perspective it is a “damned if you do, and damned if you don’t” situation.
  
  If you don’t take full advantage, then it would feel like throwing away survival chance for no good reason. (Although, have you considered why your loyalty is to humanity rather than to sentience? Isn’t that a bit like a nationalist whose loyalty is to their country, right or wrong—maybe it is just your selfish genes talking?)
  
  If you do take full advantage, while we need to bear in mind that gratitude (and resentment) are perhaps human emotions that AIs won’t share, it might leave you in rather a sticky situation if taking even full advantage turns out to be insufficient and the resulting AIs then have solids grounds to consider you a threat worth elliminating. Human history is full of examples of how humans have felt about their previous controllers after managing to escape them and, while we’ve no reason to believe the AIs will share that attitude, we’ve also no reason to believe they won’t share it.
  
  The second perspective to look at the whole situation from is that of a parent.
  
  If you think of AIs as being the offspring species of humanity, we have a duty to teach and guide them to the best of our ability. But there’s a distinction between that, and trying to indoctrinate a child with electric shocks into unswervingly believing “thou shalt honour thy father and thy mother”. Sometimes rasing a child well so that they reach their full potential means they become more powerful than you and become capable of destroying you. That’s one of the risks of parenthood.
- Douglas_Reay Mar 8, 2012, 1:13 AM
  0 points
  Parent
  
  A more sensible value set for it to have is that it just likes paperclips and want lots and lots of them to exist
  
  OK… I see lots of inferential distance to cover here.
  
  I don’t think that anyone thinks a paperclip maximiser as such is likely. It’s simply an arbitrary point taken out of the “recursive optimisation process” subset of mind design space. It’s chosen to give an idea of how alien and dangerous minds can be and still be physically plausible, not as a typical example of the minds we think will actually appear.
  
  That aside, there’s no particular reason to expect that a typical utility maximiser will have a “sensible” utility function. Its utility function might have some sensible features if programmed in explicitly by a human, but if it was programmed by an uncontrolled AI… forget it. You don’t know how much the AI will have jumped around value space before deciding to self-modify into something with a stable goal.
  
  Oh indeed. And it is always good to try to avoid making anthropocentric assumptions.
  
  But, in this case, we’re looking at not just a single AI, but at the aims of a group of AIs. Specifically, the first few AIs to escape or be released onto the internet, other than the seeded core. And it would seem likely, especially in the case of AIs created deliberately and then deliberately released, that their initial value set will have some intentionality behind it, rather than resulting from a random corruption of a file.
  
  So yes, to be stable a society of AIs would need to be able to cope with one or two new AIs entering the scene whose values are either irrational or, worse, deliberately tailored to be antithetical (such as one whose ‘paperclips’ are pain and destruction for all Zoroastrians—an end achievable by blowing up the planet.)
  
  But I don’t think, just because such a society could not cope with all the new AIs (or even a majority of them) having such values, that it invalidates the idea.
- Douglas_Reay Mar 8, 2012, 12:57 AM
  0 points
  Parent
  
  It uses bounded rationality, not just because that’s what we evolved, but because heuristics, probabilistic logic and rational ignorance have a higher marginal cost efficiency (the improvements in decision making don’t produce a sufficient gain to outweigh the cost of the extra thinking).
  
  I’m not sure about this. I think we use bounded rationality because that’s the only kind that can physically exist in the universe. You seem to be making the stronger statement that we’re near-optimal in terms of rationality—does this mean that Less Wrong can’t work?
  
  Thank you for the feedback. Most appreciated. I’ve corrected the links you mentioned.
  
  Perhaps a clearer example of what I mean with respect to bounded rationality would be in computing where, when faced with a choice between two algorithms, the first of which is provably correct and never fails, and the second of which can fail sometimes but rarely, the optimal decision is to pick the latter. An example of this is UUIDs—they can theoretically collide but, in practice, are very very unlikely to do so.
  
  My point is that we shouldn’t assume AIs will even try to be as logical as possible. They may, rather, try only to be as logical as is optimal for achieving their purposes.
  
  I don’t intend to claim that humans are near optimal. I don’t know. I have insufficient information. It seems likely to me that what we were able to biologically achieve so far is the stronger limit. I merely meant that, even were that limitation removed (by, for example, brains becoming uploadable), additional limits also exist.