skeptical_lurker comments on Will AGI surprise the world?

skeptical_lurker 22 Jun 2014 11:41 UTC
3 points

(Nobody was ever tempted to say, “But as the nuclear chain reaction grows in power, it will necessarily become more moral!”)

Apologies for asking an off-topic question that has certainly been discussed somewhere before, but if advanced decision theories are logically superior, then they are in some sense universal, in that a large subspace of mindspace will adopt them when the minds become intelligent enough (“Three worlds collide” seems to indicate that this is EYs opinion, at least for minds that evolved), then even a paperclip maximiser would assign some nontrivial component of its utility function to match humanity’s, iff we would have done the same in the counterfactual case that FAI came first (I think this does also have to assume that at least one party has a sublinear utility curve).

In this sense, it seems that as entities grow in intelligence, they are at least likely to become more cooperative/moral.

Of course, FAI is vastly preferable to an AI that might be partially cooperative, so I am not trying to diminish the importance of FAI. I’d still like to know whether the consensus opinion is that this is plausible.

Actually I think I know one place has been discussed before—Clippy promised friendliness and someone else promised him a lot of paperclips. But I don’t know of a serious discussion.
- Squark 22 Jun 2014 13:34 UTC
  6 points
  Parent
  Cooperative play (as opposed to morality) strongly depends on the position from which you’re negotiating. For example if the FAI scenario is much less likely (a priori) than a Clippy scenario, then there’s no reason for Clippy to make strong concessions.
  - XiXiDu 22 Jun 2014 14:40 UTC
    4 points
    Parent
    
    For example if the FAI scenario is much less likely (a priori) than a Clippy scenario, then there’s no reason for Clippy to make strong concessions.
    
    But if a “paperclips” maximizer, as opposed to “tables”, “cars”, or “alien sex toys” maximizer, is just one of many unfriendly maximizers, then maximizing “human values” is just one of many unlikely outcomes. In other words, you can’t just say that unfriendly AIs are more likely than friendly AIs when it comes to cooperation. Since the opposition between a paperclip maximizer and an “alien sex toy” maximizer is the same as the opposition between the former and an alien or human friendly AI. Since all of them want to maximize their opposing values. And even if there turns out to be a subset of values shared by some AIs, other groups could cooperate to outweigh their leverage.
    - skeptical_lurker 22 Jun 2014 17:42 UTC
      4 points
      Parent
      But since there is an exponentially huge set of random maximisers, the probability of each individual one is infinitesimal. OTOH, human values have a high probability density in mindspace because people are actually working towards it.
      - Viliam_Bur 22 Jun 2014 21:30 UTC
        2 points
        Parent
        
        human values have a high probability density in mindspace because people are actually working towards it
        
        Depends on how high probability density have humans (and alien life forms so similar to humans that they share our values) in mindspace. Maybe very low. Maybe a society ruled by intelligent ants according to their values would make us very unhappy… and on a cosmic scale, ants are our cousins; alien life should be much more different.
    - Squark 22 Jun 2014 15:56 UTC
      2 points
      Parent
      I don’t understand what point you’re trying to make. My point was that cooperative game theory doesn’t magically guarantee a UFAI will treat us nicely. It might work but only if there is a sufficiently substantial Everett branch with a FAI. The probability of that branch probably strongly depends on effort invested into FAI research.
      - skeptical_lurker 22 Jun 2014 17:39 UTC
        0 points
        Parent
        If the probability of FAI (and friendly uploads etc) is near zero, then we’re doomed either way. But even though I believe the probability of provably friendly AI coming first is <50% , its definitely not 10^-11!
        Squark 22 Jun 2014 17:54 UTC
        0 points
        Parent
        Fair enough, but there’s still an enormous incentive to work on FAI.
        skeptical_lurker 22 Jun 2014 19:15 UTC
        0 points
        Parent
        Of course, I was not trying to suggest otherwise.
  - TheAncientGeek 22 Jun 2014 15:18 UTC
    1 point
    Parent
    But the we might be able achieve AI safety in a relatively easy way by creating networks of interacting agents (including interacting with us)
    - Gunnar_Zarncke 22 Jun 2014 20:23 UTC
      2 points
      Parent
      I think you points out the conclusion of the assumption that
      
      Cooperative play strongly depends on the position from which you’re negotiating.
      
      but if you have multiple AIs then none of them is much stronger than the other.
    - Squark 22 Jun 2014 15:52 UTC
      0 points
      Parent
      Sorry, didn’t follow that. Can you elaborate?
      - TheAncientGeek 22 Jun 2014 16:14 UTC
        −1 points
        Parent
        I mean you don’t have to assume a singleton AI becoming very powerful very quickly. You can assume intelligence and friendliness developing in parallel.[and incrementally]
        MugaSofer 23 Jun 2014 13:46 UTC
        4 points
        Parent
        Hmm.
        
        Are you suggesting (super)intelligence would be a result of direct human programming, like Friendliness presumably would be?
        
        Or that Friendliness would be a result of self-modification, like SIAI is predicted to be ’round these parts?
        TheAncientGeek 23 Jun 2014 14:01 UTC
        0 points
        Parent
        I am talking about SIRI. I mean that human engineers are /will make multiple efforts at simultaneously improving AI and friendliness, and the ecosystem of AIs and AI users are/will select for friendliness that works.
        skeptical_lurker 22 Jun 2014 17:31 UTC
        0 points
        Parent
        Is the idea that the network develops at roughly the same rate, with no single entity undergoing a hard takeoff?
        TheAncientGeek 23 Jun 2014 14:05 UTC
        0 points
        Parent
        Yes.
        Squark 22 Jun 2014 16:46 UTC
        0 points
        Parent
        I what sense I don’t have to assume it? I think singleton AI happens to be a likely scenario and this has little to do with cooperation.
        TheAncientGeek 22 Jun 2014 17:07 UTC
        0 points
        Parent
        The more alternative scenarios there are, the less likelihood iof the MIRI scenario, and the less need for the MIRI solutiion.
        Squark 22 Jun 2014 17:56 UTC
        0 points
        Parent
        I don’t understand what it has to do with cooperative game theory.
  - skeptical_lurker 22 Jun 2014 14:36 UTC
    0 points
    Parent
    Also, cooperation seems to be at least a large component of morality, while some believe morality should be derived entirely from game theory.
    - Squark 22 Jun 2014 16:00 UTC
      3 points
      Parent
      I think this is a confusion. Game theory is only meaningful after you specified the utility functions of the players. If these utility functions don’t already include caring about other agents, the result is not what I’d call “morality”, it is just cooperation between selfish entities. Surely the evolutionary reasons for morality have to do with cooperative game theory, so what? The evolutionary reason for sex is reproduction, it doesn’t mean we shouldn’t be doing sex with condoms. Morality should not be derived from anything except human brains.
      - skeptical_lurker 22 Jun 2014 18:16 UTC
        0 points
        Parent
        I think this disagreement is purely a matter of semantics: ‘morality’ is an umbrella term which is often used to cover several distinct concepts, such as empathy, group allegiance and cooperation. In this case, the AI would be moral according to one dimension of morality, but not the others.
  - skeptical_lurker 22 Jun 2014 14:33 UTC
    0 points
    Parent
    True, but since the universe is quite big, even a share of 10^-11 (many orders of magnitude lower than P(FAI)) would be sufficient for humanity to have a galaxy while Clippy clips the rest of the universe. If the laws of physics permit an arbitrary large amount of whatever compromises utility, than all parties can achieve arbitrary large amounts of utility, providing the utility functions do not actually involve impeding the other parties activities.
    - Squark 22 Jun 2014 16:09 UTC
      0 points
      Parent
      It might be Clippy will let us have our galaxy. However “arbitrary large amounts of utility” sounds completely infeasible since for one thing the utility function has time discount and for another our universe is going to hit heat death (maybe it is escapable by destabilizing the vacuum into a state with non-positive cosmological constant but I’m not at all sure).
      - skeptical_lurker 22 Jun 2014 17:26 UTC
        0 points
        Parent
        Utility functions do not have to have a time discount—in fact, while it might be useful when dealing with inflation, I don’t see why there should be time discounts in general. As far as circumventing the second law of thermodynamics goes, there are several proposed methods, and given that humanity doesn’t have a complete understanding of physics I don’t think we can have a high degree of confidence one way or the other.
        Squark 22 Jun 2014 17:48 UTC
        0 points
        Parent
        
        Utility functions do not have to have a time discount...
        
        Without time discount you run into issues like the procrastination paradox and Boltzmann brains. UDT also runs into trouble since arbitrarily tight bounds on utility become impossible to prove due to Goedel incompleteness. If your utility function is unbounded it gets worse: your expectation values fail to converge (as exemplified by Pascal mugging).
        
        As far as circumventing the second law of thermodynamics goes, there are several proposed methods...
        
        Are there?
        
        ...given that humanity doesn’t have a complete understanding of physics I don’t think we can have a high degree of confidence one way or the other.
        
        Well, we can’t have complete confidence, but I think our understanding is not so bad. We’re missing a theory of heterogeneous nucleation of string theoretic vacua (as far as I know).
        skeptical_lurker 23 Jun 2014 6:45 UTC
        0 points
        Parent
        
        Without time discount you run into issues like the procrastination paradox and Boltzmann brains. UDT also runs into trouble since arbitrarily tight bounds on utility become impossible to prove due to Goedel incompleteness.
        
        Could you provide links? A google search turned up many different things, but I think you mean this procrastination paradox. Is it possible that one’s utility function does not discount, but given uncertainty about the future one should kind of behave as if it does? (e.g. I value life tomorrow exactly as much as I value life today, but maybe we should party hard now because we cannot be absolutely certain that we will survive until tomorrow)
        
        If your utility function is unbounded it gets worse: your expectation values fail to converge (as exemplified by Pascal mugging).
        
        What if I maximise measure. or maximise the probability of attaining an unbounded amount of utility?
        
        WRT circumventing the second law of thermodynamics, there is the idea of creating a basement universe to escape into, some form of hypercomputation that can experience subjective infinite time in a finite amount of real time, and time crystals which apparently is a real thing and not what powers the TARDIS.
        
        I think our understanding is not so bad. We’re missing a theory of heterogeneous nucleation of string theoretic vacua (as far as I know).
        
        AFAIK humanity does not know what the dark matter/ derk energy is that 96% of the universe is made of. This alone seems like a pretty big gap in our understanding, although you seem to know more physics than I do.
        Squark 24 Jun 2014 7:23 UTC
        0 points
        Parent
        
        Could you provide links?
        
        Boltzmann brains were discussed in many places, not sure what the best link would be. The idea is that when the universes reaches thermodynamic equilibrium, after humongous amount of time you get Poincare recurrences: that is, any configuration of matter will randomly appear an infinite number of times. This means there’s an infinite number of “conscious” brains coalescing from randomly floating junk, living for a brief moment and perishing. In the current context this calls for time discount because we don’t want the utility function to be dominated by the well being of those guys. You might argue we can’t influence their well being anyway but you would be wrong. According to UDT, you should behave as if you’re deciding for all agents in the same state. Since you have an infinite number of Boltzmann clones, w/o time discount you should be deciding as if you’re one of them. Which means, extreme short term optimization (since your chances to survive the next t seconds decline very fast with t). I wouldn’t bite this bullet.
        
        UDT is sort-of “cutting edge FAI research”, so there are no very good references. Basically, UDT works by counting formal proofs. If your utility function involves an infinite time span it would be typically impossible to prove arbitrarily tight bounds on it since logical sentences that contain unbounded quantifiers can be undecidable.
        
        ...I think you mean this procrastination paradox.
        
        Yes.
        
        Is it possible that one’s utility function does not discount, but given uncertainty about the future one should kind of behave as if it does?
        
        Well, you can try something like this but for one it doesn’t sound consistent with “all parties can achieve arbitrary large amounts of utility” because the latter requires arbitrarily high confidence about the future and for another I think you need unbounded utility to make it work which opens a different can of worms.
        
        What if I maximise measure. or maximise the probability of attaining an unbounded amount of utility?
        
        I don’t understand what you mean by maximizing measure. Regarding maximizing the probability of attaining an unbounded (actually infinite) amount of utility, well, that would make you a satisficing agent that only cares about the asymptotically far future (since apparently anything happening in a finite time interval only carries finite utility). I don’t think it’s a promising approach, but if you want to pursue it, you can recast it in terms of finite utility (by assigning new utility “1” when old utility is “infinity” and new utility “0” in other cases). Of course, this leaves you with the problems mentioned before.
        
        ...there is the idea of creating a basement universe to escape into...
        
        If I understand you correctly it’s the same as destabilizing the vacuum which I mentioned earlier.
        
        ...some form of hypercomputation that can experience subjective infinite time in a finite amount of real time...
        
        This is a nice fantasy but unfortunately strongly incompatible with what we know about physics. By “strongly” I mean that it would take a very radical update to make it work.
        
        ...and time crystals which apparently is a real thing and not what powers the TARDIS...
        
        To me it looks the journalist is misrepresenting what has actually been achieved. I think that this is a proposal for computing in extremely low temperatures, not for violating the second law of thermodynamics. Indeed the latter would require actual new physics which is not the case here at all.
        
        AFAIK humanity does not know what the dark matter/ dark energy is that 96% of the universe is made of. This alone seems like a pretty big gap in our understanding...
        
        You’re right, of course. There’s a lot we don’t know yet, what I meant is that we already know enough to begin discussing whether heat death is escapable because the answer might turn out to be universal or nearly universal across a very wide range of models.
        skeptical_lurker 27 Jun 2014 18:58 UTC
        0 points
        Parent
        
        Boltzmann brains were discussed in many places, not sure what the best link would be.
        
        Sorry, I should have been more precise—I’ve read about Boltzmann brains, I just didn’t realise the connection to UDT.
        
        In the current context this calls for time discount because we don’t want the utility function to be dominated by the well being of those guys.
        
        This is the bit I don’t understand—if these agents are identical to me, then it follows that I’m probably a Boltzmann brain too, as if I have some knowledge that I am not a Boltzmann brain, this would be a point of difference. In which case, surely I should optimise for the very near future even under old-fashioned causal decision theory. Like you, I wouldn’t bite this bullet.
        
        If your utility function involves an infinite time span it would be typically impossible to prove arbitrarily tight bounds on it since logical sentences that contain unbounded quantifiers can be undecidable.
        
        I didn’t know that—I’ve studied formal logic, but not to that depth unfortunately.
        
        I don’t understand what you mean by maximizing measure.
        
        I was meaning in the sense of measure theory. I’ve seen people discussing maximising the measure of a utility function over all future Everett branches, although from my limited understanding of quantum mechanics I’m unsure whether this makes sense.
        
        I don’t think it’s a promising approach, but if you want to pursue it, you can recast it in terms of finite utility (by assigning new utility “1” when old utility is “infinity” and new utility “0” in other cases).
        
        Yeah, I doubt this would be a good approach either, in that if it does turn out to be impossible to achieve unboundedly large utility I would still want to make the best of a bad situation and maximise the utility achievable by the finite amount of negentropy available. I imagine a better approach would be to add the satisfying function to the time-discounting function, scaled in some suitable manner. This doesn’t intuitively strike me as a real utility function, as its adding apples and oranges so to speak, but perhaps useful as a tool?
        
        If I understand you correctly it’s the same as destabilizing the vacuum which I mentioned earlier.
        
        Well, I’m approaching the limit of my understanding of physics here, but actually I was talking about alpha-point computation which I think may involve the creation of daughter universes inside black holes.
        
        This is a nice fantasy but unfortunately strongly incompatible with what we know about physics. By “strongly” I mean that it would take a very radical update to make it work.
        
        It does seem incompatible with e.g. the plank time, I just don’t know enough to dismiss it with a very high level of confidence, although I’m updating wrt your reply.
        
        Your reply has been very interesting, but I must admit I’m starting to get seriously point out that I’m starting to get out of my depth here, in physics and formal logic.
        Squark 1 Jul 2014 18:57 UTC
        2 points
        Parent
        
        This is the bit I don’t understand—if these agents are identical to me, then it follows that I’m probably a Boltzmann brain too...
        
        In UDT you shouldn’t consider yourself to be just one of your clones. There is no probability measure on the set of your clones: you are all of them simultaneously. CDT is difficult to apply to situations with clones, unless you supplement it by some anthropic hypothesis like SIA or SSA. If you use an anthropic hypothesis, Boltzman brains will still get you in trouble. In fact, some cosmologists are trying to find models w/o Boltzman brains precise to avoid the conclusion that you are likely to be a Boltzman brain (although UDT shows the effort is misguided). The problem with UDT and Goedel incompleteness is a separate issue which has no relation to Boltzman brains.
        
        I was meaning in the sense of measure theory. I’ve seen people discussing maximising the measure of a utility function over all future Everett branches...
        
        I’m not sure what you mean here. Sets have measure, not functions.
        
        I imagine a better approach would be to add the satisfying function to the time-discounting function, scaled in some suitable manner. This doesn’t intuitively strike me as a real utility function, as its adding apples and oranges so to speak, but perhaps useful as a tool?
        
        Well, you still got all of the abovementioned problems except divergence.
        
        ...actually I was talking about alpha-point computation which I think may involve the creation of daughter universes inside black holes.
        
        Hmm, baby universes are a possibility to consider. I thought the case for them is rather weak but a quick search revealed this. Regarding performing an infinite number of computations I’m pretty sure it doesn’t work.
        skeptical_lurker 4 Jul 2014 18:18 UTC
        0 points
        Parent
        
        CDT is difficult to apply to situations with clones, unless you supplement it by some anthropic hypothesis like SIA or SSA.
        
        While I can see why there intuitive cause to abandon the “I am person #2, therefore there are probably not 100 people” reasoning, abandoning “There are 100 clones, therefore I’m probably not clone #1″ seems to be simply abandoning probability theory altogether, and I’m certainly not willing to bite that bullet.
        
        Actually, looking back through the conversation, I’m also confused as to how time discounting helps in the case that one is acting like a Boltzmann brain—someone who knows they are a B-brain would discount quickly anyway due to short lifespan, wouldn’t extra time discounting make the situation worse? Specifically, if there are X B-brains for each ‘real’ brain, then if the real brain can survive more than X times as long as a B-brain, and doesn’t time discount, then the ‘real’ brain utility still is dominant.
        
        I’m not sure what you mean here. Sets have measure, not functions.
        
        I wasn’t being very precise with my wording—I meant that one would maximise the measure of whatever it is one values.
        
        Hmm, baby universes are a possibility to consider. I thought the case for them is rather weak but a quick search revealed this. Regarding performing an infinite number of computations I’m pretty sure it doesn’t work.
        
        Well, I have only a layman’s understanding of string theory, but if it were possible to ‘escape’ into a baby universe by creating a clone inside the universe, then the process can be repeated, leading to an uncountably infinite (!) tree of universes.
        Expand this thread
        Squark 8 Jul 2014 19:00 UTC
        0 points
        Parent
        
        While I can see why there intuitive cause to abandon the “I am person #2, therefore there are probably not 100 people” reasoning, abandoning “There are 100 clones, therefore I’m probably not clone #1″ seems to be simply abandoning probability theory altogether, and I’m certainly not willing to bite that bullet.
        
        I’m not entirely sure what you’re saying here. UDT suggests that subjective probabilities are meaningless (thus taking the third horn of the anthropic trilemma although it can be argued that selfish utility functions are still possible). “What is the probability I am clone #n” is not a meaningful question. “What is the (updated/posteriori) probability I am in a universe with property P” is not a meaningful question in general but has approximate meaning in contexts where anthropic considerations are irrelevant. “What is the a priori probability the universe has property P” is a question that might be meaningful but is probably also approximate since there is a freedom of redefining the prior and the utility function simultaneously (see this). The single fully meaningful type of question is “what is the expected utility I should assign to action A?” which is OK since it is the only question you have to answer in practice.
        
        Actually, looking back through the conversation, I’m also confused as to how time discounting helps in the case that one is acting like a Boltzmann brain—someone who knows they are a B-brain would discount quickly anyway due to short lifespan, wouldn’t extra time discounting make the situation worse?
        
        Boltzmann brains exist very far in the future wrt “normal” brains, therefore their contribution to utility is very small. The discount depends on absolute time.
        
        I wasn’t being very precise with my wording—I meant that one would maximise the measure of whatever it is one values.
        
        If “measure” here equals “probability wrt prior” (e.g. Solomonoff prior) then this is just another way to define a satisficing agent (utility equals either 0 or 1).
        
        Well, I have only a layman’s understanding of string theory, but if it were possible to ‘escape’ into a baby universe by creating a clone inside the universe, then the process can be repeated, leading to an uncountably infinite (!) tree of universes.
        
        Good point. Surely we need to understand these baby universes better.
        FeepingCreature 30 Jun 2014 0:45 UTC
        2 points
        Parent
        
        In the current context this calls for time discount because we don’t want the utility function to be dominated by the well being of those guys.
        
        This is the bit I don’t understand—if these agents are identical to me, then it follows that I’m probably a Boltzmann brain too, as if I have some knowledge that I am not a Boltzmann brain, this would be a point of difference. In which case, surely I should optimise for the very near future even under old-fashioned causal decision theory. Like you, I wouldn’t bite this bullet.
        
        I think Boltzmann brains in the classical formulation of random manifestation in vacuum are a non-issue, as neither can they benefit from our reasoning (being random, while reason assumes a predictable universe) nor from our utility maximization efforts (since maximizing our short-term utility will make it no more or less likely that a Boltzmann brain with the increased utility manifests).
- cousin_it 23 Jun 2014 16:36 UTC
  5 points
  Parent
  Just a historical note, I think Rolf Nelson was the earliest person to come up with that idea, back in 2007. Though it was phrased in terms of simulation warfare rather than acausal bargaining at first.