David Scott Krueger (formerly: capybaralet) comments on Coherence arguments do not entail goal-directed behavior

David Scott Krueger (formerly: capybaralet) 17 Aug 2019 3:59 UTC
LW: 1 AF: 1
0
AF
we haven’t seen any examples of them trying to e.g. kill other processes on your computer so they can have more computational resources and play a better game.
It’s a good point, but… we won’t see examples like this if the algorithms that produce this kind of behavior take longer to produce the behavior than the amount of time we’ve let them run.
I think there are good reasons to view the effective horizon of different agents as part of their utility function. Then I think a lot of the risk we incur is because humans act as if we have short effective horizons. But I don’t think we *actually* do have such short horizons. In other words, our revealed preferences are more myopic than our considered preferences.
Now, one can say that this actually means we don’t care that much about the long-term future, but I don’t agree with that conclusion; I think we *do* care (at least, I do), but aren’t very good at acting as if we(/I) do.
Anyways, if you buy this like of argument about effective horizons, then you should be worried that we will easily be outcompeted by some process/entity that behaves as if it has a much longer effective horizon, so long as it also finds a way to make a “positive-sum” trade with us (e.g. “I take everything after 2200 A.D., and in the meanwhile, I give you whatever you want”).
===========================
I view the chess-playing algorithm as either *not* fully goal directed, or somehow fundamentally limited in its understanding of the world, or level of rationality. Intuitively, it seems easy to make agents that are ignorant or indifferent(/”irrational”) in such a way that they will only seek to optimize things within the ontology we’ve provided (in this case, of the chess game), instead of outside (i.e. seizing additional compute). However, our understanding of such things doesn’t seem mature.… at least I’m not satisfied with my current understanding. I think Stuart Armstrong and Tom Everrit are the main people who’ve done work in this area, and their work on this stuff seems quite under appreciated.
- TurnTrout 17 Aug 2019 4:28 UTC
  LW: 4 AF: 2
  0
  AF Parent
  
  Intuitively, it seems easy to make agents that are ignorant or indifferent(/”irrational”) in such a way that they will only seek to optimize things within the ontology we’ve provided (in this case, of the chess game), instead of outside (i.e. seizing additional compute)
  
  It isn’t obvious to me that specifying the ontology is significantly easier than specifying the right objective. I have an intuition that ontological approaches are doomed. As a simple case, I’m not aware of any fundamental progress on building something that actually maximizes the number of diamonds in the physical universe, nor do I think that such a thing has a natural, simple description.
  What links here?
  - TurnTrout's comment on What You See Isn’t Always What You Want by TurnTrout (13 Sep 2019 17:33 UTC; 3 points)
  - John_Maxwell 17 Aug 2019 9:35 UTC
    LW: 2 AF: 1
    0
    AF Parent
    Diamond maximization seems pretty different from winning at chess. In the chess case, we’ve essentially hardcoded a particular ontology related to a particular imaginary universe, the chess universe. This isn’t a feasible approach for the diamond problem.
    
    In any case, the reason this discussion is relevant, from my perspective, is because it’s related to the question of whether you could have a system which constructs its own superintelligent understanding of the world (e.g. using self-supervised learning), and engages in self-improvement (using some process analogous to e.g. neural architecture search) without being goal-directed. If so, you could presumably pinpoint human values/corrigibility/etc. in the model of the world that was created (using labeled data, active learning, etc.) and use that as an agent’s reward function. (Or just use the self-supervised learning system as a tool to help with FAI research/make a pivotal act/etc.)
    
    It feels to me as though the thing I described in the previous paragraph is amenable to the same general kind of ontological whitelisting approach that we use for chess AIs. (To put it another way, I suspect most insights about meta-learning can be encoded without referring to a lot of object level content about the particular universe you find yourself building a model of.) I do think there are some safety issues with the approach I described, but they seem fairly possible to overcome.
  - David Scott Krueger (formerly: capybaralet) 17 Aug 2019 5:16 UTC
    LW: 1 AF: 1
    0
    AF Parent
    I strongly agree.
    I should’ve been more clear.
    I think this is a situation where our intuition is likely wrong.
    This sort of thing is why I say “I’m not satisfied with my current understanding”.
- John_Maxwell 17 Aug 2019 9:11 UTC
  LW: 2 AF: 1
  0
  AF Parent
  
  we won’t see examples like this if the algorithms that produce this kind of behavior take longer to produce the behavior than the amount of time we’ve let them run.
  
  Are you suggesting that Deep Blue would behave in this way if we gave it enough time to run? If so, can you explain the mechanism by which this would occur?
  
  I think Stuart Armstrong and Tom Everrit are the main people who’ve done work in this area, and their work on this stuff seems quite under appreciated.
  
  Can you share links?
  - David Scott Krueger (formerly: capybaralet) 19 Aug 2019 4:58 UTC
    LW: 3 AF: 2
    0
    AF Parent
    I don’t know how deep blue worked. My impression was that it doesn’t use learning, so the answer would be no.
    A starting point for Tom and Stuart’s works: https://scholar.google.com/scholar?rlz=1C1CHBF_enCA818CA819&um=1&ie=UTF-8&lr&cites=1927115341710450492
    - David Scott Krueger (formerly: capybaralet) 21 Aug 2019 2:48 UTC
      1 point
      0
      Parent
      BoMAI is in this vein, as well ( https://arxiv.org/pdf/1905.12186.pdf )