paulfchristiano comments on Asymptotically Unambitious AGI

paulfchristiano 8 Mar 2019 17:40 UTC
LW: 6 AF: 3
AF
The theorem is consistent with the aliens causing trouble any finite number of times. But each time they cause the agent to do something weird their model loses some probability, so there will be some episode after which they stop causing trouble (if we manage to successfully run enough episodes without in fact having anything bad happen in the meantime, which is an assumption of the asymptotic arguments).
- Wei Dai 9 Mar 2019 2:40 UTC
  LW: 4 AF: 2
  AF Parent
  Thanks. Is there a way to derive a concrete bound on how long it will take for BoMAI to become “benign”, e.g., is it exponential or something more reasonable? (Although if even a single “malign” episode could lead to disaster, this may be only of academic interest.) Also, to comment on this section of the paper:
  
  “We can only offer informal claims regarding what happens before BoMAI is definitely benign. One intuition is that eventual benignity with probability 1 doesn’t happen by accident: it suggests that for the entire lifetime of the agent, everything is conspiring to make the agent benign.”
  
  If BoMAI can be effectively controlled by alien superintelligences before it becomes “benign” that would suggest “everything is conspiring to make the agent benign” is misleading as far as reasoning about what BoMAI might do in the mean time.
  
  (if we manage to successfully run enough episodes without in fact having anything bad happen in the meantime, which is an assumption of the asymptotic arguments)
  
  Is this noted somewhere in the paper, or just implicit in the arguments? I guess what we actually need is either a guarantee that all episodes are “benign” or a bound on utility loss that we can incur through such a scheme. (I do appreciate that “in the absence of any other algorithms for general intelligence which have been proven asymptotically benign, let alone benign for their entire lifetimes, BoMAI represents meaningful theoretical progress toward designing the latter.”)
  - michaelcohen 9 Mar 2019 5:14 UTC
    LW: 1 AF: 1
    AF Parent
    Is there a way to derive a concrete bound on how long it will take for BoMAI to become “benign”, e.g., is it exponential or something more reasonable?
    The closest thing to a discussion of this so far is Appendix E, but I have not yet thought through this very carefully. When you ask if it is exponential, what exactly are you asking if it is exponential in?
    - Wei Dai 10 Mar 2019 12:48 UTC
      LW: 2 AF: 1
      AF Parent
      
      When you ask if it is exponential, what exactly are you asking if it is exponential in?
      
      I guess I was asking if it’s exponential in anything that would make BoMAI impractically slow to become “benign”, so basically just using “exponential” as a shorthand for “impractically large”.
  - michaelcohen 9 Mar 2019 5:10 UTC
    LW: 1 AF: 1
    AF Parent
    Is this noted somewhere in the paper
    I don’t think it is, thank you for pointing this out.
  - michaelcohen 9 Mar 2019 5:10 UTC
    LW: 1 AF: 1
    AF Parent
    If BoMAI can be effectively controlled by alien superintelligences before it becomes “benign” that would suggest “everything is conspiring to make the agent benign” is misleading as far as reasoning about what BoMAI might do in the mean time.
    Agreed that would be misleading, but I don’t think it would be controlled by alien superintelligences.