Liron comments on Inner Alignment: Explain like I’m 12 Edition

Liron 8 Aug 2020 23:22 UTC
12 points
Thanks for the ELI12, much appreciated.

evolution’s objective of “maximize inclusive genetic fitness” is quite simple, but it is still not represented explicitly because figuring out how actions affect the objective is computationally hard

This doesn’t seem like the bottleneck in many situations in practice. For example, a lot of young men feel like they want to have as much sex as possible, but not father as many kids as possible. I’m not sure exactly what the reason is, but I don’t think it’s the computational difficulty of representing having kids vs. having sex, because humans already build a world model containing the concept of “my kids”.

It seems to me that one under-appreciated aspect of Inner Alignment is that, even if one had the one-true-utility-function-that-is-all-you-need-to-program-into-AI, this would not, in fact, solve the alignment problem, nor even the intent-alignment part. It would merely solve outer alignment (provided the utility function can be formalized).

Damn, yep I for one under-appreciated this for the past 12 years.

What else have people said on this subject? Do folks think that scenarios where we solve outer alignment most likely involve us not having to struggle much with inner alignment? Because fully solving outer alignment implies a lot of deep progress in alignment.
- Rafael Harth 9 Aug 2020 9:14 UTC
  16 points
  Parent
  This doesn’t seem like the bottleneck in many situations in practice. For example, a lot of young men feel like they want to have as much sex as possible, but not father as many kids as possible. I’m not sure exactly what the reason is, but I don’t think it’s the computational difficulty of representing having kids vs. having sex, because humans already build a world model containing the concept of “my kids”.
  In this case, I would speculate that the kids objective wouldn’t work that well because the reward is substantially delayed. The sex happens immediately, the kids only after 9 months. Humans tend to discount their future.
  Also, how exactly would the kids objective even be implemented?
  What else have people said on this subject?
  I believe that Miri was aware of this problem for a long time, but that it didn’t have the nice, comparatively non-confused and precise handle of “Inner Alignment” until Evan published the ‘risks from learned optimizations’ paper. But I’m not the right person to say anything else about this.
  Do folks think that scenarios where we solve outer alignment most likely involve us not having to struggle much with inner alignment? Because fully solving outer alignment implies a lot of deep progress in alignment.
  Probably not. I think Inner alignment is, if anything, probably the harder problem. It strikes me as reasonably plausible that Debate is a proposal which solves outer alignment, but as very unlikely that it automatically solves Inner Alignment.
  - Liron 9 Aug 2020 11:36 UTC
    6 points
    Parent
    Hm ya I guess the causality between sex and babies (even sex and visible pregnancy) is so far away in time that it’s tough to make a brain want to “make babies”.
    
    But I don’t think computationally intractability of how actions effect inclusive genetic fitness is quite why evolution made such crude heuristics. Because if a brain understood that it was trying to maximize that quantity, I think it could figure out “have a lot of sex” as a heuristic approach without evolution hard-coding it in. And I think humans actually do have some level of in-brain goals to have more descendants beyond just having more sex. So I think these things like sex pleasure are just performance optimizations to a mentally tractable challenge.
    
    E.g. snakes quickly triggering a fear reflex