Rafael Harth comments on Inner Alignment: Explain like I’m 12 Edition

Rafael Harth 9 Aug 2020 9:14 UTC
16 points
This doesn’t seem like the bottleneck in many situations in practice. For example, a lot of young men feel like they want to have as much sex as possible, but not father as many kids as possible. I’m not sure exactly what the reason is, but I don’t think it’s the computational difficulty of representing having kids vs. having sex, because humans already build a world model containing the concept of “my kids”.
In this case, I would speculate that the kids objective wouldn’t work that well because the reward is substantially delayed. The sex happens immediately, the kids only after 9 months. Humans tend to discount their future.
Also, how exactly would the kids objective even be implemented?
What else have people said on this subject?
I believe that Miri was aware of this problem for a long time, but that it didn’t have the nice, comparatively non-confused and precise handle of “Inner Alignment” until Evan published the ‘risks from learned optimizations’ paper. But I’m not the right person to say anything else about this.
Do folks think that scenarios where we solve outer alignment most likely involve us not having to struggle much with inner alignment? Because fully solving outer alignment implies a lot of deep progress in alignment.
Probably not. I think Inner alignment is, if anything, probably the harder problem. It strikes me as reasonably plausible that Debate is a proposal which solves outer alignment, but as very unlikely that it automatically solves Inner Alignment.
- Liron 9 Aug 2020 11:36 UTC
  6 points
  Parent
  Hm ya I guess the causality between sex and babies (even sex and visible pregnancy) is so far away in time that it’s tough to make a brain want to “make babies”.
  
  But I don’t think computationally intractability of how actions effect inclusive genetic fitness is quite why evolution made such crude heuristics. Because if a brain understood that it was trying to maximize that quantity, I think it could figure out “have a lot of sex” as a heuristic approach without evolution hard-coding it in. And I think humans actually do have some level of in-brain goals to have more descendants beyond just having more sex. So I think these things like sex pleasure are just performance optimizations to a mentally tractable challenge.
  
  E.g. snakes quickly triggering a fear reflex