Nora Belrose comments on Evolution is a bad analogy for AGI: inner alignment

Nora Belrose Jan 19, 2024, 2:41 AM
10 points
0
It’s very difficult to get any agent to robustly pursue something like IGF because it’s an inherently sparse and beyond-lifetime goal. Human values have been pre-densified for us: they are precisely the kinds of things it’s easy to get an intelligence to pursue fairly robustly. We get dense, repeated, in-lifetime feedback about stuff like sex, food, love, revenge, and so on. A priori, if you’re an agent built by evolution, you should expect to have values that are easy to learn— it would be surprising if it turned out that evolution did things the hard way. So evolution suggests alignment should be easy.
- Wei Dai Jan 19, 2024, 3:25 AM
  3 points
  0
  Parent
  What if some humans actually value something that’s sparse and beyond-lifetime like IGF? For example, Nick Bostrom seems to value avoiding astronomical waste. How to explain that, if our values only come from “dense, repeated, in-lifetime feedback”?
  
  See also this top-level comment which may be related. If some people value philosophy and following correct philosophical conclusions, that would explain Nick Bostrom, but I’m not sure what “valuing philosophy” is about exactly, or how to align AI to do that. Any thoughts on this?
  - Nora Belrose Jan 19, 2024, 3:46 AM
    7 points
    0
    Parent
    People come to have sparse and beyond-lifetime goals through mechanisms that are unavailable to biological evolution— it took thousands of years of memetic evolution for people to even develop the concept of a long future that we might be able to affect with our short lives. We’re in a much better position to instill long-range goals into AIs, if we choose to do so— we can simply train them to imitate human thought processes which give rise to longterm-oriented behaviors.