Jon Garcia comments on What does it mean for an AGI to be ‘safe’?

Jon Garcia 7 Oct 2022 6:52 UTC
7 points
3
Wouldn’t that require solving alignment in itself, though? If you can simulate virtual humans, complete with human personalities, human cognition, and human values, then you’ve already figured out how to plug human values straight into a virtual agent.

If you mean that the AGI is trained on human behavior to the point where it’s figured out human values through IRL/predictive coding/etc. and is acting on them, then that’s also basically just solving alignment.

However, if you’re suggesting brain uploads, I highly doubt that such technology would be available before AGI is developed.

All that is to say that, while an AGI that is basically just virtual humans would probably be great, it’s not a prospect we can depend on in lieu of alignment research. Such a result could only come about through actually doing all the hard work of alignment research first.
- jacob_cannell 7 Oct 2022 7:03 UTC
  1 point
  1
  Parent
  Wouldn’t that require solving alignment in itself, though?
  
  Yes, but only to the same extent that evolution did. Evolution approximately solved alignment on two levels: aligning the brain with the evolutionary goal of inclusive fitness^[1], and aligning individual brains (as disposable somas) with other brains (shared kin genes) via altruism (the latter is the thing we want to emulate).
  ↩︎
  Massively successful, population of 10B vs a few M for all other great apes. It’s fashionable to say evolution failed at alignment: this is just stupidly wrong, humans are an enormous success from the perspective of inclusive fitness.
  - Jon Garcia 7 Oct 2022 7:31 UTC
    1 point
    0
    Parent
    Do you propose using evolutionary simulations to discover other-agent-aligned agents? I doubt we have the same luxury of (simulated) time that evolution had in creating humans. It didn’t have to compete against an intelligent designer; alignment researchers do (i.e., the broader AI community).
    
    I agree that humans are highly successful (though far from optimal) at both inclusive genetic fitness and alignment with fellow sapients. However, the challenge for us now is to parse the system that resulted from this messy evolutionary process, to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
    - jacob_cannell 7 Oct 2022 15:35 UTC
      5 points
      3
      Parent
      
      Do you propose using evolutionary simulations to discover other-agent-aligned agents?
      
      Nah. The wright brothers didn’t need to run evo sims to reverse engineer flight. They just observed how birds bank to turn, how that relied on wing warping, and said—cool, we can do that too! Deep learning didn’t succeed through brute force evo sims either (even though Karl Sim’s evo sims work is pretty cool, it turns out that loose reverse engineering is just enormously faster).
      
      However, the challenge for us now is … to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
      
      Sounds about right. Fortunately we may not need to model human values at all in order to build general altruistic agents: it probably suffices that the AI optimizes for human empowerment (our ability to fulfill any long term future goals, rather than any specific values), which is a much simpler and more robust target and thus probably more long term stable.