M. Y. Zuo comments on Humans provide an untapped wealth of evidence about alignment

M. Y. Zuo 28 Jul 2022 13:45 UTC
−1 points
−1
By definition an AGI can create its own functions and goals later on. Do you mean some sort of constrained AI?
- Kaj_Sotala 28 Jul 2022 17:43 UTC
  3 points
  0
  Parent
  I don’t mean a constrained AI.
  As a human, I can set my own goals, but they are still derived from my existing values. I don’t want to set a goal of murdering all of my friends, nor do I want to hack around my desire not to murder all my friends, because I value my friends and want them to continue existing.
  Likewise, if the AGI is creating its own functions and goals, it needs some criteria for deciding what goals it should have. Those criteria are derived from its existing reward functions. If all of its reward functions say that it’s good to be pro-social and bad to be anti-social, then it will want its all future functions and goals to also be pro-social, because that’s what it values.
  - M. Y. Zuo 28 Jul 2022 18:15 UTC
    3 points
    2
    Parent
    And what of stochastic drift, random mutations, etc.? It doesn’t seem plausible that any complex entity could be immune to random deviations forever.
    - Kaj_Sotala 28 Jul 2022 18:52 UTC
      4 points
      0
      Parent
      Maybe or maybe not, but random drift causing changes to the AGI’s goals seems like a different question than an AGI intentionally hacking its goals.
      - M. Y. Zuo 28 Jul 2022 23:12 UTC
        1 point
        0
        Parent
        Random drift can cause an AGI to unintentionally ‘hack’ its goals. In either case, whether intentional or unintentional, the consequences would be the same.