abramdemski comments on The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables

abramdemski 20 Nov 2020 15:37 UTC
LW: 5 AF: 3
AF
This seems like it’s only true if the humans would truly cling to their belief in spite of all evidence (IE if they believed in ghosts dogmatically), which seems untrue for many things (although I grant that some humans may have some beliefs like this). I believe the idea of the ghost example is to point at cases where there’s an ontological crisis, not cases where the ontology is so dogmatic that there can be no crisis (though, obviously, both cases are theoretically important).
However, I agree with you in either case—it’s not clear there’s “nothing to be done” for the ghost case (in either interpretation).
- StellaAthena 21 Nov 2020 14:47 UTC
  LW: 8 AF: 2
  AF Parent
  I don’t understand what the purported ontological crisis is. If ghosts exist, then I want them to be happy. That doesn’t require a dogmatic belief that there are ghosts at all. In fact, it can even be true when I believe ghosts don’t exist!
  - abramdemski 23 Nov 2020 15:35 UTC
    LW: 5 AF: 4
    AF Parent
    I mean, that’s fair. But what if your belief system justified almost everything ultimately in terms of “making ancestors happy”, and relied on a belief that ancestors are still around to be happy/sad? There are several possible responses which a real human might be tempted to make:
    Give up on those values which were justified via ancestor worship, and only pursue the few values which weren’t justified that way.
    Value all the same things, just not based on ancestor worship any more.
    Value all the same things, just with a more abstract notion of “making ancestors happy” rather than thinking the ancestors are literally still around.
    Value mostly the same things, but with some updates in places where ancestor worship was really warping your view of what’s valuable rather than merely serving as a pleasant justification for what you already think is valuable.
    So we can fix the scenario to make a more real ontological crisis.
    It also bears mentioning—the reason to be concerned about ontological crisis is, mostly, a worry that almost none of the things we express our values in terms of are “real” in a reductionistic sense. So an AI could possibly view the world through much different concepts and still be predictively accurate. The question then is, what would it mean for such an AI to pursue our values?
    - Richard_Ngo 26 Feb 2021 15:18 UTC
      LW: 6 AF: 4
      AF Parent
      The question then is, what would it mean for such an AI to pursue our values?
      Why isn’t the answer just that the AI should:
      1. Figure out what concepts we have;
      2. Adjust those concepts in ways that we’d reflectively endorse;
      3. Use those concepts?
      The idea that almost none of the things we care about could be adjusted to fit into a more accurate worldview seems like a very strongly skeptical hypothesis. Tables (or happiness) don’t need to be “real in a reductionist sense” for me to want more of them.
      - abramdemski 26 Feb 2021 16:55 UTC
        LW: 6 AF: 4
        AF Parent
        Agreed. The problem is with AI designs which don’t do that. It seems to me like this perspective is quite rare. For example, my post Policy Alignment was about something similar to this, but I got a ton of pushback in the comments—it seems to me like a lot of people really think the AI should use better AI concepts, not human concepts. At least they did back in 2018.
        
        As you mention, this is partly due to overly reductionist world-views. If tables/happiness aren’t reductively real, the fact that the AI is using those concepts is evidence that it’s dumb/insane, right?
        
        Illustrative excerpt from a comment there:
        
        From an “engineering perspective”, if I was forced to choose something right now, it would be an AI “optimizing human utility according to AI beliefs” but asking for clarification when such choice diverges too much from the “policy-approval”.
        
        Probably most of the problem was that my post didn’t frame things that well—I was mainly talking in terms of “beliefs”, rather than emphasizing ontology, which makes it easy to imagine AI beliefs are about the same concepts but just more accurate. John’s description of the pointers problem might be enough to re-frame things to the point where “you need to start from human concepts, and improve them in ways humans endorse” is bordering on obvious.
        
        (Plus I arguably was too focused on giving a specific mathematical proposal rather than the general idea.)