zhukeepa comments on Metaphilosophical competence can’t be disentangled from alignment

zhukeepa 2 Apr 2018 8:08 UTC
5 points
I think most humans achieving what they currently consider their goals would end up being catastrophic for humanity, even if they succeed. (For example I think an eternal authoritarian regime is pretty catastrophic.)
- Rohin Shah 9 Apr 2018 20:53 UTC
  4 points
  Parent
  I agree that an eternal authoritarian regime is pretty catastrophic.
  I don’t think that a human in this scenario would be pursuing what they currently consider their goals—I think they would think more, learn more, and eventually settle on a different set of goals. (Maybe initially they pursue their current goals but it changes over time.) But it’s an open question to me whether the final set of goals they settle upon is actually reasonably aligned towards “humanity’s goals”—it may be or it may not be. So it could be catastrophic to amplify a current human in this way, from the perspective of humanity. But, it would not be catastrophic to the human that you amplified. (I think you disagree with the last statement, maybe I’m wrong about that.)
  - zhukeepa 9 Apr 2018 22:01 UTC
    6 points
    Parent
    I’d say that it wouldn’t appear catastrophic to the amplified human, but might be catastrophic for that human anyway (e.g. if their values-on-reflection actually look a lot like humanity’s values-on-reflection, but they fail to achieve their values-on-reflection).
    - Rohin Shah 10 Apr 2018 16:58 UTC
      2 points
      Parent
      Yeah, I think that’s where we disagree. I think that humans are likely to achieve their values-on-reflection, I just don’t know what a human’s “values-on-reflection” would actually be (eg. could be that they want an authoritarian regime with them in charge).
      It’s also possible that we have different concepts of values-on-reflection. Eg. maybe you mean that I have found my values-on-reflection only if I’ve cleared out all epistemic pits somehow and then thought for a long time with the explicit goal of figuring out what I value, whereas I would use a looser criterion. (I’m not sure what exactly.)
      - zhukeepa 12 Apr 2018 8:10 UTC
        5 points
        Parent
        Yeah, what you described indeed matches my notion of “values-on-reflection” pretty well. So for example, I think a religious person’s values-on-reflection should include valuing logical consistency and coherent logical arguments (because they do implicitly care about those in their everyday lives, even if they explicitly deny it). This means their values-on-reflection should include having true beliefs, and thus be atheistic. But I also wouldn’t generally trust religious people to update away from religion if they reflected a bunch.