Wei Dai comments on Being nicer than Clippy

Wei Dai 20 Sep 2024 5:03 UTC
2 points
0
The main asymmetries I see are:
1. Other people not trusting the group to not be corrupted by power and to reflect correctly on their values, or not trusting that they’ll decide to share power even after reflecting correctly. Thus “programmers” who decide to not share power from the start invite a lot of conflict. (In other words, CEV is partly just trying to not take power away from people, whereas I think you’ve been talking about giving AIs more power than they already have. “the sort of influence we imagine intentionally giving to AIs-with-different-values that we end up sharing the world with”)
2. The “programmers” not trusting themselves. I note that individuals or small groups trying to solve morality by themselves don’t have very good track records. They seem to too easily become wildly overconfident and/or get stuck in intellectual dead-ends. Arguably the only group that we have evidence for being able to make sustained philosophical progress is humanity as a whole.
To the extent that these considerations don’t justify giving every human equal power/weight in CEV, I may just disagree with Eliezer about that. (See also Hacking the CEV for Fun and Profit.)
- Tamsin Leake 20 Sep 2024 6:07 UTC
  2 points
  0
  Parent
  
  trying to solve morality by themselves
  
  It doesn’t have to be by themselves; they can defer to others inside CEV, or come up with better schemes that their initial CEV inside CEV and then defer to that. Whatever other solutions than “solve everything on your own inside CEV” might exist, they can figure those out and defer to them from inside CEV. At least that’s the case in my own attempts at implementing CEV in math (eg QACI).
  - Wei Dai 20 Sep 2024 6:28 UTC
    2 points
    0
    Parent
    Once they get into CEV, they may not want to defer to others anymore, or may set things up with a large power/status imbalance between themselves and everyone else which may be detrimental to moral/philosophical progress. There are plenty of seemingly idealistic people in history refusing to give up or share power once they got power. The prudent thing to do seems to never get that much power in the first place, or to share it as soon as possible.
    If you’re pretty sure you will defer to others once inside CEV, then you might as well do it outside CEV due to #1 in my grandparent comment.
    - Tamsin Leake 20 Sep 2024 6:38 UTC
      4 points
      0
      Parent
      
      I wonder how much of those seemingly idealistic people retained power when it was available because they were indeed only pretending to be idealistic. Assuming one is actually initially idealistic but then gets corrupted by having power in some way, one thing someone can do in CEV that you can’t do in real life is reuse the CEV process to come up with even better CEV processes which will be even more likely to retain/recover their just-before-launching-CEV values. Yes, many people would mess this up or fail in some other way in CEV; but we only need one person or group who we’d be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this. Importantly, to me, this reduces outer alignment to “find someone smart and reasonable and likely to have good goal-content integrity”, which is a matter of social & psychology that seems to be much smaller than the initial full problem of formal outer alignment / alignment target design.
      
      One of the main reasons to do CEV is because we’re gonna die of AI soon, and CEV is a way to have infinite time to solve the necessary problems. Another is that even if we don’t die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.
      - Wei Dai 26 Sep 2024 22:45 UTC
        4 points
        8
        Parent
        
        but we only need one person or group who we’d be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this.
        
        Why do you think this, and how would you convince skeptics? And there are two separate issues here. One is how to know their CEV won’t be corrupted relative to what their values really are or should be, and the other is how to know that their real/normative values are actually highly altruistic. It seems hard to know both of these, and perhaps even harder to persuade others who may be very distrustful of such person/group from the start.
        
        Another is that even if we don’t die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.
        
        Would be interested in understanding your perspective on this better. I feel like aside from AI, our world is not being eaten by molochs very quickly, and I prefer something like stopping AI development and doing (voluntary and subsidized) embryo selection to increase human intelligence for a few generations, then letting the smarter humans decide what to do next. (Please contact me via PM if you want to have a chat about this.)
        the gears to ascension 27 Sep 2024 8:39 UTC
        4 points
        0
        Parent
        some fragments:
        
        What hunches do you currently have surrounding orthogonality, its truth or not, or things near it?
        
        re: hard to know—it seems to me that we can’t get a certifiably-going-to-be-good result from a CEV based ai solution unless we can make it certifiable that altruism is present. I think figuring out how to write down some form of what altruism is, especially altruism in contrast to being-a-pushover, is necessary to avoid issues—because even if any person considers themselves for CEV, how would they know they can trust their own behavior?
        
        as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what’s happening in a way that corrupts thoughts which previously implemented values. can we find a CEV-grade alignment solution that solves the self-and-other alignment problems in humans as well, such that this CEV can be run on any arbitrary chunk of matter and discover its “true wants, needs, and hopes for the future”?
        Wei Dai 27 Sep 2024 16:38 UTC
        2 points
        0
        Parent
        
        What hunches do you currently have surrounding orthogonality, its truth or not, or things near it?
        
        I’m very uncertain about it. Have you read Six Plausible Meta-Ethical Alternatives?
        
        as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what’s happening in a way that corrupts thoughts which previously implemented values.
        
        Yeah, agreed that how to safely amplify oneself and reflect for long periods of time may be hard problems that should be solved (or extensively researched/debated if we can’t definitely solve them) before starting something like CEV. This might involve creating the right virtual environment, social rules, epistemic norms, group composition, etc. A few things that seem easy to miss or get wrong:
        
        Is it better to have no competition or some competition, and what kind? (Past “moral/philosophical progress” might have been caused or spread by competitive dynamics.)
        How should social status work in CEV? (Past “progress” might have been driven by people motivated by certain kinds of status.)
        No danger or some danger? (Could a completely safe environment / no time pressure cause people to lose motivation or some other kind of value drift? Related: What determines the balance between intelligence signaling and virtue signaling?)
        
        can we find a CEV-grade alignment solution that solves the self-and-other alignment problems in humans as well, such that this CEV can be run on any arbitrary chunk of matter and discover its “true wants, needs, and hopes for the future”?
        
        I think this is worth thinking about as well, as a parallel approach from the above. It seems related to metaphilosophy in that if we can discover what “correct philosophical reasoning” is, we can solve this problem by asking “What would this chunk of matter conclude if it were to follow correct philosophical reasoning?”