paulfchristiano comments on Metaphilosophical competence can’t be disentangled from alignment

paulfchristiano 10 Apr 2018 3:36 UTC
14 points
My current intuition is that there are under 100 people whom, if 1,000,000,000,000,000x’d, would end up avoiding irreversible catastrophes with > 50% probability. (I haven’t thought too much about this question, and wouldn’t be surprised if I update to thinking there are fewer than 10 such people, or even 0 such people.)
I’ve asked this before but don’t feel like I got a solid answer: (a) do you think that giving the 100th person a lot of power is a lot worse than the status quo (w.r.t. catastrophic risk), and (b) why?
If you think it’s a lot worse, the explanations I can imagine are along the lines of: “the ideas that win in the marketplace of ideas are systematically good,” or maybe “if people are forced to reflect by thinking some, growing older, being replaced by their children, etc., that’s way better than having them reflect in the way that they’d choose to given unlimited power.”, or something like that.
But those seem inconsistent with your position in at least two ways:
- If this is the case, then people don’t need metaphilosophical competence to be fine, they just need a healthy respect for business as usual and whatever magic it is that causes the status quo to arrive at good answers. Indeed, there seem to be many people (>> 100) who would effectively abdicate their power after being greatly empowered, or who would use it in a narrow way to avoid catastrophes but not to change the basic course of social deliberation.
- The implicit claim about the magic of the status quo is itself a strong metaphilosphical claim, and I don’t see why you would have so much confidence in this position while thinking that we should have no confidence in other metaphilosphical conclusions.
If you think that the status quo is even worse, then I don’t quite understand what you mean by a statement like:
Once humanity makes enough metaphilosophical progress (which might require first solving agent foundations), I might feel comfortable 1,000,000,000,000,000x’ing the most metaphilosophically competent person alive, though it’s possible I’ll decide I wouldn’t want to 1,000,000,000,000,000x anyone running on current biological hardware. I’d also feel good 1,000,000,000,000,000x’ing someone if we’re in the endgame and the default outcome is clearly self-annihilation.
Other questions: why can we solve agent foundations, but the superintelligent person can’t? What are you imagining happening after you empower this person? Why are you able to foresee so many difficulties that they predictably won’t see?
- zhukeepa 10 Apr 2018 6:19 UTC
  5 points
  Parent
  Oh, I actually think that giving the 100th best person a bunch of power is probably better.than the status quo, assuming there are ~100 people who pass the bar (I also feel pessimistic about the status quo). The only reason why I think the status quo might be better is that more metaphilosophy would develop, and then whoever gets amplified would have more metaphilosophical competence to begin with, which seems safer.
  - paulfchristiano 10 Apr 2018 7:12 UTC
    11 points
    Parent
    What about the 1000th person?
    (Why is us making progress on metaphilosphy an improvement over the empowered person making progress on metaphilosphy?)
    - zhukeepa 12 Apr 2018 8:16 UTC
      7 points
      Parent
      I think the world will end up in a catastrophic epistemic pit. For example, if any religious leader got massively amplified, I think it’s pretty likely (>50%) the whole world will just stay religious forever.
      Us making progress on metaphilosophy isn’t an improvement over the empowered person making progress on metaphilosophy, conditioning on the empowered person making enough progress on metaphilosophy. But in general I wouldn’t trust someone to make enough progress on metaphilosophy unless they had a strong enough metaphilosophical base to begin with.
      - paulfchristiano 12 Apr 2018 16:40 UTC
        13 points
        Parent
        (I assume you mean that the 1000th person is much worse than the status quo, because they will end up in a catastrophic epistemic pit. Let me know if that’s a misunderstanding.)
        Is your view:
        People can’t make metaphilosophical progress, but they can recognize and adopt it. The status quo is OK because there is a large diversity of people generating ideas (the best of which will be adopted).
        People can’t recognize metaphilosphical progress when they see it, but better views will systematically win in memetic competition (or in biological/economic competition because their carriers are more competent).
        “Metaphilosophy advances one funeral at a time,” the way that we get out of epistemic traps is by creating new humans who start out with less baggage.
        Something completely different?
        I still don’t understand how any of those views could imply that it is so hard for individuals to make progress if amplified. For each of those three views about why the status quo is good, I think that more than 10% of people would endorse that view and use their amplified power in a way consistent with it (e.g. by creating lots of people who can generate lots of ideas; by allowing competition amongst people who disagree, and accepting the winners’ views; by creating a supportive and safe environment for the next generation and then passing off power to that generation...) If you amplify people radically, I would strongly expect them to end up with better versions of these ideas, more often, than humanity at large.
        My normal concern would be that people would drift too far too fast, so we’d end up with e.g. whatever beliefs were most memetically fit regardless of their accuracy. But again, I think that amplifying someone leaves us in a way better situation with respect to memetic competition unless they make an unforced error.
        Even more directly: I think more than 1% of people would, if amplified, have the world continue on the same deliberative trajectory it’s on today. So it seems like the fraction of people you can safely amplify must be more than 1%. (And in general those people will leave us much better off than we are today, since lots of them will take safe, easy wins like “Avoid literally killing ourselves in nuclear war.”)
        I can totally understand why you’d say “lots of people would mess up if amplified due to being hasty and uncareful.” But I still don’t see what could possibly make you think “99.99999% of people would mess up most of the time.” I’m pretty sure that I’m either misunderstanding your view, or it isn’t coherent.
    - Gordon Seidoh Worley 12 Apr 2018 22:59 UTC
      3 points
      Parent
      It seems to me the difficulty is likely to be in assessing whether someone would have a good enough start, and being able to do this probably requires enough ability to assess metaphilosophical competence now such that we could pick such a person to make progress later.
    - scarcegreengrass 10 Apr 2018 17:10 UTC
      3 points
      Parent
      (I’m not zhukeepa; i’m just bringing up my own thoughts.)
      This isn’t quite the same as a improvement, but one thing that is more appealing about normal-world metaphilosophical progress than empowered-person metaphilosophical progress is that the former has a track record of working*, while the latter is untried and might not work.
      *Slowly and not without reversals.