Quintin Pope comments on The curious case of Pretty Good human inner/outer alignment

Quintin Pope 6 Jul 2022 15:39 UTC
5 points
11
What fraction of people are nice in the way we want an AI to be nice? 1 / 100? 1 / 1000? What n is large enough such that selecting the 1 / n nicest human would give you a human sufficiently nice?
Whatever your answer, that equates to saying that human learning processes are ~ log(n) bits of optimization pressure away from satisfying the “nice in the way we want an AI to be nice” criterion.
Another way to think about this: selecting the nicest out of n humans is essentially doing a single step of random search optimization over human learning processes, optimizing purely for niceness. Random search is a pretty terrible optimization method, and one-step random search is even worse.
You can object that it’s not necessarily easy to apply optimization pressure towards niceness directly (as opposed to some more accessible proxies for niceness), which is true. But still, I think it’s telling that so few total bits of optimization pressure leads to such big differences in human niceness.
Edit: there are also lots of ways in which bird flight is non-optimal for us. E.g., birds can’t carry very much. But if you don’t know how to build a flying machine, studying birds is still valuable. Once you understand the underlying principles, then you can think about adapting them to better fit your specific use case. Before we understand why humans are nice to each other, we can’t know how easily it will be to adapt those underlying generators of niceness to better suit our own needs for AIs. How many bits of optimizaiton pressure do you have to apply to birds before they can carry cargo planes worth of stuff?
What links here?
- TurnTrout's comment on Alignment via prosocial brain algorithms by Cameron Berg (13 Sep 2022 18:24 UTC; 4 points)
- Noosphere89 6 Jul 2022 18:05 UTC
  2 points
  0
  Parent
  I would say about roughly 1 in 10-1 in a 100 million people can be trusted to be reliably nice to less powerful beings, and maybe at the high end 1 in 1 billion people can reliably not abuse less powerful beings like animals, conditional on the animal not attacking them. That’s my answer for how many bits of optimization pressure is required for reliable niceness towards less powerful beings in humans.

Quintin Pope comments on The curious case of Pretty Good human inner/​outer alignment

Quintin Pope comments on The curious case of Pretty Good human inner/outer alignment