This is a really cool idea and I’m glad you made the post! Here are a few comments/thoughts:
H1: “If you give a human absolute power, there is a small subset of humans that actually cares and will try to make everyone’s life better according to their own wishes”
How confident are you in this premise? Power and sense of values/incentives/preferences may not be orthogonal (and my intuition is that it isn’t). Also, I feel a little skeptical about the usefulness of thinking about the trait showing up more or less in various intelligence strata within humans. Seems like what we’re worried about is in a different reference class. Not sure.
H4 is something I’m super interested in and would be happy to talk about it in conversations/calls if you want to : )
No, I don’t remember exactly where on LW I saw it—just wanted to aknowledge that I was amplifying so.eone else’s thoughts.
My college writing instructor was taken aback when I asked her how to cite something I could quote, but didn’t recall from where, but her answer was “then you can’t use it” which seemed harsh. There should be a way to aknowledge plagiarism without knowing or stating who is being plagiarized—and if the original author shown up, you’ve basically pre-conceded any question of originality to them.
I don’t know that anyone has done the studies, but you could look at how winners of large lotteries behave. That is a natural example of someone suddenly gaining a lot of money (and therefore power). Do they tend to keep thier previous goals, amd just scale up thier efforts, or do they start doing power-retaining things? I have no idea what the data will show—thought experiments and amecdotes could go either way.
If they are not orthogonal then presumably prosociality and power are inversely related, which is worse?
In this case, I’m hoping intelligence and prosociality-that-is-robust-to-absolute-power would hopefully be a positive correlation. However, I struggle to think how this might actually be tested… My intuitions may be born from the Stanford Prison experiment, which I think has been refuted since. So maybe we don’t actually have as much data on prosociality in extreme circumstances as I initially intuited. I’m mostly reasoning this out now on the fly by zooming in on where my thoughts may have originally come from.
That said, it doesn’t very much matter how frequent robust prosociality traits are, as long as they do exist and can be recreated in AGI.
This is a really cool idea and I’m glad you made the post! Here are a few comments/thoughts:
H1: “If you give a human absolute power, there is a small subset of humans that actually cares and will try to make everyone’s life better according to their own wishes”
How confident are you in this premise? Power and sense of values/incentives/preferences may not be orthogonal (and my intuition is that it isn’t). Also, I feel a little skeptical about the usefulness of thinking about the trait showing up more or less in various intelligence strata within humans. Seems like what we’re worried about is in a different reference class. Not sure.
H4 is something I’m super interested in and would be happy to talk about it in conversations/calls if you want to : )
I saw this note in another thread, but the just of it is that power doesn’t corrupt. Rather,
Evil people seek power, and are willing to be corrupt (shared cause correlation)
Being corrupt helps to get more power—in the extreme statement of this, maintaining power requires corruption
The process of gaining power creates murder-ghandis.
People with power attract and/or need advice on how and for what goal to wield it, and that leads to mis-alignment with the agents pre-power values.
Can you add a link to the other thread please?
No, I don’t remember exactly where on LW I saw it—just wanted to aknowledge that I was amplifying so.eone else’s thoughts.
My college writing instructor was taken aback when I asked her how to cite something I could quote, but didn’t recall from where, but her answer was “then you can’t use it” which seemed harsh. There should be a way to aknowledge plagiarism without knowing or stating who is being plagiarized—and if the original author shown up, you’ve basically pre-conceded any question of originality to them.
Thx for being clear about it.
Are you aware of any research in to this? I struggle to think of any research designs that would make it through an ethics board.
I don’t know that anyone has done the studies, but you could look at how winners of large lotteries behave. That is a natural example of someone suddenly gaining a lot of money (and therefore power). Do they tend to keep thier previous goals, amd just scale up thier efforts, or do they start doing power-retaining things? I have no idea what the data will show—thought experiments and amecdotes could go either way.
Let me Google that for you.
Thank you!
If they are not orthogonal then presumably prosociality and power are inversely related, which is worse?
In this case, I’m hoping intelligence and prosociality-that-is-robust-to-absolute-power would hopefully be a positive correlation. However, I struggle to think how this might actually be tested… My intuitions may be born from the Stanford Prison experiment, which I think has been refuted since. So maybe we don’t actually have as much data on prosociality in extreme circumstances as I initially intuited. I’m mostly reasoning this out now on the fly by zooming in on where my thoughts may have originally come from.
That said, it doesn’t very much matter how frequent robust prosociality traits are, as long as they do exist and can be recreated in AGI.
I’ll DM you my discord :)