It’s possible that it “wouldn’t use all it’s potential power” in the same sense that a high IQ neurotic mess of a person wouldn’t use all of their potential power either if they’re too poorly aligned internally to get out of bed and get things done. And while still not harmless, crazy people aren’t as scary as coherently ruthless people optimized for doing harm.
But “People aren’t ruthless” isn’t true in any meaningful sense. If you’re an ant colony, and the humans pave over you to make a house, the fact that they aren’t completely coherent in their optimization for future states over feelings doesn’t change the fact that their successful optimization for having a house where your colony was destroyed everything you care about.
People generally aren’t in a position of that much power over other people such that reality doesn’t strongly suggest that being ruthful will help them with their goals. When they do perceive that to be the case, you see an awful lot of ruthless behavior. Whether the guy in power is completely ruthless is much less important than whether you have enough threat of power to keep him feeling ruthful towards your existence and values.
When you start positing superintelligence, and it gets smart enough that it actually can take over the world regardless of what stupid humans want, that becomes a real problem to grapple with. So it makes sense that it gets a lot of attention, and we’d have to figure it out even if it were just a massively IQ and internal-coherence boosted human.
With respect to the “smart troubled person, dumb therapist” thing, I think you have some very fundamental misgivings about human aims and therapy. It’s by no means trivial to explain in a tangent of a LW comment, but “if the person knew how to feel better in the future, they would just do that” is simply untrue. We do “optimize for feelings” in a sense, but not that one. People choose their unhappiness and their suffering because the alternative is subjectively worse (as a trivial example, would you take a pill that made you blisfully happy for the rest of your life if it came at the cost of happily watching your loved ones get tortured to death?). In the course of doing “therapy like stuff”, sometimes you have to make this explicit so that they can reconsider their choice. I had one client, for example, who I led to the realization that his suffering was a result of his unthinking-refusal to give up hope on a (seemingly) impossible goal. Once he could see that this was his choice, he did in fact choose to suffer less and give up on that goal. However, that was because the goal was actually impossible to achieve, and there’s no way in hell he’d have given up and chosen happiness if it were at all possible for him to succeed in his hopes.
It’s possible for “dumb therapists” to play a useful role, but either those “dumb” therapists are still wiser than the hyperintelligent fool, or else it’s the smart one leading the whole show.
Sure, humans are effectively ruthless in wiping out individual ant colonies. We’ve even wiped out more than a few entire species of ant. But our ruthfulness about our ultimate goals — well, I guess it’s not exactly ruthfulness that I’m talking about...
...The fact that it’s not in our nature to simply define an easy-to-evaluate utility function and then optimize, means that it’s not mere coincidence that we don’t want anything radical enough to imply the elimination of all ant-kind. In fact, I’m pretty sure that for a large majority of people, there’s no utopian ideal you could pitch and they’d buy into, that’s so radical enough that getting there would imply or even suggest actions that would kill all ants. Not because humanity wouldn’t be capable of doing that, just that we’re not capable of wanting that, and that fact may be related to our (residual) ruthfulness and to our intelligence itself. And metaphorically, from a superintelligence’s perspective, I think that humanity-as-a-whole is probably closer to being Formicidae than it is to being one species of ant.
...
This post, and its line of argument, is not about saying “AI alignment doesn’t matter”. Of fucking course it does. What I’m saying is: “it may not be the case that any tiny misalignment of a superintelligence is fatal/permanent”. Because yes, a superintelligence can and probably will change the world to suit its goals, but it won’t ruthlessly change the whole world to perfectly suit its goals, because those goals will not, themselves, be perfectly coherent. And in that gap, I believe there will probably still be room for some amount of humanity or posthumanity-that’s-still-commensurate-with-extrapolated-human-values having some amount of say in their own fates.
The response I’m looking for is not at all “well, that’s all OK then, we can stop worrying about alignment”. Because there’s a huge difference between future (post)humans living meagerly under sufferance in some tiny remnant of the world that a superintelligence doesn’t happen to care about coherently enough to change, or them thriving as an integral part of the future that it does care about and is building, or some other possibility better or worse than those. But what I am arguing is that I think the “win big or lose big are the only options” attitude I see as common in alignment circles (I know that Eleizer isn’t really cutting edge anymore, but, look at his recent April Fools’ “joke” for an example) may be misguided. Not every superintelligence that isn’t perfectly friendly is terrifyingly unfriendly, and I think that admitting other possibilities (without being complacent about them) might help useful progress in pursuing alignment.
...
As for your points about therapy: yes, of course, my off-the-cuff one-paragraph just-so-story was oversimplified. And yes, you seem to know a lot more about this than I do. But I’m not sure the metaphor is strong enough to make all that complexity matter here.
It’s possible that it “wouldn’t use all it’s potential power” in the same sense that a high IQ neurotic mess of a person wouldn’t use all of their potential power either if they’re too poorly aligned internally to get out of bed and get things done. And while still not harmless, crazy people aren’t as scary as coherently ruthless people optimized for doing harm.
But “People aren’t ruthless” isn’t true in any meaningful sense. If you’re an ant colony, and the humans pave over you to make a house, the fact that they aren’t completely coherent in their optimization for future states over feelings doesn’t change the fact that their successful optimization for having a house where your colony was destroyed everything you care about.
People generally aren’t in a position of that much power over other people such that reality doesn’t strongly suggest that being ruthful will help them with their goals. When they do perceive that to be the case, you see an awful lot of ruthless behavior. Whether the guy in power is completely ruthless is much less important than whether you have enough threat of power to keep him feeling ruthful towards your existence and values.
When you start positing superintelligence, and it gets smart enough that it actually can take over the world regardless of what stupid humans want, that becomes a real problem to grapple with. So it makes sense that it gets a lot of attention, and we’d have to figure it out even if it were just a massively IQ and internal-coherence boosted human.
With respect to the “smart troubled person, dumb therapist” thing, I think you have some very fundamental misgivings about human aims and therapy. It’s by no means trivial to explain in a tangent of a LW comment, but “if the person knew how to feel better in the future, they would just do that” is simply untrue. We do “optimize for feelings” in a sense, but not that one. People choose their unhappiness and their suffering because the alternative is subjectively worse (as a trivial example, would you take a pill that made you blisfully happy for the rest of your life if it came at the cost of happily watching your loved ones get tortured to death?). In the course of doing “therapy like stuff”, sometimes you have to make this explicit so that they can reconsider their choice. I had one client, for example, who I led to the realization that his suffering was a result of his unthinking-refusal to give up hope on a (seemingly) impossible goal. Once he could see that this was his choice, he did in fact choose to suffer less and give up on that goal. However, that was because the goal was actually impossible to achieve, and there’s no way in hell he’d have given up and chosen happiness if it were at all possible for him to succeed in his hopes.
It’s possible for “dumb therapists” to play a useful role, but either those “dumb” therapists are still wiser than the hyperintelligent fool, or else it’s the smart one leading the whole show.
Sure, humans are effectively ruthless in wiping out individual ant colonies. We’ve even wiped out more than a few entire species of ant. But our ruthfulness about our ultimate goals — well, I guess it’s not exactly ruthfulness that I’m talking about...
...The fact that it’s not in our nature to simply define an easy-to-evaluate utility function and then optimize, means that it’s not mere coincidence that we don’t want anything radical enough to imply the elimination of all ant-kind. In fact, I’m pretty sure that for a large majority of people, there’s no utopian ideal you could pitch and they’d buy into, that’s so radical enough that getting there would imply or even suggest actions that would kill all ants. Not because humanity wouldn’t be capable of doing that, just that we’re not capable of wanting that, and that fact may be related to our (residual) ruthfulness and to our intelligence itself. And metaphorically, from a superintelligence’s perspective, I think that humanity-as-a-whole is probably closer to being Formicidae than it is to being one species of ant.
...
This post, and its line of argument, is not about saying “AI alignment doesn’t matter”. Of fucking course it does. What I’m saying is: “it may not be the case that any tiny misalignment of a superintelligence is fatal/permanent”. Because yes, a superintelligence can and probably will change the world to suit its goals, but it won’t ruthlessly change the whole world to perfectly suit its goals, because those goals will not, themselves, be perfectly coherent. And in that gap, I believe there will probably still be room for some amount of humanity or posthumanity-that’s-still-commensurate-with-extrapolated-human-values having some amount of say in their own fates.
The response I’m looking for is not at all “well, that’s all OK then, we can stop worrying about alignment”. Because there’s a huge difference between future (post)humans living meagerly under sufferance in some tiny remnant of the world that a superintelligence doesn’t happen to care about coherently enough to change, or them thriving as an integral part of the future that it does care about and is building, or some other possibility better or worse than those. But what I am arguing is that I think the “win big or lose big are the only options” attitude I see as common in alignment circles (I know that Eleizer isn’t really cutting edge anymore, but, look at his recent April Fools’ “joke” for an example) may be misguided. Not every superintelligence that isn’t perfectly friendly is terrifyingly unfriendly, and I think that admitting other possibilities (without being complacent about them) might help useful progress in pursuing alignment.
...
As for your points about therapy: yes, of course, my off-the-cuff one-paragraph just-so-story was oversimplified. And yes, you seem to know a lot more about this than I do. But I’m not sure the metaphor is strong enough to make all that complexity matter here.