I don’t understand why anybody would want anything that involved leaving humans in control, unless there were absolutely no alternative whatsoever.
I’m not joking or being hyperbolic; I genuinely don’t get it. A lot of people seem to think that humans being in control is obviously good, but it seems really, really obvious to me that it’s a likely path to horrible outcomes.
Humans haven’t had access to all that much power for all that long, and we’ve already managed to create a number of conditions that look unstable and likely to go bad in catastrophic ways.
We’re on a climate slide to who-knows-where. The rest of the environment isn’t looking that good either. We’ve managed to avoid large-scale nuclear war for like 75 whole years after developing the capability, but that’s not remotely long enough to call “stable”. Those same 75 years have seen some reduction in war in general, but that looks like it’s turning around as the political system evolves. Most human governments (and other institutions) are distinctly suboptimal on a bunch of axes, including willingness to take crazy risks, and, although you can argue that they’ve gotten better in maybe the last 100 to 150 years, a large number of them now seem to have stopped getting better and started getting worse. Humans in general are systematically rotten to each other, and most of the advancement we’ve gotten against that to come from probably unsustainable institutional tricks that limit anybody’s ability to get the decisive upper hand.
If you gave humans control over more power, then why wouldn’t you expect all of that to get even worse? And even if you could find a way to make such a situation stably not-so-bad, how would you manage the transition, where some humans would have more power than others, and all humans, including the currently advantaged ones, would feel threatened?
It seems to me that the obvious assumption is that humans being in control is bad. And trying to think out the mechanics of actual scenarios hasn’t done anything to change that belief. How can anybody believe otherwise?
There’s a difference between “AI putting humans in control is bad”, and “AI putting humans in control is better than other options we seem to have for alignment.” For many people, it may be as you mentioned:
I don’t understand why anybody would want anything that involved leaving humans in control, unless there were absolutely no alternative whatsoever.
(I’m somewhat less pessimistic than you are, I think, but I agree it could go pretty damn poorly, for many ways the AI could “leave us in control.”)
I don’t have an alternative, and no I’m not very happy about that. I definitely don’t know how to build a friendly AI. But, on the other hand, I don’t see how “corrigibility” could work either, so in that sense they’re on an equal footing. Nobody seems to have any real idea how to achieve either one, so why would you want to emphasize the one that seems less likely to lead to a non-sucky world?
Anyway, what I’m reacting to is this sense I get that some people assume that keeping humans in charge is good, and that humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome. I don’t know if I’ve seen very many people say that, but I see lots of things that seem to assume it. Things people write seem to start out with “If we want to make sure humans are still in charge, then...”, like that’s the primary goal. And I do not think it should be a primary goal. Not even a goal at all, actually.
Nobody seems to have any real idea how to achieve either one
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.
Another possibility is Posthuman Technocapital Singularity, everything goes in the same approximate direction, there are a lot of competing agents but without sharp destabilization or power concertation, and Moloch wins. Probably wins, idk
I don’t understand why anybody would want anything that involved leaving humans in control, unless there were absolutely no alternative whatsoever.
I’m not joking or being hyperbolic; I genuinely don’t get it. A lot of people seem to think that humans being in control is obviously good, but it seems really, really obvious to me that it’s a likely path to horrible outcomes.
Humans haven’t had access to all that much power for all that long, and we’ve already managed to create a number of conditions that look unstable and likely to go bad in catastrophic ways.
We’re on a climate slide to who-knows-where. The rest of the environment isn’t looking that good either. We’ve managed to avoid large-scale nuclear war for like 75 whole years after developing the capability, but that’s not remotely long enough to call “stable”. Those same 75 years have seen some reduction in war in general, but that looks like it’s turning around as the political system evolves. Most human governments (and other institutions) are distinctly suboptimal on a bunch of axes, including willingness to take crazy risks, and, although you can argue that they’ve gotten better in maybe the last 100 to 150 years, a large number of them now seem to have stopped getting better and started getting worse. Humans in general are systematically rotten to each other, and most of the advancement we’ve gotten against that to come from probably unsustainable institutional tricks that limit anybody’s ability to get the decisive upper hand.
If you gave humans control over more power, then why wouldn’t you expect all of that to get even worse? And even if you could find a way to make such a situation stably not-so-bad, how would you manage the transition, where some humans would have more power than others, and all humans, including the currently advantaged ones, would feel threatened?
It seems to me that the obvious assumption is that humans being in control is bad. And trying to think out the mechanics of actual scenarios hasn’t done anything to change that belief. How can anybody believe otherwise?
There’s a difference between “AI putting humans in control is bad”, and “AI putting humans in control is better than other options we seem to have for alignment.” For many people, it may be as you mentioned:
(I’m somewhat less pessimistic than you are, I think, but I agree it could go pretty damn poorly, for many ways the AI could “leave us in control.”)
What TurnTrout said. What’s the alternative to which you’re comparing?
I don’t have an alternative, and no I’m not very happy about that. I definitely don’t know how to build a friendly AI. But, on the other hand, I don’t see how “corrigibility” could work either, so in that sense they’re on an equal footing. Nobody seems to have any real idea how to achieve either one, so why would you want to emphasize the one that seems less likely to lead to a non-sucky world?
Anyway, what I’m reacting to is this sense I get that some people assume that keeping humans in charge is good, and that humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome. I don’t know if I’ve seen very many people say that, but I see lots of things that seem to assume it. Things people write seem to start out with “If we want to make sure humans are still in charge, then...”, like that’s the primary goal. And I do not think it should be a primary goal. Not even a goal at all, actually.
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.
Another possibility is Posthuman Technocapital Singularity, everything goes in the same approximate direction, there are a lot of competing agents but without sharp destabilization or power concertation, and Moloch wins. Probably wins, idk
https://docs.osmarks.net/hypha/posthuman_technocapital_singularity