I don’t have an alternative, and no I’m not very happy about that. I definitely don’t know how to build a friendly AI. But, on the other hand, I don’t see how “corrigibility” could work either, so in that sense they’re on an equal footing. Nobody seems to have any real idea how to achieve either one, so why would you want to emphasize the one that seems less likely to lead to a non-sucky world?
Anyway, what I’m reacting to is this sense I get that some people assume that keeping humans in charge is good, and that humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome. I don’t know if I’ve seen very many people say that, but I see lots of things that seem to assume it. Things people write seem to start out with “If we want to make sure humans are still in charge, then...”, like that’s the primary goal. And I do not think it should be a primary goal. Not even a goal at all, actually.
Nobody seems to have any real idea how to achieve either one
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.
I don’t have an alternative, and no I’m not very happy about that. I definitely don’t know how to build a friendly AI. But, on the other hand, I don’t see how “corrigibility” could work either, so in that sense they’re on an equal footing. Nobody seems to have any real idea how to achieve either one, so why would you want to emphasize the one that seems less likely to lead to a non-sucky world?
Anyway, what I’m reacting to is this sense I get that some people assume that keeping humans in charge is good, and that humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome. I don’t know if I’ve seen very many people say that, but I see lots of things that seem to assume it. Things people write seem to start out with “If we want to make sure humans are still in charge, then...”, like that’s the primary goal. And I do not think it should be a primary goal. Not even a goal at all, actually.
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.