Button A: The state of alignment technology is unchanged, but all the world’s governments develop a strong commitment to coordinate on AGI. Solving the alignment problem becomes the number one focus of human civilization, and everyone just groks how important it is and sets aside their differences to work together.
Button B: The minds and norms of humans are unchanged, but you are given a program by an alien that, if combined with an AGI, will align that AGI in some kind of way that you would ultimately find satisfying.
World B may sound like LW’s dream come true, but the question looms: “Now what?” Wait for Magma Corp to build their superintelligent profit maximizer, and then kindly ask them to let you walk in and take control over it?
I would rather live in world A. If I was a billionaire or dictator, I would consider B more seriously. Perhaps the question lurking in the background is this: do you want an unrealistic Long Reflection or a tiny chance to commit a Pivotal Act? I don’t believe there’s a third option, but I hope I’m wrong.
I actually think A or B is a large improvement compared to the world as it exists today, but B wins due to the stakes and the fact that they already have the solution, but world A doesn’t have the solution pre-loaded, and with extremely important decisions, B wins over A.
World A is much better than today, to the point that a civilizational scale effort would probably succeed about 95-99.9% of the time, primarily because they understand deceptive alignment.
World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.
World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.
That is totum pro parte. It’s not World B which has a solution at hand. It’s you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don’t believe it’s a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there’s very little chance that your utopian vision will match up with theirs.
You say World B has chance 1 minus epsilon. I would say epsilon is a better ballpark, unless the whole world is already at your mercy for some reason.
That is totum pro parte. It’s not World B which has a solution at hand. It’s you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don’t believe it’s a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there’s very little chance that your utopian vision will match up with theirs.
You say World B has chance 1 minus epsilon. I would say epsilon is a better ballpark, unless the whole world is already at your mercy for some reason.
I do not think a pivotal act is necessary, primarily because it’s much easier to coordinate around negative goals like preventing their deaths than positive goals. That’s why I’m so optimistic, it is easy to cooperate on the shared goal of not dying even if value differences after that are large.
it is easy to cooperate on the shared goal of not dying
Were you here for Petrov Day? /snark
But I’m confused what you mean about a Pivotal Act being unnecessary. Although both you and a megacorp want to survive, you each have very different priors about what is risky. Even if the megacorp believes your alignment program will work as advertised, that only compels them to cooperate with you if they are (1) genuinely concerned about risk in the first place, (2) believe alignment is so hard that they will need your solution, and (3) actually possess the institutional coordination abilities needed.
Okay, let’s operationalize this.
Button A: The state of alignment technology is unchanged, but all the world’s governments develop a strong commitment to coordinate on AGI. Solving the alignment problem becomes the number one focus of human civilization, and everyone just groks how important it is and sets aside their differences to work together.
Button B: The minds and norms of humans are unchanged, but you are given a program by an alien that, if combined with an AGI, will align that AGI in some kind of way that you would ultimately find satisfying.
World B may sound like LW’s dream come true, but the question looms: “Now what?” Wait for Magma Corp to build their superintelligent profit maximizer, and then kindly ask them to let you walk in and take control over it?
I would rather live in world A. If I was a billionaire or dictator, I would consider B more seriously. Perhaps the question lurking in the background is this: do you want an unrealistic Long Reflection or a tiny chance to commit a Pivotal Act? I don’t believe there’s a third option, but I hope I’m wrong.
I actually think A or B is a large improvement compared to the world as it exists today, but B wins due to the stakes and the fact that they already have the solution, but world A doesn’t have the solution pre-loaded, and with extremely important decisions, B wins over A.
World A is much better than today, to the point that a civilizational scale effort would probably succeed about 95-99.9% of the time, primarily because they understand deceptive alignment.
World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.
Both, of course are far better than our world.
That is totum pro parte. It’s not World B which has a solution at hand. It’s you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don’t believe it’s a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there’s very little chance that your utopian vision will match up with theirs.
You say World B has chance 1 minus epsilon. I would say epsilon is a better ballpark, unless the whole world is already at your mercy for some reason.
I do not think a pivotal act is necessary, primarily because it’s much easier to coordinate around negative goals like preventing their deaths than positive goals. That’s why I’m so optimistic, it is easy to cooperate on the shared goal of not dying even if value differences after that are large.
Were you here for Petrov Day? /snark
But I’m confused what you mean about a Pivotal Act being unnecessary. Although both you and a megacorp want to survive, you each have very different priors about what is risky. Even if the megacorp believes your alignment program will work as advertised, that only compels them to cooperate with you if they are (1) genuinely concerned about risk in the first place, (2) believe alignment is so hard that they will need your solution, and (3) actually possess the institutional coordination abilities needed.
And this is just for one org.