While writing, I realized that this sounds a bit similar to the unilateralist’s curse. It’s not the same, but it has parallels. I’ll discuss that briefly because it’s relevant to other aspects of the situation. The unilateralist’s curse does not occur specifically due to multiple samplings, it occurs because different actors have different beliefs about the value/disvalue, and this variance in beliefs makes it more likely that one of those actors has a belief above the “do it” threshold. If each draw from the AGI urn had the same outcome, this would look a lot like a unilateralist’s curse situation where we care about variance in the actors’ beliefs. But I instead think that draws from the AGI urn are somewhat independent and the problem is just that we should incur e.g., a 5% misalignment risk as few times as we have to.
Interestingly, a similar look at variance is part of what makes the infosecurity situation much worse for multiple projects compared to centralized AGI project: variance is bad here. I expect a single government AGI project to care about and invest in security at least as much as the average AGI company. The AGI companies have some variance in their caring and investment in security, and the lower ones will be easier to steal from. If you assume these multiple projects have similar AGI capabilities (this is a bad assumption but is basically the reason to like multiple projects for Power Concentration reasons so worth assuming here; if the different projects don’t have similar capabilities, power is not very balanced), you might then think that any of the companies getting their models stolen is similarly bad to the centralized project getting its models stolen (with a time lag I suppose, because the centralized project got to that level of capability faster).
If you are hacking a centralized AGI project, say you have a 50% chance of success. If you are hacking 3 different AGI projects, you have 3 different/independent 50% chances of success. They’re different because these project have different security measures in place. Now sure, as indicated by one of the points in this blog post, maybe less effort goes into hacking each of the 3 projects (because you have to split your resources, and because there’s less overall interest in stealing model weights), maybe that pushes each of these down to 33%. These numbers are obviously made up, and they would get to a 1 – (0.67^3) = 70% chance of success.
Unilateralist’s curse is about variance in beliefs about the value of some action. The parent comment is about taking multiple independent actions that each have a risk of very bad outcomes.
While writing, I realized that this sounds a bit similar to the unilateralist’s curse. It’s not the same, but it has parallels. I’ll discuss that briefly because it’s relevant to other aspects of the situation. The unilateralist’s curse does not occur specifically due to multiple samplings, it occurs because different actors have different beliefs about the value/disvalue, and this variance in beliefs makes it more likely that one of those actors has a belief above the “do it” threshold. If each draw from the AGI urn had the same outcome, this would look a lot like a unilateralist’s curse situation where we care about variance in the actors’ beliefs. But I instead think that draws from the AGI urn are somewhat independent and the problem is just that we should incur e.g., a 5% misalignment risk as few times as we have to.
Interestingly, a similar look at variance is part of what makes the infosecurity situation much worse for multiple projects compared to centralized AGI project: variance is bad here. I expect a single government AGI project to care about and invest in security at least as much as the average AGI company. The AGI companies have some variance in their caring and investment in security, and the lower ones will be easier to steal from. If you assume these multiple projects have similar AGI capabilities (this is a bad assumption but is basically the reason to like multiple projects for Power Concentration reasons so worth assuming here; if the different projects don’t have similar capabilities, power is not very balanced), you might then think that any of the companies getting their models stolen is similarly bad to the centralized project getting its models stolen (with a time lag I suppose, because the centralized project got to that level of capability faster).
If you are hacking a centralized AGI project, say you have a 50% chance of success. If you are hacking 3 different AGI projects, you have 3 different/independent 50% chances of success. They’re different because these project have different security measures in place. Now sure, as indicated by one of the points in this blog post, maybe less effort goes into hacking each of the 3 projects (because you have to split your resources, and because there’s less overall interest in stealing model weights), maybe that pushes each of these down to 33%. These numbers are obviously made up, and they would get to a 1 – (0.67^3) = 70% chance of success.
Unilateralist’s curse is about variance in beliefs about the value of some action. The parent comment is about taking multiple independent actions that each have a risk of very bad outcomes.