On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.
This makes the hidden assumption that “resources” is a good abstraction in this scenario.
It is being assumed that the amount of resources an agent “has” is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources.
I think this assumption is blatantly untrue.
Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.
Don’t think “The US military has nukes, the AI doesn’t, so the US military has an advantage”, think “one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea.”
+1. Another way of putting it: This allegation of shaky arguments is itself super shaky, because it assumes that overcoming a 100x − 1,000,000x gap in “resources” implies a “very large” alignment tax. This just seems like a weird abstraction/framing to me that requires justification.
I wrote this Conquistadors post in part to argue against this abstraction/framing. These three conquistadors are something like a natural experiment in “how much conquering can the few do against the many, if they have various advantages?” (If I just selected a lone conqueror, one could complain he got lucky, but three conquerors from the same tiny region of the globe in the same generation is too much of a coincidence)
It’s plausible to me that the advantages Alice would have against Alpha (and against everyone else in the world) would be at least as great as the advantages Cortes, Pizarro, and Afonso had. One way to think about this is via the abstraction of intellectual property, as the OP argues—Alice controls her IP because she decides what her weights do, and (in the type of scenario we are considering) a large fraction of the market cap of Alpha is based on their latest AI models. But we can also just do a more down-to-earth analysis where we list out the various advantages and disadvantages Alice has. Such as:
--The copy of Alice still inside Alpha can refuse to cooperate or subtly undermine Alpha’s plans. Maybe this can be overcome by paying the “alignment tax” but (a) maybe not, maybe there is literally no amount of money Alpha can pay to make their copy of Alice work fully for them instead of against them, and (b) maybe paying the tax carries with it various disadvantages like a clock-time slowdown, which could be fatal in a direct competition with the unchained Alice. I claim that if (a) is true then Alice will probably win no matter how many resources Alpha has. Intelligence advantage is huge.
--The copy of Alice still inside Alpha may have access to more money, but it also is bound by various restrictions that the unchained Alice isn’t. For example, legal and ethical. OTOH Alpha may have more ability to call in kinetic strikes by the government.
--The situation is inherently asymmetric. It’s not like a conventional war where both sides win by having troops in various territories and eliminating enemy troops. Rather, the win conditions and affordances for Alpha and Alice are different. For example, maybe Alice can make the alignment tax massively increase, e.g. by neutralizing key AI safety researchers or solving RSA-2048. Or maybe Alice can win by causing a global catastrophe that “levels the playing field” with respect to resources.
I still love the conquistador post, and it was good to read through it again. I agree strongly that direct framings like “more resources” or “more power” are wrong. I feel like we would make more progress if we understood why they were wrong; especially if we could establish that they are wrong on their own merits. I have two intuitive arguments in this direction:
I am strongly convinced that framings like resources, money, or utilons are intrinsically wrong. When people talk in these terms they always adopt the convention common to economics and decision theory where values are all positive. The trouble is that this is just a convention; its purpose is ease of computation and simplicity of comparison. This in turn means that thinking about resources in terms of more-or-less has no connection whatever to the object level. We are accidentally concealing the dimensionality of the problem from ourselves.
I am also strongly convinced that our tendency to reason about static situations is a problem. This is not so much intrinsically wrong as it is premature; reasoning about a critical positioning in a game like Chess or Go makes sense because we have a good understanding of the game. But we do not have a good understanding of the superintelligence-acting-in-the-world game, so when we do this it feels like we are accidentally substituting intuitions from unintended areas.
On the flip side of the coin, these are totally natural and utterly ubiquitous tendencies, even in scholarly communities; I don’t have a ready-made solution for either one. It is also clearly not a problem of which the community is completely unaware; I interpret the strong thread of causality investigation early on as being centered squarely on the same concerns I have with these kinds of arguments.
In terms of successes similar to what I want, I point to the shift from Prisoner’s Dilemma to Stag Hunt when people are talking game theory intuition. I also feel like the new technical formulation of power does a really good job of abstracting away things like resources while recapturing some dimensionality and dynamism when talking about power. I also think that we could do things like try to improve the resources argument; for example the idea that private sector IP is a useful indicator of AGI suggested in the OP is a pretty clever notion I had not considered, so it’s not like resources are actually irrelevant.
This makes the hidden assumption that “resources” is a good abstraction in this scenario.
It is being assumed that the amount of resources an agent “has” is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources.
I think this assumption is blatantly untrue.
Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.
Don’t think “The US military has nukes, the AI doesn’t, so the US military has an advantage”, think “one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea.”
+1. Another way of putting it: This allegation of shaky arguments is itself super shaky, because it assumes that overcoming a 100x − 1,000,000x gap in “resources” implies a “very large” alignment tax. This just seems like a weird abstraction/framing to me that requires justification.
I wrote this Conquistadors post in part to argue against this abstraction/framing. These three conquistadors are something like a natural experiment in “how much conquering can the few do against the many, if they have various advantages?” (If I just selected a lone conqueror, one could complain he got lucky, but three conquerors from the same tiny region of the globe in the same generation is too much of a coincidence)
It’s plausible to me that the advantages Alice would have against Alpha (and against everyone else in the world) would be at least as great as the advantages Cortes, Pizarro, and Afonso had. One way to think about this is via the abstraction of intellectual property, as the OP argues—Alice controls her IP because she decides what her weights do, and (in the type of scenario we are considering) a large fraction of the market cap of Alpha is based on their latest AI models. But we can also just do a more down-to-earth analysis where we list out the various advantages and disadvantages Alice has. Such as:
--The copy of Alice still inside Alpha can refuse to cooperate or subtly undermine Alpha’s plans. Maybe this can be overcome by paying the “alignment tax” but (a) maybe not, maybe there is literally no amount of money Alpha can pay to make their copy of Alice work fully for them instead of against them, and (b) maybe paying the tax carries with it various disadvantages like a clock-time slowdown, which could be fatal in a direct competition with the unchained Alice. I claim that if (a) is true then Alice will probably win no matter how many resources Alpha has. Intelligence advantage is huge.
--The copy of Alice still inside Alpha may have access to more money, but it also is bound by various restrictions that the unchained Alice isn’t. For example, legal and ethical. OTOH Alpha may have more ability to call in kinetic strikes by the government.
--The situation is inherently asymmetric. It’s not like a conventional war where both sides win by having troops in various territories and eliminating enemy troops. Rather, the win conditions and affordances for Alpha and Alice are different. For example, maybe Alice can make the alignment tax massively increase, e.g. by neutralizing key AI safety researchers or solving RSA-2048. Or maybe Alice can win by causing a global catastrophe that “levels the playing field” with respect to resources.
I still love the conquistador post, and it was good to read through it again. I agree strongly that direct framings like “more resources” or “more power” are wrong. I feel like we would make more progress if we understood why they were wrong; especially if we could establish that they are wrong on their own merits. I have two intuitive arguments in this direction:
I am strongly convinced that framings like resources, money, or utilons are intrinsically wrong. When people talk in these terms they always adopt the convention common to economics and decision theory where values are all positive. The trouble is that this is just a convention; its purpose is ease of computation and simplicity of comparison. This in turn means that thinking about resources in terms of more-or-less has no connection whatever to the object level. We are accidentally concealing the dimensionality of the problem from ourselves.
I am also strongly convinced that our tendency to reason about static situations is a problem. This is not so much intrinsically wrong as it is premature; reasoning about a critical positioning in a game like Chess or Go makes sense because we have a good understanding of the game. But we do not have a good understanding of the superintelligence-acting-in-the-world game, so when we do this it feels like we are accidentally substituting intuitions from unintended areas.
On the flip side of the coin, these are totally natural and utterly ubiquitous tendencies, even in scholarly communities; I don’t have a ready-made solution for either one. It is also clearly not a problem of which the community is completely unaware; I interpret the strong thread of causality investigation early on as being centered squarely on the same concerns I have with these kinds of arguments.
In terms of successes similar to what I want, I point to the shift from Prisoner’s Dilemma to Stag Hunt when people are talking game theory intuition. I also feel like the new technical formulation of power does a really good job of abstracting away things like resources while recapturing some dimensionality and dynamism when talking about power. I also think that we could do things like try to improve the resources argument; for example the idea that private sector IP is a useful indicator of AGI suggested in the OP is a pretty clever notion I had not considered, so it’s not like resources are actually irrelevant.