This argument rests on what I’ll call the strategy-stealing assumption: for any strategy an unaligned AI could use to influence the long-run future, there is an analogous strategy that a similarly-sized group of humans can use in order to capture a similar amount of flexible influence over the future.
The word “assumption” in “strategy-stealing assumption” keeps making me think that you’re assuming this as a proposition and deriving consequences from it, but the actual assumption you’re making is more like “it’s a good idea to pick strategy-stealing as an instrumental goal to work towards, i.e., to work on things that would make the ‘strategy-stealing assumption’ true.” This depends on at least 2 things:
If “strategy-stealing assumption” is true, we can get most of what we “really” want by doing strategy-stealing. (Example of how this can be false: (Logical) Time is of the essence)
It’s not too hard to make “strategy-stealing assumption” true.
(If either 1 or 2 is false, then it would make more sense to work in another direction, like trying to get a big enough advantage to take over the world and prevent any unaligned AIs from arising, or trying to coordinate world governments to do that.)
Is this understanding correct? Also, because there is no name for “it’s a good idea to try to make the ‘strategy-stealing assumption’ true’ I think I and others have occasionally been using “strategy-stealing assumption” to refer to that as well, which I’m not sure if you’d endorse or not. Since there are other issues with the name (like “stealing” making some people think “literally stealing”), I wonder if you’d be open to reconsidering the terminology.
ETA: Re-reading the sentence I quoted makes me realize that you named it “assumption” because it’s an assumption needed for Jessica’s argument, so it does make sense in that context. In the long run though, it might make more sense to call it something like a “goal” or “framework” since again in the larger scheme of things you’re not so much assuming it and trying to figure out what to do given that it’s true, as trying to make it true or using it as a framework for finding problems to work on.
I wrote this post imagining “strategy-stealing assumption” as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing “Under a strategy-stealing assumption, this AI would result in an OK outcome.” The post was motivated by trying to write up another argument where I wanted to use this assumption, spending a bit of time trying to think through what the assumption was, and deciding it was likely to be of independent interest. (Although that hasn’t yet appeared in print.)
I’d be happy to have a better name for the research goal of making it so that this kind of assumption is true. I agree this isn’t great. (And then I would probably be able to use that name in the description of this assumption as well.)
I wrote this post imagining “strategy-stealing assumption” as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing “Under a strategy-stealing assumption, this AI would result in an OK outcome.”
When you say “strategy-stealing assumption” in this sentence, do you mean the relatively narrow assumption that you gave in this post, specifically about “flexible influence”:
This argument rests on what I’ll call the strategy-stealing assumption: for any strategy an unaligned AI could use to influence the long-run future, there is an analogous strategy that a similarly-sized group of humans can use in order to capture a similar amount of flexible influence over the future.
or a stronger assumption that also includes that the universe and our values are such that “capture a similar amount of flexible influence over the future” would lead to an OK outcome? I’m guessing the latter? I feel like people, including me sometimes and you in this instance, are equivocating back and forth between these two meanings when using “strategy-stealing assumption”. Maybe we should have two different terms for these two concepts too?
The word “assumption” in “strategy-stealing assumption” keeps making me think that you’re assuming this as a proposition and deriving consequences from it, but the actual assumption you’re making is more like “it’s a good idea to pick strategy-stealing as an instrumental goal to work towards, i.e., to work on things that would make the ‘strategy-stealing assumption’ true.” This depends on at least 2 things:
If “strategy-stealing assumption” is true, we can get most of what we “really” want by doing strategy-stealing. (Example of how this can be false: (Logical) Time is of the essence)
It’s not too hard to make “strategy-stealing assumption” true.
(If either 1 or 2 is false, then it would make more sense to work in another direction, like trying to get a big enough advantage to take over the world and prevent any unaligned AIs from arising, or trying to coordinate world governments to do that.)
Is this understanding correct? Also, because there is no name for “it’s a good idea to try to make the ‘strategy-stealing assumption’ true’ I think I and others have occasionally been using “strategy-stealing assumption” to refer to that as well, which I’m not sure if you’d endorse or not. Since there are other issues with the name (like “stealing” making some people think “literally stealing”), I wonder if you’d be open to reconsidering the terminology.
ETA: Re-reading the sentence I quoted makes me realize that you named it “assumption” because it’s an assumption needed for Jessica’s argument, so it does make sense in that context. In the long run though, it might make more sense to call it something like a “goal” or “framework” since again in the larger scheme of things you’re not so much assuming it and trying to figure out what to do given that it’s true, as trying to make it true or using it as a framework for finding problems to work on.
I wrote this post imagining “strategy-stealing assumption” as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing “Under a strategy-stealing assumption, this AI would result in an OK outcome.” The post was motivated by trying to write up another argument where I wanted to use this assumption, spending a bit of time trying to think through what the assumption was, and deciding it was likely to be of independent interest. (Although that hasn’t yet appeared in print.)
I’d be happy to have a better name for the research goal of making it so that this kind of assumption is true. I agree this isn’t great. (And then I would probably be able to use that name in the description of this assumption as well.)
When you say “strategy-stealing assumption” in this sentence, do you mean the relatively narrow assumption that you gave in this post, specifically about “flexible influence”:
or a stronger assumption that also includes that the universe and our values are such that “capture a similar amount of flexible influence over the future” would lead to an OK outcome? I’m guessing the latter? I feel like people, including me sometimes and you in this instance, are equivocating back and forth between these two meanings when using “strategy-stealing assumption”. Maybe we should have two different terms for these two concepts too?