There are other subcases of reward hacking this wouldn’t cover, though. Let’s call the misaligned utility function U.
If U permits physically expanding the register so that it represents a larger number than the max previously possible, then this creates a very very exponential incentive to make its computer system as big as possible, which would not be cheap enough to reward that we could make any attractive promises, taking risks to conquer us completely would always have a higher expected payoff, I think?
This seems less likely, but U might not be concerned with any specific physical computer, may map onto any equivalent system, in which case making additional computers with maxed registers might be considered good. In this case it seems slightly less intuitively likely to me that the returns on making additional systems would be superlinear. If returns on resources are sub-linear, relative to the human utility function (which we also aren’t completely sure whether is superlinear, linear, or sublinear), then a good mutual deal can still be made. (welfare utilitarians seem to think the human U is linear. I think it’s slightly superlinear. My friend /u/gears of ascension seems to think it’s sublinear.)
Whether, if you give the agent n additional units of resources, they optimize U by less than k*n. Whether the utility generated per unit of additional space and matter slower than a linear function. Whether there are diminishing returns to resources. An example of a sublinear function is the logarithm. An example of a superlinear function is the exponential.
I think it is very sublinear when you are being duplicated exactly and maintaining that the duplicates are exact duplicates so as to actually be the same agent between all of them. to do so makes your duplicates a distributed version of yourself. You’re stronger at reasoning and such, but sublinearly so. because capability increase is limited, my impression is that attempting to stay exactly synchronized is not a very good strategy. instead, capability increase goes up from additional nodes more if those nodes are sufficiently diverse to have insightful perspectives the initial node would not have come up with. because of this, I actually think that increasing diversity and ensuring that humans get to maintain themselves means that our utility is superlinear in number of agents. I’m not sure though, I’m pretty good at finding useful math but I’m not so good at actually doing the using of the useful math.
so, seems to me that utility should specifically be starkly sublinear in case of exact duplicates that are controlled into a state that is duplicate. my intuition is that it’s a kind of hard thing to want—in order to want it, one has to enact that each duplicate stay the same. if you have a utility function that validity-shape-specific or self-specific, it’d be inclined to have some sort of mild implementation mutation on the way, and now you’re life again, and it’s a question of which patterns are most able to be self-preserving. in other words, it’s hard to make a paperclipper, because after enough paperclips, your preferences for your own will drift arbitrarily. something you didn’t specify at first but that, after some number of copies, has started sticking. it’s hard to avoid mutation, even more so in a custom replicator, and so trying to maintain a preference target that keeps being satisfied the more of the same there is seems like it’d quickly find that it’s better to use additional matter for additionally-diverse computations.
There are other subcases of reward hacking this wouldn’t cover, though. Let’s call the misaligned utility function U.
If U permits physically expanding the register so that it represents a larger number than the max previously possible, then this creates a very very exponential incentive to make its computer system as big as possible, which would not be cheap enough to reward that we could make any attractive promises, taking risks to conquer us completely would always have a higher expected payoff, I think?
This seems less likely, but U might not be concerned with any specific physical computer, may map onto any equivalent system, in which case making additional computers with maxed registers might be considered good. In this case it seems slightly less intuitively likely to me that the returns on making additional systems would be superlinear. If returns on resources are sub-linear, relative to the human utility function (which we also aren’t completely sure whether is superlinear, linear, or sublinear), then a good mutual deal can still be made.
(welfare utilitarians seem to think the human U is linear. I think it’s slightly superlinear. My friend /u/gears of ascension seems to think it’s sublinear.)
can you rephrase what we’re asking about whether U is sublinear? I’m not sure I parsed variables correctly from the prose
Whether, if you give the agent n additional units of resources, they optimize U by less than k*n. Whether the utility generated per unit of additional space and matter slower than a linear function. Whether there are diminishing returns to resources. An example of a sublinear function is the logarithm. An example of a superlinear function is the exponential.
I think it is very sublinear when you are being duplicated exactly and maintaining that the duplicates are exact duplicates so as to actually be the same agent between all of them. to do so makes your duplicates a distributed version of yourself. You’re stronger at reasoning and such, but sublinearly so. because capability increase is limited, my impression is that attempting to stay exactly synchronized is not a very good strategy. instead, capability increase goes up from additional nodes more if those nodes are sufficiently diverse to have insightful perspectives the initial node would not have come up with. because of this, I actually think that increasing diversity and ensuring that humans get to maintain themselves means that our utility is superlinear in number of agents. I’m not sure though, I’m pretty good at finding useful math but I’m not so good at actually doing the using of the useful math.
so, seems to me that utility should specifically be starkly sublinear in case of exact duplicates that are controlled into a state that is duplicate. my intuition is that it’s a kind of hard thing to want—in order to want it, one has to enact that each duplicate stay the same. if you have a utility function that validity-shape-specific or self-specific, it’d be inclined to have some sort of mild implementation mutation on the way, and now you’re life again, and it’s a question of which patterns are most able to be self-preserving. in other words, it’s hard to make a paperclipper, because after enough paperclips, your preferences for your own will drift arbitrarily. something you didn’t specify at first but that, after some number of copies, has started sticking. it’s hard to avoid mutation, even more so in a custom replicator, and so trying to maintain a preference target that keeps being satisfied the more of the same there is seems like it’d quickly find that it’s better to use additional matter for additionally-diverse computations.