Let me first make some comments about revealed preferences that might clarify how I’m seeing this. Preferences are famously underdetermined by limited choice behaviour. If A and B are available and I pick A, you can’t infer that I like A more than B — I might be indifferent or unable to compare them. Worse, under uncertainty, you can’t tell why I chose some lottery over another even if you assume I have strict preferences between all options — the lottery I choose depends on my beliefs too. In expected utility theory, beliefs and preferences together induce choice, so if we only observe a choice, we have one equation in two unknowns.[1] Given my choice, you’d need to read my mind’s probabilities to be able to infer my preferences (and vice versa).[2]
In that sense, preferences (mostly) aren’t actually revealed. Economists often assume various things to apply revealed preference theory, e.g. setting beliefs equal to ‘objective chances’, or assuming a certain functional form for the utility function.
But why do we care about preferences per se, rather than what’s revealed? Because we want to predict future behaviour. If you can’t infer my preferences from my choices, you can’t predict my future choices. In the example above, if my ‘revealed preference’ between A and B is that I prefer A, then you might make false predictions about my future behaviour (because I might well choose B next time).
Let me know if I’m on the right track for clarifying things. If I am, could you say how you see trammelling/shutdown connecting to revealed preferences as described here, and I’ll respond to that?
Let’s walk through more carefully why revealed preferences are interesting in the shutdown problem. (I’m partly thinking as I write, here.) Suppose that, at various times, the agent is offered opportunities to spend resources in order to cause the button to be pushed/unpushed. We want the agent to turn down such opportunities, in both directions—implying either indifference or lack of preference in any revealed preferences. Further, we do want the agent to spend resources to cause various different outcomes within the button-pressed or button-unpressed worlds, so there’s nontrivial revealed preference ordering within button-pressed worlds and within button-unpressed worlds. But if the agent is to turn down costly opportunities to cause the button to be pressed/unpressed, and those opportunities jump between enough different pressed-outcome and unpressed-outcome pairs (which themselves each have nontrivial revealed preferences), then there’s going to be a revealed preference gap.
Upshot: (one way to frame) the reason that the shutdown problem is difficult/interesting in the first place, is that the desired behavior implies a revealed preference gap. Insofar as e.g. any standard expected utility maximizer cannot have a revealed preference gap, such standard EU maximizers cannot behave the way we want. (This frame is new-to-me, so thankyou.)
(Note that that’s all totally compatible with revealed preferences usually being very underdetermined! The desired behavior nails things down enough that any assignment of revealed preferences must have a preferential gap. The question is whether we can come up with some agent with a stable gap in its revealed preferences.)
(Also note that the story above routed through causal intervention/counterfactuals to probe revealed preference, so that does open up a lot of extra ways-of-revealing. Not sure if that’s relevant yet.)
Now bringing this back to DSM… I think the question I’m interested in is: “do trammelling-style issues imply that DSM agents will not have a revealed preference gap (under reasonable assumptions about their environment and capabilities)?”. If the answer is “yes”—i.e. if trammelling-style issues do imply that sufficiently capable DSM agents will have no revealed preference gaps—then that would imply that capable DSM agents cannot display the shutdown behavior we want.
On the other hand, if DSM agents can have revealed preference gaps, without having to artificially limit the agents’ capabilities or the richness of the environment, then that seems like it would circumvent the main interesting barrier to the shutdown problem. So I think that’s my main crux here.
Great, I think bits of this comment help me understand what you’re pointing to.
the desired behavior implies a revealed preference gap
I think this is roughly right, together with all the caveats about the exact statements of Thornley’s impossibility theorems. Speaking precisely here will be cumbersome so for the sake of clarity I’ll try to restate what you wrote like this:
Useful agents satisfying completeness and other properties X won’t be shutdownable.
Properties X are necessary for an agent to be useful.
So, useful agents satisfying completeness won’t be shutdownable.
So, if a useful agent is shutdownable, its preferences are incomplete.
This argument would let us say that observing usefulness and shutdownability reveals a preferential gap.
I think the question I’m interested in is: “do trammelling-style issues imply that DSM agents will not have a revealed preference gap (under reasonable assumptions about their environment and capabilities)?”
A quick distinction: an agent can (i) reveal p, (ii) reveal ¬p, or (iii) neither reveal p nor ¬p. The problem of underdetermination of preference is of the third form.
We can think of some of the properties we’ve discussed as ‘tests’ of incomparability, which might or might not reveal preferential gaps. The test in the argument just above is whether the agent is useful and shutdownable. The test I use for my results above (roughly) is ‘arbitrary choice’. The reason I use that test is that my results are self-contained; I don’t make use of Thornley’s various requirements for shutdownability. Of course, arbitrary choice isn’t what we want for shutdownability. It’s just a test for incomparability that I used for an agent that isn’t yet endowed with Thornley’s other requirements.
The trammelling results, though, don’t give me any reason to think that DSM is problematic for shutdownability. I haven’t formally characterised an agent satisfying DSM as well as TND, Stochastic Near-Dominance, and so on, so I can’t yet give a definitive or exact answer to how DSM affects the behaviour of a Thornley-style agent. (This is something I’ll be working on.) But regarding trammelling, I think my results are reasons for optimism if anything. Even in the least convenient case that I looked at—awareness growth—I wrote this in section 3.3. as an intuition pump:
we’re simply picking out the best prospects in each class. For instance, suppose prospects were representable as pairs ⟨s,c⟩ that are comparable iff the s-values are the same, and then preferred to the extent that c is large. Then here’s the process: for each value of s, identify the options that maximise c. Put all of these in a set. Then choice between any options in that set will always remain arbitrary; never trammelled.
That is, we retain the preferential gap between the options we want a preferential gap between.
[As an aside, the description in your first paragraph of what we want from a shutdownable agent doesn’t quite match Thornley’s setup; the relevant part to see this is section 10.1. here.]
That makes sense, yeah.
Let me first make some comments about revealed preferences that might clarify how I’m seeing this. Preferences are famously underdetermined by limited choice behaviour. If A and B are available and I pick A, you can’t infer that I like A more than B — I might be indifferent or unable to compare them. Worse, under uncertainty, you can’t tell why I chose some lottery over another even if you assume I have strict preferences between all options — the lottery I choose depends on my beliefs too. In expected utility theory, beliefs and preferences together induce choice, so if we only observe a choice, we have one equation in two unknowns.[1] Given my choice, you’d need to read my mind’s probabilities to be able to infer my preferences (and vice versa).[2]
In that sense, preferences (mostly) aren’t actually revealed. Economists often assume various things to apply revealed preference theory, e.g. setting beliefs equal to ‘objective chances’, or assuming a certain functional form for the utility function.
But why do we care about preferences per se, rather than what’s revealed? Because we want to predict future behaviour. If you can’t infer my preferences from my choices, you can’t predict my future choices. In the example above, if my ‘revealed preference’ between A and B is that I prefer A, then you might make false predictions about my future behaviour (because I might well choose B next time).
Let me know if I’m on the right track for clarifying things. If I am, could you say how you see trammelling/shutdown connecting to revealed preferences as described here, and I’ll respond to that?
L∗∈argmaxL∑iu(xi)p(xi[L])
The situation is even worse when you can’t tell what I’m choosing between, or what my preference relation is defined over.
Feels like we’re making some progress here.
Let’s walk through more carefully why revealed preferences are interesting in the shutdown problem. (I’m partly thinking as I write, here.) Suppose that, at various times, the agent is offered opportunities to spend resources in order to cause the button to be pushed/unpushed. We want the agent to turn down such opportunities, in both directions—implying either indifference or lack of preference in any revealed preferences. Further, we do want the agent to spend resources to cause various different outcomes within the button-pressed or button-unpressed worlds, so there’s nontrivial revealed preference ordering within button-pressed worlds and within button-unpressed worlds. But if the agent is to turn down costly opportunities to cause the button to be pressed/unpressed, and those opportunities jump between enough different pressed-outcome and unpressed-outcome pairs (which themselves each have nontrivial revealed preferences), then there’s going to be a revealed preference gap.
Upshot: (one way to frame) the reason that the shutdown problem is difficult/interesting in the first place, is that the desired behavior implies a revealed preference gap. Insofar as e.g. any standard expected utility maximizer cannot have a revealed preference gap, such standard EU maximizers cannot behave the way we want. (This frame is new-to-me, so thankyou.)
(Note that that’s all totally compatible with revealed preferences usually being very underdetermined! The desired behavior nails things down enough that any assignment of revealed preferences must have a preferential gap. The question is whether we can come up with some agent with a stable gap in its revealed preferences.)
(Also note that the story above routed through causal intervention/counterfactuals to probe revealed preference, so that does open up a lot of extra ways-of-revealing. Not sure if that’s relevant yet.)
Now bringing this back to DSM… I think the question I’m interested in is: “do trammelling-style issues imply that DSM agents will not have a revealed preference gap (under reasonable assumptions about their environment and capabilities)?”. If the answer is “yes”—i.e. if trammelling-style issues do imply that sufficiently capable DSM agents will have no revealed preference gaps—then that would imply that capable DSM agents cannot display the shutdown behavior we want.
On the other hand, if DSM agents can have revealed preference gaps, without having to artificially limit the agents’ capabilities or the richness of the environment, then that seems like it would circumvent the main interesting barrier to the shutdown problem. So I think that’s my main crux here.
Great, I think bits of this comment help me understand what you’re pointing to.
I think this is roughly right, together with all the caveats about the exact statements of Thornley’s impossibility theorems. Speaking precisely here will be cumbersome so for the sake of clarity I’ll try to restate what you wrote like this:
Useful agents satisfying completeness and other properties X won’t be shutdownable.
Properties X are necessary for an agent to be useful.
So, useful agents satisfying completeness won’t be shutdownable.
So, if a useful agent is shutdownable, its preferences are incomplete.
This argument would let us say that observing usefulness and shutdownability reveals a preferential gap.
A quick distinction: an agent can (i) reveal p, (ii) reveal ¬p, or (iii) neither reveal p nor ¬p. The problem of underdetermination of preference is of the third form.
We can think of some of the properties we’ve discussed as ‘tests’ of incomparability, which might or might not reveal preferential gaps. The test in the argument just above is whether the agent is useful and shutdownable. The test I use for my results above (roughly) is ‘arbitrary choice’. The reason I use that test is that my results are self-contained; I don’t make use of Thornley’s various requirements for shutdownability. Of course, arbitrary choice isn’t what we want for shutdownability. It’s just a test for incomparability that I used for an agent that isn’t yet endowed with Thornley’s other requirements.
The trammelling results, though, don’t give me any reason to think that DSM is problematic for shutdownability. I haven’t formally characterised an agent satisfying DSM as well as TND, Stochastic Near-Dominance, and so on, so I can’t yet give a definitive or exact answer to how DSM affects the behaviour of a Thornley-style agent. (This is something I’ll be working on.) But regarding trammelling, I think my results are reasons for optimism if anything. Even in the least convenient case that I looked at—awareness growth—I wrote this in section 3.3. as an intuition pump:
That is, we retain the preferential gap between the options we want a preferential gap between.
[As an aside, the description in your first paragraph of what we want from a shutdownable agent doesn’t quite match Thornley’s setup; the relevant part to see this is section 10.1. here.]