Kind of tangential but I’d be interested in your take on how strongly money-pumping etc is actually an argument against full-on cyclical preferences? One way to think about why getting money-pumped is bad is because you have an additional preference to not pay money to go nowhere. But it feels like all this tells us is that “something has to go”, and if an agent is rationally permitted to modify its own preferences to avoid these situations then it seems a priori acceptable for it to instead just say something like “well actually I weight my cyclical preferences more highly so I’ll modify the preference against arbitrarily paying”
In other words, it feels like the money-pumping arguments presume this other preference that in a sense takes “precedence” over the cyclical ones and I’m not sure how to think about that still
I find the money-pumping arguments compelling not as normative arguments about what preferences are “allowed”, but as engineering/security/survival arguments about what properties of preferences are necessary for them to be stable against an adversarial environment (which is distinct from what properties are sufficient for them to be stable, and possibly distinct from questions of self-modification).
Yeah I agree that even if they fall short of normative constraints there’s some empirical content around what happens in adversarial environments. I think I have doubts that this stuff translates to thinking about AGIs too much though, in the sense that there’s an obvious story of how an adversarial environment selected for (partial) coherence in us, but I don’t see the same kinds of selection pressures being a force on AGIs. Unless you assume that they’ll want to modify themselves in anticipation of adversarial environments which kinda begs the question
Hmm, I was going to reply with something like “money-pumps don’t just say something about adversarial environments, they also say something about avoiding leaking resources” (e.g. if you have circular preferences between proximity to apples, bananas, and carrots, then if you encounter all three of them in a single room you might get trapped walking between them forever) but that’s also begging your original question—we can always just update to enjoy leaking resources, transmuting a “leak” into an “expenditure”.
Another frame here is that if you make/encounter an agent, and that agent self-modifies into/starts off as something which is happy to leak pretty fundamental resources like time and energy and material-under-control, then you’re not as worried about it? It’s certainly not competing as strongly for the same resources as you whenever it’s “under the influence” of its circular preferences.
Kind of tangential but I’d be interested in your take on how strongly money-pumping etc is actually an argument against full-on cyclical preferences? One way to think about why getting money-pumped is bad is because you have an additional preference to not pay money to go nowhere. But it feels like all this tells us is that “something has to go”, and if an agent is rationally permitted to modify its own preferences to avoid these situations then it seems a priori acceptable for it to instead just say something like “well actually I weight my cyclical preferences more highly so I’ll modify the preference against arbitrarily paying”
In other words, it feels like the money-pumping arguments presume this other preference that in a sense takes “precedence” over the cyclical ones and I’m not sure how to think about that still
(I’m not EJT, but for what it’s worth:)
I find the money-pumping arguments compelling not as normative arguments about what preferences are “allowed”, but as engineering/security/survival arguments about what properties of preferences are necessary for them to be stable against an adversarial environment (which is distinct from what properties are sufficient for them to be stable, and possibly distinct from questions of self-modification).
Yeah I agree that even if they fall short of normative constraints there’s some empirical content around what happens in adversarial environments. I think I have doubts that this stuff translates to thinking about AGIs too much though, in the sense that there’s an obvious story of how an adversarial environment selected for (partial) coherence in us, but I don’t see the same kinds of selection pressures being a force on AGIs. Unless you assume that they’ll want to modify themselves in anticipation of adversarial environments which kinda begs the question
Hmm, I was going to reply with something like “money-pumps don’t just say something about adversarial environments, they also say something about avoiding leaking resources” (e.g. if you have circular preferences between proximity to apples, bananas, and carrots, then if you encounter all three of them in a single room you might get trapped walking between them forever) but that’s also begging your original question—we can always just update to enjoy leaking resources, transmuting a “leak” into an “expenditure”.
Another frame here is that if you make/encounter an agent, and that agent self-modifies into/starts off as something which is happy to leak pretty fundamental resources like time and energy and material-under-control, then you’re not as worried about it? It’s certainly not competing as strongly for the same resources as you whenever it’s “under the influence” of its circular preferences.