I happen to agree with this generalization, provided we also respect the constraint “if you can predict that you’ll want something in the future, then you want it now”. (There might also be other coherence constraints I would want to impose! But this is a central one.)
On the one hand, if we violate this, we will usually prefer to self-modify to remove the violation. They might not entirely stop their preferences from changing, but they’d certainly want to change the method of change, at least. This is very much like a philosopher who doesn’t trust their own deliberation process. They might not want to entirely stop thinking (some ways of changing your mind are good), but they would want to modify their reasoning somehow.
(Furthermore, an agent who sees this kind of thing coming, but does not yet inhabit either conflicting camp, would probably want to self-modify in some way to avoid the conflict.)
On the other hand, suppose an agent passes through this kind of belief change without having an opportunity to self-modify. The agent will think its past self was wrong to want to resist the change. It will want to avoid that type of mistake in the future. If we make the assumption that learning will tend to make modifications which would have ‘helped’ its past self, then such an agent will learn to predict value changes and learn to agree with those predictions.
This gives us something similar to logical induction.
You mentioned in the article that you intuitively want some kind of “dominance” argument which dutch-books/money-pumps don’t give you. I would propose logical-induction style dominance. What you have is essentially the guarantee that someone with cognitive powers comparable to yours can’t come in and do a better job of satisfying your (future) values.
Why do we want that guarantee?
The usefulness of the current action to future preferences is what’s important for learning, since future preferences are the ones which get to decide how to modify things. So this is a notion of “doing the best we can” with respect to learning: we couldn’t benefit from the advice of someone with similar cognitive strength to us.
Relatedly, this is important for tiling agents: if (it looks to you like) a different configuration of a similar amount of processing power would do a better job, then you’d prefer to self-modify to that configuration.
I happen to agree with this generalization, provided we also respect the constraint “if you can predict that you’ll want something in the future, then you want it now”. (There might also be other coherence constraints I would want to impose! But this is a central one.)
On the one hand, if we violate this, we will usually prefer to self-modify to remove the violation. They might not entirely stop their preferences from changing, but they’d certainly want to change the method of change, at least. This is very much like a philosopher who doesn’t trust their own deliberation process. They might not want to entirely stop thinking (some ways of changing your mind are good), but they would want to modify their reasoning somehow.
(Furthermore, an agent who sees this kind of thing coming, but does not yet inhabit either conflicting camp, would probably want to self-modify in some way to avoid the conflict.)
On the other hand, suppose an agent passes through this kind of belief change without having an opportunity to self-modify. The agent will think its past self was wrong to want to resist the change. It will want to avoid that type of mistake in the future. If we make the assumption that learning will tend to make modifications which would have ‘helped’ its past self, then such an agent will learn to predict value changes and learn to agree with those predictions.
This gives us something similar to logical induction.
You mentioned in the article that you intuitively want some kind of “dominance” argument which dutch-books/money-pumps don’t give you. I would propose logical-induction style dominance. What you have is essentially the guarantee that someone with cognitive powers comparable to yours can’t come in and do a better job of satisfying your (future) values.
Why do we want that guarantee?
The usefulness of the current action to future preferences is what’s important for learning, since future preferences are the ones which get to decide how to modify things. So this is a notion of “doing the best we can” with respect to learning: we couldn’t benefit from the advice of someone with similar cognitive strength to us.
Relatedly, this is important for tiling agents: if (it looks to you like) a different configuration of a similar amount of processing power would do a better job, then you’d prefer to self-modify to that configuration.