There are two definitions of rationality to keep in mind: epistemic rationality and instrumental rationality. An agent is epistemically rational to the extent it update their beliefs about the world based on the evidence and in accordance with probability theory—notably Bayes rule.
On the other hand, an agent is instrumentally rational to the extent it maximizes it’s utility function (i.e. satisfies it’s preferences).
There is no such thing as “rational preferences,” though much ink has been spilled trying to argue for them. Clearly preferences can’t be rational in an epistemic sense because, well, preferences aren’t beliefs. Now can preferences be rational in the instrumental sense? Well, actually, yes but only in the sense that having a certain set of preferences may maximize the preferences you actually care about—not in the sense of some sort of categorical imperative. Suppose a rational agent has the ability to modify their own utility function (i.e. preferences) - maybe an AI that can rewrite its own source code. Would it do it? Well, only if it maximizes that agent’s utility function. In other words, a rational agent will change its utility function if and only if it maximizes expected utility according to that same utility function—which is unlikely to happen under most normal circumstances.
As for Bob, presumably he’s a human. Humans aren’t rational, so all bets are off as far as what I said above. However, let’s assume at least with respect to utility function changing behavior Bob is rational. Will he change his utility function? Again, only if he expects it to better help him maximize that same utility function. Now then, what do we make of him editing out his alcoholism? Isn’t that a case of editing his utility function? Actually, it isn’t—it’s more of a constraint of the hardware that Bob is running on. There are a lots of programs running inside Bob’s head (and yours), but only a subset are Bob. The difficult part is figuring out which parts of Bob’s head are Bob and which aren’t.
Thank you for your response. I believe I understand you correctly, I made a response to Manfred’s comment in which I reference your response as well. Do you believe I interpreted you correctly?
An agent that has an empathetic utility functions will only edit its own code if and only if it maximizes expected utility of the same empathetic utility function. Do I get your drift?
I think that’s right, though just to be clear an empathetic utility function isn’t required for this behavior. Just a utility function and a high enough degree of rationality (and the ability to edit its own source code).
Suppose an agent has a utility function X. It can modify it’s utility function to become Y. It will only make the switch from “X” to “Y” if it believes that switching will ultimately maximize X. It will not switch to Y simply because it believes it can get a higher amount of Y than X.
This is correct, if the agent has perfect knowledge of themselves, if X is self-consistent, it X is cheap to compute, etc.
The article supposes that “Bob is a perfect rationalist”. What exactly does it mean? In my opinion, it does not mean the he is always right. He is “merely” able to choose the best possible bet based on his imperfect information. In a few branches of a quantum multiverse his choice will be wrong (and he anticipates it), because even his perfect reasoning could be misled by a large set of very improbable events.
Bob may be aware that some of his values are incosistent and he may choose to sacrifice some of them to create a best possible coherent approximation (an intra-personal CEV of Bob.)
In theory, X can be very expensive to compute, so Bob must spend significant resources to calculate X precisely, and these resources cannot be used for increasing X directly. If there is a function Y giving very similar results to X, but much cheaper to compute, then Bob may make a calculated risk of replacing X by Y, assuming that maximum Y will give him near-maximum Y, and he can spend the saved resources to increase Y, thereby paradoxically (probably) obtaining higher X than if he tried to increase X directly.
There are two definitions of rationality to keep in mind: epistemic rationality and instrumental rationality. An agent is epistemically rational to the extent it update their beliefs about the world based on the evidence and in accordance with probability theory—notably Bayes rule.
On the other hand, an agent is instrumentally rational to the extent it maximizes it’s utility function (i.e. satisfies it’s preferences).
There is no such thing as “rational preferences,” though much ink has been spilled trying to argue for them. Clearly preferences can’t be rational in an epistemic sense because, well, preferences aren’t beliefs. Now can preferences be rational in the instrumental sense? Well, actually, yes but only in the sense that having a certain set of preferences may maximize the preferences you actually care about—not in the sense of some sort of categorical imperative. Suppose a rational agent has the ability to modify their own utility function (i.e. preferences) - maybe an AI that can rewrite its own source code. Would it do it? Well, only if it maximizes that agent’s utility function. In other words, a rational agent will change its utility function if and only if it maximizes expected utility according to that same utility function—which is unlikely to happen under most normal circumstances.
As for Bob, presumably he’s a human. Humans aren’t rational, so all bets are off as far as what I said above. However, let’s assume at least with respect to utility function changing behavior Bob is rational. Will he change his utility function? Again, only if he expects it to better help him maximize that same utility function. Now then, what do we make of him editing out his alcoholism? Isn’t that a case of editing his utility function? Actually, it isn’t—it’s more of a constraint of the hardware that Bob is running on. There are a lots of programs running inside Bob’s head (and yours), but only a subset are Bob. The difficult part is figuring out which parts of Bob’s head are Bob and which aren’t.
Thank you for your response. I believe I understand you correctly, I made a response to Manfred’s comment in which I reference your response as well. Do you believe I interpreted you correctly?
An agent that has an empathetic utility functions will only edit its own code if and only if it maximizes expected utility of the same empathetic utility function. Do I get your drift?
I think that’s right, though just to be clear an empathetic utility function isn’t required for this behavior. Just a utility function and a high enough degree of rationality (and the ability to edit its own source code).
Put another way:
Suppose an agent has a utility function X. It can modify it’s utility function to become Y. It will only make the switch from “X” to “Y” if it believes that switching will ultimately maximize X. It will not switch to Y simply because it believes it can get a higher amount of Y than X.
This is correct, if the agent has perfect knowledge of themselves, if X is self-consistent, it X is cheap to compute, etc.
The article supposes that “Bob is a perfect rationalist”. What exactly does it mean? In my opinion, it does not mean the he is always right. He is “merely” able to choose the best possible bet based on his imperfect information. In a few branches of a quantum multiverse his choice will be wrong (and he anticipates it), because even his perfect reasoning could be misled by a large set of very improbable events.
Bob may be aware that some of his values are incosistent and he may choose to sacrifice some of them to create a best possible coherent approximation (an intra-personal CEV of Bob.)
In theory, X can be very expensive to compute, so Bob must spend significant resources to calculate X precisely, and these resources cannot be used for increasing X directly. If there is a function Y giving very similar results to X, but much cheaper to compute, then Bob may make a calculated risk of replacing X by Y, assuming that maximum Y will give him near-maximum Y, and he can spend the saved resources to increase Y, thereby paradoxically (probably) obtaining higher X than if he tried to increase X directly.