Thank you for your response. I believe I understand you correctly, I made a response to Manfred’s comment in which I reference your response as well. Do you believe I interpreted you correctly?
An agent that has an empathetic utility functions will only edit its own code if and only if it maximizes expected utility of the same empathetic utility function. Do I get your drift?
I think that’s right, though just to be clear an empathetic utility function isn’t required for this behavior. Just a utility function and a high enough degree of rationality (and the ability to edit its own source code).
Suppose an agent has a utility function X. It can modify it’s utility function to become Y. It will only make the switch from “X” to “Y” if it believes that switching will ultimately maximize X. It will not switch to Y simply because it believes it can get a higher amount of Y than X.
This is correct, if the agent has perfect knowledge of themselves, if X is self-consistent, it X is cheap to compute, etc.
The article supposes that “Bob is a perfect rationalist”. What exactly does it mean? In my opinion, it does not mean the he is always right. He is “merely” able to choose the best possible bet based on his imperfect information. In a few branches of a quantum multiverse his choice will be wrong (and he anticipates it), because even his perfect reasoning could be misled by a large set of very improbable events.
Bob may be aware that some of his values are incosistent and he may choose to sacrifice some of them to create a best possible coherent approximation (an intra-personal CEV of Bob.)
In theory, X can be very expensive to compute, so Bob must spend significant resources to calculate X precisely, and these resources cannot be used for increasing X directly. If there is a function Y giving very similar results to X, but much cheaper to compute, then Bob may make a calculated risk of replacing X by Y, assuming that maximum Y will give him near-maximum Y, and he can spend the saved resources to increase Y, thereby paradoxically (probably) obtaining higher X than if he tried to increase X directly.
Thank you for your response. I believe I understand you correctly, I made a response to Manfred’s comment in which I reference your response as well. Do you believe I interpreted you correctly?
An agent that has an empathetic utility functions will only edit its own code if and only if it maximizes expected utility of the same empathetic utility function. Do I get your drift?
I think that’s right, though just to be clear an empathetic utility function isn’t required for this behavior. Just a utility function and a high enough degree of rationality (and the ability to edit its own source code).
Put another way:
Suppose an agent has a utility function X. It can modify it’s utility function to become Y. It will only make the switch from “X” to “Y” if it believes that switching will ultimately maximize X. It will not switch to Y simply because it believes it can get a higher amount of Y than X.
This is correct, if the agent has perfect knowledge of themselves, if X is self-consistent, it X is cheap to compute, etc.
The article supposes that “Bob is a perfect rationalist”. What exactly does it mean? In my opinion, it does not mean the he is always right. He is “merely” able to choose the best possible bet based on his imperfect information. In a few branches of a quantum multiverse his choice will be wrong (and he anticipates it), because even his perfect reasoning could be misled by a large set of very improbable events.
Bob may be aware that some of his values are incosistent and he may choose to sacrifice some of them to create a best possible coherent approximation (an intra-personal CEV of Bob.)
In theory, X can be very expensive to compute, so Bob must spend significant resources to calculate X precisely, and these resources cannot be used for increasing X directly. If there is a function Y giving very similar results to X, but much cheaper to compute, then Bob may make a calculated risk of replacing X by Y, assuming that maximum Y will give him near-maximum Y, and he can spend the saved resources to increase Y, thereby paradoxically (probably) obtaining higher X than if he tried to increase X directly.