You are about to be forced into the brainwashing machine. You know that the machine is very effective, and when you exit the machine, you will believe the moon is made of green cheese. You update to believe the moon is made of green cheese now.
Your new password is long and complicated. You confidently predict that by this time next week, you will have forgotten it and have no idea what your password was. Thus you update now and have no idea what your password is even though you remember it.
This rule about your future credences being unpredictable only works if there are no forces making your beliefs less accurate over time.
Is there a precise / formal statement of this, or an analysis of some good concept of the sort of agent that can have this unpredictability? What is a “force making your belief” some way, but excluding “endorsed” updates?
There’s some sort of reflection here, where what it means to think the brainwashing machine is a brainwashing machine rather than just the store of knowledge about the composition of the moon, is to think that it breaks correspondences....
If you are doing perfect baysian updates and nothing else, that is an endorsed update.
Note that in a baysian framework, you can assign 50% probability to some statement, and be confident that you will still assign 50% probability next year, if you don’t expect any new evidence. You can also assign 50% probability, with the expectation that it will turn into 0% or 100% in 5 minutes. You can also expect your probabilities to change conditional on future survival. If you are alive next year, you will have updated towards the tumour being non-malignant. But in the toy model where you never die and always do perfect baysian updates like AIXI, the expected future probability is equal to the current probability. (The distribution over future probabilities can be anything with this expectation.)
Actually there is also the possibility of strange loops, if you try to use this rule to determine current belief. You correctly predict that this whole thing happens. So you predict you will update your belief in X to 93%, so you update your belief in X to 93% as predicted. You start believing any nonsense today, just because you predict that you will believe it tomorrow.
In everyday rationality, the noise is usually fairly small compared to the updates. This rule can be used the way Eliezer is using it fairly well. The strange loops aren’t something a human would usually do (as opposed to an AI) If for some reason, your expected future beliefs are easier to calculate than your current beliefs, use those. If you have a box, and you know that when you open the box, for all the things you can imagine seeing in the box, you would believe X, then you should go ahead and believe X now. (If you have a good imagination)
There’s often not just one possible future self. First you choose which future self you wish to become, and then you update based on the credence of that self. Eliezer wants to become the future-Eliezer who knows the topic which the homework taught, so he updates accordingly. Before doing the homework, there’s a possible-future-Eliezer who blew off the class to watch cartoons and learned nothing. Eliezer has to choose which of those future-Eliezers he wants to update toward becoming, just as you have to choose which of your possible future selves you would prefer to be.
If you are about to be forced into the brainwashing machine, there’s a high probability that you will be brainwashed if you do nothing about it. But if you want to become the possible future self who does not desire to hold inaccurate beliefs indefinitely, you can update to that later future self’s rediscovery that the moon is made of rock wrapped in regolith.
Your new password is long and complicated. You confidently predict that, if you do nothing about it, you will have forgotten the password by this time next week. Since you desire to not forget your password, you choose to draw a picture that reminds you of what the password was, and type it out a few extra times. There’s a possible future self who has memorized the long and complex password, and you update yourself to more closely resemble that future self, perhaps by taking actions which they would have to have taken.
In the least convenient possible world, the brainwashing machine is very effective and you will be living the rest of your life underground with no access to any info about the moon.
The password won’t be needed after tomorrow, there is no point remembering it longer than that.
First you choose which future self you wish to become, and then you update based on the credence of that self.
I wish to become a future self that knows everything, wait I can’t just update my beliefs to match a hypothetical omniscient future version of myself, thus becoming omniscient.
If you want you future beliefs to be more accurate, update your beliefs to be more accurate. True advice, but not very useful.
I wish to become a future self that is confident they are living in a post ASI utopia. But deluding myself into thinking I am in a post ASI utopia despite strong evidence to the contrary isn’t a good idea.
I think this makes a useful distinction, but it’s against something this article didn’t quite say. If the “brainwashing machine” convinces you of its agenda by just giving you evidence, then of course you should believe what it will teach you. If it just replaces your brain with one that believes in its agenda, then this isn’t a belief that was “arrived at using your own reasoning process” (though this may have been added as response to what you said).
A similar move works to disarm your password example. I don’t gain any new information that changes my beliefs about the password, I just forget. There is no reasoning process that changes what I’m going to type into the box on the login page.
You are about to be forced into the brainwashing machine. You know that the machine is very effective, and when you exit the machine, you will believe the moon is made of green cheese. You update to believe the moon is made of green cheese now.
Your new password is long and complicated. You confidently predict that by this time next week, you will have forgotten it and have no idea what your password was. Thus you update now and have no idea what your password is even though you remember it.
This rule about your future credences being unpredictable only works if there are no forces making your beliefs less accurate over time.
Great comment.
Is there a precise / formal statement of this, or an analysis of some good concept of the sort of agent that can have this unpredictability? What is a “force making your belief” some way, but excluding “endorsed” updates?
There’s some sort of reflection here, where what it means to think the brainwashing machine is a brainwashing machine rather than just the store of knowledge about the composition of the moon, is to think that it breaks correspondences....
If you are doing perfect baysian updates and nothing else, that is an endorsed update.
Note that in a baysian framework, you can assign 50% probability to some statement, and be confident that you will still assign 50% probability next year, if you don’t expect any new evidence. You can also assign 50% probability, with the expectation that it will turn into 0% or 100% in 5 minutes. You can also expect your probabilities to change conditional on future survival. If you are alive next year, you will have updated towards the tumour being non-malignant. But in the toy model where you never die and always do perfect baysian updates like AIXI, the expected future probability is equal to the current probability. (The distribution over future probabilities can be anything with this expectation.)
Actually there is also the possibility of strange loops, if you try to use this rule to determine current belief. You correctly predict that this whole thing happens. So you predict you will update your belief in X to 93%, so you update your belief in X to 93% as predicted. You start believing any nonsense today, just because you predict that you will believe it tomorrow.
In everyday rationality, the noise is usually fairly small compared to the updates. This rule can be used the way Eliezer is using it fairly well. The strange loops aren’t something a human would usually do (as opposed to an AI) If for some reason, your expected future beliefs are easier to calculate than your current beliefs, use those. If you have a box, and you know that when you open the box, for all the things you can imagine seeing in the box, you would believe X, then you should go ahead and believe X now. (If you have a good imagination)
There’s often not just one possible future self. First you choose which future self you wish to become, and then you update based on the credence of that self. Eliezer wants to become the future-Eliezer who knows the topic which the homework taught, so he updates accordingly. Before doing the homework, there’s a possible-future-Eliezer who blew off the class to watch cartoons and learned nothing. Eliezer has to choose which of those future-Eliezers he wants to update toward becoming, just as you have to choose which of your possible future selves you would prefer to be.
If you are about to be forced into the brainwashing machine, there’s a high probability that you will be brainwashed if you do nothing about it. But if you want to become the possible future self who does not desire to hold inaccurate beliefs indefinitely, you can update to that later future self’s rediscovery that the moon is made of rock wrapped in regolith.
Your new password is long and complicated. You confidently predict that, if you do nothing about it, you will have forgotten the password by this time next week. Since you desire to not forget your password, you choose to draw a picture that reminds you of what the password was, and type it out a few extra times. There’s a possible future self who has memorized the long and complex password, and you update yourself to more closely resemble that future self, perhaps by taking actions which they would have to have taken.
In the least convenient possible world, the brainwashing machine is very effective and you will be living the rest of your life underground with no access to any info about the moon.
The password won’t be needed after tomorrow, there is no point remembering it longer than that.
I wish to become a future self that knows everything, wait I can’t just update my beliefs to match a hypothetical omniscient future version of myself, thus becoming omniscient.
If you want you future beliefs to be more accurate, update your beliefs to be more accurate. True advice, but not very useful.
I wish to become a future self that is confident they are living in a post ASI utopia. But deluding myself into thinking I am in a post ASI utopia despite strong evidence to the contrary isn’t a good idea.
I think this makes a useful distinction, but it’s against something this article didn’t quite say. If the “brainwashing machine” convinces you of its agenda by just giving you evidence, then of course you should believe what it will teach you. If it just replaces your brain with one that believes in its agenda, then this isn’t a belief that was “arrived at using your own reasoning process” (though this may have been added as response to what you said).
A similar move works to disarm your password example. I don’t gain any new information that changes my beliefs about the password, I just forget. There is no reasoning process that changes what I’m going to type into the box on the login page.