Your Future Self’s Credences Should Be Unpredictable to You
My exposition of Yudkowsky’s below idea about skipping unnecessary studying:
I think that to contain the concept of Utility as it exists in me, you would have to do homework exercises I don’t know how to prescribe. Maybe one set of homework exercises like that would be showing you an agent, including a human, making some set of choices that allegedly couldn’t obey expected utility, and having you figure out how to pump money from that agent (or present it with money that it would pass up).
Like, just actually doing that a few dozen times.
Maybe it’s not helpful for me to say this? If you say it to Eliezer, he immediately goes, “Ah, yes, I could see how I would update that way after doing the homework, so I will save myself some time and effort and just make that update now without the homework”, but this kind of jumping-ahead-to-the-destination is something that seems to me to be… dramatically missing from many non-Eliezers.
If you already know how your future self’s credences (to be arrived at using your own reasoning process) will differ from your current credences, you should just trust your expected-future-self’s reasoning and immediately adopt his expected conclusions. If you can foresee how a reasoner like yourself will be led to think in some particular way, then you also have the ability to put yourself in that agent’s shoes and be so led right now. Whatever considerations will foreseeably later motivate you are equally moving considerations now.
In deep stock-markets, many traders are trying to make a buck by knowing something that no one else yet does. Because the current prices of various assets already take into consideration all public knowledge about them, you can’t expect to reliably make money as a trader unless you know something that the other traders don’t. Here, “knowing something” is basically synonymous with “being able to predict how public knowledge about the assets in question will evolve.” If a trader can foresee what everyone will believe in the near future, he can beat everyone else to it by betting on those expected beliefs now. Because of this, whenever a trader in a deep stock-market comes to know something about a relevant future belief, prices today are pushed around to reflect that belief that would’ve been held tomorrow. The deep stock-market is thus epistemically efficient: it factors in every relevant credence that any trader can presently scry, well before those credences become widely held.
You want to be like the epistemically efficient stock-market. Don’t leave any actionable knowledge on the table by turning a deaf ear to what your future self will think; don’t dutifully go through the motions of studying when you already know what you’ll think at the end of the process. Incorporate everything any good reasoning process would conclude right now; if you can currently see it, you’re already in a position to believe it. And so long as you keep doing that, the way your future self will update his credences should remain completely unpredictable to you.
- 6 May 2022 20:53 UTC; 1 point) 's comment on David Udell’s Shortform by (
You are about to be forced into the brainwashing machine. You know that the machine is very effective, and when you exit the machine, you will believe the moon is made of green cheese. You update to believe the moon is made of green cheese now.
Your new password is long and complicated. You confidently predict that by this time next week, you will have forgotten it and have no idea what your password was. Thus you update now and have no idea what your password is even though you remember it.
This rule about your future credences being unpredictable only works if there are no forces making your beliefs less accurate over time.
Great comment.
Is there a precise / formal statement of this, or an analysis of some good concept of the sort of agent that can have this unpredictability? What is a “force making your belief” some way, but excluding “endorsed” updates?
There’s some sort of reflection here, where what it means to think the brainwashing machine is a brainwashing machine rather than just the store of knowledge about the composition of the moon, is to think that it breaks correspondences....
If you are doing perfect baysian updates and nothing else, that is an endorsed update.
Note that in a baysian framework, you can assign 50% probability to some statement, and be confident that you will still assign 50% probability next year, if you don’t expect any new evidence. You can also assign 50% probability, with the expectation that it will turn into 0% or 100% in 5 minutes. You can also expect your probabilities to change conditional on future survival. If you are alive next year, you will have updated towards the tumour being non-malignant. But in the toy model where you never die and always do perfect baysian updates like AIXI, the expected future probability is equal to the current probability. (The distribution over future probabilities can be anything with this expectation.)
Actually there is also the possibility of strange loops, if you try to use this rule to determine current belief. You correctly predict that this whole thing happens. So you predict you will update your belief in X to 93%, so you update your belief in X to 93% as predicted. You start believing any nonsense today, just because you predict that you will believe it tomorrow.
In everyday rationality, the noise is usually fairly small compared to the updates. This rule can be used the way Eliezer is using it fairly well. The strange loops aren’t something a human would usually do (as opposed to an AI) If for some reason, your expected future beliefs are easier to calculate than your current beliefs, use those. If you have a box, and you know that when you open the box, for all the things you can imagine seeing in the box, you would believe X, then you should go ahead and believe X now. (If you have a good imagination)
There’s often not just one possible future self. First you choose which future self you wish to become, and then you update based on the credence of that self. Eliezer wants to become the future-Eliezer who knows the topic which the homework taught, so he updates accordingly. Before doing the homework, there’s a possible-future-Eliezer who blew off the class to watch cartoons and learned nothing. Eliezer has to choose which of those future-Eliezers he wants to update toward becoming, just as you have to choose which of your possible future selves you would prefer to be.
If you are about to be forced into the brainwashing machine, there’s a high probability that you will be brainwashed if you do nothing about it. But if you want to become the possible future self who does not desire to hold inaccurate beliefs indefinitely, you can update to that later future self’s rediscovery that the moon is made of rock wrapped in regolith.
Your new password is long and complicated. You confidently predict that, if you do nothing about it, you will have forgotten the password by this time next week. Since you desire to not forget your password, you choose to draw a picture that reminds you of what the password was, and type it out a few extra times. There’s a possible future self who has memorized the long and complex password, and you update yourself to more closely resemble that future self, perhaps by taking actions which they would have to have taken.
In the least convenient possible world, the brainwashing machine is very effective and you will be living the rest of your life underground with no access to any info about the moon.
The password won’t be needed after tomorrow, there is no point remembering it longer than that.
I wish to become a future self that knows everything, wait I can’t just update my beliefs to match a hypothetical omniscient future version of myself, thus becoming omniscient.
If you want you future beliefs to be more accurate, update your beliefs to be more accurate. True advice, but not very useful.
I wish to become a future self that is confident they are living in a post ASI utopia. But deluding myself into thinking I am in a post ASI utopia despite strong evidence to the contrary isn’t a good idea.
I think this makes a useful distinction, but it’s against something this article didn’t quite say. If the “brainwashing machine” convinces you of its agenda by just giving you evidence, then of course you should believe what it will teach you. If it just replaces your brain with one that believes in its agenda, then this isn’t a belief that was “arrived at using your own reasoning process” (though this may have been added as response to what you said).
A similar move works to disarm your password example. I don’t gain any new information that changes my beliefs about the password, I just forget. There is no reasoning process that changes what I’m going to type into the box on the login page.
See also https://www.lesswrong.com/posts/jiBFC7DcCrZjGmZnJ/conservation-of-expected-evidence . You are correct, of course—if you are a rational agent and you believe your future self has remained rational, then https://en.wikipedia.org/wiki/Aumann%27s_agreement_theorem actually applies here—you know you have the same priors, and mutual (across time, but so what?) knowledge of rationality. You cannot rationally disagree with your future self.
Of course, for fallible biological agents the assumption of rationality doesn’t hold—you are free to predict that future-you becomes stupid or biased in ways you don’t currently accept. In those cases, it can be reasonable to try to bind your future self based on your current self’s beliefs and preferences.
Like for instance, ‘That’s the wrong Zoom (company).’