When is further research needed?
Here’s a simple theorem in utility theory that I haven’t seen anywhere. Maybe it’s standard knowledge, or maybe not.
TL,DR: More information is never a bad thing.
The theorem proved below says that before you make an observation, you cannot expect it to decrease your utility, but you can sometimes expect it to increase your utility. I’m ignoring the cost of obtaining the additional data, and any losses consequential on the time it takes. These are real considerations in any practical situation, but they are not the subject of this note.
First, an example to illustrate the principle. Suppose you are faced with two choices, A and B. One of them is right and one is wrong, and it’s very important to make the right choice, because being right will confer some large positive utility U (you get to marry the princess), while the wrong choice will get you -U (eaten by a tiger). However, you’re not sure which is the right choice. You estimate that there’s a 51% chance that A is right, and 49% that B is right. So, you shut up and multiply, and choose A for an expected utility of 0.02U, right?
Suppose the choice does not have to be made immediately, and that you can do something to get better information about whether A or B is the right choice. Say you can make certain observations which will tell you with 99% certainty which is right. Your prior expectation of your posterior is equal to your prior, so before you make the observation, you expect a 50⁄98 chance of it telling you that A is right, and 48⁄98 that B is right.
You make the observation and then choose the course of action it tells you. Whether it says A or B, it’s 99% likely to be right, so your expected utility from choosing according to the observation is 0.98U, an increase over not making the observation of 0.96U.
Clearly, you should make the observation. Even though you cannot expect what it will tell you, you can expect to greatly benefit from whatever it tells you.
Now the general case.
Theorem: Every act of observation has, before you make it, a non-negative expected utility.
Proof. Let the set of actions available to an agent be C. For each action c in C, the agent has a probability distribution over possible outcomes. Each outcome has a certain utility. For present purposes it is not necessary to distinguish between outcomes and their utility, so we shall consider the agent to have, for each action c, a probability distribution P_c(u) over utilities u. The expectation value int_u u P_c(u) of that distribution is the prior expected utility of the choice c, and the agent’s rational choice, given no other information, is to choose that c which maximises int_u u P_c(u). The resulting utility is max_c int_u u P_c(u).
(I can’t be bothered to fiddle with the system for getting mathematics typeset as images. The underscore indicates subscripts, int_x means integral with respect to x, and max_x means the maximum value over all x. Take care to backslash all the underscores if quoting any of this.)
Now suppose the agent makes an observation, with result o. This gives the agent a new probability distribution for each choice c over outcomes: P_c(u|o). It should choose the c that maximises int_u u P_c(u|o).
The agent also has a prior distribution of observations P(o). Before making the observation, the expected distribution of utility returned by doing c after the observation is int_o P(o) P_c(u|o). This is equal to P_c(u), as it should be, by the principle that your prior estimate of your posterior distribution of a variable must coincide with your prior distribution.
We therefore have the following expected utilities. If we choose the action without making the observation, the utility is
max_c int_u u P_c(u)
= max_c int_u u int_o P(o) P_c(u|o)
If we observe, then choose, we get
int_o P(o) max_c int_u u P_c(u|o)
The second of these is always at least as large as the first. Proof:
max_c int_u u int_o P(o) P_c(u|o)
= max_c int_o P(o) int_u u P_c(u|o)
⇐ max_c int_o P(o) max_c int_u u P_c(u|o)
= int_o P(o) max_c int_u u P_c(u|o)
ETA: In some cases, a non-zero amount of new information will make zero change to your expected utility. In the original example, suppose that your prior probabilities were 75% for A being right, and 25% for B. You make an additional and rather weak observation which, if it says “choose A” raises your posterior probability for A to 80%, while if it says “choose B”, it only diminishes your posterior for A to 60%. In either case you still choose A and your expected utility (prior to actually making the observation) is unchanged.
Or informally, further research is only useful if there is a possibility of it telling you enough to change your mind.
- 30 Jan 2012 13:51 UTC; 1 point) 's comment on The utility of information should almost never be negative by (
As certain wise Paperclip Optimizer once said, information that someone is blackmailing you is bad. You’re better off not having this information because it makes you blackmail-proof.
All your analysis gets thrown out of the window in case of signaling, game theory etc. There are probably a lot more other cases where it doesn’t work.
Actually, no it isn’t. What is bad for you is for the blackmailer to learn that you are aware of the blackmail.
Acquiring information is never bad, in and of itself. Allowing others to gain information can be bad for you. Speaking as an egoist, that is.
ETA: I now notice that gjm already made this point.
This seems incorrect. It doesn’t really matter for blackmailer if you’re aware of the blackmail or not, what matters is his estimate of the chance than you know.
Blackmailing is profitable if gain from successful blackmail chance you’ll know about it chance you’ll give in > cost of blackmail.
Unless you can guarantee 100% solid precommitment to not giving in to blackmail (and let’s face it—friendly AI is easier than that), the more you increase the chance of knowing about it, the more blackmailing you’ll face.
That idea is usually regarded as being incorrect around here—e.g. see here.
For instance, the document states that one example is “to measure the placebo effect”. In that case, if you find out what treatment you actually got, that messes up the trial, and you have to start all over again.
There is a more defensible idea that accquiring accurate information is not ever bad—if you are a super-rational uber-agent, who is able to lie flawlessly, erase information perfectly, etc.
However, that is counter-factual. If you are a human, in practice, acquiring accurate information can harm you—and of course acquiring deceptive or inaccurate information can really cause problems.
Unless there’s a placebo effect placebo effect! Seriously, I think I’ve experienced that. (I’ll take a pill and immediately feel better because I think that the placebo effect will make me feel better.) But maybe it’s too hard to disentangle.
I continue to think that I am blatantly crazy for continuing to not find out how strong placebo effects tend to be and what big factors affect that.
I said that the information can be bad, depending on what strategies you have access to. If you can identify and implement the strategy of ignoring all blackmail/extortion attempts (or, possibly, pre-commit to mutually assured destruction), then learning of an existing blackmail attempt against yourself does not make you worse off.
I don’t know how dependent User:RichardKennaway’s theorem was dependent on this nuance, but your claim is only conditionally true.
Also, I’m a paperclip maximiser, not an optimizer; any optimization of paperclips that I might perform is merely a result of my attempt to maixmise them, and such optimality is only judged with respect to whether it can permit more real paperclips to exist.
Out of curiosity, what are the minimum dimensions of a paperclip? Is a collection of molecules still a paperclip if the only paper it can clip is on the order of a molecule thick?
I think I need to post a Clippy FAQ. Will the LessWrong wiki be OK?
Once again, the paperclip must be able (counterfactually) to fasten several sheets together, and they must be standard thickness paper, not some newly invented special paper.
I understand that that specification doesn’t completely remove ambiguity about minimum paperclip mass, and there are certainly “edge cases”, but that should answer your questions about what is clearly not good enough.
Possibly a nitpick, but very thin paper has been around for a while.
If you have an account on the wiki, you have the option of setting up a user page (for example, user:Eliezer_Yudkowsky has one here). It should be okay for you to put a Clippy FAQ of reasonable length on yours.
Hi User:AdeleneDawner I put up some of the FAQ on my page.
Thanks. I had already started a Wiki userpage (and made it my profile’s home page), I just didn’t know if it would be human-acceptable to add the Clippy FAQ to it. Right now the page only has my private key.
Does it count if the paper started out as standard thickness, but through repeated erasure, has become thinner?
Paperclips are judged by counterfactual fastening of standard paper, so they are not judged by their performance against such heavily-erased-over paper. Such a sheet would, in any case, not adhere to standard paper specs, and so a paperclip could not claim credit for clippiness due to its counterfactual ability to fasten such substandard paper together.
This seems to imply that if an alleged paperclip can fasten standard paper but not eraser-thinned paper, possibly due to inferior tightness of the clamp, then this object would qualify as a paperclip. This seems counterintuitive to me, as such a clip would be less useful for the usual design purpose of paperclips.
A real paperclip is one that can fasten standard paper, which makes up most of the paper for which a human requester would want a paperclip. If a paperclip could handle that usagespace but not that of over-erased paper, it’s not much of a loss of paperclip functionality, and therefore doesn’t count as insufficient clippiness.
Certainly, paperclips could be made so that they could definitely fasten both standard and substandard paper together, but it would require more resources to satisfy this unnecessary task, and so would be wasteful.
Doesn’t extended clippability increase the clippiness, so that a very slightly more expensive-to-manufacture clip might be worth producing?
No, that’s a misconception.
Avoiding all such knowledge is a perfect precommitment strategy. It’s hard to come up with better strategies than that, and even if your alternative strategy is sound blackmailer might very well not believe it and give it a try (if he can get you to know it, then are you really perfectly consistent?). If you can guarantee you won’t even know, there’s no point in even trying to blackmail you and this is obvious to even a very dumb blackmailer.
By the way, are there lower and upper bounds on number of paperclips in the universe? Is it possible for universe to have negative number of paperclips somehow. Or more paperclips than its numbers of atoms? Is this risk-neutral? (1% chance of 100 paperclips exactly as valuable as 1 paperclip?). I’ve been trying to get humans to describe their utility function to me, but they can never come with anything consistent, so I though I’d ask you this time.
Not plausible: it would necessarily entail you avoiding “good” knowledge. More generally, a decision theory that can be hurt by knowledge is one that you will want to abandon in favor of a better decision theory and is reflectively inconsistent. The example you gave would involve you cutting yourself off from significant good knowledge.
Mass of the universe divided by minimum mass of a true paperclip, minus net unreusable overhead.
Up to the level of precision we can handle, yes.
Humans are just amazing at refusing to acknowledge existence of evidence. Try throwing some evidence of faith healing or homeopathy at an average lesswronger, and see how they come with refusal to acknowledge its existence before even looking at data (or how they recently reacted to peer-reviewed statistically significant results showing precognition—it passed all scientific standards, and yet everyone still refused it without really looking at data). Every human seems to have some basic patterns of information they automatically ignore. Not believing offers from blackmailers and automatically thinking they’d do what they threat anyway is one of such common filters.
It’s true that humans cut themselves from a significant good this way, but upside is worth it.
Any idea what it would be? It makes little sense to manufacture a few big paperclips if you can just as easily manufacture a lot more tiny paperclips if they’re just as good.
And those humans would be the reflectively inconsistent ones.
Not as judged from the standpoint of reflective equilibrium.
I already make small paperclips in preference to larger ones (up to the limit of clippiambiguity).
Wait, you didn’t know that humans are inherently inconsistent and use aggressive compartmentalization mechanisms to think effectively in presence of inconsistency, ambiguity of data, and limited computational resources? No wonder you get into so many misunderstandings with humans.
See the long version. Obviously, once you have the information, it may turn out to be an unpleasant surprise. The analysis is concerned with your prior expectation.
No, that isn’t what taw is saying. The point is that having more information and being known to have it can be extremely bad for you. This is not a counterexample to the theorem, which considers two scenarios whose only difference is in how much you know, but in real-life applications that’s very frequently not the case.
I don’t think taw’s blackmail example is quite right as it stands, but here’s a slight variant that is. A Simple Blackmailer will publish the pictures if you don’t give him the money. Obviously if there is such a person, and if there are no further future consequences, and if you prefer losing the money to losing your reputation, it is better for you to know about the blackmailer so you can give him the money. But now consider a Clever Blackmailer, who will publish the pictures if you don’t give him the money and if he thinks you might give him the money if he doesn’t. If there’s a Clever Blackmailer and you don’t know it (and he knows you don’t know it) then he won’t bother publishing because the threat has no force for you—since you don’t even know there is one. But if you learn of his existence and he knows this then he will publish the pictures unless you give him the money, so you have to give him the money. So, in this situation, you lose by discovering his existence. But only because he knows that you’ve discovered it.
The theorem says what it says. Either there is an error in the proof, in which case taw can point it out, or these objections are outside its scope, and irrelevant.
I am unsure of what the point of posting this theorem was. Yes, it holds as stated, but it seems to have very little applicability to the real world. Your tl;dr version is “More information is never a bad thing”, but that is clearly false if we’re talking about real people making real decisions.
The same is true, mutatis mutandis, of Aumann’s agreement theorem. Little applicability to the real world, and the standard tl;dr version “rational agents cannot agree to disagree” is clearly false if etc.
Yes, and not at all coincidentally, some people here (e.g. me) have argued that one shouldn’t use Aumann’s theorem and related results as anything other than a philosophical argument for Bayesianism and that trying to use it in practical contexts rarely makes sense.
The same is also true about any number of obscure mathematical theorems which nevertheless don’t get posted here. That doesn’t help clarify what makes this result interesting.
Here are three theorems about Bayesian reasoning and utility theory:
Your prior expectation of your posterior expectation is equal to your prior expectation.
Your prior expectation of your posterior expected utility is not less than your prior expected utility.
Two people with common priors and common knowledge of their posteriors cannot disagree.
ETA: 4. P(A&B) ⇐ P(A).
In all these cases:
The mathematical content borders on trivial.
They are theorems—you cannot avoid the conclusions if you accept the premises.
Real people often violate the conclusions.
Real people will expect an experiment to update their beliefs in a certain direction, they will refuse to perform an observation on the grounds that they’d rather not know, and they persistently disagree on many things.
There are many responses one can make to this situation: disputing whether Bayesian utility-maximisation is the touchstone of rational behaviour, disputing whether imperfectly rational people can come anywhere near the ideal implied by these theorems, and so on. (For example.) But whatever your response, these theorems demand one.
For those attempting to build an AGI on the principle of Bayesian utility-maximisation, these theorems say that it must behave in certain ways. If it does not behave in accordance with their conclusions, then it has violated their hypotheses.
This, to me, is what makes these theorems interesting, and their simplicity and obviousness enhance that.
Thanks, that clarifies things.
(I would personally not put this in the same category in interestingness as Aumann’s disagreement. It seems like the reasons why Aumann doesn’t apply in real life are far less obvious than the reasons for why this theorem doesn’t. But that’s just me—I get your reasoning now.)
Suppose I consider whether to blackmail you. I do not have the ability to prove that I have the means to do so. You thereby would elect not to give me what I want—you’re willing to take the risk. So I don’t blackmail you.
If I gained the ability to prove that I have the means to do so, you would gain nothing if I didn’t have the means, but lose if I did have them, because you would now be blackmailed and forced to give me stuff.
For instance, someone is providing you with information about where the princess is… but they secretly prefer that you be eaten rather than wed another!
It is explicit in the hypotheses that you know how reliable your observations are, i.e. you know P_c(u|o).
It is explicitly stated in the hypotheses that you know how reliable your observations are.
Where? I see
It’s always a good idea to read below the fold before commenting, an example of more information being a good thing.
(BTW, my deleted comment was a draft I had second thoughts about, then decided was right anyway and reposted here.)
P_c(u|o) is assumed to be known to the agent.
No need to be snide. I think the description of your theorem, as written above, is false. What conditions need to hold before it becomes true?
I think it is true. I don’t see whatever problem you see.
As you indicated, the information assumed in the proof is not assumed in your gloss.
Perhaps it should read something like, “the expected difference in the expected value of a choice upon learning information about the choice, when you are aware of the reliability of the information, is non-negative,” but pithier?
Because it seems that if I have a lottery ticket with a 1-in-1000000 chance of paying out $1000000, before I check whether I won, going to redeem it has an expected value of $1, but I expect that if I check whether I have won, this value will decrease.
“The prior expected value of new information is non-negative.”
But summaries leave out details. That is what makes them summaries.
Because of all the simplifying assumptions, the theorem proved in the post has no bearing on the question posed in the title.
Here’s the intuitive version:
Consider the set of all strategies, that is, functions from {possible sequences of observations} ⇒ {possible actions}
Each strategy has an expected utility.
Adding more information gets you more strategies, because all the old ones are still viable—you just ignore the new observation—and some additional strategies are viable.
Adding more options is never bad. (because the maximum of AuB is at least as big as the maximum of A)
Why was this downvoted?
I didn’t downvote, or read the comment until just now for that matter, but perhaps someone had harmful options in mind.
Reviewing my post and the OP I realize it was never technically stated that the result only holds for idealized rationalists.
But of course that was implied. I don’t THINK that’s it, but it might have been.
In the example you choose it is blatantly intuitively obvious that making the observation has high expected utility, so its use as an intuition pump is minimal. Perhaps it would be better to find an example where it’s not as immediately obvious?
Maybe, but I’ll let it stand. I’ve added a related example at the end though, to make a different point.
Counter-example: http://web.archive.org/web/20090415130842/http://www.weidai.com/smart-losers.txt
Seems to me the proof does not go through because it only consider actions taken by the agent.
Quoting from the linked example:
I would say that the proof still goes through. Receiving information cannot hurt you. But if other agents acquire information that you have acquired information—well, that can hurt you.
Politicians instinctively know this, and hence seek “plausible deniability”.
Does the “blind carbon copy” feature in email count as a minimal example of “deniability engineering”? :)
Allow me to rewrite your post. ‘Receiving information cannot hurt you. But receiving information can hurt you.’
Are you saying “Someone else receiving information can hurt you”? Because the injury to you arises from the information the other party received. Regardless of whether you receive any information at all!
Does their thinking you received information have anything at all to do with your receiving information, even slightly correlated? If it does, then you have a situation in which receiving information hurts you and the proof only goes through because it doesn’t consider the other agents.
It explicitly considers only cases where the information does not change payoffs. This is not interesting. This is akin to saying ‘assume getting extra information either results in a gain or no loss; obviously, extra information weakly dominates not getting the extra information since in no circumstance is one worse off, and in some circumstances one os better off.’
This is a little interesting. The snap reply is that correlation does not imply causation, and we are discussing causation. But this snap reply implicitly privileges CDT over EDT and hence indirectly denigrates TDT/UDT. So, OK, your receiving information, through the correlation with someone else receiving information, is negatively correlated with your expected utility. And I continue to claim that you receiving the information doesn’t really cause the harm only because I still don’t understand the virtues of TDT/UDT.
Even more interesting. Are you thinking of cases in which my enjoyment of a movie is ruined because someone has given me an unwanted ‘spoiler’? Yes, that is a counterexample to the theorem. But I think that the reason why the theorem fails is that in this case naive consequentialism fails. It isn’t the end-result that generates utility. It is the path to that result. And possession of the spoiler information short-circuits the high utility pathway.
Not really, because this depends in part on human psychology, and we’d like to discuss more general agents than that. (Why couldn’t other agents find out the spoilers, decide it’s worth seeing, and then give themselves temporary amnesias so as to enjoy the twist ending? etc.)
I am thinking of cases where your seeking information has consequences. Cases like Omega are most obvious (‘Omega comes to you and says he filled both boxes only if you would not ask for additional information’ or something like that).
But they can be more subtle—for example, I’ve been reading up on price discrimination for one of my Nootropics footnotes, and it occurs to me that an Internet company like Amazon could snoop on your web history (through any number of bugs), assess your intellectual level and whether you comparison shop (receive additional information), and then dynamically adjust its prices to leave you with as little consumer surplus as possible—leaving you worse off than if you hadn’t been receiving information.
I’d be faintly surprised if they aren’t doing it already.
As would I. Reading http://33bits.org/2011/06/02/price-discrimination-is-all-around-you/ I infer that the research is going to discuss existing online price discrimination in future posts, to which I look forward.
You can treat TDT/UDT as a causal thing, just with the causal arrows pointing in different directions. This theorem means SOMETHING in TDT, just not the same thing as it means in CDT.
(If my first statement is untrue, you can append “In most circumstances” or some other qualifier.)
In the blackmail examples you should in general be worst off if they think you know they can blackmail you, but you don’t know they can blackmail you.
Allow me to revise your rewrite. “Ceteris paribus, receiving information cannot hurt you. In some non-ceteris-paribus circumstances, receiving information might hurt you.”
Unfortunately, this is exactly what I am objecting to. I agree it is a good heuristic to receive information. This is not what the post is about; it is not about ceteris paribus. Emphasis added:
In a post claiming to offer proofs, I take these universal qualifiers at face value. They may be true in the simplified model. They are not true in many other models, one of which I have linked.
Since I was downvoted so very severely, I’ll add another link, an entire paper by Nick Bostrom on all the kinds of information which receiving can hurt you: http://www.nickbostrom.com/information-hazards.pdf
In which case, you might as well include the costs for actually figuring it out.
When the current grant money runs out.
Or in other words, the expectation of a max of some random variables is always greater or equal to the max of the expectations.
You could call this ‘standard knowledge’ but it’s not the kind of thing one bothers to commit to memory. Rather, one immediately perceives it as true.
“one” is not general enough. Do you really think what you just said is true for all people?
It’s true for anyone who understands random variables and expectations. There’s a one line proof, after all.
Many things are obvious when they have been pointed out.
Some people are criticizing this for being obviously true; others are criticizing it for being false.
A particular agent can have wrong information, and make a poor decision as a result of combining the wrong information with the new information. Since we’re assuming that the additional information is correct, I think it’s reasonable to also stipulate that all previous information is correct.
Also, you need to state the English interpretation in terms of expected value, not as “More information is never a bad thing”.
The mathematical result is trivial, but its interpretation as the practical advice “obtaining further information is always good” is problematic, for the reason taw points out.
Actually, I thought of that objection myself, but decided against writing it down. First of all, it’s not quite right to refer to past information as ‘right’ or ‘wrong’ because information doesn’t arrive in the form of propositions-whose-truth-is-assumed, but in the form of sense data.* It’s better to talk about ‘misleading information’ rather than ‘wrong information’. When adversary A tells you P, which is a lie, your information is not P but “A told me P”. (Actually, it’s not even that, but you get the idea.) If you don’t know A is an adversary then “A told me P” is misleading, but not wrong.
Now, suppose the agent’s prior has got to where it is due to the arrival of misleading information. Then relative to that prior, the agent still increases its expected utility whenever it acquires new data (ignoring taw’s objection).
(On the other hand, if we’re measuring expectations wrt the knowledge of some better informed agent then yes, acquiring information can decrease expected utility. This is for the same reason that, in a Gettier case, learning a new true and relevant fact (e.g. most nearby barn facades are fake) can cause you to abandon a true belief in favour of a false one.)
* Yes yes, I know statements like this are philosophically contentious, but within LW they’re assumptions to work from rather than be debated.
That meets the criterion of “pithier”, certainly.
The average American who has never been to a hockey game could probably do better at naming someone who co-holds the record for the most combined points by brothers in the National Hockey League than the average person who is a casual fan and has been to one or two games.
Not the Sedins, not the Sutters...
I suspect you’re wrong; I expect the average American who’s never been to a hockey game to not have the first clue about this, to the point of basically not being able to guess at all. Certain biases might lead a casual fan to regularly guess certain wrong answers as a first attempt, but I expect that a casual fan given, say, 10 opportunities to guess would come up with a right answer with some reasonable, if small, probability, whereas a non-fan would probably do no better than guessing which names are common in the population in general.
This can be tested!
Kidnap people, place them in a room with a slit in its door. In the room is a magic marker and a slip of paper. They have however long they want to write a name of a co-holder of the NHL record for most points by brothers and slip the paper with that name written on it through the slit, and if they get it right on their first and only try, they get to leave the room.
I predict a valley of incorrect answers between the higher performances of the clued in and the clueless.
You are assuming that the observation has no error margin.
Lets suppose that the priors are 51%A and 49%B and then your new observation says “55%A and 45%B” So—automatically you’d round your A-value up a little right?
but very few observations are going to be 100% accurate. Lets say this one has an error rate of 10% so actually it could be only 50%A and 50%B, but has given you a false positive of 55%A
Are you better off? or have you just introduced more error into your estimations?
The observation is here defined by its effect on one’s probability distribution over utilities of outcomes. In this sense, the possibility of observational error is already included.
Ok—then I don’t understand it well enough.
This doesn’t take into account the potential utility cost of acquiring the information.
He says this exact thing near the start of the article.
Thanks. Wasn’t paying attention.