Thanks for clearing up the countability. It’s clear that there are some cases where taking limits will fail (like when the utility is discontinuous at infinity), but I don’t have an intuition about how that issue is related to countability.
aspera
In the above example, the number of people and the number of days they live were uncountable, if I’m not mistaken. The take-home message is that you do not get an answer if you just evaluate the problem for sets like that, but you might if you take a limit.
Conclusions that involve infinity don’t map uniquely on to finite solutions because they don’t supply enough information. Above, “infinite immortal people” refers to a concept that encapsulates three different answers. We had to invent a new parameter, alpha, which was not supplied in the original problem, to come up with a well defined result. In essence, we didn’t actually answer the question. We made up our own problem that was similar to the original one.
Here is some clarification from Zinsser himself (ibid.):
“Who am I writing for? It’s a fundamental question, and it has a fundamental answer: You’re writing for yourself. Don’t try to visualize the great mass audience. There is no such audience—every reader is a different person.
This may seem to be a paradox. Earlier I warned that the reader is… impatient… . Now I’m saying you must write for yourself and not be gnawed by worry over whether the reader is tagging along. I’m talking about two different issues. One is craft, the other is attitude. The first is a question of mastering a precise skill. The second is a question of how you use the skill to express your personality.
In terms of craft, there’s no excuse for losing readers through sloppy workmanship. … But on the larger issue of whether the reader likes you, or likes what you are saying or how you are saying it, or agrees with it, or feels an affinity for your sense of humor or your vision of life, don’t give him a moment’s worry. You are who you are, he is who he is, and either you’ll get along or you won’t.
N.B: These paragraphs are not contiguous in the original text.
On Writing Well, by William Zinsser
Every word should do useful work. Avoid cliché. Edit extensively. Don’t worry about people liking it. There is more to write about than you think.
It makes no sense to call something “true” without specifying prior information. That would imply that we could never update on evidence, which we know not to be the case for statements like “2 + 3 = 5.” Much of the confusion comes from different people meaning different things by the proposition “2 + 3 = 5,” which we can resolve as usual by tabooing the symbols.
Consider the propositions ” A =“The next time I put two sheep and three sheep in a pen, I will end up with five sheep in the pen.”
B = “The universe works as if in all cases, combining two of something with three of something results in five of that thing.” C = “the symbolic expression 2 + 3 = 5 is consistent with mathematical formalism”These are a few examples of what we might mean when we ask “Is ‘2+3=5’ true?” In all cases, we can in principle perform the computation of P(A|Q), or P(B|Q), etc, where Q represents prior information including what I know about sheep and mathematical formalism.
As usual, I’m late to the discussion.
The probability that a counterfactual is true should be handled with the same probabilistic machinery we always use. Once the set of prior information is defined, it can be computed as usual with Bayes. The confusing point seems to be that the prior information is contrary to what actually occurred, but there’s no reason this should be different than any other case with limited prior information.
For example, suppose I drop a glass above a marble floor. Define:
sh = “my glass shattered”
f = “the glass fell to the floor under the influence of gravity”
and define sh_0 and f_0 as the negations of these statements. We wish to compute
P(sh_0|f_0,Q) = P(sh_0|Q)P(f_0|sh_0,Q)/P(f_0|Q),
where Q is all other prior information, including my understanding of physics. As long as these terms exist, we have no problem. The confusion seems to stem from the assumption that P(f_0|sh_0,Q) = P(f_0|Q) = 0, since f_0 is contrary to our observations, and in this case seemingly mutually exclusive with Q.
But probability is in the mind. From the perspective of an observer at the moment the glass is dropped, P(f_0|Q) at least includes cases in which she is living in the Matrix, or aliens have harnessed the glass in a tractor beam. Both of these cases hold finite probability consistent with Q. From the perspective of someone remembering the observed event, P(f_0|Q) might include cases in which her memory is not trustworthy.
In the usual colloquial case, we’re taking the perspective of someone running a thought experiment on a histroical event with limited information about history and physics. The glass-dropping case limits the possible cases covered by P(f_0|Q) considerably, but the Kennedy-assassination case leaves a good many of them open. All terms are well defined in Bayes’ rule above, and I see no problem with computing in principle the probability of the counterfactual being true.
I’m confused about why this problem is different from other decision problems.
Given the problem statement, this is not an acausal situation. No physics is being disobeyed—Kramers Kronig still works, relativity still works. It’s completely reasonable that my choice could be predicted from my source code. Why isn’t this just another example of prior information being appropriately applied to a decision?
Am I dodging the question? Does EY’s new decision theory account for truly acausal situations? If I based my decision on the result of, say, a radioactive decay experiment performed after Omega left, could I still optimize?
Ha—thanks. FIxed. But I guess if other people want to Skype in from around the world, they’re welcome to.
Yes, we are running on corrupted hardware at about 100 Hz, and I agree that defining broad categories to make first-cut decisions is necessary.
But if we were designing a morality program for a super-intelligent AI, we would want to be as mathematically consistent as possible. As shminux implies, we can construct pathological situations that exploit the particular choice of discontinuities to yield unwanted or inconsistent results.
I think it would be possible to have an anti-Occam prior if the total complexity of the universe is bounded.
Suppose we list integers according to an unknown rule, and we favor rules with high complexity. Given the problem statement, we should take an anti-Occam prior to determine the rule given the list of integers. It doesn’t diverge because the list has finite length, so the complexity is bounded.
Scaling up, the universe presumably has a finite number of possible configurations given any prior information. If we additionally had information that led us to take an Anti-Occam prior, it would not diverge.
I’m also looking for a discussion of the symmetry related to conservation of probability through Noether’s theorem. A quick Google search only finds quantum mechanics discussions, which relate it to spatial invariances, etc.
If there’s no symmetry, it’s not a conservation law. Surely someone has derived it carefully. Does anyone know where?
The idea that the utility should be continuous is mathematically equivalent to the idea that an infinitesimal change on the discomfort/pain scale should give an infinitesimal change in utility. If you don’t use that axiom to derive your utility funciton, you can have sharp jumps at arbitrary pain thresholds. That’s perfectly OK—but then you have to choose where the jumps are.
I think that in physics we would deal with this as a mapping problem. Jonh’s and Mary’s beliefs about the planet live in different spaces, and we need to pick a basis on which to project them in order to compare them. We use language as the basis. But then when we try to map between concepts, we find that the problem is ill posed: it doesn’t have a unique solution because the maps are not all 1:1.
Nice job writing the survey—fun times. I kind of want to hand it out to my non-LW friends, but I don’t want to corrupt the data.
Thanks, I’ll check it out.
Bravo, Eliezer. Anyone who says the answer to this is obvious is either WAY smarter than I am, or isn’t thinking through the implications.
Suppose we want to define Utility as a function of pain/discomfort on the continuum of [dust speck, torture] and including the number of people afflicted. We can choose whatever desiderata we want (e.g. positive real valued, monotonic, commutative under addition).
But what if we choose as one desideratum, “There is no number n large enough such that Utility(n dust specks) > Utility(50 yrs torture).” What does that imply about the function? It can’t be analytic in n (even if n were continuous). That rules out multaplicative functions trivially.
Would it have singularities? If so, how would we combine utility functions at singular values? Take limits? How, exactly?
Or must dust specks and torture live in different spaces, and is there no basis that can be used to map one to the other?
The bottom line: is it possible to consistently define utility using the above desideratum? It seems like it must be so, since the answer is obvious. It seems like it must not be so, because of the implications for the utility function as the arguments change.
Edit: After discussing with my local meetup, this is somewhat resolved. The above desiderata require the utility to be bounded in the number of people, n. For example, it could be a staurating exponential function. This is self-consistent, but inconsistent with the notion that because experience is independent, utilities should add.
Interestingly, it puts strict mathematical rules on how utility can scale with n.
Also, I suggest you read Torture vs Dust Specks. I found it to be very troubling, and would love to talk about it at the meeting.
Is this the same as Jaynes’ method for construction of a prior using transformation invariance on acquisition of new evidence?
Does conservation of expected evidence always uniquely determine a probability distribution? If so, it should eliminate a bunch of extraneous methods of construction of priors. For example, you would immediately know if an application of MaxEnt was justified.
That thought occurred to me too, and then I decided that EY was using “entropy” as “the state to which everything naturally tends” But after all, I think it’s possible to usefully extend the metaphor.
There is a higher number of possible cultish microstates than non-cultish microstates, because there are fewer logically consistent explanations for a phenomenon than logically inconsistent ones. In each non-cultish group, rational argument and counter-argument should naturally push the group toward one describing observed reality. By contrast, cultish groups can fill up the rest of concept-space.
No, I mean a function whose limit doesn’t equal its defined value at infinity. As a trivial example, I could define a utility function to be 1 for all real numbers in [-inf,+inf) and 0 for +inf. The function could never actually be evaluated at infinity, so I’m not sure what it would mean, but I couldn’t claim that the limit was giving me the “correct” answer.