The claim is generally that EDT chooses not to chew gum.
ArthurB
preferences:decision theory :: data:code
No it can’t. If you use a given decision theory, your actions are entirely determined by your preferences and your sensory inputs.
But that’s not how EDT works—your modification amounts to a totally different algorithm, which you’ve conveniently named “EDT”.
EDT measures expected value after the action has been taken, but the output of EDT has no reason to be ignored by EDT if it is relevant to the calculation.
...then Omega’s prediction is that EDT will two-box and oops—goodbye prize.
It loses, but it is generally claimed that EDT one boxes.
This case is handled in the previous sentence. If this is your actual decision, and your actual decision is the product of a decision algorithm, then your decision algorithm is not EDT.
To put it another way, is your decision to chew gum determined by EDT our by your genes? Pick one.
As it’s been pointed out, this is not an anthropic problem, however there still is a paradox. I’m may be stating the obvious, but the root of the problem is that you’re doing something fishy when you say that the other people will think the same way and that your decision will theirs.
The proper way to make a decision is to have a probability distribution on the code of the other agents (which will include their prior on your code). From this I believe (but can’t prove) that you will take the correct course of action.
Newcomb like problem fall in the same category, the trick is that there is always a belief about someone’s decision making hidden in the problem.
Hum no you haven’t. The approximation depends on the scale of course.
Indeed.
But I may have gotten “scale” wrong here. If we scale the error at the same time as we scale the part we’re looking at, then differentiability is necessary and sufficient. If we’re concerned about approximating the function, on a smallish part, then continuous is what we’re looking for.
ok, but with this definition of “approximate”, a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function.
The original question is whether a continuous function can be approximated by a linear function at a small enough scale. The answer is yes.
If you want the error to decrease linearly with scale, then continuous is not sufficient of course.
I defined approximate in an other comment.
Approximate around x : for every epsilon > 0, there is a neighborhood of x over which the absolute difference between the approximation and the approximation function is always lower than epsilon.
Adding a slop to a small segment doesn’t help or hurt the ability to make a local approximation, so continuous is both sufficient and necessary.
that is because our eyes cannot see nowhere differentiable functions
That is because they are approximated by piecewise linear functions.
Consider that when you look at a “picture” of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be “going up” at that point. Think about that for a second: the function isn’t differentialble—it isn’t “going” anywhere at that point!
It means on any point you can’t make a linear approximation whose precision increases like the inverse of the scale, it doesn’t mean you can’t approximate.
No he’s right. The Weierstrass function can be approximated with a piecewise linear function. It’s obvious, pick N equally spaced points and join then linearly. For N big enough, you won’t see the difference. It means that is is becoming infinitesimally small as N gets bigger.
Question is, what do you mean “approximately”.
If you mean, for any error size, the supremum of distance between the linear approximation and the function is lower than this error for all scales smaller than a given scale, then a necessary and sufficient condition is “continuous”. Differentiable is merely sufficient.
When the function is differentiable, you can make claims on how fast the error decreases asymptotically with scale.
An explanation cannot increase your knowledge.Your knowledge can only increase by observation. Increasing your knowledge is a decision theory problem (exploration/exploitation for example).
Phlogiston explains why some categories of things burn and some don’t. Phlogiston predicts that dry wood will always burn when heated to a certain temperature. Phlogiston explains why different kind of things burn as opposed to sometime burn and sometimes not burn. It explains that if you separate a piece of woods in smaller pieces, every smaller piece will also burn.
To clarify my original point, the problem isn’t the narrative. The narrative is a heuristic, it’s a method to update from an observation by remembering a simple unimodal distribution centered on the narrative (what I think most likely happened, how confident I am)
It seems to me that a narrative is generally a maximum likelihood explanation behind an event. If you observe two weird events, an explanation that links them is more likely than an explanation that doesn’t. That’s why causality is such a great explanation mechanism. I don’t think making narratives is a bug. The bug is discarding the rest of the probability distribution… we are bad are remembering complex multimodal distributions.
Sometimes, a narrative will even add unnecessary details and it looks like a paradox (the explanation would be more likely without the details). However, the explanation without the detail would be a zone while the explanation with the detail is a point. If we try to remember modes, it makes perfect sense to add the details.
Coming from here, I don’t really understand the advice to
“In other words, concentrate your probability mass”
It seems that concentrating the probability mass would reinforce the belief in the most likely explanation which is often a narrative.
Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave
f0′ (x) = (k—x.f1′ (x))/(1-x)
for f0 = 0, it means x.f1′(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)
Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^
No.
I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.
If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in “1” for the most likely answer and 0 otherwise.
Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.
The number of options drop out of the equation that’s beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that… histograms, gaussians… there are many ways a students could write in his answer.
The equivalence class of the utility function should be the set of monotonous function of a canonical element.
However, what von Neumann-Morgenstern shows under mild assumptions is that for each class of utility functions, there is a subset of utility functions generated by the affine transforms of a single canonical element for which you can make decisions by computing expected utility. Therefore, looking at the set of all affine transforms of such an utility function really is the same as looking at the whole class. Still, it doesn’t make utility commensurable.
A speck in Adam’s eye vs Eve being tortured is not a utility comparison but a happiness comparison. Happiness is hard to compare but can be compared because it is a state, utility is an ordering function. There is no utility meter.
You’re saying EDT causes you not to chew gum because cancer gives you EDT? Where does the gum appear in the equation?