A Request for Open Problems
Open problems are clearly defined problems1 that have not been solved. In older fields, such as Mathematics, the list is rather intimidating. Rationality, on the other, seems to have no list.
While we have all of us here together to crunch on problems, let’s shoot higher than trying to think of solutions and then finding problems that match the solution. What things are unsolved questions? Is it reasonable to assume those questions have concrete, absolute answers?
The catch is that these problems cannot be inherently fuzzy problems. “How do I become less wrong?” is not a problem that can be clearly defined. As such, it does not have a concrete, absolute answer. Does Rationality have a set of problems that can be clearly defined? If not, how do we work toward getting our problems clearly defined?
See also: Open problems at LW:Wiki
1: “Clearly defined” essentially means a formal, unambiguous definition. “Solving” such a problem would constitute a formal proof.
- 22 Sep 2010 19:09 UTC; 15 points) 's comment on Rationality Case Study—Ad-36 by (
- Everyday Questions Wanting Rational Answers by 5 Oct 2010 6:04 UTC; 8 points) (
Suppose someone offered you a bet on P = NP. How should you go about deciding whether or not to take it?
Does it make sense to assign probabilities to unproved mathematical conjectures? If so, what is the probability of P = NP? How should we compute this probability, given what we know so far about the problem? If the answer is no, what is a rational way to deal with mathematical uncertainty?
It only took 7 years to make substantial progress on this problem: Logical Induction by Garrabrant et al..
Scott Aaronson and many others in his field have essentially bet their careers that P ≠ NP (at least, I think the whole hierarchy of complexity classes would collapse if P = NP), while only a bunch of cranks (so far as I’ve heard) have bet a substantial amount of their life’s efforts on P = NP. That’s as good a statement of betting odds as any.
As for whether it’s legitimate to do so, well, that’s Bayesian probability for you. You’ve got to have some way of dealing with subjective uncertainty, and you get better results in the long run if your method of dealing with it is like unto the laws of probability.
Let’s say you’re the first person to work on P = NP. What then? (Assume that you have enough mathematical ability to produce most of the results on it that we have so far.)
Well, I’m pretty sure that the smart money is all on one side of the question because of certain heuristic evidence that becomes very clear when you’ve worked with these complexity classes for a long while.
It’s the same reason the smart money was on Fermat’s Last Theorem being true (prior to the proof being found); not only would it have been very unusual in mathematics for this simple Diophantine equation to have its first nontrivial solution only when the numbers became absurdly large, but it is equivalent to a beautiful conjecture in elliptic curves which seemed to admit of no counterexamples.
There’s plenty of inductive evidence found and used in the actual doing of mathematics; it just doesn’t make the textbooks (since you generally don’t publish on a subject unless you’ve found a proof, at which point the inductive evidence is utterly redundant). Yet it guides the intuition of all mathematicians when they decide what to try proving next.
Compare Einstein’s Arrogance.
Suppose you test Fermat’s Last Theorem for n up to 10^10, and don’t find a counterexample. How much evidence does that give you for FLT being true? In other words, how do you compute P(a counterexample exists with n<=10^10 | FLT is false), since that’s what’s needed to do a Bayesian update with this inductive evidence? (Assume this is before the proof was found.)
I don’t dispute that mathematicians do seem to reason in ways that are similar to using probabilities, but I’d like to know where these “probabilities” are coming from and whether the reasoning process really is isomorphic to probability theory. What you call “heuristic” and “intuition” are the results of computations being done by the brains of mathematicians, and it would be nice to know what the algorithms are (or should be), but we don’t have them even in an idealized form.
The most influential books on the topic I know of are Mathematics and Plausible Reasoning Volume I: Induction and Analogy in Mathematics, and Mathematics and Plausible Reasoning Volume II: Patterns of Plausible Reasoning (amazon link) by George Pólya.
There’s a difficulty here involving the fact that every finite set of possible counterexamples has measure zero in the set {(a, b, c, n) | a, b, c, n in N} equipped with the probability measure that assigns each possible counterexample an equal probability.
So all the usual biases and cognitive gotchas involving probability and infinity come into play (even when the people doing the thinking are mathematicians!), and all bets are off.
My hypothesis is that the commonly used prior for counterexample distributions is exponential. As the lower bound K >= a, b, c, n on possible counterexamples increases, the exponential is updated into something close to uniform on the rest of {(a, b, c, n) | a, b, c, n in N}.
This does somewhat dodge the question, but it does make a difference that an infinite set of counterexamples can be associated with each counterexample. That is, if (a,b,c,n) is not a solution to the Fermat equation, then (ka,kb,kc,n) isn’t either for any positive integer k.
And, mid-2010, Scott Aaronson also literally put down a large bet that P != NP.
If you’re thinking of this, you’re misremembering—that bet (of $200,000) was that Vinay Deolalikar’s recent P != NP paper would not win the Clay Millenium prize. (In the comments, Aaronson says “P≠NP is exactly the ‘expected’ answer!”; the bet was his way of keeping people from accusing him of rushing to judgment when he said that he didn’t think the paper would turn out to successfully prove that expected answer even though he hadn’t read the whole thing.)
Ah, you’re entirely right. I didn’t misremember—I read his blog rather religiously. I just apparently wasn’t quite awake when I wrote what he was betting on.
I should also clarify that he didn’t have anyone matching even a lesser amount in the case that the paper was indeed unsuccessful (which it appears to be as it stands, but Aaronson’s bet gives it a while to correct problems). His goal, which didn’t exactly work, was to get people to stop asking him about the paper. I say it didn’t work, because he probably got even more people commenting on the bet, and still a large number asking about the paper.
Here are some papers I found recently that seem to represent the state of the art on the issue of how to deal with uncertainty in mathematics. None of them really get very far beyond “let’s try to apply probability theory”, but at least the problem is not being completely ignored by mathematicians and philosophers.
Bayesianism in mathematics, a book chapter in Towards a Philosophy of Real Mathematics by David Corfield
Non-deductive Logic in Mathematics, by James Franklin
Bayesian Networks for Logical Reasoning, by Jon Williamson
One could say that most of math is already about uncertainty: when you have a system and ways of refining it, it is in a way a form of applying knowledge to resolve uncertainty. For example, applying a function to a parameter, or combining morphisms. A lot of analysis is about approximation or representing systems that expect future observations. It is a very narrow sense of “dealing with uncertainty” that would require going to the fringe.
I don’t understand the point of your comment. It should have been clear from the context that by “dealing with uncertainty in mathematics” I did not mean things like proving or disproving a conjecture, thus resolving its uncertainty, but rather how to make bets involving mathematical statements that we don’t know how to either prove or disprove. Are you saying that the latter is not an important problem, or just that you don’t like that I’m using the phrase “dealing with uncertainty in mathematics” to refer to it?
You don’t have to resolve all of uncertainty in one go. For example, you could restrict a function to part of a domain, thus deciding that it is only this part that you are interested in, instead of the whole thing.
What you seem to mean is non-rigorous methods for making uncertain conclusions about mathematical structures. It is about dealing with uncertainty about mathematics on non-mathematical level of rigor. Correct?
Yes, something like that, except that “non-rigorous” seems too prejudicial. Why not just “methods for making uncertain conclusions about mathematical structures”, or “dealing with uncertainty about mathematics”?
(Retracted; useless speculation.)
Is there a best language in which to express complexity for use in the context of Occam’s razor.
If there is a best language in which to express complexity for use in the context of Occam’s razor, what is that language?
I suspect the answer is no, at least not the kind of formal languages that have been suggested so far. The problem is this: as soon as you define a formal language, I can say “the lexicographically first object which can’t be described in less than a million bits in your language”. Given the uniqueness of this object, why should it be a priori as unlikely as a random million-bit string?
Note that in general, this sort of specification is uncomputable. To find the lexicographically first object which can’t be described in less than a million bits, we would have to make a list of all objects which can be described in less than a million bits (and return the first thing that’s not on the list). But we don’t know if our UTM will halt for any given string with less than a million bits, so we can never know if our list is complete.
Yes, and that’s sort of intentional. I was trying to come up with a mathematical model of an agent that can deal with uncomputable physics. The physics of our universe seems likely to be computable, but there is no a priori reason to assume that it must be. We may eventually discover a law of physics that’s not computable, or find out that we are in a simulation running inside a larger universe that has uncomputable physics. Agents using UTM-based priors can’t deal with these scenarios.
So I tried to find a “better”, i.e., more expressive, language for describing objects, but then realized that any fixed formal language has a similar problem. Here’s my current idea for solving this: make the language extensible instead of fixed. That is, define a base language, and a procedure for extending the language. Then, when the agent encounters some object that can’t be described concisely using his current language, he recursively extends it until a short description is possible. What the extension procedure should be is still unclear.
I agree that there are very interesting questions here. We have quite natural ways of describing uncomputable functions very far up the arithmetical hierarchy, and it seems that they can be described in some kind of recursive language even if the things they describe are not recursive (using recursive in the recursion theory sense both times). Turing tried something like this in Systems of Logic Based on Ordinals (Turing, 1939), but that was with formal logic and systems where you repeatedly add the Godel sentence of a system into the system as an axiom, repeating this into the transfinite. A similar thing could be done describing levels of computability transfinitely far up the arithmetical hierarchy using recursively represented ordinals to index them. However, then people like you and I will want to use certain well defined but non-recursive ordinals to do the indexing, and it seems to degenerate in the standard kind of way, just a lot further up the hierarchy than before.
And then there are objects that are completely outside the arithmetical hierarchy, but we probably shouldn’t assign zero priors to either. Things like large cardinals, perhaps.
Another mystery is, why did evolution create minds capable of thinking about these issues, given that agents equipped with a fixed UTM-based prior would have done perfectly fine in our place, at least up to now?
We aren’t really Bayesian reasoning machines at all, and it isn’t really accurate to speak of us having a prior. We choose a prior in order to use Bayesian reasoning to analyze a situation, and we seek to bend our natural reasoning to a Bayesian template in order to improve its accuracy, but we cannot wholly succeed in doing so. So the problem you raise should worry someone building AGI, but it’s not realistic to imagine a human agent becoming so Bayesian that they swallow the Solomonoff prior whole and are literally unable to contemplate super-Turing Universes.
I don’t think it’s unreasonable, therefore, to adopt the Solomonoff prior as a useful model to aid reasoning and discussion, and rely on our human ability to make and adopt a new, super-Turing model if some more general prior would have favoured it.
Then how can you deal with these scenarios? Did the idiot God make you better equipped for this task, Oh uncomputable ape-brain?
The idea of agents using UTM-based priors is a human invention, and therefore subject to human error. I’m not claiming to have an uncomputable brain, just that I’ve found such an error.
For a specific example of how human beings might deal with such scenarios, compared to agents using UTM-based priors, see “is induction unformalizable?”.
The model of environment values observations and behaviors, not statements about “uncomputability” and such. No observation should be left out, declared impossible. If you, as a human, decide to trust in something you label “halting oracle”, that’s your decision, and this is a decision you’d want any trusted AI to carry through as well.
I suspect that the roots of this confusion are something not unlike mind projection fallacy, with magical properties attributed to models, but I’m not competent to discuss domain-specific aspects of this question.
Not a serious problem—just consider objects which can’t be described in less than a million bits after a million timesteps instead.
If a problem is so serious that you have to change your prior, then I’d call that a serious problem.
Did you understand why the problem is not serious? Your comment was nit-picking the example. Other similar examples make exactly the same point—but are not vulnerable to such nit-picking.
It doesn’t sound too serious—if all the candidate languages have the same problem.
It is hard to deny that some languages are better than other ones, and so therefore there is a set of “best” ones. The intent of my first question was more along the lines of: is one formulation of Occam’s razor optimal for all the different situations in which Occam’s razor is used—or should we be using different descriptive languages under different circumstances.
That depends on what you mean by “best”.
Is speed of calculation important? What about suitability for humans? I guess you want one where complexities are as small as possible.
Given 2 languages, L1 & L2, and their complexity measures, K1 & K2.
If K1(L2) < K2(L1) then I take that as a sign that L1 is better for use in the context of Ockhams razor. It is also a sign that L1 is more complex than L2, but that effect can be removed by doing lots of comparisons like that, so the unnecessarily complex languages loose against those that are actually good. Or one can use 3 languages in a comparison: K1(L3) < K2(L3) is a sign that L1 is better.
The idea here is that a good language should more easily represent systems, such as other languages.
What sort of languages are best in this measure? Not the simplest ones, but rather the more expressive ones. Programs for Turing machines use some complexity for tape traversing and storage systems, so I guess they are not so good. Cellular automata have the same problems. Combinatory Logic is very simple, with complex operators even for simple stuff, but its system for storage and access can be simpler, since it is a tree. I guess Lambda Calculus is one of the better languages, as it is both simple and very expressive, because of its more direct tree structure.
I guess the best language would use an even more expressive structure, such as graphs, perhaps bi-graphs, since those are maximally expressive. The Scheme language can be used as a general graph language thanks to the set-car! and set-cdr! procedures.
“Best”: most accurate—i.e. when Occam’s razor says A is a more likely hypothesis than B, then that is actually true.
Then I would go for Turing machines, Lambda calculus, or similar. These languages are very simple, and can easily handle input and output.
Even simpler languages, like cellular automaton No.110 or Combinatory Logic might be better, but those are quite difficult to get to handle input and output correctly.
The reason simple languages, or universal machines, should be better, is that the upper bound for error in estimating probabilities is 2 to the power of the complexity of the program simulating one language in another, according to algorithmic information theory.
So, the simpler the language is, the more correct relative probabilities it will give.
The answer I gave before this one, was for the question: Maximize the likelihood that the language will produce the output corresponding to the observed events.
You however want to compare 2 hypotheses, getting 2 probabilities, and compare them. In this case the absolute size of the probabilities do not matter, just their relative size, so you can just go for the simplest language that is practical to use.
What do you want to use this for?
Using simple languages is the conventional approach. However, simple languages typically result in more complex programs. The game of life is very simple—yet try writing a program in it. If you are trying to minimise the size of an emulator of other languages, then highly simple languages don’t seem to fit the bill.
Why would one want a decent formulation of Occam’s razor? To help solve the problem of the priors.
I agree. We seem to have the same goal, so my first advice stands, not my second.
I am currently trying to develop a language that is both simple and expressive, and making some progress. The overall design is finished, and I am now down to what instructions it should have. It is a general bi-graph, but with a sequential program structure, and no separation of program and data.
It is somewhat different from what you want, as I also need something that have measurable use of time and memory, and is provable able to run fast.
Could I have a link, or maybe some more information about this language? It’s something I’m very interested in (merging expressiveness and guarantees about resource usage).
I’m sure that’s not possible to do in general (in an algorithmic fashion), since it would solve certain mathematical problems in a flash that aren’t supposed to be solved in polynomial time or even efficiently (like particular halting problems).
You’ll have to hope for something less.
I think Tim’s intended meaning was more along the lines of “a language L such that when A is a more likely hypothesis than B, L’s MDL for A is shorter than its MDL for B more often than for any other language”.
Here are some more problems that have come up on LW:
What is the right probability/decision theory for running on error-prone hardware?
What is my utility function?
Meta-philosophy
How to think about “all mathematical structures” when no formal language is sufficiently expressive?
What is the nature of anticipation/surprise/disappointment/good and bad news? (For example, surprise wouldn’t be a cognitive feature of an ideal utility maximizer. What purpose, if any, does it serve in us?)
What does it mean to deal rationally with moral uncertainty? If Nick Bostrom and Toby Ord’s solution is right, how do we apply it in practice?
ETA: this isn’t a “clearly defined” question in the sense you mentioned, but I’ll leave it up anyway; apologies
As I pointed out in that thread, their solution doesn’t work. You would need to choose an aggregation mechanism to combine votes. Different mechanisms will cause different systematic outcomes. Notably, some mechanisms will result in always choosing actions from one category; some mechanisms will result in sampling from different categories proportionally to their votes (much as, eg., the American system always chooses the most popular candidate, resulting in a 2-party system equilibrium; many European systems allocate seats proportionally to votes, allowing equilibria with more than 2 parties.)
You need to choose which kind of outcome you prefer in order to choose your aggregation mechanism, in order to implement their solution. But if you could do that, you wouldn’t need their solution in the first place!
Is this really the case? It’s doesn’t seem true of axiomatic approaches to decision-theory in general, so is there a special reason to think it should be true here?
I guess I would view the parliamentary mechanism more as an intuition pump than a “solution” per se. It may well be that, having thought through it’s implications, it will turn out that the results can be represented in (say) the standard vNM framework. Nonetheless, the parliamentary model could still be helpful in getting a handle on the nature of the “utility” functions involved.
As an aside, it seems as though their parliamentary approach could potentially be modeled more effectively using co-operative game theory than the more standard non-cooperative version.
I just gave the reason. “Some mechanisms will result in always choosing actions from one category; some mechanisms will result in sampling from different categories proportionally to their votes.”
The aggregation mechanism is a lot like the thread priority system in a computer operating system. Some operating systems will always give the CPU to the highest-priority task. Some try to give tasks CPU time proportional to their priority. Likewise, some aggregation mechanisms will choose the most popular option; some choose options with probability proportional to their popularity, never giving any voice to minority opinions. You have to choose which type of aggregation mechanism to use. But this choice is exactly the sort of choice that the parliament is supposed to be producing as output, not requiring as input.
I wonder if it would work to renormalize utility so that the total of everything that’s “at stake” (in some sense that would need to be made more precise) is always worth the same?
Probably this gives too much weight to easy-to-achieve moralities, like the morality that says all that matters is whether you’re happy tomorrow? It also doesn’t accommodate non-consequentalist moralities.
But does it ever make sense to respond to new moral information by saying, “huh, I guess existence as a whole doesn’t matter as much as I thought it did”? It seems counterintuitive somehow.
I can’t follow your comment. I would need some inferential steps filled in, between the prior comment, and the first sentence of your comment, and between every sentence of your comment.
It’s still a great question. I do not know how far we can get formalizing it, but my hunches tell we could get pretty close.
2. Is there a practical method to reach Aumann agreement?
Scott Aaronson has a STOC paper on this: http://arxiv.org/abs/cs.CC/0406061
Here’s an open problem that’s been on my mind this past week:
Take some controversial question on which there are a small number of popular opinions. Draw a line going from 0 on the left to 1 on the right. Divide that line into segments for each opinion that holds > 1% of opinion-space.
Now stratify the population by IQ into 10-point intervals. Redo the process, drawing a new line from 0 to 1 for each IQ range and dividing it into segments. Then stack your 0-1 line segments up vertically. Connect the sections for the same opinions in each IQ group.
What does the resulting picture look like? Questions include:
Is there a wider range of popular opinions (technically, a larger entropy) at the bottom or at the top?
Are there opinions that have maximum popularity at an intermediate IQ level?
Are there opinions that have minimum popularity at an intermediate IQ level?
Other variables could be used on the vertical axis. In general, continuous variables (IQ, educational level, year of survey, income) are more interesting to me than category variables (race, sex). I’m trying to get at the question of how systematic misunderstandings are, how reliably increasing understanding increases accuracy, and whether increasing understanding of a topic increases or decreases the chances of agreement on it.
I noticed recently my impression that certain wrong opinions reliably attract people of a certain level of understanding of a problem. In some domains, increasing someone’s intelligence or awareness seems to decrease their chances of correct action. This is probably because people have evolved correct behaviors, and figure out incorrect behaviors when they’re smart enough to notice what they’re doing but not smart enough to figure out why. But it’s then interesting that their errors would be correlated rather than random.
If there were a common pattern, you could sample the opinions in a population, or over time, and estimate how much longer it would take, or how much knowledge would be needed, to arrive at a correct opinion.
Another problem, although to what degree it is currently both “open” and relevant is debatable: finding new loopholes in Arrow’s Theorem.
Can you even have a clearly defined problem in any field other than Mathematics, or one that doesn’t reduce to a mathematical problem regardless of the field where it originated?
Some people like mathematics so much that they define it as encompassing all clearly defined problems. Some people like philosophy so much that they define it as encompassing all problems whatsoever. Both of these definitions are clearly wrong, and the product of affective death spirals.
You say that as if a problem can only belong to one field.
See Wikipedia’s list for a few examples.
The distinction between clearly defined and otherwise is somewhat subjective. I have not heard anyone talking about the subject yet, so brought it up.
Since Rationality seems to be strongly related to Bayes’ theorem, it makes some sense that a lot of problems could be presented in a fashion where we only have to answer a few questions about priors to understand which actions to take.
I don’t know if this answers your question.
It’s the best possible kind of answer to my question—a link to a load of interesting stuff—thanks!
I see where I went wrong, in missing out the entire physical universe as a source of questions that can be clearly stated but are about real things rather than mathematical descriptions of them.
I’ve been lurking here for a while now and thought I had something to add for the first time, so Hi all, thanks for all the great content and concepts; on to the good stuff:
I think a good open problem for the list would be: a formal (or a good solid) defintion of rationality. I know of things like BDI architecture and pareto optimality but how do these things apply to a human rational being. For that matter, how do you reason logically/formally about a human being? What would be a good abstraction/structure, are there any guidelines?
well, just my 2 formal cents.
compare to: http://lesswrong.com/lw/x/define_rationality/
I think that problem is too hard for us. As a general rule, sweeping definitions made without respect to a particular purpose are of limited utility. But thanks for commenting, and please don’t let my response deter you from commenting again.
Well, I can give some classes of problems. For instance, many of the biases that we know about, we don’t really know good ways for humans to reliably correct for. So right there is a whole bunch of open problems. (I know of some specific ones with known debiasing techniques, but many are “okay, great, I know I make this error… but other than occasionally being lucky enough to directly catch myself in the act of doing so, it’s not really obvious how to correct for these”)
Another, I guess vaguer one would be “general solution that allows people to solve their akrasia problems”
How do we handle the existence of knowledge which is reliable but cannot be explained? As an example, consider the human ability to recognize faces (or places, pieces of music, etc). We have nearly total confidence in our ability to recognize people by their faces (given enough time, good lighting, etc). However, we cannot articulate the process by which we perform face recognition.
Imagine you met a blind alien, and for whatever reason needed to convince it of your ability to recognize people by face. Since you cannot provide a reasonable description of your face recognition process, you are essentially in the position of saying “I’m totally sure of the identity of the person I saw, but I cannot explain why I am so certain”.
Quite a bit is known about the neurology behind face recognition. No one understands the algorithm well enough to build a fusiform gyrus from scratch, but that doesn’t mean the fact that there is an algorithm is mysterious.
Even if we did not have any understanding of the neurology, I’m not sure why pointing to an empirical record of successful face recognition shouldn’t be fairly convincing. Is the point that we could be lying about our record?
(In the specific example given, you could probably get a fair bit of mileage from explaining the nature of vision, even without the specifics of face-recognition. I’m not really sure what broader lesson that might have though, as I don’t fully understand the nature of the question you’re asking.)
This one’s well-known in certain quarters (so it’s not really open), but it may provide training for those unfamiliar with it.
Suppose that you observe two random samples from the uniform distribution centered at unknown location c with width 1. Label the samples x_max and x_min. The random interval (x_min, x_max) is a 50% confidence interval for c because it contains c* with 50% probability.
What is the practical problem with this confidence interval?
Do better.
* changed from “deviates” per comment below
“The uniform distribution centered at c” does not seem to make sense. Did you perchance mean the Gaussian distribution? Further, ‘deviates’ looks like jargon to me. Can we use ‘samples’? I would therefore rephrase as follows, with specific example to hang one’s visualisation on:
Heights of male humans are known to have a Gaussian distribution of width 10 cm around some central value ; unfortunately you have forgotten what the central value is. Joe is 180 cm, Stephen is 170 cm. The probability that is between these two heights is 50%; explain why. Then find a better confidence interval for .
I mean the continuous uniform distribution). “Centered at c” is intended to indicate that the mean of the distribution is c.
ETA: Let me be specific—I’ll use the notation of the linked Wikipedia article.
You know that b—a = 1.
c = (a + b)/2 is unknown, and the confidence interval is supposed to help you infer it.
If exactly half of all men have a height less than the central value c, than randomly picking sample will have a 50% chance of being below c. Picking two samples (A and B) results in four possible scenarios:
A is less than c; B is greater than c
A is less than c; B is less than c
A is greater than c; B is greater than c
A is greater than c; B is less than c
The interval created by (A, B) contains c in scenarios (1) and (4) and does not contain c in scenarios (2) and (3). Since each scenario has an equal chance of occurring, c is in (A, B) 50% of the time.
That is as far as I got just thinking about it. If I am on the right path I can keep plugging away.
In the Gaussian case, you can do better than (A, B) but the demonstration of that fact won’t smack you in the face they way it does in the case of the uniform distribution.
One thing you can do in the uniform case is shorten the interval to at most length 1⁄2. Not sure if that’s face-smacking enough.
You can do better than that. If the distance between the two data points is 7⁄4, you can shrink the 100% confidence interval to 1⁄4, etc. (The extreme case is as the distance between the two data points approaches 2, your 100% confidence interval approaches size zero.)
EDIT: whoops, I was stupid. Corrected 3⁄4 to 7⁄4 and 1 to 2. There, now it should be right
Do we know the heights of the men A and B? If so, we can get a better estimate of whether c lies between their heights by taking into account the difference between A and B...
That’s the basic idea. Now apply it in the case of the uniform distribution.
If all men are (say) within 10 cm of each other, and the heights are uniformly distributed...
… if we have two men who are 8 cm apart, then c lies between their heights with 80% probability?
Getting there… 80% is too low.
Wait, what? It must be 100%...
That’s it. The so-called 50% confidence interval sometimes contains c with certainty. Also, when x_max—x_min is much smaller than 0.5, 50% is a lousy summary of the confidence (ETA: common usage confidence, not frequentist confidence) that c lies between them.
If it’s less than 0.5, is the confidence simply that value times 2?
“Confidence” in the statistics sense doesn’t always have much to do with how confident you are in the conclusion. Something that’s the real line in half of all cases and the empty set in the other half of all cases is a 50% confidence interval, but that doesn’t mean you’re ever 50% confident (in the colloquial sense) that the parameter is on the real line or that the parameter is in the empty set.
The Credible interval article on Wikipedia describes the distinction between frequentist and Bayesian confidence intervals.
The general pattern here is that there’s something you do care about and something you don’t care about, and frequentism doesn’t allow you to talk about the thing you do care about, so it renames the thing you don’t care about in such a way as to suggest that it’s the thing you do care about, and everyone who doesn’t understand statistics well interprets it as such.
The interesting thing about the confidence interval I’m writing about is that it has some frequentist optimality properties. (“Uniformly most accurate”, if anyone cares.)
Well. So if all men were within 10 cm of each other, and uniformly distributed, and we plucked 2 random men out, and they were 4cm apart, would c be between them with 80% probability? Or some other value?
The shorter man can be between c-5 and c+1 with all values equally probable, if he’s between c-5 and c-4 or c and c+1 then c is not between them, if he’s between c-4 and c then c is between them, so assuming a uniform prior for c the probability is 2⁄3 if I’m not mistaken.
Ah, I see what I did wrong. I think.
Yup. Under the uniform prior the posterior probability that c is between the two values is d/(1 - d), 0 < d < 0.5, where d = x_max—x_min (and the width of the uniform data distribution is 1).
The answer to that depends on what you know about c beforehand—your prior probability for c.
It’s not between them if the shorter man is 4-5 cm shorter than average or 0-1 cm taller than average, so yes, 80% assuming a uniform prior for c.
Whoops—“confidence” is frequentist jargon. I’ll just say that any better method ought to take x_max—x_min into account.
One such problem has already come up:
1. How can we train rationality?
A lot of our irrationality seems to be rationality tuned for small sample sizes. When you live in a tribe of <200 people, any given event or opinion has a lot of weight. We evolved to do science on a small scale. How do we get around Dunbar’s limit?
This isn’t a formal problem that can be “solved” with a formal solution. I am specifically talking about problems like the Angel problem or P = NP.
Examples I can think of from the top of my head are Newcomb’s problem and the Prisoner’s dilemma. Both of these can be expressed formally with Bayesian terms. Have the problems been solved? I assumed so or I would have brought them up in my post.
For fun, I am starting to work out what is needed to tackle Newcomb’s problem and it certainly seems doable. I figured it is a good test of my new Bayesian skillz. Game theory claims to have “solved” one-shot PDs but not in a way that makes sense in helping someone decide what to do in a real life example. Newcomb’s seemed easier so I am starting with that.
Ok, I had interpreted the scope more widely than you intended.
I believe Eliezer has a formal analysis of Newcomb’s problem, but I don’t know if he’s published it anywhere.
There are a fair number of formal analyses of Newcomb’s problem. I particularly like this one:
D.H. Wolpert and G. Benford, What does Newcomb’s paradox teach us? (showing that the standard approaches to the paradox encode fundamentally different—and inconsistent—views about the nature of the decision problem, and clearing up a number of other confusions.)
Newcomb’s problem seems to disappear under any known formalization, and as far as I can tell from that thread and all others Eliezer doesn’t have any good formalization which preserves its paradoxical nature
This does somewhat dodge the question, but it does make a difference that an infinite set of counterexamples can be associated with each counterexample. That is, if (a,b,c,n) is not a solution to the Fermat equation, then (ka,kb,kc,n) isn’t either for any positive integer k.
This is not a game question, but it may be an interesting question regarding decision making for humans:
What is the total Shannon entropy of the variables controlling whether or not a human will do what it consciously believes will lead to the most desirable outcome?
If all humans currently alive collectively represent every possible variable combination in this regard, the maximum value for the answer is 32.7 bits[1]. That is, 33 on/off switches completely decide whether or not you will put off doing your homework[2]. Is the correct value higher or lower?
Some variables might be aggregated from analog sources, such as adrenaline level, in combination with a series of thresholds specific to the individual.
Two to the power of 32.7 is the current world population.
Substitute “homework” with whatever you desire.
I made up this question just now, and suspect it may be insanely stupid.
Much, much higher. The humans currently alive represent only a very sparse sampling of possible combinations of genes, and an even sparser sampling of possible combinations of life experiences. I don’t see any obvious reason why the answer to this question shouldn’t be greater than the number of subatomic particles in your body.
Clarification: I am only talking about direct inputs to the decision making process, not what they’re aggregated from (which would be the observable universe).
Due to chaotic / non-linear effects, you’re not going to get anywhere near the compression you need for 33 bits to be enough...I’m very confident the answer is much much higher...
How do you build a smart, synthetic goal-seeking agent? I believe there are some associated problems as well.
Well, this is unspeakably late, but better late than never.
I was beginning to think in all seriousness that none of you would ever begin asking what questions you should be asking. It’s nice to be proven wrong.
Well, it doesn’t seem all that long to me. We’re not in one of Eliezer’s stories after all.
And if you get enough philosophically-minded folks around a table, we spend the first hour or so talking about the table, and then the next hour or so talking about our discussion of the table.
EDIT: changed ‘they’ and ‘their’ to ‘we’ and ‘our’