Infinitely many hypotheses increase in probability. What good is that? You have infinite possibilities before you and haven’t made progress towards picking between them.
When you say “this infinite set over here, its probability increases” you aren’t reaching an answer. You aren’t even getting any further than pure deduction would have gotten you.
Look, there’s two infinite sets: those contradicted by the evidence, and those not (deal with theories with “maybes” in them however you like, it does not matter to my point). The first set we don’t care about—we all agree to reject it. The second set is all that’s left to consider. if you increase the probability of every theory in it that doesn’t help you choose between them. it’s not useful. when you “confirm” or increase the probability of every theory logically consistent with the data, you aren’t reaching an answer, you aren’t making progress.
The progress is in the theories that are ruled out. When playing cards, you could consider all possible histories of the motions of the cards that are compatible with the evidence. Would you have any problem with making bets based on these probabilities? Solomonoff induction is very similar. While there are an infinite number of possibilities, both cases involve proving general properties of the distribution rather than considering each possibility individually.
In the future please capitalize your sentences; it improves readability (especially in large paragraphs).
If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.
The Solomonoff prior is really just a from of the principle of insufficient reason, which states that if there is no reason to think that one thing is more probable than another, they should be assigned the same probability. Since there are an infinite number of theories, we need to take some kind of limit. If we encode them as self-delimiting computer programs, we can write them as strings of digits (usually binary). Start with some maximum length and increase it toward infinity. Some programs will proceed normally, looping infinitely or encountering a stop instruction, making many programs equivalent because changing bits that are never used by the hypothesis does not change the theory. Other programs will leave the bounds of the maximum length, but this will be fixed as that length is taken to infinity.
This obviously isn’t a complete justification, but it is better than Popperian induction. Both rule out falsified theories and both penalize theories for unfalsifiability and complexity. Only Solomonoff induction allows us to quantify the size of these penalties in terms of probability. Popper would agree that a simpler theory, being compared to a more complex one, is more likely but not guaranteed to be true, but he could not give the numbers.
If you are still worried about the foundational issues of the Solomonoff prior, I’ll answer your questions, but it would be better if you asked me again in however long progress takes (that was supposed to sound humourous, as if I were describing a specific, known amount of time, but I really doubt that that is noticable in text). http://lesswrong.com/r/discussion/lw/534/where_does_uncertainty_come_from/ writes up some of the questions I’m thinking about now. It’s not by me, but Paul seems to wonder about the same issues. This should all be significantly more solid once some of these questions are answered.
“If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.”
That’s it? That is trivial, and doesn’t solve the major problems in epistemology. It’s correct enough (I’m not convinced theories have probabilities, but I think that’s a side issue) but it doesn’t get you very far. Any old non-Bayesian epistemology could tell you this.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better. This ordering has nothing empirical about it. Your approach here is not based on evidences or probabilities, just an ordering. Correct me if I got that wrong. That raises the question: why is the Solomonoff ordering correct? Why not some other ordering? Here’s one objection: “God did everything” is a short theory which is compatible with all evidence. You can make separate versions of it for all possible sets of predictions if you want. Doesn’t that mean we’re either stuck with some kind of “God did everything” or the final truth is even shorter?
You mention “Popperian induction”. Please don’t speak for Popper. The idea that Popper advocated induction is a myth. A rather crass one; he refuted induction and published a lot of material against it. Instead, ask me about his positions, OK? Popper would not agree that the simpler theory is “more likely”. There’s many issues here. One is that Popper said we should prefer low probability theories because they say more about the world.
You seem to present “Popperian induction” as an incomplete justification. Maybe you are unware that Popper’s epistemology rejects the concept of justification itself. It is thus a mistake to criticize it on justificationist grounds. It isn’t any type of justification and doesn’t want to be.
In order to quote people, you can use a single greater than sign ‘>’ at the beginning of a line.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
Note I said that and a prior. The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act. From Wikipedia: ‘Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures’, where a statistical procedure may be taken as a guide for optimal action.
Sorry about saying ‘Popperian induction’. I only have a basic knowledge of Popper. Would Popper say that predicting the results of actions is (one of) the goals of science? This is, of course, slightly more general than induction.
Wikipedia quotes Popper as saying simpler theories are to be preferred ‘because their empirical content is greater; and because they are better testable’. Does this mean that he would bet something important on this? If there were two possible explanations for a plague and if the simpler one were true than, with medicine, we could save 100 lives but if the more complex one were true we could save 200 lives, how would you decide which cure the factories should manufacture (and it takes a long time to prepare the factories or something so you can only make one type of cure).
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better.
It is exactly not about this. The reason to prefer simpler theories is that more possible universes correspond to them. For a simple universe, axioms 1 through 10 have to come out the right way, but the rest can be anything, as they are meaningless since the universe is already fully specified. For a more complex theory, axioms 11-15 must also turn out a certain way, so fewer possible universe are compatible with this theory. I would also add the principle of sufficient reason, which I think is likely, as further justification for Occam’s razor, but that is irrelevant here.
Popper said we should prefer low probability theories because they say more about the world.
This seems wrong. Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act.
Popperian epistemology doesn’t assign probabilities like that, and has a way to act. So would you agree that, if you fail to refute Popperian epistemology, then one of your major claims is wrong? Or do you have a backup argument: you don’t have to, but you should anyway because..?
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
Secondary sources about Popper, like wikipedia, are not trustworthy. Popper would not bet anything important on that simpler theories thing. That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
The way to make decisions is by making conjectures about what to do, and criticizing those conjectures. We learn by critical, imaginative argument (including within one mind). Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified. Have I misunderstood again?
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
I’m not totally sure what your concept of a universe or axiom is here. Also I note that the real world is pretty complicated.
Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
Popperian epistemology doesn’t assign probabilities like that, and has a way to act.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences? Otherwise, your choices must be describable using terms of a prior probability and Bayesian updating (and, while we already agree that the latter is obvious, here we are using it to update a set of probabilities and, on the pain of inconsistency, our new probabilities must have that relationship to our old ones).
Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
Definitely use all the evidence when making decisions. I didn’t mean for my example to be complete. I was wondering how a question like that could be addressed in general. What pieces of information would be important and how would they be taken into account? You can assume that the less relevant variables, like which disease is more painful, are equal in both cases.
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
I may have been unclear here. I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified.
No, not at all. The more likely theories are those that include small amounts of theory, not small amounts of prediction. Eliezer discusses this in the sequences here, here, andhere. Those don’t really cover Solomonoff induction directly, but they will probably give you a better idea of what I’m trying to say than I did. I think Solomonoff induction is better covered in another post, but I can’t find it right now.
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
Sorry, I was abusing one word ‘theories’ to mean both ‘individual descriptions of the universe’ and ‘sets of descriptions that make identical predictions in some realm (possibly in all realms)‘. It is a very natural place to slip definitions, because, for example, when discussing biology, we often don’t care about the distinction between ‘Classical physics is true and birds are descended from dinosaurs.’ and ‘Quantum physics is true and birds are descended from dinosaurs.’ Once enough information is specified to make predictions, a theory (in the second sense) is on equal ground with another theory that contains the same amount of information and that makes different predictions only in realms where it has not been tested, as well as with a set of theories for which the set can be specified with the same amount of information but for which specifying one theory out of the set would take more information.
That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
All of this is present is Bayesian epistemology.
Consider Bayes theorem, with theories A and B and evidence E:
P(A|E) = P(E|A) P(A) / P(E)
Let’s look at how the probability of a theory increases upon learning E, using a ratio.
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B. This means that the more theory A narrowly predicts E, the actual observation, to the exclusion of other possible observations, the more probability we assign to it. This is a quantitative form of Popper’s preference for more specific and more easily falsifiable theories, as proven by Bayes theorem.
The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?)
That’s basically what Solomonoff means by prior probability.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences?
Yes Popper is non-consequentialist.
Consequentialism is a bad theory. It says ideas should be evaluated by their consequences (only). This does not address the question of how to determine what are good or bad consequences.
If you try to evaluate methods of determining what are good or bad consequences, by their consequences, you’ll end up with serious regress problems. If you don’t, you’ll have to introduce something other than consequences.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
For example, I might assert that a consequence is good if it makes me happy, and bad if it makes me unhappy. (I don’t in fact assert this.) I would then conclude that an idea is good if its consequences make me happy, and bad if its consequences make me unhappy. No regress involved.
(And note that this is different from saying that an idea is good if the idea makes me happy. If it turns out that the idea “I could drink drain cleaner” makes me happy, but that actually drinking drain cleaner makes me unhappy, then it’s a bad idea by the first theory but a good idea by the second theory.)
A certain amount of precision is helpful when thinking about these sorts of things.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
...
A certain amount of precision is helpful when thinking about these sorts of things.
If you reread the sentence in which I discuss a regress, you will notice it begins with “if” and says that a certain method would result in a regress, the point being you have to do something else. So it was your mistake.
That is not what I meant by consequentialism, and I agree that that theory entails an infinite regress. The theory I was referring to, which is the first google result for consequentialism, states that actions should be judged by their consequences.
That theory is bad too. For one thing, you might do something really dumb—say, shoot at a cop—and the consequence might be something good, e.g. you might accidentally hit the robber behind him who was about to kill him. you might end up declared a hero.
For another thing, “judge by consequences” does not answer the question of what are good or bad consequences. It tells us almost nothing. The only content is don’t judge by anything else. Why not? Beats me.
If you mean judge by rationally expected consequences, or something like that, you could drop the first objection but I still don’t see the use of it. If you merely want to exclude mysticism I think we can do that with a lighter restriction.
Sorry, I didn’t explain this very well. I don’t use consequentialism to judge people, I use it to judge possible courses of action. I (try to) make choices with the best consequences, this fully determines actions, so judgments of, for example, who is a bad person, do not add anything.
You are right that this is very broad. My point is that all consequentialist decision rules are either Bayesian decision rules or limits of Bayesian decision rules, according to a theorem.
I didn’t discuss who is a bad person. An action might be bad but have a good result (this time) by chance. And you haven’t said a word about what kinds of consequences of actions are good or bad … I mean desirable or undesirable. And you haven’t said why everything but consequences is inadmissible.
In your example of someone shooting a police officer, I would say that it is good that the police officer’s life was saved, but it is bad that there is a person who would shoot people so irresponsibly and I would not declare that person a hero as that will neither help save more police officers or reduce the number of people shooting recklessly; in fact, it would probably increase the number of reckless people.
I don’t want to get into the specifics of morality, because it is complex. The only reason that I specified consequentialist decision making is that it is a condition of the theorem that proves Bayesian decision making to be optimal. Entirely nonconsequentialist systems don’t need to learn about the universe to make decisions and partially consequentialist systems are more complicated. For the latter, Bayesianism is often necessary if there are times when nonconsequentialist factors have little import to a decision.
it is bad that there is a person who would shoot people so irresponsibly
You are here judging a non-action by a non-consequence.
Yes, this is a non-action; I often say it is bad that as shorthand for cetris paribus, I would act so as to make not be the case. However, it is a consequence of what happened before (though you may have just meant it is not a consequence of my action). Judgements are often attached to consequences without specifying which action they are consequences of, just for convenience.
I think you mean systems which ignore all consequences. Popper’s system does not do that.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B.
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
Examples include Newton’s Laws and Quantum Theory. They don’t say they happen sometimes but always, and that’s important. Good scientific theories are always like that. Even when they have a restricted, non-universal domain, it’s 100% within the domain.
Physics is currently thought to be deterministic. And even if physics was random, we would say that e.g. motion happens randomly 100% of the time, or whatever the law is. We would expect a law of motion with a call to a random function to still always be what happens.
PS Since you seem to have an interest in math, I’d be curious about your thoughts on this:
The article you sent me is mathematically sound, but Popper draws the wrong conclusion from it. He has already accepted that P(H|E) can be greater than P(H). That’s all that’s necessary for induction: updating probability distribution. The stuff he says at the end about H ← E being countersupported by E does not prevent decision making based on the new distribution.
Setting aside Popper’s point for a minute, p(h|e) > p(h) is not sufficient for induction.
The reason it is not sufficient is that infinitely many h gain probability for any e. The problem of dealing with those remains unaddressed. And it would be incorrect and biased to selectively pick some pet theory from that infinite set and talk about how it’s supported.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
It seems obvious that low probability theories are good. Since probabilities must add up to 100%, there can be only a few high-probability theories and, when one is true, there is not much work to be done in finding it, since it is already so likely. telling someone to look among low-probability theories is like telling them to look among nonapples when looking for possible products to sell, and it provides no way of distinguishing good low-prior theories, like quantum mechanics, from bad ones, like astrology.
Unfortunately, I cannot read that article, as it is behind a paywall. If you have access to it, perhaps you could email it to me at endoself (at) yahoo (dot) com .
ETA:
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
I was only talking about Popper’s idea of theories with high content. That particular analysis was not meant to address theories that predicted certain outcomes with probability 1.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
It’s a loose guideline for people about where it may be fruitful to look. It can also be used in critical arguments if/when people think of arguments that use it.
One of the differences between Popper and Bayesian Epistemology is that Popper thinks being overly formal is a fault not a merit. Much of Popper’s philosophy does not consist of formal, rigorous guidelines to be followed exactly. Popper isn’t big on rules of procedure. A lot is explanation. Some is knowledge to use on your own. Some is advice.
The more likely theories are those that include small amounts of theory, not small amounts of prediction.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
Define theories of that type for all possible sets of predictions. Then at any given time you will have infinitely many of them that predict all your data with 100% probability.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
No, it has tons of theory. God is a very complex concept. Note that ‘God did everything’ is more complex and therefore less likely than ‘everything happened’. Did you read http://lesswrong.com/lw/jp/occams_razor/ ?
How do you figure God is complex? God as I mean it simply can do anything, no reason given. That is its only attribute: that it arbitrarily does anything the theory its attached to cares to predict. We can even stop calling it “God”. We could even not mention it at all so there is no theory and merely give a list of predictions. Would that be good, in your view?
If ‘God’ is meaningless and can merely be attached to any theory, then the theory is the same with and without God. There is nothing to refute, since there is no difference. If you defined ‘God’ to mean a being who created all species or who commanded a system of morality, I would have both reason to care about and means to refute God. If you defined ‘God’ to mean ‘quantum physics’, there would be applications and means of proving that ‘God’ is a good approximation, but this definition is nonsensical, since it is not what is usually meant by ‘God. If the theory of ‘God’ has no content, there is nothing to discuss, but the is again a very unusual definition.
If there is no simpler description, then a list of predictions is better but, if an explanation simpler then merely a list of prediction is at all possible, then that would be more likely.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Essentially. It is simpler if it has a higher Solomonoff prior.
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
Yes, there is more than just predictions. However, prediction are the only things that tell us how to update our probability distributions.
Quoting from The Fabric of Reality, chapter 1, by David Deutsch.
Yet some philosophers — and even some scientists — disparage the role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of experiments: its entire content lies in its predictive formulae. They consider that any consistent explanation that a theory may give for its predictions is as good as any other — or as good as no explanation at all — so long as the predictions are true. This view is called instrumentalism (because it says that a theory is no more than an ‘instrument’ for making predictions). To instrumentalists, the idea that science can enable us to understand the underlying reality that accounts for our observations is a fallacy and a conceit. They do not see how anything a scientific theory may say beyond predicting the outcomes of experiments can be more than empty words.
[cut a quote of Steven Weinberg clearly advocating instrumentalism. the particular explanation he says doesn’t matter is that space time is curved. space time curvature is an example of a non-predictive explanation.]
imagine that an extraterrestrial scientist has visited the Earth and given us an ultra-high-technology ‘oracle’ which can predict the outcome of any possible experiment, but provides no explanations. According to instrumentalists, once we had that oracle we should have no further use for scientific theories, except as a means of entertaining ourselves. But is that true? How would the oracle be used in practice? In some sense it would contain the knowledge necessary to build, say, an interstellar spaceship. But how exactly would that help us to build one, or to build another oracle of the same kind — or even a better mousetrap? The oracle only predicts the outcomes of experiments. Therefore, in order to use it at all we must first know what experiments to ask it about. If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction — even perfect, universal prediction — is simply no substitute for explanation.
Similarly, in scientific research the oracle would not provide us with any new theory. Not until we already had a theory, and had thought of an experiment that would test it, could we possibly ask the oracle what would happen if the theory were subjected to that test. Thus, the oracle would not be replacing theories at all: it would be replacing experiments. It would spare us the expense of running laboratories and particle accelerators.
[cut elaboration]
The oracle would be very useful in many situations, but its usefulness would always depend on people’s ability to solve scientific problems in just the way they have to now, namely by devising explanatory theories. It would not even replace all experimentation, because its ability to predict the outcome of a particular experiment would in practice depend on how easy it was to describe the experiment accurately enough for the oracle to give a useful answer, compared with doing the experiment in reality. After all, the oracle would have to have some sort of ‘user interface’. Perhaps a description of the experiment would have to be entered into it, in some standard language. In that language, some experiments would be harder to specify than others. In practice, for many experiments the specification would be too complex to be entered. Thus the oracle would have the same general advantages and disadvantages as any other source of experimental data, and it would be useful only in cases where consulting it happened to be more convenient than using other sources. To put that another way: there already is one such oracle out there, namely the physical world. It tells us the result of any possible experiment if we ask it in the right language (i.e. if we do the experiment), though in some cases it is impractical for us to ‘enter a description of the experiment’ in the required form (i.e. to build and operate the apparatus). But it provides no explanations.
In a few applications, for instance weather forecasting, we may be almost as satisfied with a purely predictive oracle as with an explanatory theory. But even then, that would be strictly so only if the oracle’s weather forecast were complete and perfect. In practice, weather forecasts are incomplete and imperfect, and to make up for that they include explanations of how the forecasters arrived at their predictions. The explanations allow us to judge the reliability of a forecast and to deduce further predictions relevant to our own location and needs. For instance, it makes a difference to me whether today’s forecast that it will be windy tomorrow is based on an expectation of a nearby high-pressure area, or of a more distant hurricane. I would take more precautions in the latter case.
[“wind due to hurricane” and “wind due to high-pressure area” are different explanations for a particular prediction.]
So knowledge is more than just predictive because it also lets us design things?
Here’s a solution to the problem with the oracle—design a computer that inputs every possible design to the oracle and picks the best. You may object that this would be extremely time-consuming and therefore impractical. However, you don’t need to build the computer; just ask the oracle what would happen if you did.
What can we learn from this? This kind of knowledge can be seen as predictive, but only incidentally, because the computer happen to be implemented in the physical world. If it were implemented mathematically, as an abstract algorithm, we would recognize this as deductive, mathematical knowledge. But math is all about tautologies; nothing new is learned. Okay, I apologize for that. I think I’ve been changing my definition of knowledge repeatedly to include or exclude such things. I don’t really care as much about consistent definitions as I should. Hopefully it is clear from context. I’ll go back to your original question.
Would a list of predictions with no theory/explanation be good or bad, in your view?
The difference between the two cases is not the same as the crucial difference here. Having a theory as opposed to a list of predictions for every possible experiment does not necessarily make the theorems easier to prove. When it does, which is almost always, this is simply because that theory is more concise, so it is easier to deduce things from. This seems more like a matter of computing power than one of epistemology.
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
And wouldn’t the oracle predict that the computer program would never halt, since it would attempt to enter infinitely many questions into the oracle?
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size, but, again, this is irrelevant; whatever trick you use to make this work, you will not change the conclusions.
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
I think figuring out what criteria you want is an example of a non-predictive issue. That makes it not beside the point. And if the computer picks the best according to criteria we give it, they will contain mistakes. We won’t actually get the best answer. We’ll have to learn stuff and improve our knowledge all in order to set up your predictive thing. So there is this whole realm of non-predictive learning.
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size,
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well. So your ability to get useful predictions out of the oracle depends on your understanding, not just on predicting anything.
I think figuring out what criteria you want is an example of a non-predictive issue.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well.
Good point. I guess you can’t bootstrap an oracle like this; some things possible mathematically, like calculating a function over an infinity of points, just can’t be done physically. My point still stands, but this illustration definitely dies.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
That’s it? That’s just not very impressive by my standards. Popper’s epistemology is far more advanced, already. Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
By ‘what Eliezer is working on now’ I meant AI, which would probably be necessary to extract my desires from my brain in practice. In principle, we could just use Bayes’ theorem a lot, assuming we had precise definitions of these concepts.
Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
Popperian epistemology is incompatible with Bayesian epistemology, which I accept from its own justification, not from a lack of any other theory. I disliked what I had heard about Popper before I started reading LessWrong, but I forget my exact argument, so I do not know if it was valid. From what I do remember, I suspect it was not.
So, you reject Popper’s ideas without having any criticism of them that you can remember?
That’s it?
You don’t care that Popper’s ideas have criticisms of Bayesian epistemology which you haven’t answered. You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
So, you reject Popper’s ideas without having any criticism of them that you can remember?
No, I brought up my past experience with Popper because you asked if my opinions on him came from Eliezer.
You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
No, I think Bayesian epistemology has been mathematically proven. I don’t spend a lot of time investigating alternatives for the same reason I don’t spend time investigating alternatives to calculus.
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know. You did bring up some things of this type, like that paper by Popper, but none of them have logically stood up, unless I am missing something.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It seems to me that saying it’s proven, the end, is incompatible with it having any flaws or unsolved problems or need for further research. So, which is it?
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know.
All of the above. It is wrong b/c, e.g., it is instrumentalist (has not understood the value of explanatory knowledge) and inductivist (induction is refuted). It is incomplete b/c, e.g. it cannot deal with non-observational knowledge such as moral knowledge. You haven’t proved much to me however I’ve been directed to two books, so judgment there is pending.
I don’t know how you concluded that none of my arguments stood up logically. Did you really think you’d logically refuted every point? I don’t agree, I think most of your arguments were not pure logic, and I thought that various issues were pending further discussion of sub-points. As I recall, some points I raised have not been answered. I’m having several conversations in parallel so I don’t recall which in particular you didn’t address which were replies to you personally, but for example I quoted an argument by David Deutsch about an oracle. The replies I got about how to try to cheat the oracle did not address the substantive point of the thought experiment, and did not address the issue (discussed in the quote) that oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle (we already have an oracle and none of the replies I got about use the oracle would actually work with the oracle we have). As I saw it, my (quoted) points on that issue stood. The replies were some combination of incomplete and missing the point. They were also clever which is a bit of fun. I thought of what I think is a better way to try to cheat the rules, which is to ask the oracle to predict the contents of philosophy books that would be written if philosophy was studied for trillions of years by the best people. However, again, the assumption that any question which is easily described in English can be easily entered into the oracle and get a prediction was not part of the thought experiment. And the reason I hadn’t explained all this yet is that there were various other points pending anyway, so shrug, it’s hard to decide where to start when you have many different things to say.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It is proven that the correct epistemology, meaning one that is necessary to achieve general goals, is isomorphic to Bayesianism with some prior (for beings that know all math). What that prior is requires more work. While the constraint of knowing all math is extremely unrealistic, do you agree that the theory of what knowledge would be had in such situations is a useful guide to action until we have a more general theory. Popperian epistemology cannot tell me how much money to bet at what odds for or against P = NP any more than Bayesian epistemology can, but at least Bayesian epistemology set this as a goal.
it is instrumentalist (has not understood the value of explanatory knowledge)
oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle
This is all based on our limited mathematical ability. A theory does have an advantage over an oracle or the reality-oracle: we can read it. Would you agree that all the benefits of a theory come from this plus knowing all math. The difference is one of mathematical knowledge, not of physical knowledge. How does Popper help with this? Are there guidelines for what ‘equivalent’ formulations of a theory are mathematically better? If so, this is something that Bayesianism does not try to cover, so this may have value. However, this is unrelated to the question of the validity of “don’t assign probabilities to theories”.
inductivist (induction is refuted)
I thought I addressed this but, to recap:
p(h, eb) > p(h, b) [bottom left of page 1]
That (well and how much bigger) is all I need to make decisions.
All this means: that factor that contains all of h that does not follow deductively from e is strongly countersupported by e.
So what? I already have my new probabilities.
[T]he calculus of probability reveals that probabilistic support cannot be inductive support.
What is induction if not the calculation of new probabilities for hypotheses? Should I care about these ‘inductive truths’ that Popper disproves the existence of? I already have an algorithm to calculate the best action to take. It seems like Bayesianism isn’t inductivist by Popper’s definition.
moral knowledge
I’d like to be sure that we are using the same definitions of our terms, so please give an example.
You mean proven given some assumptions about what an epistemology should be, right?
Would you agree that all the benefits of a theory come from this [can read it] plus knowing all math.
No. We need explanations to understand the world. In real life, is only when we have explanations that we can make good predictions at all. For example, suppose you have a predictive theory about dice and you want to make bets. I chose that example intentionally to engage with areas of your strength. OK, now you face the issue: does a particular real world situation have the correct attributes for my predictive theory to apply? You have to address that to know if your predictions will be correct or not. We always face this kind of problem to do much of anything. How do we figure out when our theories apply? We come up with explanations about what kinds of situations they apply to, and what situation we are in, and we then come up with explanations about why we think we are/aren’t in the right kind of situation, and we use critical argument to improve these explanations. Bayesian Epistemology does not address all this.
p(h, eb) > p(h, b) [bottom left of page 1]
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
I gave examples of moral knowledge in another comment to you. Morality is knowledge about how to live, what is a good life. e.g. murder is immoral.
You mean proven given some assumptions about what an epistemology should be, right?
Yes, I stated my assumptions in the sentence, though I may have missed some.
We always face this kind of problem to do much of anything. How do we figure out when our theories apply?
This comes back to the distinction between one complete theory that fully specifies the universe and a set of theories that are considered to be one because we are only looking at a certain domain. In the former case, the domain of applicability is everywhere. In the latter, we have a probability distribution that tells us how likely it is to fail in every domain. So, this kind of thing is all there in the math.
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
What do you mean by ‘a good theory’. Bayesian never select one theory as ‘good’ as follow that; we always consider the possibility of being wrong. When theories have higher probability than others, I guess you could call them good. I don’t see why this is hard; just calculate P(H | E) for all the theories and give more weight to the more likely ones when making decisions.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Let’s consider only theories which make all their predictions with 100% probability for now. And theories which cover everything.
Then:
If H and E are consistent, then it follows that P(H | E) > P(H).
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
We can put all the Hs in two categories: the consistent ones which gain equal probability, and the inconsistent ones for which P(H|E) = 0. (Assumption warning: we’re relying on getting it right which H are consistent with which E.)
This means:
1) consistency and support coincide.
2) there are infinitely many equally supported theories. There are only and exactly two amounts of support that any theory has given all current evidence, one of which is 0.
3) The support concept plays no role in helping us distinguish between the theories with more than 0 support.
4) The support concept can be dropped entirely because it has no use at all. The consistency concept does everything
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Similar arguments can be made without my initial assumptions/restrictions. For example introducing theories that make predictions with less than 100% probability will not help you because they are going to have lower probability than theories which make the same predictions with 100% probability.
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
Well the ratio is the same, but that’s probably what you meant.
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Have a prior. This reintroduces probabilities and deals with the remaining theories. You will converge on the right theory eventually no matter what your prior is. Of course, that does not mean that all priors are equally rational.
If they all have the same prior probability, then their probabilities are the same and stay that way. If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
How does having a prior save the concept of support? Can you give an example? Maybe the one here, currently near the bottom:
If they all have the same prior probability, then their probabilities are the same and stay that way.
Well shouldn’t they? If you look at it from the perspective of making decisions rather than finding one right theory, it’s obvious that they are equiprobable and this should be recognized.
If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
Solomonoff does not give “some things higher prior probabilities in a 100% non-evidence-based way”. All hypotheses have the same probability, many just make similar predictions.
It seems someone has downvoted you for not being familiar with Eliezer’s work on AI. Basically, this is overly anthropomorphic. It is one of our goals to ensure that an AI can progress from a ‘seed AI’ to a superintelligent AI without anything going wrong, but, in practice, we’ve observed that using metaphors like ‘parenting’ confuses people too much to make progress, so we avoid it.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
One of the fundamental attributes of an AI is that it’s a program which can learn new things.
Humans are also entities that learn new things.
But humans, left alone, don’t fare so well. Helping people learn is important, especially children. This avoids having everyone reinvent the wheel.
The parenting issue therefore must be addressed for AI. I am familiar with the main ideas of the kind of AI work you guys do, but I have not found the answer to this.
One possible way to address it is to say the AI will reinvent the wheel. It will have no help but just figure everything out from scratch.
Another approach would be to program some ideas into the AI (changeable, or not, or some of each), and then leave it alone with that starting point.
Another approach would be to talk with the AI, answer its questions, lecture it, etc… This is the approach humans use with their children.
Each of these approaches has various problems with it which are non-trivial to solve.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
When humans hear parenting, they think of the human parenting process. Describe the AI as ‘learning’ and the humans as ‘helping it learn’. This get us closer to the idea of humans learning about the universe around them, rather than being raised as generic members of society.
Don’t worry about downvotes, they do not matter.
Well, the point of down votes is discourage certain behaviour, and I agree that you should use terminology that we have found less likely to cause confusion.
This is definitely an important problem, but we’re not really at the stage where it is necessary yet. I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
When humans hear parenting, they think of the human parenting process.
Not all humans. Not me. Is that not a bias?
Well, the point of down votes is discourage certain behaviour
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution. We should expect an AI to use it, we shouldn’t expect a brand new solution to this hard problem (historically there have been very few candidate solutions proposed, most not at all promising).
The implementation details are not very important because the result will be universal, just like people are. This is similar to how the implementation details of universal computers are not important for many purposes.
Are you guys familiar with these concepts? There is important knowledge relevant to creating AIs which your statement seems to me to overlook.
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
As a general rule, if I downvote, I either reply to the post, or it is something that should be obvious to someone who has read the main sequences.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution.
No, there is another: the brain. It is also much faster than evolution, an advantage I would want a FAI to have.
You’re conflating two things. Biological evolution is a very specific algorithm, with well-studied mathematical properties. ‘Evolution’ in general just means any change over time. You seem to be using it in an intermediate sense, as any change that proceeds through reproduction, variation, and selection, which is also a common meaning. This, however, is still very broad, so there’s very little that you can learn about an AI just from knowing “it will come up with many ideas, mostly based on previous ones, and reject most of them”. This seems less informative than “it will look at evidence and then rationally adjust its understanding”.
Why is it that you guys want to make AI but don’t study relevant topics like this?
Eliezer has studied cognitive science. Those of us not working directly with him have very little to do with AI design. Even Eliezer’s current work is slightly more background theory than AI itself.
I’m not conflating them. I did not mean “change over time”.
There are many things we can learn from evolutionary epistemology. It seeming broad to you does not prevent that. You would do better to ask what good it is instead of guess it is no good.
For one thing it connects with meme theory.
A different example is that it explains misunderstandings when people communicate. Misunderstandings are extremely common because communication involves 1) guessing what the other person is trying to say 2) selecting between those guesses with criticism 3) making more guesses which are variants of previous guesses 4) more selection 5) etc
This explanation helps us see how easily communication can go wrong. It raises interesting issues like why so much communication doesn’t go wrong. It refutes various myths like that people absorb their teacher’s lectures a little like sponges.
It matters. And other explanations of miscommunication are worse.
Eliezer has studied cognitive science.
But that isn’t the topic I was speaking of. I meant evolutionary epistemology. Which btw I know that Eliezer has not studied much because he isn’t familiar with one of it’s major figures (Popper).
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics. There has not been any science experiment to test it which I consider important, but I could conceive of some because if you specified different and perverse laws of physics you could break evolution. In a different sense, evolution is tested constantly in that the laws of physics and evidence we see around us, every day, are not that perverse but conceivable physics that would break evolution.
The reason I accept evolution (again I refer to the epistemological theory about how knowledge is created) is that it is a good explanation, and it solves an important philosophical problem, and I don’t know anything wrong with it, and I also don’t know any rivals which solve the problem.
The problem has a long history. Where does “apparent design” come from? Paley gave an example of finding a watch in nature, which he said you know can’t have gotten there by chance. That’s correct—the watch has knowledge (aka apparent designed, or purposeful complexity, or many other terms). The watch is adapted “to a purpose” as some people put it (I’m not really a fan of the purpose terminology. But it’s adapted! And I think it gets the point across ok.)
Paley then guessed as follows: there is no possible solution to the origins of knowledge other than “A designer (God) created it”. This is a very bad solution even pre-Darwin because it does not actually solve the problem. The designer itself has knowledge, adaptation to a purpose, whatever. So where did it come from? The origin is not answered.
Since then, the problem has been solved by the theory of evolution and nothing else. And it applies to more than just watches found in nature, and to plants and animals. It also applies to human knowledge. The answer “intelligence did it” is no better than “God did it”. How does intelligence do it? The only known answer is: by evolution.
The best thing to read on this topic is The Beginning of Infinity by David Deutsch which discusses Popperian epistemology, evolution, Paley’s problem and its solution, and also has two chapters about meme theory which give important applications.
Also here: http://fallibleideas.com/tradition (Deutsch discusses static and dynamic memes and societies. I discuss “traditions” rather than “memes”. It’s quite similar stuff.)
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics.
What? Epistemological evolution seems to be about how the mind works, independent of what philosophical status is accorded to the thoughts. Surely it could be tested just by checking if the mind actually develops ideas in accordance with the way it is predicted to.
If you want to check how minds work, you could do that. But that’s very hard. We’re not there yet. We don’t know how.
How minds work is a separate issue from evolutionary epistemology. Epistemology is about how knowledge is created (in abstract, not in human minds specifically). If it turns out there is another way, it wouldn’t upset the evolution would create knowledge if done in minds.
There’s no reason to think there is another way. No argument that there is. No explanation of why to expect there to be. No promising research on the verge of working one out. Shrug.
Epistemology is about how knowledge is created (in abstract, not in human minds specifically).
I see. I thought that evolutionary epistemology was a theory of human minds, though I know that that technically isn’t epistemology. Does evolutionary epistemology describe knowledge about the world, mathematical knowledge, or both (I suspect you will say both)?
So, you’re saying that in order to create knowledge, there has to be copying, variation, and selection. I would agree with the first two, but not necessarily the third. Consider a formal axiomatic system. It produces an ever-growing list of theorems, but none are ever selected any more than others. Would you still consider this system to be learning?
With deduction, all the consequences are already contained in the premises and axioms. Abstractly, that’s not learning.
When human mathematicians do deduction, they do learn stuff, because they also think about stuff while doing it, they don’t just mechanically and thoughtlessly follow the rules of math.
So induction (or probabilistic updating, since you said that Popper proved it not to be the same as whatever philosopher call ‘induction’) isn’t learning either because the conclusions are contained in the priors and observations?
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
the idea of induction is that the conclusions are NOT logically contained in the observations (that’s why it is not deduction).
if you make up a prior from which everything deductively follows, and everything else is mere deduction from there, then all of your problems and mistakes are in the prior.
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
The correct method of updating one’s probability distributions is contained in the observations. P(H|E) = P(H)P(E|H)/P(E) .
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
So how could evolutionary epistemology be relevant to AI design?
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
The conjectures need to be based on the previous conjectures (not necessarily all of the, but some). That makes it replication with variation. The criticism is the selection.
Any AI design that completely ignores this is, imo, hopeless. I think that’s why the AI field hasn’t really gotten anywhere. They don’t understand what they are trying to make, because they have the wrong philosophy (in particular the wrong explanations. i don’t mean math or logic).
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
Note that there are AI approaches which do do something close to what you think an AI “needs”. For example, some of Simon Colton’s work can be thought of in a way roughly like what you want. But it is a mistake to think that such an entity needs to do that. (Some of the hardcore Bayesians make the same mistake in assuming that an AI must use a Bayesian framework. That something works well as a philosophical approach is not the same claim as that it should work well in a specific setting where we want an artificial entity to produce certain classes of systematic reliable results.)
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to. All the knowledge is provided by the human programmer—they are designed by an intelligent person and to the extent they “act intelligent” it’s all due to the person providing the thinking for it.
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to.
I’m not sure this is at all well-defined. I’m curious, what would make you change your mind? If for example, Colton’s systems constructed new definitions, proofs, conjectures, and counter-examples in math would that be enough to decide they were learning?
Could you explain how this is connected to the issue of making new knowledge?
Or: show me the code, and explain to me how it works, and how the code doesn’t contain all the knowledge the AI creates.
This seems a bit like showing a negative. I will suggest you look for a start at Simon Colton’s paper in the Journal of Integer Sequences which uses a program that operates in a way very close to the way you think an AI would need to operate in terms of making conjectures and trying to refute them. I don’t know if the source code is easily available. It used to be on Colton’s website but I don’t see it there anymore; if his work seems at all interesting to you you can presumably email him requesting a copy. I don’t know how to show that the AI “doesn’t contain all the knowledge the AI creates” aside from the fact that the system constructed concepts and conjectures in number theory which had not previously been constructed. Moreover, Colton’s own background in number theory is not very heavy, so it is difficult to claim that he’s importing his own knowledge into the code. If you define more precisely what you mean by the code containing the knowledge I might be able to answer that further. Without a more precise notion it isn’t clear to me how to respond.
Holding a conversation requires creating knowledge of what the other guy is saying.
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
In a spam filter, a programmer figures out how he wants spam filtered (he has the idea), then he tells the computer to do it. The computer doesn’t figure out the idea or any new idea.
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension; they contain important knowledge such as the knowledge of optics that makes their eyes work. This is why “Where can cats come from?” has been considered an important question (people want an explanation of the knowledge which i sometimes called “apparent design), while “Where can rocks come from?” is not in the same category of question (it does have some interest for other reasons).
With people, people create ideas that aren’t in their genes, and were’t told to them by their parents or anyone else. That includes abstract ideas that aren’t the summation of observation. They sometimes create ideas no one ever thought of before. THey create new ideas.
In an AI (AGI you call it?) should be like a person: it should create new ideas which are not in it’s “genes” (programming). If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension;
Are genetic algorithm systems then creating something new by your definition?
In an AI (AGI you call it?)
Different concepts. An artificial intelligent is not (necessarily) a well-defined notion. An AGI is an artficial general intelligence, essentially something that passes the Turing test. Not the same concept.
If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
When you “discover” that 2+1 = 3, given premises and axioms, you aren’t discovering something new.
But working mathematicians do more than that. They create new knowledge. It includes:
1) they learn new ways to think about the premises and axioms
2) they do not publish deductively implied facts unselectively or randomly. they choose the ones that they consider important. by making these choices they are adding content not found in the premises and axioms
3) they make choices between different possible proofs of the same thing. again where they make choices they are adding stuff, based on their own non-deductive understanding
4) when mathematicians work on proofs, they also think about stuff as they go. just like when experimental scientists do fairly mundane tasks in a lab, at the same time they will think and make it interesting with their thoughts.
Are genetic algorithm systems then creating something new by your definition?
They could be. I don’t think any exist yet that do. For example I read a Dawkins paper about one. In the paper he basically explained how he tweaked the code in order to get the results he wanted. He didn’t, apparently, realize that it was him, not the program, creating the output.
By “AI” I mean AGI. An intelligence (like a person) which is artificial. Please read all my prior statements in light of that.
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
Well, OK, but they’d understand how it was created, and could explain that. They could explain what they know about why it works (it copies what humans do). And they could also make the code public and discuss what it doesn’t include (e.g. hard coded special cases. except for the 3 he included on purpose, and he explains why they are there). That’d be pretty convincing!
I don’t think this is true. While he probably wouldn’t announce it if he was working on AI, he’ has indicated that he’s working on two books (HPMoR and a rationality book), and has another book queued. He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
I more meant “he’s probably thinking about this in the back of his mind fairly often”, as well as trying to be humourous.
He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
Do you know what he would think of work that has a small chance of solving goal stability and a slightly larger chance of helping with AI in general? This seems like a net plus to me, but you seem to have heard what he thinks should be studied from a slightly clearer source than I did.
I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
The reason is that if you could predict this you would in effect already have made the discovery.
Understanding is not primarily predictive and it is useful in a practical way. For example, you have to understand issues to address critical arguments offered by your peers. Merely predicting that they are wrong isn’t a good approach. It’s crucial to understand what their point is and to reason with them.
Understanding ideas helps us improve on them. It’s crucial to making judgments about what would be an improvement or not. It lets us judge changes better b/c e.g. we have some conception of why it works, which means we can evaluate what would break it or not.
I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
That is not what I meant. If we could predict that the LHC will discover superparticles then yes, we would already know that. However, since we don’t know whether it will produce superparticles, we can predict that it will give us a lot of knowledge, since we will either learn that superparticles in the mass range detectable by the LHC exist or that they do not exist, so we can predict that we will learn a lot more about the universe by continuing to run the LHC than by filling in the tunnel where it is housed.
So if new knowledge doesn’t come from prediction, what creates it? Answering this is one of epistemology’s main tasks. If you are focussing on prediction then you aren’t addressing it. Am I missing something?
New knowledge comes from observation. If you are referring to knowledge of what a theory says rather than of which theory is true, then this is assumed to be known. The math of how to deal with a situation where a theory is known but its consequences cannot be fully understood due to mathematical limitations is still in its infancy, but this has never posed a problem in practice.
That is a substantive and strong empiricist claim which I think is false.
For example, we have knowledge of things we never observed. Like stars. Observation is always indirect and its correctness always depends on theories such as our theories about whether the chain of proxies we are observing with will in fact observe what we want to observe.
Do you understand what I’m talking about and have a reply, or do you need me to explain further?
then this is assumed to be known
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
It is assumed in practice, applied epistemology being a rather important thing to have. In ‘pure’ epistemology, it is just labelled incomplete; we definitely don’t have all the answers yet.
it is just labelled incomplete; we definitely don’t have all the answers yet.
It seems to me that you’re pretty much conceding that your epistemology doesn’t work. (All flaws can be taken as “incomplete” parts where, in the future, maybe a solution will be found.)
That would leave the following important disagreement: Popper’s epistemology is not incomplete in any significant way. There is room for improvement, sure, but not really any flaws worth complaining about. No big unsolved problems marring it. So, why not drop this epistemology that doesn’t have the answers yet for one that does?
It seems to me that you’re pretty much conceding that your epistemology doesn’t work.
Would you describe quantum mechanics’ incompatibility with general relativity as “the theory doesn’t work”? For a being with unlimited computing power in a universe that is known to be computable (except for the being itself obviously), we are almost entirely done. Furthermore, many of the missing pieces to get from that to something much more complete seem related.
Popper’s epistemology is not incomplete in any significant way.
No, it is just wrong. Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories. Any consistent consequentialist decision rule must be basically equivalent to that. The statement that there is no way to assign probabilities to theories therefore implies that there is no algorithm that a consequentialist can follow to reliably achieve their goals. Note that even if Popper’s values are not consequentialist, a consequentialist should still be able to act based on the knowledge obtained by a valid epistemology.
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories.
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t believe these givens are all true. If you think otherwise could we start with you giving the details more? I don’t want to argue with parts you simply omitted b/c I’ll have to guess what you think too much.
As a separate issue, “given my preferences” is such a huge given. It means that your epistemology does not deal in moral knowledge. At all. It simply takes preferences as givens and doesn’t tell you which to have. So in practice in real life it cannot be used for a lot of important issues. That’s a big flaw. And it means a whole entire second epistemology is needed to deal in moral knowledge. And if we have one of those, and it works, why not use it for all knowledge?
The rest of the paragraph was what I meant by this. You agree that Popperian epistemology states that theories should not be assigned probabilities.
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Depends. If it’s standards make it useless, then, while internally consistent, I can judge it to be pointless. I just want an epistemology that can help me actually make decisions based on what I learn about reality.
Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories.
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future. In this context, it is far more general than anything composed of a bunch of term, each of which describes some properties of a person.
It simply takes preferences as givens and doesn’t tell you which to have.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
I just want an epistemology that can help me actually make decisions based on what I learn about reality.
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
Overall you’ve said almost nothing that’s actually about Popperian epistemology. You just took one claim (which has nothing to do with what it’s about, it’s just a minor point about what it isn’t) and said it’s wrong (without detailed elaboration).
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future.
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
How do you learn what preferences are good to have, in that way?
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules. Even if the probabilities are not mentioned when constructing the rule, they can be inferred from its final form.
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
I don’t know what you mean by ′ “utility” the thing people in real life have’.
How do you learn what preferences are good to have, in that way?
Can we please not get into this. If it helps, assume I am an expected paperclip maximizer. How would I decide then?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
What was the argument for that?
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
I don’t know what you mean by ′ “utility” the thing people in real life have’.
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
Can we please not get into this.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
What was the argument for that?
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
I thought you didn’t address the issue (and need to): you did not say what mathematical properties you think that real utilities have and how you deal with them.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
Using what premises?
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
Every possible universe is associated with a utility.
Any two utilities can be compared.
These comparisons are transitive.
Weighted averages of utilities can be taken.
For any three possible universe, L, M, and N, with L < M, a weighted average of L and N is less than a weighted average of M and N, if N is accorded the same weight in both cases.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
Using what premises?
Basically just definitions. I’m currently trying to enumerate them, which is why I wanted to find the proof of the theorem we were discussing.
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
Care about in the sense of when I’m deciding whether to make it. I don’t really care about how reasonable other people’s decisions are unless it’s relevant to my interactions with them, where I will need that knowledge to make my own decisions.
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Wait, you bought the book just for that proof? I don’t even know if its the best proof of it (in terms of making assumptions that aren’t necessary to get the result). I’m confidant in the proof because of all the other similar proofs I’ve read, though none seem as widely applicable as that one. I can almost sketch a proof in my mind. Some simple ones are explained well at http://en.wikipedia.org/wiki/Coherence_%28philosophical_gambling_strategy%29 .
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
Wait, you bought the book just for that proof?
No, I used inter library loan.
I don’t really care about how reasonable other people’s decisions are
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
Basically just definitions.
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
Yes, I did. Oops.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
But that is what a choice is between—the universe where you choose one way and the universe where you choose another. Often large parts of the universe are ignored, but only because the action’s consequences for those parts are not distinguishable from how those part would be if a different action was taken. A utility function may be a sum (or more complicated combination) of parts referring to individual aspects of the universe, but, in this context, let’s not call those ‘utilities’; we’ll reserve that word for the final thing used to make decisions. Most of this is not consciously invoked when people make decisions, but a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice.
I don’t really care about how reasonable other people’s decisions are
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
I could could achieve better consequences using an ‘irrational’ process, I would, but this sounds nonsensical because I am used to defining ‘rational’ as that which reliably gets the best consequences.
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
But that is what a choice is between—the universe where you choose one way and the universe where you choose another.
So what? I think you’re basically saying that your formulation is equivalent to what people (should) do. But that doesn’t address the issue of what people actually do—it doesn’t demonstrate the equivalence. As you guys like to point out, people often think in ways that don’t make sense, including violating basic logic.
But also, for example, I think a person might evaluate getting a cat, and getting an iphone, and then they might (incorrectly) evaluate both by adding the benefits instead of by considering the universe with both based on its own properties.
Another issue is that I don’t think any two utilities people have can be compared. They are sometimes judged with different, contradictory standards. This leads to two major issues when trying to compare them 1) the person doesn’t know how 2) it might not be possible to compare even in theory because one or both contain some mistakes. the mistakes might need to be fixed before comparing, but that would change it.
a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice
I’m not saying people are doing it correctly. Whether they are right or wrong has no bearing on whether “utility” the mathematical object with the 5 properties you listed corresponds to “utility” the thing people do.
If you want to discuss what people should do, rather than what they do do, that is a moral issue. So it leads to questions like: how does bayesian epistemology create moral knowledge and how does it evaluate moral statements?
If you want to discuss what kind of advice is helpful to people (which people?), then I”m sure how you can see how talking about entire universes could easily confuse people, and how some other procedure being a special case of it may not be very good advice which does not address the practical problems they are having.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
Do you think that the Dutch book arguments go “way beyond deductive or definitions”? Well, I guess that would depend on what you conclude from them. For now, lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
The confusion here is that we’re not judging an action. If I make a mistake and happen to benefit from it, there were good consequences, but there was no choice involved. I don’t care about this; it already happened. What I do care about, and what I can accomplish, is avoiding similar mistakes in the future.
If you want to discuss what people should do, rather than what they do do, that is a moral issue.
Yes, that is what I was discussing. I probably don’t want to actually get into my arguments here. Can you give an example of what you mean by “moral knowledge”?
Applying dutch book arguments to real life situations always goes way behind deduction and definitions, yes.
lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
A need? Are you talking about morality now?
Why are we saying this? You now speak of probabilities of events. Previously we were discussing epistemology which is about ideas. I object to assigning probabilities to the truth of ideas. Assigning them to events is OK when
1) the laws of physics are indeterministic (never, as far as we know)
2) we have incomplete information and want to make a prediction that would be deterministic except that we have to put several possibilities in some places, which leads to several possible answers. and probability is a reasonable way to organize thoughts about that.
So what?
Can you give an example of what you mean by “moral knowledge”?
Murder is immoral.
Being closed minded makes ones life worse because it sabotages improvement.
Can you give an example of what you mean by “moral knowledge”?
Murder is immoral.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”? How would you test this? What would you consider a proof of it?
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means. I am a moral realist to some, a relativist to others, and an error theorist to other others. I could prove the statement for many common non-confused definitions, though not for, for example, people who say ‘morality’ is synomnymous to ‘that which is commanded by God’, which is based on confusion but at least everyone can agree on when it is or isn’t true and not for error theorists, as both groups’ definitions make the sentence false.
Being closed minded makes ones life worse because it sabotages improvement.
In theory I could prove this sentence, but in practice I could not do this clearly, especially over the internet. It would probably be much easier for you to read the sequences, which get to this toward the end, but, depending on your answers to some of my questions, there may be an easier way to explain this.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”?
Yes. One epistemology. All types of knowledge. Unified!
How would you test this?
You would not.
What would you consider a proof of it?
We don’t accept proofs of anything, we are fallibilists. We consider mathematical proofs to be good arguments though. I don’t really want to argue about those (unless you’re terribly interested. btw this is covered in the math chapter of The Fabric of Reality by David Deutsch). But the point is we don’t accept anything as providing certainty or even probableness. In our terminology, nothing provides justification.
What we do instead is explain our ideas, and to criticize mistakes, and in this way to improve our ideas. This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
Well, biological evolution is a much smaller part of conceptspace than “replication, variation, selection” and now I’m realizing that you probably haven’t read A Human’s Guide to Words which is extremely important and interesting and, while you’ll know much of it, has things that are unique and original and that you’ll learn a lot from. Please read it.
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
I do apply reason to those things, I just don’t use the words ‘morality’ in my reasoning process because too many people get confused. It is only a word after all.
On a side note, I am staring to like what I hear of Popper. It seems to embody an understanding of the brain and a bunch of useful advice for it. I think I disagree with some things, but on grounds that seems like the sort of thing that is accepted as motivation for the theory self-modify. Does that make sense? Anyways, it’s not Popper’s fault that there are a set of theorems that in principle remove the need for other types of thought and in practice cause big changes in the way we understand and evaluate the heuristics that are necessary because the brain is fallible and computationally limited.
Wei Dai likes thinking about how to deal with questions outside of Bayesianism’s current domain of applicability, so he might be interested in this.
lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
A need? Are you talking about morality now?
Interpret this as a need in order to achieve some specified goal in order to keep this part the debate out of morality. A paperclip maximizer, for example would obviously need to not pay 200 paperclips for a lottery with a maximum payout of 100 paperclips in order to achieve its goals. Furthermore, this applies to any consequentialist set of preferences.
Why are we saying this? You now speak of probabilities of events.
So you assume morality (the “specified goal”). That makes your theory amoral.
Well there’s a bit more than this, but it’s not important right now. One can work toward any goal just by assuming it as a goal.
Why is there a need to assign probabilities to theories? Popperian epistemology functions without doing that.
Because of the Dutch book arguments. The probabilities can be inferred from the choices. I’m not sure if the agent’s probability distribution can be fully determined from a finite set of wagers, but it can be definitely be inferred to an arbitrary degree of precision by adding enough wagers.
Can you give an example of how you use a Dutch book argument on a non-gambling topic? For example, if I’m considering issues like whether to go swimming today, and what nickname to call my friend, and I don’t assign probabilities like “80% sure that calling her Kate is the best option”, how do I get Dutch Booked?
First you hypothetically ask what would happen if you were asked to make bets on whether calling her Kate would result in world X (with utility U(X)). Do this for all choices and all possible worlds. This gives you probabilities and utilities. You then take a weighted average, as per the VNM theorem.
You don’t get to decide utilities so much as you have to figure out what they are. You already have a utility function, and you do your best to describe it . How do you weight the things you value relative to each other?
This takes observation, because what we think we value often turns out not to be a good description of our feelings and behavior.
By criticizing them. And conjecturing improvements which meet the challenges of the criticism. It is the same method as for improving all other knowledge.
In outline it is pretty simple. You may wonder things like what would be a good moral criticism. To that I would say: there’s many books full of examples, why dismiss all that? There is no one true way of arguing. Normal arguments are ok, I do not reject them all out of hand but try to meet their challenges. Even the ones with some kind of mistake (most of them), you can often find some substantive point which can be rescued. It’s important to engage with the best versions of theories you can think of.
BTW once upon a time I was vaguely socialist. Now I’m a (classical) liberal. People do change their fundamental moral values for the better in real life. I attended a speech by a former Muslim terrorist who is now a pro-Western Christian (walid shoebat).
I’ve changed my social values plenty of times, because I decided different policies better served my terminal values. If you wanted to convince me to support looser gun control, for instance, I would be amenable to that because my position on gun control is simply an avenue for satisfying my core values, which might better be satisfied in a different way.
If you tried to convince me to support increased human suffering as an end goal, I would not be amenable to that, unless it turns out I have some value I regard as even more important that would be served by it.
This is what Popper called the Myth of the Framework and refuted in his essay by that name. It’s just not true that everyone is totally set in their ways and extremely closed minded, as you suggest. People with different frameworks learn from each other.
One example is children learn. They are not born sharing their parents framework.
You probably think that frameworks are genetic, so they are. Dealing with that would take a lengthy discussion. Are you interested in this stuff? Would you read a book about it? Do you want to take it seriously?
I’m somewhat skeptical b/c e.g. you gave no reply to some of what I said.
I think a lot of the reason people don’t learn other frameworks, in practice, is merely that they choose not to. They think it sounds stupid (before they understand what it’s actually saying) and decide not to try.
When did I suggest that everyone is set in their ways and extremely closed minded? As I already pointed out, I’ve changed my own social values plenty of times. Our social frameworks are extremely plastic, because there are many possible ways to serve our terminal values.
I have responded to moral arguments with regards to more things than I could reasonably list here (economics, legal codes, etc.) I have done so because I was convinced that alternatives to my preexisting social framework better served my values.
Valuing strict gun control, to pick an example, is not genetically coded for. A person might have various inborn tendencies which will affect how they’re likely to feel about gun control; they might have innate predispositions towards authoritarianism or libertarianism, for instance, that will affect how they form their opinion. A person who valued freedom highly enough might support little or no gun control even if they were convinced that it would result in a greater loss of life. You would have a hard time finding anyone who valued freedom so much that they would support looser gun control if they were convinced it would destroy 90% of the world population, which gives you a bit of information about how they weight their preferences.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more that would be served by this. If you could argue that my preference for happiness is arbitrary, that preference for suffering is more natural, even if you could demonstrate that the moral goodness of human suffering is intrinsically inscribed on the fabric of the universe, why should I care? To make me want to make humans unhappy, you’d have to convince me there’s something else I want enough to make humans unhappy for its sake.
I also don’t feel I’m being properly understood here; I’m sorry if I’m not following up on everything, but I’m trying to focus on the things that I think meaningfully further the conversation, and I think some of your arguments are based on misapprehensions about where I’m coming from. You’ve already made it clear that you feel the same, but you can take it as assured that I’m both trying to understand you and make myself understood.
When did I suggest that everyone is set in their ways and extremely closed minded?
You suggested it about a category of ideas which you called “core values”.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
That’s more or less correct. To quote one of Eliezer’s works of ridiculous fanfiction, “A moral system has room for only one absolute commandment; if two unbreakable rules collide, one has to give way.”
If circumstances force my various priorities into conflict, some must give way to others, and if I value one thing more than anything else, I must be willing to sacrifice anything else for it. That doesn’t necessarily make it my only terminal value; I might have major parts of my social framework which ultimately reduce to service to another value, and they’d have to bend if they ever came into conflict with a more heavily weighted value.
Well in the first half, you get Dutch booked in the usual way. It’s not necessarily actually happening, but there still must be probabilities that you would use if it were. In the second half, if you don’t follow the procedure (or an equivalent one) you violate at least one VNM axiom.
If you violate axiom 1, there are situations in which you don’t have a preferred choice—not as is “both are equally good/bad” but as in your decision process does not give an answer or gives more than one answer. I don’t think I’d call this a decision process.
If you violate axiom 2, there are outcomes L, M and N such that you’d want to switch from L to M and then from M to N, but you would not want to switch from L to N.
Axiom 3 is unimportant and is just there to simplify the math.
For axiom 4, imagine a situation where a statement with unknown truth-value, X, determines whether you get to choose between two outcomes, L and M, with L < M, or have no choice in accepting a third outcome, N. If you violate the axiom, there is a situation like this where, if you were asked for your choice before you know X (it will be ignored if X is false), you would pick L, even though L < M.
Do any of these situations describe your preferences?
And I’m still curious how the utilities are decided. By whim?
If your decision process is not equivalent to one that uses the previously described procedure, there are situations where something like one of the following will happen.
I ask you if you want chocolate or vanilla ice cream and you don’t decide. Not just you don’t care which one you get or you would prefer not to have ice cream, but you don’t output anything and see nothing wrong with that.
You prefer chocolate to vanilla ice cream, so you would willingly pay 1c to have the vanilla ice cream that you have been promised upgraded to chocolate. You also happen to prefer strawberry to chocolate, so you are willing to pay 1c to exchange a promise of a chocolate ice cream for a promise of a strawberry ice cream. Furthermore, it turn out you prefer vanilla to strawberry, so whenever you are offered a strawberry ice cream, you gladly pay a single cent to change that to an offer of vanilla, ad infinitum.
N/A
You like chocolate ice cream more than vanilla ice cream. Nobody knows if you’ll get ice cream today, but you are asked for your choice just in case, so you pick vanilla.
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
Let’s say they did learn better. How would they do this—changing their utility function? Someone with a utility function like this really does prefer B+1c to A, C+1c to B, and A+1c to C. Even if they did change their utility function, the new one would either have a new hole or it would obey the results of the VNM-theorem.
So Bayes teaches: do not disobey the laws of logic and math.
Still wondering where the assigning probabilities to truths of theories is.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far. And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
Still wondering where the assigning probabilities to truths of theories is.
Well the Dutch books make it so you have to pick some probabilities. Actually getting the right prior is incomplete, though Solomonoff induction is most of the way there.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far.
Where else are you hoping to go?
And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
In principle, yes. There’s actually a computer program called AIXItl that does it. In practice I use approximations to it. It probably could be done to a very higher degree of certainty. There are a lot of issues and a lot of relevant data.
Well the Dutch books make it so you have to pick some probabilities.
Can you give an example? Use the ice cream flavors. What probabilities do you have to pick to buy ice cream without being dutch booked?
Where else are you hoping to go?
Explanatory knowledge. Understanding the world. Philosophical knowledge. Moral knowledge. Non-scientific, non-emprical knowledge. Beyond prediction and observation.
In principle, yes.
How do you know if your approximations are OK to make or ruin things? How do you work out what kinds of approximations are and aren’t safe to make?
The way I would do that is by understanding the explanation of why something is supposed to work. In that way, I can evaluate proposed changes to see whether they mess up the main point or not.
Endo, I think you are making things more confusing by combining issues of Bayesianism with issues of utility. It might help to keep them more separate or to be clear when one is talking about one, the other, or some hybrid.
I use the term Bayesianism to include utility because (a) they are connected and (b) a philosophy of probabilities as abstract mathematical constructs with no applications doesn’t seem complete; it needs an explanation of why those specific objects are studied. How do you think that any of this caused or could cause confusion?
Well, it empirically seems to be causing confusion. See curi’s remarks about the ice cream example. Also, one doesn’t need Bayesianism to include utility and that isn’t standard (although it is true that they do go very well together).
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
So the argument is now not that that suboptimal issues don’t exist but that they aren’t a big deal? Are you aware that the primary reason that this involves small amounts of ice cream is for convenience of the example? There’s no reason these couldn’t happen with far more serious issues (such as what medicine to use).
I know. I thought it was strange that you said “ad infinitum” when it would not go on forever. And that you presented this as dire but made your example non-dire.
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
How can I notice? I imagine you will say that involves probabilities. But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
And what I definitely don’t see is probabilities that various theories are true (as opposed to probabilities about events which are ok).
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
Yes, but the Bayesian avoids having this step. For any step you can construct a “criticism” that will duplicate what the Bayesian will do. This is connected to a number of issues, including the fact that what constitutes valid criticism in a Popperian framework is far from clear.
But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
Ice cream is an analogy. It might not be a great one since it is connected to preferences (which sometimes gets confused with Bayesianism). The analogy isn’t a great one. It might make more sense to just go read Cox’s theorem and translate to yourself what the assumptions mean about an approach.
what constitutes valid criticism in a Popperian framework is far from clear.
Anything which is not itself criticized.
Ice cream is an analogy.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out? To help concretize the idea for me.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out
Well, I’m not sure, in that I’m not convinced that Dutch Booking really does occur much in real life other than in the obvious contexts. But there are a lot of contexts it does occur in. For example, a fair number of complicated stock maneuvers can be thought of essentially as attempts to dutch book other players in the stock market.
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
“data” could be any observable property of the world
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Can you give an example using a moral argument, or anything that would help illustrate how you take things that don’t look like they are Bayes’ law cases and apply it anyway?
The linked page says imperfectly efficient minds give off heat and that this is probabilistic (which is weird b/c the laws of physics govern it and they are not probabilistic but deterministic). Even if I accept this, I don’t quite see the relevance. Are you reductionists? I don’t think that the underlying physical processes tell us everything interesting about the epistemology.
What do you mean by `get anywhere’? I can update my probability estimates and use the new estimates to make decisions perfectly well.
What does this have to do with whether confirmation can be used as evidence?
Infinitely many hypotheses increase in probability. What good is that? You have infinite possibilities before you and haven’t made progress towards picking between them.
When you say “this infinite set over here, its probability increases” you aren’t reaching an answer. You aren’t even getting any further than pure deduction would have gotten you.
Look, there’s two infinite sets: those contradicted by the evidence, and those not (deal with theories with “maybes” in them however you like, it does not matter to my point). The first set we don’t care about—we all agree to reject it. The second set is all that’s left to consider. if you increase the probability of every theory in it that doesn’t help you choose between them. it’s not useful. when you “confirm” or increase the probability of every theory logically consistent with the data, you aren’t reaching an answer, you aren’t making progress.
The progress is in the theories that are ruled out. When playing cards, you could consider all possible histories of the motions of the cards that are compatible with the evidence. Would you have any problem with making bets based on these probabilities? Solomonoff induction is very similar. While there are an infinite number of possibilities, both cases involve proving general properties of the distribution rather than considering each possibility individually.
In the future please capitalize your sentences; it improves readability (especially in large paragraphs).
“The progress is in the theories that are ruled out.”
This is purely a matter of deduction, right? Bayes’ theorem doesn’t come into it.
One doesn’t have to be a Bayesian to rule out theories contradicted by the evidence.
Further, there are always infinitely many theories that aren’t ruled out. This is the hard part of epistemology. How do you deal with those?
If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.
The Solomonoff prior is really just a from of the principle of insufficient reason, which states that if there is no reason to think that one thing is more probable than another, they should be assigned the same probability. Since there are an infinite number of theories, we need to take some kind of limit. If we encode them as self-delimiting computer programs, we can write them as strings of digits (usually binary). Start with some maximum length and increase it toward infinity. Some programs will proceed normally, looping infinitely or encountering a stop instruction, making many programs equivalent because changing bits that are never used by the hypothesis does not change the theory. Other programs will leave the bounds of the maximum length, but this will be fixed as that length is taken to infinity.
This obviously isn’t a complete justification, but it is better than Popperian induction. Both rule out falsified theories and both penalize theories for unfalsifiability and complexity. Only Solomonoff induction allows us to quantify the size of these penalties in terms of probability. Popper would agree that a simpler theory, being compared to a more complex one, is more likely but not guaranteed to be true, but he could not give the numbers.
If you are still worried about the foundational issues of the Solomonoff prior, I’ll answer your questions, but it would be better if you asked me again in however long progress takes (that was supposed to sound humourous, as if I were describing a specific, known amount of time, but I really doubt that that is noticable in text). http://lesswrong.com/r/discussion/lw/534/where_does_uncertainty_come_from/ writes up some of the questions I’m thinking about now. It’s not by me, but Paul seems to wonder about the same issues. This should all be significantly more solid once some of these questions are answered.
“If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.”
That’s it? That is trivial, and doesn’t solve the major problems in epistemology. It’s correct enough (I’m not convinced theories have probabilities, but I think that’s a side issue) but it doesn’t get you very far. Any old non-Bayesian epistemology could tell you this.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better. This ordering has nothing empirical about it. Your approach here is not based on evidences or probabilities, just an ordering. Correct me if I got that wrong. That raises the question: why is the Solomonoff ordering correct? Why not some other ordering? Here’s one objection: “God did everything” is a short theory which is compatible with all evidence. You can make separate versions of it for all possible sets of predictions if you want. Doesn’t that mean we’re either stuck with some kind of “God did everything” or the final truth is even shorter?
You mention “Popperian induction”. Please don’t speak for Popper. The idea that Popper advocated induction is a myth. A rather crass one; he refuted induction and published a lot of material against it. Instead, ask me about his positions, OK? Popper would not agree that the simpler theory is “more likely”. There’s many issues here. One is that Popper said we should prefer low probability theories because they say more about the world.
You seem to present “Popperian induction” as an incomplete justification. Maybe you are unware that Popper’s epistemology rejects the concept of justification itself. It is thus a mistake to criticize it on justificationist grounds. It isn’t any type of justification and doesn’t want to be.
In order to quote people, you can use a single greater than sign ‘>’ at the beginning of a line.
Note I said that and a prior. The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act. From Wikipedia: ‘Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures’, where a statistical procedure may be taken as a guide for optimal action.
Sorry about saying ‘Popperian induction’. I only have a basic knowledge of Popper. Would Popper say that predicting the results of actions is (one of) the goals of science? This is, of course, slightly more general than induction.
Wikipedia quotes Popper as saying simpler theories are to be preferred ‘because their empirical content is greater; and because they are better testable’. Does this mean that he would bet something important on this? If there were two possible explanations for a plague and if the simpler one were true than, with medicine, we could save 100 lives but if the more complex one were true we could save 200 lives, how would you decide which cure the factories should manufacture (and it takes a long time to prepare the factories or something so you can only make one type of cure).
It is exactly not about this. The reason to prefer simpler theories is that more possible universes correspond to them. For a simple universe, axioms 1 through 10 have to come out the right way, but the rest can be anything, as they are meaningless since the universe is already fully specified. For a more complex theory, axioms 11-15 must also turn out a certain way, so fewer possible universe are compatible with this theory. I would also add the principle of sufficient reason, which I think is likely, as further justification for Occam’s razor, but that is irrelevant here.
This seems wrong. Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
Popperian epistemology doesn’t assign probabilities like that, and has a way to act. So would you agree that, if you fail to refute Popperian epistemology, then one of your major claims is wrong? Or do you have a backup argument: you don’t have to, but you should anyway because..?
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
Secondary sources about Popper, like wikipedia, are not trustworthy. Popper would not bet anything important on that simpler theories thing. That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
The way to make decisions is by making conjectures about what to do, and criticizing those conjectures. We learn by critical, imaginative argument (including within one mind). Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified. Have I misunderstood again?
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
I’m not totally sure what your concept of a universe or axiom is here. Also I note that the real world is pretty complicated.
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences? Otherwise, your choices must be describable using terms of a prior probability and Bayesian updating (and, while we already agree that the latter is obvious, here we are using it to update a set of probabilities and, on the pain of inconsistency, our new probabilities must have that relationship to our old ones).
Definitely use all the evidence when making decisions. I didn’t mean for my example to be complete. I was wondering how a question like that could be addressed in general. What pieces of information would be important and how would they be taken into account? You can assume that the less relevant variables, like which disease is more painful, are equal in both cases.
I may have been unclear here. I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
No, not at all. The more likely theories are those that include small amounts of theory, not small amounts of prediction. Eliezer discusses this in the sequences here, here, andhere. Those don’t really cover Solomonoff induction directly, but they will probably give you a better idea of what I’m trying to say than I did. I think Solomonoff induction is better covered in another post, but I can’t find it right now.
Sorry, I was abusing one word ‘theories’ to mean both ‘individual descriptions of the universe’ and ‘sets of descriptions that make identical predictions in some realm (possibly in all realms)‘. It is a very natural place to slip definitions, because, for example, when discussing biology, we often don’t care about the distinction between ‘Classical physics is true and birds are descended from dinosaurs.’ and ‘Quantum physics is true and birds are descended from dinosaurs.’ Once enough information is specified to make predictions, a theory (in the second sense) is on equal ground with another theory that contains the same amount of information and that makes different predictions only in realms where it has not been tested, as well as with a set of theories for which the set can be specified with the same amount of information but for which specifying one theory out of the set would take more information.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
All of this is present is Bayesian epistemology.
Consider Bayes theorem, with theories A and B and evidence E:
P(A|E) = P(E|A) P(A) / P(E)
Let’s look at how the probability of a theory increases upon learning E, using a ratio.
P(A|E) / P(A) = P(E|A) / P(E)
P(B|E) / P(B) = P(E|B) / P(E)
Which one increases by a larger ratio?
[P(A|E) / P(A)] / [P(B|E) / P(B)] = [P(E|A) / P(E)] / [P(E|B) / P(E)] = P(E|A) / P(E|B)
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B. This means that the more theory A narrowly predicts E, the actual observation, to the exclusion of other possible observations, the more probability we assign to it. This is a quantitative form of Popper’s preference for more specific and more easily falsifiable theories, as proven by Bayes theorem.
That’s basically what Solomonoff means by prior probability.
Yes Popper is non-consequentialist.
Consequentialism is a bad theory. It says ideas should be evaluated by their consequences (only). This does not address the question of how to determine what are good or bad consequences.
If you try to evaluate methods of determining what are good or bad consequences, by their consequences, you’ll end up with serious regress problems. If you don’t, you’ll have to introduce something other than consequences.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
For example, I might assert that a consequence is good if it makes me happy, and bad if it makes me unhappy. (I don’t in fact assert this.) I would then conclude that an idea is good if its consequences make me happy, and bad if its consequences make me unhappy. No regress involved.
(And note that this is different from saying that an idea is good if the idea makes me happy. If it turns out that the idea “I could drink drain cleaner” makes me happy, but that actually drinking drain cleaner makes me unhappy, then it’s a bad idea by the first theory but a good idea by the second theory.)
A certain amount of precision is helpful when thinking about these sorts of things.
If you reread the sentence in which I discuss a regress, you will notice it begins with “if” and says that a certain method would result in a regress, the point being you have to do something else. So it was your mistake.
That is not what I meant by consequentialism, and I agree that that theory entails an infinite regress. The theory I was referring to, which is the first google result for consequentialism, states that actions should be judged by their consequences.
That theory is bad too. For one thing, you might do something really dumb—say, shoot at a cop—and the consequence might be something good, e.g. you might accidentally hit the robber behind him who was about to kill him. you might end up declared a hero.
For another thing, “judge by consequences” does not answer the question of what are good or bad consequences. It tells us almost nothing. The only content is don’t judge by anything else. Why not? Beats me.
If you mean judge by rationally expected consequences, or something like that, you could drop the first objection but I still don’t see the use of it. If you merely want to exclude mysticism I think we can do that with a lighter restriction.
Sorry, I didn’t explain this very well. I don’t use consequentialism to judge people, I use it to judge possible courses of action. I (try to) make choices with the best consequences, this fully determines actions, so judgments of, for example, who is a bad person, do not add anything.
You are right that this is very broad. My point is that all consequentialist decision rules are either Bayesian decision rules or limits of Bayesian decision rules, according to a theorem.
I didn’t discuss who is a bad person. An action might be bad but have a good result (this time) by chance. And you haven’t said a word about what kinds of consequences of actions are good or bad … I mean desirable or undesirable. And you haven’t said why everything but consequences is inadmissible.
In your example of someone shooting a police officer, I would say that it is good that the police officer’s life was saved, but it is bad that there is a person who would shoot people so irresponsibly and I would not declare that person a hero as that will neither help save more police officers or reduce the number of people shooting recklessly; in fact, it would probably increase the number of reckless people.
I don’t want to get into the specifics of morality, because it is complex. The only reason that I specified consequentialist decision making is that it is a condition of the theorem that proves Bayesian decision making to be optimal. Entirely nonconsequentialist systems don’t need to learn about the universe to make decisions and partially consequentialist systems are more complicated. For the latter, Bayesianism is often necessary if there are times when nonconsequentialist factors have little import to a decision.
You are here judging a non-action by a non-consequence.
I think you mean systems which ignore all consequences. Popper’s system does not do that.
Popper’s system incorporates observational evidence in the form of criticism: ideas can be criticized for contradicting it.
Yes, this is a non-action; I often say it is bad that as shorthand for cetris paribus, I would act so as to make not be the case. However, it is a consequence of what happened before (though you may have just meant it is not a consequence of my action). Judgements are often attached to consequences without specifying which action they are consequences of, just for convenience.
Yes, that was what I meant.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
Examples include Newton’s Laws and Quantum Theory. They don’t say they happen sometimes but always, and that’s important. Good scientific theories are always like that. Even when they have a restricted, non-universal domain, it’s 100% within the domain.
Physics is currently thought to be deterministic. And even if physics was random, we would say that e.g. motion happens randomly 100% of the time, or whatever the law is. We would expect a law of motion with a call to a random function to still always be what happens.
PS Since you seem to have an interest in math, I’d be curious about your thoughts on this:
http://scholar.google.com/scholar?cluster=10839009135739435828&hl=en&as_sdt=0,5
There’s an improved version in Popper’s book The World of Parmenides but that may be harder for you to get.
The article you sent me is mathematically sound, but Popper draws the wrong conclusion from it. He has already accepted that P(H|E) can be greater than P(H). That’s all that’s necessary for induction: updating probability distribution. The stuff he says at the end about H ← E being countersupported by E does not prevent decision making based on the new distribution.
Setting aside Popper’s point for a minute, p(h|e) > p(h) is not sufficient for induction.
The reason it is not sufficient is that infinitely many h gain probability for any e. The problem of dealing with those remains unaddressed. And it would be incorrect and biased to selectively pick some pet theory from that infinite set and talk about how it’s supported.
Do you see what I’m getting at?
Yes, that is what the Solomonoff prior is for. It gives numbers to all the P(H_i).
And what is the argument for that prior? Why is it not arbitrary and often incorrect?
And whatever argument you give, I’ll also be curious: what method of arguing are you using? Deduction? Induction? Something else?
I tried to present it, but was obviously very unclear. If you read http://lesswrong.com/lw/jk/burdensome_details/ , http://lesswrong.com/lw/jn/how_much_evidence_does_it_take/ , and http://lesswrong.com/lw/jp/occams_razor/ , it’s basically a formalization of those ideas, with a tiny amount of handwaving.
Deduction.
Deduction requires premises to function. Where did you get the premises?
It seems obvious that low probability theories are good. Since probabilities must add up to 100%, there can be only a few high-probability theories and, when one is true, there is not much work to be done in finding it, since it is already so likely. telling someone to look among low-probability theories is like telling them to look among nonapples when looking for possible products to sell, and it provides no way of distinguishing good low-prior theories, like quantum mechanics, from bad ones, like astrology.
Unfortunately, I cannot read that article, as it is behind a paywall. If you have access to it, perhaps you could email it to me at endoself (at) yahoo (dot) com .
ETA:
I was only talking about Popper’s idea of theories with high content. That particular analysis was not meant to address theories that predicted certain outcomes with probability 1.
It’s a loose guideline for people about where it may be fruitful to look. It can also be used in critical arguments if/when people think of arguments that use it.
One of the differences between Popper and Bayesian Epistemology is that Popper thinks being overly formal is a fault not a merit. Much of Popper’s philosophy does not consist of formal, rigorous guidelines to be followed exactly. Popper isn’t big on rules of procedure. A lot is explanation. Some is knowledge to use on your own. Some is advice.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
Define theories of that type for all possible sets of predictions. Then at any given time you will have infinitely many of them that predict all your data with 100% probability.
Why is that wrong?
No, it has tons of theory. God is a very complex concept. Note that ‘God did everything’ is more complex and therefore less likely than ‘everything happened’. Did you read http://lesswrong.com/lw/jp/occams_razor/ ?
How do you figure God is complex? God as I mean it simply can do anything, no reason given. That is its only attribute: that it arbitrarily does anything the theory its attached to cares to predict. We can even stop calling it “God”. We could even not mention it at all so there is no theory and merely give a list of predictions. Would that be good, in your view?
If ‘God’ is meaningless and can merely be attached to any theory, then the theory is the same with and without God. There is nothing to refute, since there is no difference. If you defined ‘God’ to mean a being who created all species or who commanded a system of morality, I would have both reason to care about and means to refute God. If you defined ‘God’ to mean ‘quantum physics’, there would be applications and means of proving that ‘God’ is a good approximation, but this definition is nonsensical, since it is not what is usually meant by ‘God. If the theory of ‘God’ has no content, there is nothing to discuss, but the is again a very unusual definition.
Would a list of predictions with no theory/explanation be good or bad, in your view?
If there is no simpler description, then a list of predictions is better but, if an explanation simpler then merely a list of prediction is at all possible, then that would be more likely.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
Essentially. It is simpler if it has a higher Solomonoff prior.
Yes, there is more than just predictions. However, prediction are the only things that tell us how to update our probability distributions.
So, your epistemology is 100% instrumentalist and does not deal with non-predictive knowledge at all?
Can you give an example of non-predictive knowledge and what role it should play?
Quoting from The Fabric of Reality, chapter 1, by David Deutsch.
Yet some philosophers — and even some scientists — disparage the role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of experiments: its entire content lies in its predictive formulae. They consider that any consistent explanation that a theory may give for its predictions is as good as any other — or as good as no explanation at all — so long as the predictions are true. This view is called instrumentalism (because it says that a theory is no more than an ‘instrument’ for making predictions). To instrumentalists, the idea that science can enable us to understand the underlying reality that accounts for our observations is a fallacy and a conceit. They do not see how anything a scientific theory may say beyond predicting the outcomes of experiments can be more than empty words.
[cut a quote of Steven Weinberg clearly advocating instrumentalism. the particular explanation he says doesn’t matter is that space time is curved. space time curvature is an example of a non-predictive explanation.]
imagine that an extraterrestrial scientist has visited the Earth and given us an ultra-high-technology ‘oracle’ which can predict the outcome of any possible experiment, but provides no explanations. According to instrumentalists, once we had that oracle we should have no further use for scientific theories, except as a means of entertaining ourselves. But is that true? How would the oracle be used in practice? In some sense it would contain the knowledge necessary to build, say, an interstellar spaceship. But how exactly would that help us to build one, or to build another oracle of the same kind — or even a better mousetrap? The oracle only predicts the outcomes of experiments. Therefore, in order to use it at all we must first know what experiments to ask it about. If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction — even perfect, universal prediction — is simply no substitute for explanation.
Similarly, in scientific research the oracle would not provide us with any new theory. Not until we already had a theory, and had thought of an experiment that would test it, could we possibly ask the oracle what would happen if the theory were subjected to that test. Thus, the oracle would not be replacing theories at all: it would be replacing experiments. It would spare us the expense of running laboratories and particle accelerators.
[cut elaboration]
The oracle would be very useful in many situations, but its usefulness would always depend on people’s ability to solve scientific problems in just the way they have to now, namely by devising explanatory theories. It would not even replace all experimentation, because its ability to predict the outcome of a particular experiment would in practice depend on how easy it was to describe the experiment accurately enough for the oracle to give a useful answer, compared with doing the experiment in reality. After all, the oracle would have to have some sort of ‘user interface’. Perhaps a description of the experiment would have to be entered into it, in some standard language. In that language, some experiments would be harder to specify than others. In practice, for many experiments the specification would be too complex to be entered. Thus the oracle would have the same general advantages and disadvantages as any other source of experimental data, and it would be useful only in cases where consulting it happened to be more convenient than using other sources. To put that another way: there already is one such oracle out there, namely the physical world. It tells us the result of any possible experiment if we ask it in the right language (i.e. if we do the experiment), though in some cases it is impractical for us to ‘enter a description of the experiment’ in the required form (i.e. to build and operate the apparatus). But it provides no explanations.
In a few applications, for instance weather forecasting, we may be almost as satisfied with a purely predictive oracle as with an explanatory theory. But even then, that would be strictly so only if the oracle’s weather forecast were complete and perfect. In practice, weather forecasts are incomplete and imperfect, and to make up for that they include explanations of how the forecasters arrived at their predictions. The explanations allow us to judge the reliability of a forecast and to deduce further predictions relevant to our own location and needs. For instance, it makes a difference to me whether today’s forecast that it will be windy tomorrow is based on an expectation of a nearby high-pressure area, or of a more distant hurricane. I would take more precautions in the latter case.
[“wind due to hurricane” and “wind due to high-pressure area” are different explanations for a particular prediction.]
So knowledge is more than just predictive because it also lets us design things?
Here’s a solution to the problem with the oracle—design a computer that inputs every possible design to the oracle and picks the best. You may object that this would be extremely time-consuming and therefore impractical. However, you don’t need to build the computer; just ask the oracle what would happen if you did.
What can we learn from this? This kind of knowledge can be seen as predictive, but only incidentally, because the computer happen to be implemented in the physical world. If it were implemented mathematically, as an abstract algorithm, we would recognize this as deductive, mathematical knowledge. But math is all about tautologies; nothing new is learned. Okay, I apologize for that. I think I’ve been changing my definition of knowledge repeatedly to include or exclude such things. I don’t really care as much about consistent definitions as I should. Hopefully it is clear from context. I’ll go back to your original question.
The difference between the two cases is not the same as the crucial difference here. Having a theory as opposed to a list of predictions for every possible experiment does not necessarily make the theorems easier to prove. When it does, which is almost always, this is simply because that theory is more concise, so it is easier to deduce things from. This seems more like a matter of computing power than one of epistemology.
How does it pick the best?
And wouldn’t the oracle predict that the computer program would never halt, since it would attempt to enter infinitely many questions into the oracle?
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size, but, again, this is irrelevant; whatever trick you use to make this work, you will not change the conclusions.
I think figuring out what criteria you want is an example of a non-predictive issue. That makes it not beside the point. And if the computer picks the best according to criteria we give it, they will contain mistakes. We won’t actually get the best answer. We’ll have to learn stuff and improve our knowledge all in order to set up your predictive thing. So there is this whole realm of non-predictive learning.
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well. So your ability to get useful predictions out of the oracle depends on your understanding, not just on predicting anything.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
Good point. I guess you can’t bootstrap an oracle like this; some things possible mathematically, like calculating a function over an infinity of points, just can’t be done physically. My point still stands, but this illustration definitely dies.
That’s it? That’s just not very impressive by my standards. Popper’s epistemology is far more advanced, already. Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
By ‘what Eliezer is working on now’ I meant AI, which would probably be necessary to extract my desires from my brain in practice. In principle, we could just use Bayes’ theorem a lot, assuming we had precise definitions of these concepts.
Popperian epistemology is incompatible with Bayesian epistemology, which I accept from its own justification, not from a lack of any other theory. I disliked what I had heard about Popper before I started reading LessWrong, but I forget my exact argument, so I do not know if it was valid. From what I do remember, I suspect it was not.
So, you reject Popper’s ideas without having any criticism of them that you can remember?
That’s it?
You don’t care that Popper’s ideas have criticisms of Bayesian epistemology which you haven’t answered. You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
No, I brought up my past experience with Popper because you asked if my opinions on him came from Eliezer.
No, I think Bayesian epistemology has been mathematically proven. I don’t spend a lot of time investigating alternatives for the same reason I don’t spend time investigating alternatives to calculus.
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know. You did bring up some things of this type, like that paper by Popper, but none of them have logically stood up, unless I am missing something.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It seems to me that saying it’s proven, the end, is incompatible with it having any flaws or unsolved problems or need for further research. So, which is it?
All of the above. It is wrong b/c, e.g., it is instrumentalist (has not understood the value of explanatory knowledge) and inductivist (induction is refuted). It is incomplete b/c, e.g. it cannot deal with non-observational knowledge such as moral knowledge. You haven’t proved much to me however I’ve been directed to two books, so judgment there is pending.
I don’t know how you concluded that none of my arguments stood up logically. Did you really think you’d logically refuted every point? I don’t agree, I think most of your arguments were not pure logic, and I thought that various issues were pending further discussion of sub-points. As I recall, some points I raised have not been answered. I’m having several conversations in parallel so I don’t recall which in particular you didn’t address which were replies to you personally, but for example I quoted an argument by David Deutsch about an oracle. The replies I got about how to try to cheat the oracle did not address the substantive point of the thought experiment, and did not address the issue (discussed in the quote) that oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle (we already have an oracle and none of the replies I got about use the oracle would actually work with the oracle we have). As I saw it, my (quoted) points on that issue stood. The replies were some combination of incomplete and missing the point. They were also clever which is a bit of fun. I thought of what I think is a better way to try to cheat the rules, which is to ask the oracle to predict the contents of philosophy books that would be written if philosophy was studied for trillions of years by the best people. However, again, the assumption that any question which is easily described in English can be easily entered into the oracle and get a prediction was not part of the thought experiment. And the reason I hadn’t explained all this yet is that there were various other points pending anyway, so shrug, it’s hard to decide where to start when you have many different things to say.
It is proven that the correct epistemology, meaning one that is necessary to achieve general goals, is isomorphic to Bayesianism with some prior (for beings that know all math). What that prior is requires more work. While the constraint of knowing all math is extremely unrealistic, do you agree that the theory of what knowledge would be had in such situations is a useful guide to action until we have a more general theory. Popperian epistemology cannot tell me how much money to bet at what odds for or against P = NP any more than Bayesian epistemology can, but at least Bayesian epistemology set this as a goal.
This is all based on our limited mathematical ability. A theory does have an advantage over an oracle or the reality-oracle: we can read it. Would you agree that all the benefits of a theory come from this plus knowing all math. The difference is one of mathematical knowledge, not of physical knowledge. How does Popper help with this? Are there guidelines for what ‘equivalent’ formulations of a theory are mathematically better? If so, this is something that Bayesianism does not try to cover, so this may have value. However, this is unrelated to the question of the validity of “don’t assign probabilities to theories”.
I thought I addressed this but, to recap:
That (well and how much bigger) is all I need to make decisions.
So what? I already have my new probabilities.
What is induction if not the calculation of new probabilities for hypotheses? Should I care about these ‘inductive truths’ that Popper disproves the existence of? I already have an algorithm to calculate the best action to take. It seems like Bayesianism isn’t inductivist by Popper’s definition.
I’d like to be sure that we are using the same definitions of our terms, so please give an example.
You mean proven given some assumptions about what an epistemology should be, right?
No. We need explanations to understand the world. In real life, is only when we have explanations that we can make good predictions at all. For example, suppose you have a predictive theory about dice and you want to make bets. I chose that example intentionally to engage with areas of your strength. OK, now you face the issue: does a particular real world situation have the correct attributes for my predictive theory to apply? You have to address that to know if your predictions will be correct or not. We always face this kind of problem to do much of anything. How do we figure out when our theories apply? We come up with explanations about what kinds of situations they apply to, and what situation we are in, and we then come up with explanations about why we think we are/aren’t in the right kind of situation, and we use critical argument to improve these explanations. Bayesian Epistemology does not address all this.
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
I gave examples of moral knowledge in another comment to you. Morality is knowledge about how to live, what is a good life. e.g. murder is immoral.
Yes, I stated my assumptions in the sentence, though I may have missed some.
This comes back to the distinction between one complete theory that fully specifies the universe and a set of theories that are considered to be one because we are only looking at a certain domain. In the former case, the domain of applicability is everywhere. In the latter, we have a probability distribution that tells us how likely it is to fail in every domain. So, this kind of thing is all there in the math.
What do you mean by ‘a good theory’. Bayesian never select one theory as ‘good’ as follow that; we always consider the possibility of being wrong. When theories have higher probability than others, I guess you could call them good. I don’t see why this is hard; just calculate P(H | E) for all the theories and give more weight to the more likely ones when making decisions.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Let’s consider only theories which make all their predictions with 100% probability for now. And theories which cover everything.
Then:
If H and E are consistent, then it follows that P(H | E) > P(H).
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
We can put all the Hs in two categories: the consistent ones which gain equal probability, and the inconsistent ones for which P(H|E) = 0. (Assumption warning: we’re relying on getting it right which H are consistent with which E.)
This means:
1) consistency and support coincide.
2) there are infinitely many equally supported theories. There are only and exactly two amounts of support that any theory has given all current evidence, one of which is 0.
3) The support concept plays no role in helping us distinguish between the theories with more than 0 support.
4) The support concept can be dropped entirely because it has no use at all. The consistency concept does everything
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Similar arguments can be made without my initial assumptions/restrictions. For example introducing theories that make predictions with less than 100% probability will not help you because they are going to have lower probability than theories which make the same predictions with 100% probability.
Well the ratio is the same, but that’s probably what you meant.
Have a prior. This reintroduces probabilities and deals with the remaining theories. You will converge on the right theory eventually no matter what your prior is. Of course, that does not mean that all priors are equally rational.
If they all have the same prior probability, then their probabilities are the same and stay that way. If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
How does having a prior save the concept of support? Can you give an example? Maybe the one here, currently near the bottom:
http://lesswrong.com/lw/54u/bayesian_epistemology_vs_popper/3urr?context=3
Well shouldn’t they? If you look at it from the perspective of making decisions rather than finding one right theory, it’s obvious that they are equiprobable and this should be recognized.
Solomonoff does not give “some things higher prior probabilities in a 100% non-evidence-based way”. All hypotheses have the same probability, many just make similar predictions.
Is anyone here working on the problem of parenting/educating AIs?
It seems someone has downvoted you for not being familiar with Eliezer’s work on AI. Basically, this is overly anthropomorphic. It is one of our goals to ensure that an AI can progress from a ‘seed AI’ to a superintelligent AI without anything going wrong, but, in practice, we’ve observed that using metaphors like ‘parenting’ confuses people too much to make progress, so we avoid it.
Don’t worry about downvotes, they do not matter.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
One of the fundamental attributes of an AI is that it’s a program which can learn new things.
Humans are also entities that learn new things.
But humans, left alone, don’t fare so well. Helping people learn is important, especially children. This avoids having everyone reinvent the wheel.
The parenting issue therefore must be addressed for AI. I am familiar with the main ideas of the kind of AI work you guys do, but I have not found the answer to this.
One possible way to address it is to say the AI will reinvent the wheel. It will have no help but just figure everything out from scratch.
Another approach would be to program some ideas into the AI (changeable, or not, or some of each), and then leave it alone with that starting point.
Another approach would be to talk with the AI, answer its questions, lecture it, etc… This is the approach humans use with their children.
Each of these approaches has various problems with it which are non-trivial to solve.
Make sense so far?
When humans hear parenting, they think of the human parenting process. Describe the AI as ‘learning’ and the humans as ‘helping it learn’. This get us closer to the idea of humans learning about the universe around them, rather than being raised as generic members of society.
Well, the point of down votes is discourage certain behaviour, and I agree that you should use terminology that we have found less likely to cause confusion.
AIs don’t necessarily have so much of a problem with this. They learn very differently than humans: http://lesswrong.com/lw/jo/einsteins_arrogance/ , http://lesswrong.com/lw/qj/einsteins_speed/ , http://lesswrong.com/lw/qk/that_alien_message/
This is definitely an important problem, but we’re not really at the stage where it is necessary yet. I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
Not all humans. Not me. Is that not a bias?
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution. We should expect an AI to use it, we shouldn’t expect a brand new solution to this hard problem (historically there have been very few candidate solutions proposed, most not at all promising).
The implementation details are not very important because the result will be universal, just like people are. This is similar to how the implementation details of universal computers are not important for many purposes.
Are you guys familiar with these concepts? There is important knowledge relevant to creating AIs which your statement seems to me to overlook.
Yes, that would be a bias. Note that this kind of bias is not always explicitly noticed.
As a general rule, if I downvote, I either reply to the post, or it is something that should be obvious to someone who has read the main sequences.
No, there is another: the brain. It is also much faster than evolution, an advantage I would want a FAI to have.
You are unfamiliar with the basic concepts of evolutionary epistemology. The brain internally does evolution of ideas.
Why is it that you guys want to make AI but don’t study relevant topics like this?
You’re conflating two things. Biological evolution is a very specific algorithm, with well-studied mathematical properties. ‘Evolution’ in general just means any change over time. You seem to be using it in an intermediate sense, as any change that proceeds through reproduction, variation, and selection, which is also a common meaning. This, however, is still very broad, so there’s very little that you can learn about an AI just from knowing “it will come up with many ideas, mostly based on previous ones, and reject most of them”. This seems less informative than “it will look at evidence and then rationally adjust its understanding”.
There’s an article related to this: http://lesswrong.com/lw/l6/no_evolutions_for_corporations_or_nanodevices/
Eliezer has studied cognitive science. Those of us not working directly with him have very little to do with AI design. Even Eliezer’s current work is slightly more background theory than AI itself.
I’m not conflating them. I did not mean “change over time”.
There are many things we can learn from evolutionary epistemology. It seeming broad to you does not prevent that. You would do better to ask what good it is instead of guess it is no good.
For one thing it connects with meme theory.
A different example is that it explains misunderstandings when people communicate. Misunderstandings are extremely common because communication involves 1) guessing what the other person is trying to say 2) selecting between those guesses with criticism 3) making more guesses which are variants of previous guesses 4) more selection 5) etc
This explanation helps us see how easily communication can go wrong. It raises interesting issues like why so much communication doesn’t go wrong. It refutes various myths like that people absorb their teacher’s lectures a little like sponges.
It matters. And other explanations of miscommunication are worse.
But that isn’t the topic I was speaking of. I meant evolutionary epistemology. Which btw I know that Eliezer has not studied much because he isn’t familiar with one of it’s major figures (Popper).
I don’t know enough about evolutionary epistemology to evaluate the usefulness and applicability of its ideas.
How was evolutionary epistemology tested? Are there experiments or just introspection?
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics. There has not been any science experiment to test it which I consider important, but I could conceive of some because if you specified different and perverse laws of physics you could break evolution. In a different sense, evolution is tested constantly in that the laws of physics and evidence we see around us, every day, are not that perverse but conceivable physics that would break evolution.
The reason I accept evolution (again I refer to the epistemological theory about how knowledge is created) is that it is a good explanation, and it solves an important philosophical problem, and I don’t know anything wrong with it, and I also don’t know any rivals which solve the problem.
The problem has a long history. Where does “apparent design” come from? Paley gave an example of finding a watch in nature, which he said you know can’t have gotten there by chance. That’s correct—the watch has knowledge (aka apparent designed, or purposeful complexity, or many other terms). The watch is adapted “to a purpose” as some people put it (I’m not really a fan of the purpose terminology. But it’s adapted! And I think it gets the point across ok.)
Paley then guessed as follows: there is no possible solution to the origins of knowledge other than “A designer (God) created it”. This is a very bad solution even pre-Darwin because it does not actually solve the problem. The designer itself has knowledge, adaptation to a purpose, whatever. So where did it come from? The origin is not answered.
Since then, the problem has been solved by the theory of evolution and nothing else. And it applies to more than just watches found in nature, and to plants and animals. It also applies to human knowledge. The answer “intelligence did it” is no better than “God did it”. How does intelligence do it? The only known answer is: by evolution.
The best thing to read on this topic is The Beginning of Infinity by David Deutsch which discusses Popperian epistemology, evolution, Paley’s problem and its solution, and also has two chapters about meme theory which give important applications.
You can also find some, e.g. here: http://fallibleideas.com/evolution-and-knowledge
Also here: http://fallibleideas.com/tradition (Deutsch discusses static and dynamic memes and societies. I discuss “traditions” rather than “memes”. It’s quite similar stuff.)
What? Epistemological evolution seems to be about how the mind works, independent of what philosophical status is accorded to the thoughts. Surely it could be tested just by checking if the mind actually develops ideas in accordance with the way it is predicted to.
If you want to check how minds work, you could do that. But that’s very hard. We’re not there yet. We don’t know how.
How minds work is a separate issue from evolutionary epistemology. Epistemology is about how knowledge is created (in abstract, not in human minds specifically). If it turns out there is another way, it wouldn’t upset the evolution would create knowledge if done in minds.
There’s no reason to think there is another way. No argument that there is. No explanation of why to expect there to be. No promising research on the verge of working one out. Shrug.
I see. I thought that evolutionary epistemology was a theory of human minds, though I know that that technically isn’t epistemology. Does evolutionary epistemology describe knowledge about the world, mathematical knowledge, or both (I suspect you will say both)?
It describes the creation of any type of knowledge. It doesn’t tell you the specifics of any field itself, but doing it helps you learn them.
So, you’re saying that in order to create knowledge, there has to be copying, variation, and selection. I would agree with the first two, but not necessarily the third. Consider a formal axiomatic system. It produces an ever-growing list of theorems, but none are ever selected any more than others. Would you still consider this system to be learning?
With deduction, all the consequences are already contained in the premises and axioms. Abstractly, that’s not learning.
When human mathematicians do deduction, they do learn stuff, because they also think about stuff while doing it, they don’t just mechanically and thoughtlessly follow the rules of math.
So induction (or probabilistic updating, since you said that Popper proved it not to be the same as whatever philosopher call ‘induction’) isn’t learning either because the conclusions are contained in the priors and observations?
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
the idea of induction is that the conclusions are NOT logically contained in the observations (that’s why it is not deduction).
if you make up a prior from which everything deductively follows, and everything else is mere deduction from there, then all of your problems and mistakes are in the prior.
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
The correct method of updating one’s probability distributions is contained in the observations. P(H|E) = P(H)P(E|H)/P(E) .
So how could evolutionary epistemology be relevant to AI design?
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
The conjectures need to be based on the previous conjectures (not necessarily all of the, but some). That makes it replication with variation. The criticism is the selection.
Any AI design that completely ignores this is, imo, hopeless. I think that’s why the AI field hasn’t really gotten anywhere. They don’t understand what they are trying to make, because they have the wrong philosophy (in particular the wrong explanations. i don’t mean math or logic).
Could you explain where AIXI does any of that?
Or could you explain where a Bayesian spam filter does that?
Note that there are AI approaches which do do something close to what you think an AI “needs”. For example, some of Simon Colton’s work can be thought of in a way roughly like what you want. But it is a mistake to think that such an entity needs to do that. (Some of the hardcore Bayesians make the same mistake in assuming that an AI must use a Bayesian framework. That something works well as a philosophical approach is not the same claim as that it should work well in a specific setting where we want an artificial entity to produce certain classes of systematic reliable results.)
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to. All the knowledge is provided by the human programmer—they are designed by an intelligent person and to the extent they “act intelligent” it’s all due to the person providing the thinking for it.
I’m not sure this is at all well-defined. I’m curious, what would make you change your mind? If for example, Colton’s systems constructed new definitions, proofs, conjectures, and counter-examples in math would that be enough to decide they were learning?
How about it starts by passing the turing test?
Or: show me the code, and explain to me how it works, and how the code doesn’t contain all the knowledge the AI creates.
Could you explain how this is connected to the issue of making new knowledge?
This seems a bit like showing a negative. I will suggest you look for a start at Simon Colton’s paper in the Journal of Integer Sequences which uses a program that operates in a way very close to the way you think an AI would need to operate in terms of making conjectures and trying to refute them. I don’t know if the source code is easily available. It used to be on Colton’s website but I don’t see it there anymore; if his work seems at all interesting to you you can presumably email him requesting a copy. I don’t know how to show that the AI “doesn’t contain all the knowledge the AI creates” aside from the fact that the system constructed concepts and conjectures in number theory which had not previously been constructed. Moreover, Colton’s own background in number theory is not very heavy, so it is difficult to claim that he’s importing his own knowledge into the code. If you define more precisely what you mean by the code containing the knowledge I might be able to answer that further. Without a more precise notion it isn’t clear to me how to respond.
Holding a conversation requires creating knowledge of what the other guy is saying.
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
In a spam filter, a programmer figures out how he wants spam filtered (he has the idea), then he tells the computer to do it. The computer doesn’t figure out the idea or any new idea.
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension; they contain important knowledge such as the knowledge of optics that makes their eyes work. This is why “Where can cats come from?” has been considered an important question (people want an explanation of the knowledge which i sometimes called “apparent design), while “Where can rocks come from?” is not in the same category of question (it does have some interest for other reasons).
With people, people create ideas that aren’t in their genes, and were’t told to them by their parents or anyone else. That includes abstract ideas that aren’t the summation of observation. They sometimes create ideas no one ever thought of before. THey create new ideas.
In an AI (AGI you call it?) should be like a person: it should create new ideas which are not in it’s “genes” (programming). If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
Are genetic algorithm systems then creating something new by your definition?
Different concepts. An artificial intelligent is not (necessarily) a well-defined notion. An AGI is an artficial general intelligence, essentially something that passes the Turing test. Not the same concept.
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
When you “discover” that 2+1 = 3, given premises and axioms, you aren’t discovering something new.
But working mathematicians do more than that. They create new knowledge. It includes:
1) they learn new ways to think about the premises and axioms
2) they do not publish deductively implied facts unselectively or randomly. they choose the ones that they consider important. by making these choices they are adding content not found in the premises and axioms
3) they make choices between different possible proofs of the same thing. again where they make choices they are adding stuff, based on their own non-deductive understanding
4) when mathematicians work on proofs, they also think about stuff as they go. just like when experimental scientists do fairly mundane tasks in a lab, at the same time they will think and make it interesting with their thoughts.
They could be. I don’t think any exist yet that do. For example I read a Dawkins paper about one. In the paper he basically explained how he tweaked the code in order to get the results he wanted. He didn’t, apparently, realize that it was him, not the program, creating the output.
By “AI” I mean AGI. An intelligence (like a person) which is artificial. Please read all my prior statements in light of that.
Well, OK, but they’d understand how it was created, and could explain that. They could explain what they know about why it works (it copies what humans do). And they could also make the code public and discuss what it doesn’t include (e.g. hard coded special cases. except for the 3 he included on purpose, and he explains why they are there). That’d be pretty convincing!
I don’t think this is true. While he probably wouldn’t announce it if he was working on AI, he’ has indicated that he’s working on two books (HPMoR and a rationality book), and has another book queued. He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
I more meant “he’s probably thinking about this in the back of his mind fairly often”, as well as trying to be humourous.
Do you know what he would think of work that has a small chance of solving goal stability and a slightly larger chance of helping with AI in general? This seems like a net plus to me, but you seem to have heard what he thinks should be studied from a slightly clearer source than I did.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
The reason is that if you could predict this you would in effect already have made the discovery.
Understanding is not primarily predictive and it is useful in a practical way. For example, you have to understand issues to address critical arguments offered by your peers. Merely predicting that they are wrong isn’t a good approach. It’s crucial to understand what their point is and to reason with them.
Understanding ideas helps us improve on them. It’s crucial to making judgments about what would be an improvement or not. It lets us judge changes better b/c e.g. we have some conception of why it works, which means we can evaluate what would break it or not.
That is not what I meant. If we could predict that the LHC will discover superparticles then yes, we would already know that. However, since we don’t know whether it will produce superparticles, we can predict that it will give us a lot of knowledge, since we will either learn that superparticles in the mass range detectable by the LHC exist or that they do not exist, so we can predict that we will learn a lot more about the universe by continuing to run the LHC than by filling in the tunnel where it is housed.
Eliezer proves that you cannot predict which direction science will go from a Bayesian perspective in http://lesswrong.com/lw/ii/conservation_of_expected_evidence/ .
So if new knowledge doesn’t come from prediction, what creates it? Answering this is one of epistemology’s main tasks. If you are focussing on prediction then you aren’t addressing it. Am I missing something?
New knowledge comes from observation. If you are referring to knowledge of what a theory says rather than of which theory is true, then this is assumed to be known. The math of how to deal with a situation where a theory is known but its consequences cannot be fully understood due to mathematical limitations is still in its infancy, but this has never posed a problem in practice.
That is a substantive and strong empiricist claim which I think is false.
For example, we have knowledge of things we never observed. Like stars. Observation is always indirect and its correctness always depends on theories such as our theories about whether the chain of proxies we are observing with will in fact observe what we want to observe.
Do you understand what I’m talking about and have a reply, or do you need me to explain further?
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
The new knowledge that is obtained from an observation is not just the content of the observation, it is also the new probabilities resulting from the observation. This is discussed at http://lesswrong.com/lw/pb/belief_in_the_implied_invisible/ .
It is assumed in practice, applied epistemology being a rather important thing to have. In ‘pure’ epistemology, it is just labelled incomplete; we definitely don’t have all the answers yet.
It seems to me that you’re pretty much conceding that your epistemology doesn’t work. (All flaws can be taken as “incomplete” parts where, in the future, maybe a solution will be found.)
That would leave the following important disagreement: Popper’s epistemology is not incomplete in any significant way. There is room for improvement, sure, but not really any flaws worth complaining about. No big unsolved problems marring it. So, why not drop this epistemology that doesn’t have the answers yet for one that does?
Would you describe quantum mechanics’ incompatibility with general relativity as “the theory doesn’t work”? For a being with unlimited computing power in a universe that is known to be computable (except for the being itself obviously), we are almost entirely done. Furthermore, many of the missing pieces to get from that to something much more complete seem related.
No, it is just wrong. Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories. Any consistent consequentialist decision rule must be basically equivalent to that. The statement that there is no way to assign probabilities to theories therefore implies that there is no algorithm that a consequentialist can follow to reliably achieve their goals. Note that even if Popper’s values are not consequentialist, a consequentialist should still be able to act based on the knowledge obtained by a valid epistemology.
Can you be more specific?
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t believe these givens are all true. If you think otherwise could we start with you giving the details more? I don’t want to argue with parts you simply omitted b/c I’ll have to guess what you think too much.
As a separate issue, “given my preferences” is such a huge given. It means that your epistemology does not deal in moral knowledge. At all. It simply takes preferences as givens and doesn’t tell you which to have. So in practice in real life it cannot be used for a lot of important issues. That’s a big flaw. And it means a whole entire second epistemology is needed to deal in moral knowledge. And if we have one of those, and it works, why not use it for all knowledge?
The rest of the paragraph was what I meant by this. You agree that Popperian epistemology states that theories should not be assigned probabilities.
Depends. If it’s standards make it useless, then, while internally consistent, I can judge it to be pointless. I just want an epistemology that can help me actually make decisions based on what I learn about reality.
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future. In this context, it is far more general than anything composed of a bunch of term, each of which describes some properties of a person.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
Overall you’ve said almost nothing that’s actually about Popperian epistemology. You just took one claim (which has nothing to do with what it’s about, it’s just a minor point about what it isn’t) and said it’s wrong (without detailed elaboration).
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
How do you learn what preferences are good to have, in that way?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules. Even if the probabilities are not mentioned when constructing the rule, they can be inferred from its final form.
I don’t know what you mean by ′ “utility” the thing people in real life have’.
Can we please not get into this. If it helps, assume I am an expected paperclip maximizer. How would I decide then?
What was the argument for that?
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
The original is Abraham Wald’s An Essentially Complete Class of Admissible Decision Functions.
Thank you!
I thought you didn’t address the issue (and need to): you did not say what mathematical properties you think that real utilities have and how you deal with them.
Using what premises?
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Every possible universe is associated with a utility.
Any two utilities can be compared.
These comparisons are transitive.
Weighted averages of utilities can be taken.
For any three possible universe, L, M, and N, with L < M, a weighted average of L and N is less than a weighted average of M and N, if N is accorded the same weight in both cases.
Basically just definitions. I’m currently trying to enumerate them, which is why I wanted to find the proof of the theorem we were discussing.
Care about in the sense of when I’m deciding whether to make it. I don’t really care about how reasonable other people’s decisions are unless it’s relevant to my interactions with them, where I will need that knowledge to make my own decisions.
Wait, you bought the book just for that proof? I don’t even know if its the best proof of it (in terms of making assumptions that aren’t necessary to get the result). I’m confidant in the proof because of all the other similar proofs I’ve read, though none seem as widely applicable as that one. I can almost sketch a proof in my mind. Some simple ones are explained well at http://en.wikipedia.org/wiki/Coherence_%28philosophical_gambling_strategy%29 .
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
No, I used inter library loan.
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
Yes, I did. Oops.
But that is what a choice is between—the universe where you choose one way and the universe where you choose another. Often large parts of the universe are ignored, but only because the action’s consequences for those parts are not distinguishable from how those part would be if a different action was taken. A utility function may be a sum (or more complicated combination) of parts referring to individual aspects of the universe, but, in this context, let’s not call those ‘utilities’; we’ll reserve that word for the final thing used to make decisions. Most of this is not consciously invoked when people make decisions, but a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice.
I could could achieve better consequences using an ‘irrational’ process, I would, but this sounds nonsensical because I am used to defining ‘rational’ as that which reliably gets the best consequences.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
So what? I think you’re basically saying that your formulation is equivalent to what people (should) do. But that doesn’t address the issue of what people actually do—it doesn’t demonstrate the equivalence. As you guys like to point out, people often think in ways that don’t make sense, including violating basic logic.
But also, for example, I think a person might evaluate getting a cat, and getting an iphone, and then they might (incorrectly) evaluate both by adding the benefits instead of by considering the universe with both based on its own properties.
Another issue is that I don’t think any two utilities people have can be compared. They are sometimes judged with different, contradictory standards. This leads to two major issues when trying to compare them 1) the person doesn’t know how 2) it might not be possible to compare even in theory because one or both contain some mistakes. the mistakes might need to be fixed before comparing, but that would change it.
I’m not saying people are doing it correctly. Whether they are right or wrong has no bearing on whether “utility” the mathematical object with the 5 properties you listed corresponds to “utility” the thing people do.
If you want to discuss what people should do, rather than what they do do, that is a moral issue. So it leads to questions like: how does bayesian epistemology create moral knowledge and how does it evaluate moral statements?
If you want to discuss what kind of advice is helpful to people (which people?), then I”m sure how you can see how talking about entire universes could easily confuse people, and how some other procedure being a special case of it may not be very good advice which does not address the practical problems they are having.
Do you think that the Dutch book arguments go “way beyond deductive or definitions”? Well, I guess that would depend on what you conclude from them. For now, lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
The confusion here is that we’re not judging an action. If I make a mistake and happen to benefit from it, there were good consequences, but there was no choice involved. I don’t care about this; it already happened. What I do care about, and what I can accomplish, is avoiding similar mistakes in the future.
Yes, that is what I was discussing. I probably don’t want to actually get into my arguments here. Can you give an example of what you mean by “moral knowledge”?
Applying dutch book arguments to real life situations always goes way behind deduction and definitions, yes.
A need? Are you talking about morality now?
Why are we saying this? You now speak of probabilities of events. Previously we were discussing epistemology which is about ideas. I object to assigning probabilities to the truth of ideas. Assigning them to events is OK when
1) the laws of physics are indeterministic (never, as far as we know)
2) we have incomplete information and want to make a prediction that would be deterministic except that we have to put several possibilities in some places, which leads to several possible answers. and probability is a reasonable way to organize thoughts about that.
So what?
Murder is immoral.
Being closed minded makes ones life worse because it sabotages improvement.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”? How would you test this? What would you consider a proof of it?
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means. I am a moral realist to some, a relativist to others, and an error theorist to other others. I could prove the statement for many common non-confused definitions, though not for, for example, people who say ‘morality’ is synomnymous to ‘that which is commanded by God’, which is based on confusion but at least everyone can agree on when it is or isn’t true and not for error theorists, as both groups’ definitions make the sentence false.
In theory I could prove this sentence, but in practice I could not do this clearly, especially over the internet. It would probably be much easier for you to read the sequences, which get to this toward the end, but, depending on your answers to some of my questions, there may be an easier way to explain this.
Yes. One epistemology. All types of knowledge. Unified!
You would not.
We don’t accept proofs of anything, we are fallibilists. We consider mathematical proofs to be good arguments though. I don’t really want to argue about those (unless you’re terribly interested. btw this is covered in the math chapter of The Fabric of Reality by David Deutsch). But the point is we don’t accept anything as providing certainty or even probableness. In our terminology, nothing provides justification.
What we do instead is explain our ideas, and to criticize mistakes, and in this way to improve our ideas. This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
Fine, what would you consider an argument for it?
Eliezer and I probably agree with you.
Well, biological evolution is a much smaller part of conceptspace than “replication, variation, selection” and now I’m realizing that you probably haven’t read A Human’s Guide to Words which is extremely important and interesting and, while you’ll know much of it, has things that are unique and original and that you’ll learn a lot from. Please read it.
I do apply reason to those things, I just don’t use the words ‘morality’ in my reasoning process because too many people get confused. It is only a word after all.
On a side note, I am staring to like what I hear of Popper. It seems to embody an understanding of the brain and a bunch of useful advice for it. I think I disagree with some things, but on grounds that seems like the sort of thing that is accepted as motivation for the theory self-modify. Does that make sense? Anyways, it’s not Popper’s fault that there are a set of theorems that in principle remove the need for other types of thought and in practice cause big changes in the way we understand and evaluate the heuristics that are necessary because the brain is fallible and computationally limited.
Wei Dai likes thinking about how to deal with questions outside of Bayesianism’s current domain of applicability, so he might be interested in this.
Interpret this as a need in order to achieve some specified goal in order to keep this part the debate out of morality. A paperclip maximizer, for example would obviously need to not pay 200 paperclips for a lottery with a maximum payout of 100 paperclips in order to achieve its goals. Furthermore, this applies to any consequentialist set of preferences.
Not sure why I wrote that. Substitute ‘theories’.
So you assume morality (the “specified goal”). That makes your theory amoral.
Why is there a need to assign probabilities to theories? Popperian epistemology functions without doing that.
Well there’s a bit more than this, but it’s not important right now. One can work toward any goal just by assuming it as a goal.
Because of the Dutch book arguments. The probabilities can be inferred from the choices. I’m not sure if the agent’s probability distribution can be fully determined from a finite set of wagers, but it can be definitely be inferred to an arbitrary degree of precision by adding enough wagers.
Can you give an example of how you use a Dutch book argument on a non-gambling topic? For example, if I’m considering issues like whether to go swimming today, and what nickname to call my friend, and I don’t assign probabilities like “80% sure that calling her Kate is the best option”, how do I get Dutch Booked?
First you hypothetically ask what would happen if you were asked to make bets on whether calling her Kate would result in world X (with utility U(X)). Do this for all choices and all possible worlds. This gives you probabilities and utilities. You then take a weighted average, as per the VNM theorem.
How do I get Dutch Booked for not doing that?
And I’m still curious how the utilities are decided. By whim?
You don’t get to decide utilities so much as you have to figure out what they are. You already have a utility function, and you do your best to describe it . How do you weight the things you value relative to each other?
This takes observation, because what we think we value often turns out not to be a good description of our feelings and behavior.
From our genes? And the goal is just to figure out what it is, but not change it for the better?
Can you explain how you would change your fundamental moral values for the better?
By criticizing them. And conjecturing improvements which meet the challenges of the criticism. It is the same method as for improving all other knowledge.
In outline it is pretty simple. You may wonder things like what would be a good moral criticism. To that I would say: there’s many books full of examples, why dismiss all that? There is no one true way of arguing. Normal arguments are ok, I do not reject them all out of hand but try to meet their challenges. Even the ones with some kind of mistake (most of them), you can often find some substantive point which can be rescued. It’s important to engage with the best versions of theories you can think of.
BTW once upon a time I was vaguely socialist. Now I’m a (classical) liberal. People do change their fundamental moral values for the better in real life. I attended a speech by a former Muslim terrorist who is now a pro-Western Christian (walid shoebat).
I’ve changed my social values plenty of times, because I decided different policies better served my terminal values. If you wanted to convince me to support looser gun control, for instance, I would be amenable to that because my position on gun control is simply an avenue for satisfying my core values, which might better be satisfied in a different way.
If you tried to convince me to support increased human suffering as an end goal, I would not be amenable to that, unless it turns out I have some value I regard as even more important that would be served by it.
This is what Popper called the Myth of the Framework and refuted in his essay by that name. It’s just not true that everyone is totally set in their ways and extremely closed minded, as you suggest. People with different frameworks learn from each other.
One example is children learn. They are not born sharing their parents framework.
You probably think that frameworks are genetic, so they are. Dealing with that would take a lengthy discussion. Are you interested in this stuff? Would you read a book about it? Do you want to take it seriously?
I’m somewhat skeptical b/c e.g. you gave no reply to some of what I said.
I think a lot of the reason people don’t learn other frameworks, in practice, is merely that they choose not to. They think it sounds stupid (before they understand what it’s actually saying) and decide not to try.
When did I suggest that everyone is set in their ways and extremely closed minded? As I already pointed out, I’ve changed my own social values plenty of times. Our social frameworks are extremely plastic, because there are many possible ways to serve our terminal values.
I have responded to moral arguments with regards to more things than I could reasonably list here (economics, legal codes, etc.) I have done so because I was convinced that alternatives to my preexisting social framework better served my values.
Valuing strict gun control, to pick an example, is not genetically coded for. A person might have various inborn tendencies which will affect how they’re likely to feel about gun control; they might have innate predispositions towards authoritarianism or libertarianism, for instance, that will affect how they form their opinion. A person who valued freedom highly enough might support little or no gun control even if they were convinced that it would result in a greater loss of life. You would have a hard time finding anyone who valued freedom so much that they would support looser gun control if they were convinced it would destroy 90% of the world population, which gives you a bit of information about how they weight their preferences.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more that would be served by this. If you could argue that my preference for happiness is arbitrary, that preference for suffering is more natural, even if you could demonstrate that the moral goodness of human suffering is intrinsically inscribed on the fabric of the universe, why should I care? To make me want to make humans unhappy, you’d have to convince me there’s something else I want enough to make humans unhappy for its sake.
I also don’t feel I’m being properly understood here; I’m sorry if I’m not following up on everything, but I’m trying to focus on the things that I think meaningfully further the conversation, and I think some of your arguments are based on misapprehensions about where I’m coming from. You’ve already made it clear that you feel the same, but you can take it as assured that I’m both trying to understand you and make myself understood.
You suggested it about a category of ideas which you called “core values”.
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
That’s more or less correct. To quote one of Eliezer’s works of ridiculous fanfiction, “A moral system has room for only one absolute commandment; if two unbreakable rules collide, one has to give way.”
If circumstances force my various priorities into conflict, some must give way to others, and if I value one thing more than anything else, I must be willing to sacrifice anything else for it. That doesn’t necessarily make it my only terminal value; I might have major parts of my social framework which ultimately reduce to service to another value, and they’d have to bend if they ever came into conflict with a more heavily weighted value.
Well in the first half, you get Dutch booked in the usual way. It’s not necessarily actually happening, but there still must be probabilities that you would use if it were. In the second half, if you don’t follow the procedure (or an equivalent one) you violate at least one VNM axiom.
If you violate axiom 1, there are situations in which you don’t have a preferred choice—not as is “both are equally good/bad” but as in your decision process does not give an answer or gives more than one answer. I don’t think I’d call this a decision process.
If you violate axiom 2, there are outcomes L, M and N such that you’d want to switch from L to M and then from M to N, but you would not want to switch from L to N.
Axiom 3 is unimportant and is just there to simplify the math.
For axiom 4, imagine a situation where a statement with unknown truth-value, X, determines whether you get to choose between two outcomes, L and M, with L < M, or have no choice in accepting a third outcome, N. If you violate the axiom, there is a situation like this where, if you were asked for your choice before you know X (it will be ignored if X is false), you would pick L, even though L < M.
Do any of these situations describe your preferences?
I’ll let Desrtopa handle this.
Can you give a concrete example. What happens to me? Is it that I get an outcome which is less ideal than was available?
If your decision process is not equivalent to one that uses the previously described procedure, there are situations where something like one of the following will happen.
I ask you if you want chocolate or vanilla ice cream and you don’t decide. Not just you don’t care which one you get or you would prefer not to have ice cream, but you don’t output anything and see nothing wrong with that.
You prefer chocolate to vanilla ice cream, so you would willingly pay 1c to have the vanilla ice cream that you have been promised upgraded to chocolate. You also happen to prefer strawberry to chocolate, so you are willing to pay 1c to exchange a promise of a chocolate ice cream for a promise of a strawberry ice cream. Furthermore, it turn out you prefer vanilla to strawberry, so whenever you are offered a strawberry ice cream, you gladly pay a single cent to change that to an offer of vanilla, ad infinitum.
N/A
You like chocolate ice cream more than vanilla ice cream. Nobody knows if you’ll get ice cream today, but you are asked for your choice just in case, so you pick vanilla.
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
Let’s say they did learn better. How would they do this—changing their utility function? Someone with a utility function like this really does prefer B+1c to A, C+1c to B, and A+1c to C. Even if they did change their utility function, the new one would either have a new hole or it would obey the results of the VNM-theorem.
So Bayes teaches: do not disobey the laws of logic and math.
Still wondering where the assigning probabilities to truths of theories is.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far. And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
Well the Dutch books make it so you have to pick some probabilities. Actually getting the right prior is incomplete, though Solomonoff induction is most of the way there.
Where else are you hoping to go?
In principle, yes. There’s actually a computer program called AIXItl that does it. In practice I use approximations to it. It probably could be done to a very higher degree of certainty. There are a lot of issues and a lot of relevant data.
Can you give an example? Use the ice cream flavors. What probabilities do you have to pick to buy ice cream without being dutch booked?
Explanatory knowledge. Understanding the world. Philosophical knowledge. Moral knowledge. Non-scientific, non-emprical knowledge. Beyond prediction and observation.
How do you know if your approximations are OK to make or ruin things? How do you work out what kinds of approximations are and aren’t safe to make?
The way I would do that is by understanding the explanation of why something is supposed to work. In that way, I can evaluate proposed changes to see whether they mess up the main point or not.
Endo, I think you are making things more confusing by combining issues of Bayesianism with issues of utility. It might help to keep them more separate or to be clear when one is talking about one, the other, or some hybrid.
I use the term Bayesianism to include utility because (a) they are connected and (b) a philosophy of probabilities as abstract mathematical constructs with no applications doesn’t seem complete; it needs an explanation of why those specific objects are studied. How do you think that any of this caused or could cause confusion?
Well, it empirically seems to be causing confusion. See curi’s remarks about the ice cream example. Also, one doesn’t need Bayesianism to include utility and that isn’t standard (although it is true that they do go very well together).
Yes I see what you mean.
I think it goes a bit beyond this. Utility considerations motivate the choice of definitions. I acknowledge that they are distinct things, though.
The consequences could easily be thousands of lives or more in case of sufficiently important decisions.
So the argument is now not that that suboptimal issues don’t exist but that they aren’t a big deal? Are you aware that the primary reason that this involves small amounts of ice cream is for convenience of the example? There’s no reason these couldn’t happen with far more serious issues (such as what medicine to use).
I know. I thought it was strange that you said “ad infinitum” when it would not go on forever. And that you presented this as dire but made your example non-dire.
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
How can I notice? I imagine you will say that involves probabilities. But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
And what I definitely don’t see is probabilities that various theories are true (as opposed to probabilities about events which are ok).
I didn’t say that (I’m not endoself).
Yes, but the Bayesian avoids having this step. For any step you can construct a “criticism” that will duplicate what the Bayesian will do. This is connected to a number of issues, including the fact that what constitutes valid criticism in a Popperian framework is far from clear.
Ice cream is an analogy. It might not be a great one since it is connected to preferences (which sometimes gets confused with Bayesianism). The analogy isn’t a great one. It might make more sense to just go read Cox’s theorem and translate to yourself what the assumptions mean about an approach.
OK, my bad. So many people. I lose track.
Anything which is not itself criticized.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out? To help concretize the idea for me.
Well, I’m not sure, in that I’m not convinced that Dutch Booking really does occur much in real life other than in the obvious contexts. But there are a lot of contexts it does occur in. For example, a fair number of complicated stock maneuvers can be thought of essentially as attempts to dutch book other players in the stock market.
Koth already had an amusing response to that.
Someone here told me it does. Maybe you can go argue with him for me ;-)
I agree.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
In what way would the Popperian approach fail to fit the decision rule approach on page 19 of Bickel and Doksum?
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Do you think of arguments and explanations as types of evidence? If so, how does that work? If not then I wasn’t talking about evidence.
In Bayesian epistemology, most arguments and explanations are just applications of Bayes’ law as discussed at http://lesswrong.com/lw/o7/searching_for_bayesstructure/ . Of course, ‘taking evidence into account’ is the same as using it in Bayes’ law.
Can you give an example using a moral argument, or anything that would help illustrate how you take things that don’t look like they are Bayes’ law cases and apply it anyway?
The linked page says imperfectly efficient minds give off heat and that this is probabilistic (which is weird b/c the laws of physics govern it and they are not probabilistic but deterministic). Even if I accept this, I don’t quite see the relevance. Are you reductionists? I don’t think that the underlying physical processes tell us everything interesting about the epistemology.