No confirming evidence can prove T is true. You can see 5 white swans or 500 or 50 million. Still might be false. But if you see one black swan it is false. This is an asymmetry between confirmation and falsification when applied to universal theories. It does not hold for all theories.
That brings us to the second part of the Yudkowsky quote that you criticised:
Bayes’ Theorem shows that falsification is very strong evidence compared to confirmation, but falsification is still probabilistic in nature; it is not governed by fundamentally different rules from confirmation, as Popper argued.
The answer to this problem is: as implied by Hume, we certainly are not justified in reasoning from an instance to the truth of the corresponding law. But to this negative result a second result, equally negative, may be added: we are justified in reasoning from a counterinstance to the falsity of the corresponding universal law (that is, of any law of which it is a counterinstance). Or in other words, from a purely logical point of view, the acceptance of one counterinstance to ‘All swans are white’ implies the falsity of the law ‘All swans are white’ - that law, that is, whose counterinstance we accepted. Induction is logically invalid; but refutation or falsification is a logically valid way of arguing from a single counterinstance to—or, rather, against—the corresponding law.
Yet finding evidence against a theory is actually a probabilistic process—just like confirmation is. So, Yudkowsky is correct in saying that Popper was wrong about this. Popper really did believe and promote this kind of material.
Popper did not argue that that confirmation and falsification have fundamentally different rules. They both obey the rules of logic.
Confirmation cannot be any evidence for universal theories. None, probabilistic or otherwise. Popper explained this and did the math. If you disagree people provide the math that governs it and explain how it works.
As to the rest you’re asking how Popper deals with fallible evidence. If you would read his books you could find the answer. He does have an answer, not none, and it isn’t probabilistic.
Let me ask you: how do you deal with the regress I asked Manfred about?
It says “When we see evidence, hypotheses that assigned a higher likelihood to that evidence, gain probability at the expense of hypotheses that assigned a lower likelihood to the evidence.”
This does not work. There are infinitely many possible hypotheses which assign a 100% probability to any given piece of evidence. So we can’t get anywhere like this. The probability of each remains infinitesimal.
Actually, its possible to have infinitely many hypotheses, each assigned non-infinitesimal evidence. For example, I could assign probability 50% to the simplest hypothesis, 25% to the second simplest, 12.5% to the third simplest and so on down (I wouldn’t necessarily reccomend that exact assignment, its just the easiest example).
In general, all we need is a criterion of simplicity, such that there are only finitely many hypotheses simpler than any given hypothesis (Kolmogorov Complexity and Minimum Message Length both have this property) and an Occam’s razor style rule saying that simpler hypotheses get higher prior probabilities than more complex hypotheses. Solomonoff induction is a way of doing this.
It seems like people are presenting a moving target. First I was directed to one essay. In response to my criticism of a statement from that essay, you suggest that a different technique other than the one I quoted could work. Do you think I was right that the section of the essay I quoted doesn’t solve the problem?
I am aware that you can assign probabilities to infinite sets in the way you mention. This is beside the point. If you get the probabilities above infinitesimal by doing that it’s simply a different method than the one I was commenting on. The one I commented on, in which “hypotheses that assigned a higher likelihood to that evidence, gain probability” does not get them above infinitesimal or do anything very useful.
You’re missing the point, which is that we need to act—we need to use the information we have as best we can in order to achieve ‘the greatest good’. (The question of what ‘the greatest good’ means is non-trivial but it’s orthogonal to present concerns.)
The agent chooses an action, and then depending on the state of world, the effects of the action are ‘good’ or ‘bad’. Here, the expression “the state of the world” incorporates both contingent facts about ‘how things are’ and the ‘natural laws’ describing how present causes have future effects.
Now, one very broad strategy for answering the question “what should we do?” is to try to break it down as follows:
We assign ‘weights’ p[i] to a wide variety of different ‘states of the world’, to represent the incomplete (but real) information we have thus far acquired.
For each such state, we calculate the effects that each of our actions a[j] would have, and assign ‘weights’ u[i,j] to the outcomes to represent how desirable we think they are.
We choose the action a[j] such that Sum(over i) p[i] * u[i,j] is maximized.
As a matter of terminology, we refer to the weights in step 1 as “probabilities” and those in step 2 as “utilities”.
Here’s an important question: “To what extent is the above procedure inevitable if we are to make rational decisions?”
The standard Lesswrong ideology here is that the above procedure (supplemented by Bayes’ theorem for updating ‘probability weights’) is absolutely central to ‘rationality’ - that any rational decision-maker must be following it, whether explicitly or implicitly.
It’s important to understand that Lesswrong’s discussions of rationality take place in the context of ‘thinking about how to design an artificial intelligence’. One of the great virtues of the Bayesian approach is that it’s clear what it would mean to implement it, and we can actually put it into practice on a wide variety of problems.
Anyway, if you want to challenge Bayesianism then you need to show how it makes irrational choices. It’s not sufficient to present a philosophical view under which assigning probabilities to theories is itself irrational, because that’s just a means to an end. What matters is whether an agent makes clever or stupid decisions, not how it gets there.
And now something more specific:
The one I commented on, in which “hypotheses that assigned a higher likelihood to that evidence, gain probability” does not get them above infinitesimal or do anything very useful.
No-one but you ever assumed that the hypotheses would begin at infinitesimal probability. The idea that we need to “assign probabilities to infinite sets in the way [benelliot] mention[ed]” is so obvious and commonplace that you should assume it even if it’s not actually spelled out.
In your theory, do the probabilities of the infinitely many theories add up to 1?
Does increasing their probabilities ever change the ordering of theories which assigned the same probability to some evidence/event?
If all finite sets of evidence leave infinitely many theories unchanged in ordering, then would we basically be acting on the a priori conclusions built into our way of assigning the initial probabilities?
If we were, would that be rational, in your view?
And do you have anything to say about the regress problem?
It seems like people are presenting a moving target. First I was directed to one essay. In response to my criticism of a statement from that essay, you suggest that a different technique other than the one I quoted could work. Do you think I was right that the section of the essay I quoted doesn’t solve the problem?
The ‘moving target’ effect is caused by the fact that you are talking to several different people, the grandparent is my first comment in this discussion.
The concept mentioned in that essay is Bayes’ Theorem, which tells us how to update our probabilities on new evidence. It does not solve the problem of how to avoid infinitely many hypotheses for the same reason that Newton’s laws to not explain the price of gold in London, its not supposed to. Bayes theorem tells us how to change our probabilities with new evidence, and in the process assumes that those probabilities are real numbers (as opposed to infinitesimals).
Solomonoff induction tells us how to assign the initial probabilities, which are then fed into Bayes theorem to determine the current probabilities after adjusting based on the evidence. Both are essential, criticising BT for not doing SI’s job is like saying a car’s wheels are useless because they can’t do the engine’s job of providing power.
I don’t see any infinite regress at all, Solomonoff Induction tells us the prior, Bayesian Updating turns the prior into a posterior. They depend on each other to work properly but I don’t think they depend on anything else (unless you wish to doubt the basics of probability theory).
The regress was discussed in other comments here. I took you to be saying “everything together, works” and wanting to discuss the philosophy as a whole.
I thought that would be more productive than arguing with you about whether Bayes theorem really “assumes that those probabilities are real numbers” and various other details. That’s certainly not what other people here told me when I brought up infinitesimals. I also thought it would be more productive than going back to the text I quoted and explaining why that quote doesn’t make sense. Whether it is correct or not isn’t very important if a better idea, along the same lines, works.
The regress argument begins like this: What is the justification or probability for Solomonoff Induction and Bayesian updating? Or if they are not justified, and do not have a probability, then why should we accept them in the first place?
When you say they don’t depend on anything else, maybe you are answering the regress issue by saying they are unquestionable foundations. Is that it?
Well, to some extent every system must have unquestionable foundations, even maths must assume the axioms. The principle of induction (the more something has happened in the past, the more likely it is to happen in the future, all else being equal), cannot be justified without the justification being circular, but I doubt you could get through a single day without it. Ultimately every approach must fall back on an infinite regress as you put it, this doesn’t prevent that system from working.
However, both Bayes’ Theorem and Solomonoff Induction can be justified:
Bayes’ Theorem is an elementary deductive consequence of basic probability theory, particular the fairly obvious (at least it seems that way to me) that P(A&B) = P(A)*P(B|A). If it doesn’t seem obvious to you, then I know of at least two approaches for proving it. One is the Cox theorems, which begin by saying we want to rank statements by their plausibility, and we want certain things to be true this ranking (it must obey the laws of logic, it must treat hypotheses consistently etc), and from these derive probability theory.
Another approach is the Dutch Book arguments, which show that if you are making bets based on your probability estimates of certain things being true, then unless your probability estimates obey Bayes Theorem you can be tricked into a set of bets which result in a guaranteed loss.
To justify Solomonoff Induction, we imagine a theoretical predictor which bases its prior on Solomonoff Induction and updates by Bayes Theorem. Given any other predictor, we can compare our predictor to this opponent by comparing the probability estimates they assign to the actual outcome, then Solomonoff induction will at worst lose by a constant factor based on the complexity of the opponent.
This is the best that can be demanded of any prior, it is impossible to give perfect predictions in every possible universe, since you can always be beaten by a predictor taylor-made for that universe (which will generally perform very badly in most others).
(note: I am not an expert, it is possible that I have some details wrong, please correct me if I do)
“Well, to some extent every system must have unquestionable foundations”
No, Popper’s epistemology does not have unquestionable foundations.
You doubt I could get by without induction, but I can and do. Popper’s epistemology has no induction. It also has no regress.
Arguing that there is no choice but these imperfect concepts only works if there really is no choice. But there are alternatives.
I think that things like unquestionable foundations, or an infinite regress, are flaws. I think we should reject flawed things when we have better options. And I think Bayesian Epistemology has these flaws. Am I going wrong somewhere?
“However, both Bayes’ Theorem and Solomonoff Induction can be justified”
Justified by statements which are themselves justified (which leads to regress issues)? Or you mean justified given some unquestionable foundations? In your statements below, I don’t think you specify precisely what you deem to be able to justify things.
“Bayes’ Theorem is an elementary deductive consequence of basic probability theory”
Yes. It is not controversial itself. What I’m not convinced of is the claim that this basic bit of math solves any major epistemological problem.
Regarding Solomonoff induction, I think you are now attempting to justify it by argument. But you haven’t stated what are the rules for what counts as a good argument and why. Could you specify that? There’s not enough rigor here. And in my understanding Bayesian epistemology aims for rigor and that is one of the reasons they like math and try to use math in their epistemology. It seems to me you are departing from that worldview and its methods.
Another aspect of the situation is you have focussed on prediction. That is instrumentalist. Epistemologies should be able to deal with all categories of knowledge, not just predictive knowledge. For example they should be able to deal with creating non-emprical, non-predictive moral knowledge. Can Solomonoff induction do that? How?
Hang on, Popper’s philosophy doesn’t depend on any foundations? I’m going to call shenanigans on this. Earlier you gave and example of Popperian inference:
Consider a theory, T, that all swans are white. T is a universal theory.
No confirming evidence can prove T is true. You can see 5 white swans or 500 or 50 million. Still might be false.
But if you see one black swan it is false.
This is an asymmetry between confirmation and falsification when applied to universal theories. It does not hold for all theories.
Consider the negation ~T. At least one swan is not white. This theory cannot be refuted by any amount of observations. But it can be confirmed with only one observation. ~T is a non-universal theory and not the kind science is after.
Unquestioned assumptions include, but are not limited to the following:
The objects under discussion actually exist (Solomonoff Induction does not make this assumption)
“There is no evidence which could prove T” is stated without any proof, what if you got all the swans in one place, what if you found a reason why the existence of a black swan was impossibile?
Any observation of a black swan must be correct (Bayes Theorem is explicitly designed to avoid this assumption)
You can generalise from this one example to a point about all theories
“Science is only interested in universal theories”. Really? Are palaeontology and astronomy not sciences? They are both often concerned with specifics.
You must always begin with assumptions, if nothing else you must assume maths (which is pretty much the only thing that Bayes Theorem and Solomonoff Induction do assume).
I think that things like unquestionable foundations, or an infinite regress, are flaws. I think we should reject flawed things when we have better options. And I think Bayesian Epistemology has these flaws. Am I going wrong somewhere?
To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.
What I’m not convinced of is the claim that this basic bit of math solves any major epistemological problem.
Stating that you believe Bayes’ theorem but doubt that it can actually solve epistemic problems is like saying you believe Pythagoras’ theorem but doubt it can actually tell you the side lengths of right angled triangles, it demonstrates a failure to internalise.
Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.
Regarding Solomonoff induction, I think you are now attempting to justify it by argument. But you haven’t stated what are the rules for what counts as a good argument and why. Could you specify that? There’s not enough rigor here. And in my understanding Bayesian epistemology aims for rigor and that is one of the reasons they like math and try to use math in their epistemology. It seems to me you are departing from that worldview and its methods.
Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. Any argument will be convincing to some possible minds and not to others, and I’m okay with that, because I only have one mind.
Epistemologies should be able to deal with all categories of knowledge, not just predictive knowledge. For example they should be able to deal with creating non-emprical, non-predictive moral knowledge. Can Solomonoff induction do that? How?
Allow me to direct you to my all time favourite philosopher
Popper’s philosophy itself is not a deductive argument which depends on the truth of its premises and which, given their truth, is logically indisputable.
We’re well aware of issues like the fallibility of evidence (you may think you see a black swan, but didn’t). Those do not contradict this logical point about a particular asymmetry.
“You must always begin with assumptions”
No you don’t have to. Popper’s approach begins with conjectures. None of them are assumed, they are simply conjectured.
Here’s an example. You claim this is an assumption:
“You can generalise from this one example to a point about all theories”
In a Popperian approach, that is not assumed. It is conjectured. It is then open to critical debate. Do you see something wrong with it? Do you have an argument against it? Conjectures can be refuted by criticism.
BTW Popper wasn’t “generalizing”. He was making a point about all theories (in particular categories) in the first place and then illustrating it second. “Generalizing” is a vague and problematic concept.
“Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. ”
He argues, but without setting up predefined, static rules for argument. The rules for argument are conjectured, criticized, modified. They are a work in progress.
Regarding the Hume quote, are you saying you’re a positivist or similar?
“Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.”
Only probabilistic beliefs. I think it is only appropriate to use when you have actual numbers instead of simply having to assign them to everything involved by estimating.
“To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.”
Mistakes have real world consequences. I think Popper’s epistemology works better in the real world. Everyone thinks their epistemology is more practical. How can we decide? By looking at whether they make sense, whether they are refuted by criticism, etc… If you have a practical criticism of Popperian epistemology you’re welcome to state it.
I think Popper’s epistemology works better in the real world. How can we decide? By looking at whether they make sense, whether they are refuted by criticism, etc...
How does this translate into illustrating whether either epistemology has “real world consequences”? Criticism and “sense making” are widespread, varied, and not always valuable.
I think what would be most helpful if you set up a hypothetical example and then proceeded to show how Popperian espistemology would lead to a success while a Bayesian approach would lead to a “real world consequence.” I think your question, “How can we decide?” was perfect, but I think your answer was incorrect.
Example: we want to know if liberalism or socialism is correct.
Popperian approach: consider what problem the ideas in question are intended to solve and whether they solve it. They should explain how they solve the problem; if they don’t, reject them. Criticize them. If a flaw is discovered, reject them. Conjecture new theories also to solve the problem. Criticize those too. Theories similar to rejected theories may be conjectured; and it’s important to do that if you think you see a way to not have the same flaw as before. Some more specific statements follow:
Liberalism offers us explanations such as: voluntary trade is mutually beneficial to everyone involved, and harms no one, so it should not be restricted. And: freedom is compatible with a society that makes progress because as people have new ideas they can try them out without the law having to be changed first. And: tolerance of people with different ideas is important because everyone with an improvement on existing customs will at first have a different idea which is unpopular.
Socialism offers explanations like, “People should get what they need, and give what they are able to” and “Central planning is more efficient than the chaos of free trade.”
Socialim’s explanations have been refuted by criticisms like Mises’s 1920 paper which explained that central planners have no rational way to plan (in short: because you need prices to do economic calculation). And “need” has been criticized, e.g. how do you determine what is a need? And the concept of what people are “able to give” is also problematic. Of course the full debate on this is very long.
Many criticisms of liberalism have been offered. Some were correct. Older theories of liberalism were rejected and new versions formulated. If we consider the best modern version, then there are currently no outstanding criticisms of it. It is not refuted, and it has no rivals with the same status. So we should (until this situation changes) accept and use liberalism.
New socialist ideas were also created many times in response to criticism. However, no one has been able to come up with coherent ideas which address all the criticisms and still reach the same conclusions (or anything even close).
Liberalism’s “justification” is merely this: it is the only theory we do not currently have a criticism of. A criticism is an explanation of what we think is a flaw or mistake. It’s a better idea to use a theory we don’t see anything wrong with than one we do. Or in other words: we should act on our best (fallible) knowledge that we have so far. In this way, the Popperian approach doesn’t really justify anything in the normal sense, and does without foundations.
Bayesian approach: Assign them probabilities (how?), try to find relevant evidence to update the probabilities (this depends on more assumptions), ignore that whenever you increase the probability of liberalism (say) you should also be increasing the probability of infinitely many other theories which made the same empirical assertions. Halt when—I don’t know. Make sure the evidence you update with doesn’t have any bias by—I don’t know, it sure can’t be a random sample of all possible evidence.
No doubt my Bayesian approach was unfair. Please correct it and add more specific details (e.g. what prior probability does liberalism have, what is some evidence to let us update that, what is the new probability, etc...)
PS is it just me or is it difficult to navigate long discussions and to find new nested posts? And I wasn’t able to find a way to get email notification of replies.
I’m beginning to see where the problem in this debate is coming from.
Bayesian humans don’t always assign actual probabilities, I almost never do. What we do in practice is vaguely similar to your Popperian approach.
The main difference is that we do thought experiments about Ideal Bayesians, strange beings with the power of logical omniscience (which gets them round the problem of Solomonoff Induction being uncomputable), and we see which types of reasoning might be convincing to them, and use this a standard for which types are legitimate.
Even this might in practice be questioned, if someone showed be a thought experiment in which an ideal Bayesian systematically arrived at worse beliefs than some competitor I might stop being a Bayesian. I can’t tell you what I would use as a standard in this case, since if I could predict that theory X would turn out to be better than Bayesianism I would already be an X theorist.
Popperian reasoning, on the other hand, appears to use human intuition as its standard. The conjectures he makes ultimately come from his own head, and inevitably they will be things that seem intuitively plausible to him. It is also his own head which does the job of evaluating which criticisms are plausible. He may bootstrap himself up into something that looks more rigorous, but ultimately if his intuitions are wrong he’s unlikely to recover from it. The intuitions may not be unquestioned but they might as well be for all the chance he has of getting away from their flaws.
Cognitive science tells us that our intuitions are often wrong. In extreme cases they contradict logic itself, in ways that we rarely notice. Thus they need to be improved upon, but to improve upon them we need a standard to judge them by, something where we can say “I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”. A good example of this is conjunction bias, conjunctions are often more plausible than disjunctions to human intuition, but they are always less likely to be correct, and we know this through probability theory.
So here’s how a human Bayesian might look, this approach only reflects the level of Bayesian strength I currently have, and can definitely be improved upon.
We wouldn’t think in terms of Liberalism and Socialism, both of them are package deals containing many different epistemic beliefs and prescriptive advice. Conjunction bias might fool you into thinking that one of them is probably right, but in fact both are astonishingly unlikely.
We hold off on proposing solutions (scientifically proven to lead to better solutions) and instead just discuss the problem. We clearly frame exactly what our values are in this situation, possibly in the form of a precisely delineated utility function and possibly not, so we know what we are trying to achieve.
We attempt to get our facts straight. Each fact is individually analysed, to see whether we have enough evidence to overcome its complexity. This process continues permanently, every statement is evaluated.
We then suggest policies which seem likely to satisfy our values, and calculate which one is likely to do so best.
I’m not sure there’s actually a difference between the two approaches, ultimately only arrived at Bayesian through my intuitions as well, so there is no difference at the foundations. Bayesianism is just Popperianism done better.
PS there is a little picture of an envelope next to your name and karma score in the right hand corner. It turns red when one of your comments has a reply. Click on it too see the most recent replies to your comments.
Popperian reasoning, on the other hand, appears to use human intuition as its standard.
No. It does not have a fixed standard. Fixed standards are part of the justificationist attitude which is a mistake which leads to problems such as regress. Justification isn’t possible and the idea of seeking it must be dropped.
Instead, the standard should use our current knowledge (the starting point isn’t very important) and then change as people find mistakes in it (no matter what standard we use for now, we should expect it to have many mistakes to improve later).
The conjectures he makes ultimately come from his own head, and inevitably they will be things that seem intuitively plausible to him.
Popperian epistemology has no standard for conjectures. The flexible, tentative standard is for criticism, not conjecture.
The “work”—the sorting of good ideas from bad—is all done by criticism and not by rules for how to create ideas in the first place.
You imply that people are parochial and biased and thus stuck. First, note the problems you bring up here are for all epistemologies to deal with. Having a standard you tell everyone to follow does not solve them. Second, people can explain their methods of criticism and theory evaluation to other people and get feedback. We aren’t alone in this. Third, some ways (e.g. having less bias) as a matter of fact work better than others, so people can get feedback from reality when they are doing it right, plus it makes their life better (incentive). More could be said. Tell me if you think it needs more (why?).
“I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”
I think by “know” here you are referring to the justified, true belief theory of knowledge. And you are expecting that the authority or certainty of objective knowledge will defeat bias. This is a mistake. Like it or not, we cannot ever have knowledge of that type (e.g. b/c justification attempts lead to regress). What we can have is fallible, conjectural knowledge. This isn’t bad; it works fine; it doesn’t not devolve into everyone believing their bias.
Liberalism is not a package by accident. It is a collection of ideas around one theme. They are all related and fit together. They are less good in isolation—e.g. if you take away one idea you’ll find that now one of the other ideas has an unsolved and unaddressed problem. It is sometimes interesting to consider the ideas individually but to a significant extent they all are correct or incorrect as a group.
The way I’m seeing it is that most of the time you (and everyone else) does something roughly similar to what Popper said to. This isn’t a surprise b/c most people do learn stuff and that is the only method possible of creating any knowledge. But when you start using Bayesian philosophy more directly, by e.g. explicitly assigning and updating probabilities to try to settle non-probabilistic issues (like moral issues), then you start making mistakes. You say you don’t do that very often. OK. But there’s other more subtle ones. One is what Popper called The Myth of the Framework where you suggest that people with different frameworks (initial biases) will both be stuck on thinking that what seems right to them (now) is correct and won’t change. And you suggest the way past this is, basically, authoritative declarations where you put someone’s biases against Truth so he has no choice but to recant. This is a mistake!
You say you don’t do that very often. OK. But there’s other more subtle ones. One is what Popper called The Myth of the Framework where you suggest that people with different frameworks (initial biases) will both be stuck on thinking that what seems right to them (now) is correct and won’t change.
To some extent our thought processes can certainly improve, however there is no guarantee of this, let me give an example to illustrate:
Alice is an inductive thinker, in general she believes that is something has happened often in the past it is more likely to happen in the future. She does treat this as an absolute, it is only probabilistic, and it does not work in certain specific situations (such as pulling beads out of a jar with 5 red and 5 blue beads), but she used induction to discover which situations those were. She is not particular worried that induction might be wrong, after all, it has almost always worked in the past.
Bob is an anti-inductive thinker, he believes that the more often something happens, the less likely it is to happen in the future. To him, the universe is like a giant bag of beads, and the more something happens the more depleted the universe’s supply of it becomes. He also concedes that anti-induction is merely probabilistic, and there are certain situations (the bag of beads example) were it has already worked a few times so he doesn’t think its very likely to work now. He isn’t particularly worried that he might be wrong, anti-induction has almost never worked for him before, so he must be set up for the winning streak of a lifetime.
Ultimately, neither will ever be convinced of the other’s viewpoint. If Alice conjectures anti-induction then she will immediately have a knock-down criticism, and vice versa for Bob and Induction. One of them has an irreversibly flawed starting point.
Like it or not, you, me, Popper and every other human is an Alice. If you don’t believe me, just ask which of the following criticisms seems more logically appealing to you:
“Socialism has never worked in the past, every socialist state has turned into a nightmarish tyranny, so this country shouldn’t become socialist”
“Liberalism has usually worked in the past, most liberal democracies are wealthy and have the highest standards of living in human history, so this country shouldn’t become liberal”
Liberalism is not a package by accident. It is a collection of ideas around one theme. They are all related and fit together. They are less good in isolation—e.g. if you take away one idea you’ll find that now one of the other ideas has an unsolved and unaddressed problem. It is sometimes interesting to consider the ideas individually but to a significant extent they all are correct or incorrect as a group.
This might be correct, but there is a heavy burden of proof to show it. Liberalism and Socialism are two philosophies out of thousands (maybe millions) of possibilities. This means that you need huge amounts evidence to distinguish the two of them from the crowd and comparatively little evidence to distinguish one from the other.
Popperian epistemology has no standard for conjectures. The flexible, tentative standard is for criticism, not conjecture.
That is a recipe for disaster. There are too many possible conjectures, we cannot consider them all, we need some way to prioritise some over others. If you do not specify a way then people will just do so according to personal preference.
As I see it, Popperian reasoning is pretty much the way humans reason naturally, and you only have to look at any modern political debate to see why that’s a problem.
To some extent our thought processes can certainly improve, however there is no guarantee of this
Yes, there is no guarantee. One doesn’t need a guarantee for something to happen. And one can’t have guarantees about anything, ever. So the request for guarantees is itself a mistake.
Ultimately, neither will ever be convinced of the other’s viewpoint. If Alice conjectures anti-induction then she will immediately have a knock-down criticism, and vice versa for Bob and Induction. One of them has an irreversibly flawed starting point.
The sketches you give of Bob and Alice are not like real people. They are simplified and superficial, and people like that could not function in day to day life. The situation with normal people is different. No everyday people have irreversibly flawed starting points.
The argument for this is not short and simple, but I can give it. First I’d like to get clear what it means, and why we would be discussing it. Would you agree that if my statement here is correct then Popper is substantially right about epistemology? Would you concede? If not, what would you make of it?
Like it or not, you, me, Popper and every other human is an Alice.
That is a misconception. One of its prominent advocates was Hume. We do not dispute things like this out of ignorance, out of never hearing it before. One of the many problems with it is that people can’t be like Alice because there is no method of induction—it is a myth that one could possibly do induction because induction doesn’t describe a procedure a person could do. Induction has no set of instructions to follow to offer.
That may sound strange to you. You may think it offers a procedure like:
1) gather data
2) generalize/extrapolate (induce) a conclusion from the data
3) the conclusion is probably right, with some exceptions
The problem is step 2 which does not how how to extrapolate a conclusion from a set of data. There are infinitely many conclusions consistent with any finite data set. So the entire procedure rests on having a method of choosing between them. All proposals made for this either don’t work or are vague. The one I would guess you favor is Occam’s Razor—pick the simplest one. This is both vague (what are the precise guidelines for deciding what is simpler) and wrong (under many interpretations. for example because it might reject all explanatory theories b/c omitting the explanation is simpler).
Another issue is how one thinks about things he has no past experience about. Induction does not answer that. Yet people do it.
which of the following criticisms seems more logically appealing to you
I think they are both terrible arguments and they aren’t how I think about the issue.
This might be correct, but there is a heavy burden of proof to show it.
The “burden of proof” concept is a justificationist mistake. Ideas cannot be proven (which violates fallibility) and they can’t be positively shown to be true. You are judging Popperian ideas by standards which Popper rejected which is a mistake.
That is a recipe for disaster.
But it works in practice. The reason it doesn’t turn into a disaster is people want to find the truth. They aren’t stopped from making a mess of things by authoritative rules but by their own choices because they have some understanding of what will and won’t work.
The authority based approach is a mistake in many ways. For example, authorities can themselves be mistaken and could impose disasters on people. And people don’t always listen to authority. We don’t need to try to force people to follow some authoritative theory to make them think properly, they need to understand the issues and do it voluntarily.
Personal preferences aren’t evil, and imposing what you deem the best preference as a replacement is an anti-liberal mistake.
As I see it, Popperian reasoning is pretty much the way humans reason naturally
No. Since Aristotle, justificationism has dominated philosophy and governs the unconscious assumptions people make in debates. They do not think like Popperians or understand Popper’s philosophy (except to the extent that some of their mental processes are capable of creating knowledge, and those have to be in line with the truth of the matter about what does create knowledge).
The argument for this is not short and simple, but I can give it. First I’d like to get clear what it means, and why we would be discussing it. Would you agree that if my statement here is correct then Popper is substantially right about epistemology? Would you concede? If not, what would you make of it?
Since I’m not familiar with the whole of Popper’s position I’m noting going to accept it blindly. I’m also not even certain that he’s incompatible with Bayesianism.
Anyway, that fact that no human has a starting point as badly flawed as anti-induction doesn’t make Bayesianism invalid. It may well be that we are just very badly flawed, and can only get out of those flaws by taking the mathematically best approach to truth. This is Bayesianism, it has been proven in more than one way.
The problem is step 2 which does not how how to extrapolate a conclusion from a set of data. There are infinitely many conclusions consistent with any finite data set.
This is exactly we we need induction. It is usually possible to stick any future onto any past and get a consistent history, induction tells us that if we want a probable history we need to make the future and the past resemble each other.
The reason it doesn’t turn into a disaster is people want to find the truth.
People certainly say that. Most of them even believe it on a conscious level, but there in your average discussion there is a huge amount of other stuff going on, from signalling tribal loyalty to rationalising away unpleasant conclusions. You will not wander down the correct path by chance, you must use a map and navigate.
The authority based approach is a mistake in many ways. For example, authorities can themselves be mistaken and could impose disasters on people. And people don’t always listen to authority. We don’t need to try to force people to follow some authoritative theory to make them think properly, they need to understand the issues and do it voluntarily.
Personal preferences aren’t evil, and imposing what you deem the best preference as a replacement is an anti-liberal mistake.
I have no further interest in talking with you if you resort to straw men like this. I am not proposing we set up a dictatorship and kill all non-Bayesians, nor am I advocating censorship of views opposed the correct Bayesian conclusion.
All I am saying is your mind was not designed to do philosophical reasoning. It was designed to chase antelope across the savannah, lob a spear in them, drag them back home to the tribe, and come up with an eloquent explanation for why you deserve a bigger share of the meat (this last bit got the lion’s share of the processing power).
Your brain is not well suited to abstract reasoning, it is a fortunate coincidence that you are capable of it at all. Hopefully, you are lucky enough to have a starting point which is not irreversibly flawed, and you may be able to self improve, but this should be in the direction of realising that you run on corrupt hardware, distrusting your own thoughts, and forcing them to follow rigorous rules. Which rules? The ones that have been mathematically proven to be the best seem like a good starting point.
(The above is not intended as a personal attack, it is equally true of everyone)
Anyway, that fact that no human has a starting point as badly flawed as anti-induction doesn’t make Bayesianism invalid.
I did not say it makes Bayesianism invalid. I said it doesn’t make Popperism invalid or require epistemological pessimism. You were making myth of the framework arguments against Popper’s view. My comments on those were not intended to refute Bayesianism itself.
This is exactly we we need induction. It is usually possible to stick any future onto any past and get a consistent history, induction tells us that if we want a probable history we need to make the future and the past resemble each other.
That is a mistake and Popper’s approach is superior.
Part 1: It is a mistake because the future does not resemble the past except in some vacuous senses. Why? Because stuff changes. For example an object in motion moves to a different place in the future. And human societies invent new technologies.
It is always the case that some things resemble the past and some don’t. And the guideline that “the future resembles the past” gives no guidance whatsoever in figuring out which are which.
Popper’s approach is to improve our knowledge piecemeal by criticizing mistakes. The primary criticisms of this approach are that is it is incapable of offering guarantees, authority, justification, a way to force people to go against their biases, etc.. These criticisms are mistaken: no viable theory offers what they want. Setting aside those objections—that Popper doesn’t meet standard too high for anything to meet—it works and is how we make progress.
Regarding people wanting to find the truth, indeed they don’t always. Sometimes they don’t learn. Telling them they should be Bayesians won’t change that either. What can change it is sorting out the mess of their psychology enough to figure out some advice they can use. BTW the basic problem you refer to is static memes, the theory of which David Deutsch explains in his new (Popperian) book The Beginning of Infinity.
I have no further interest in talking with you if you resort to straw men like this.
Please calm down. I am trying my best to explain clearly. If I think that some of your ideas have nasty consequences that doesn’t mean I’m trying to insult you. It could be the case that some of your ideas actually do have nasty consequences of which you are unaware, and that by pointing out some of the ways your ideas relate to some ideas you consciously deem bad, you may learn better.
All justificationist epistemologies have connections to authority, and authority has nasty connections to politics. You hold a justificationist epistemology. When it comes down to it, justification generally consists of authority. And no amount of carefully deciding what is the right thing to set up as that authority changes that.
This connect to one of Popper’s political insights which is that most political theories focus on the problem “Who should rule?” (or: what policies should rule?). This question is a mistake which begs for an authoritarian answer. The right question is a fallibilist one: how can we set up political institutions that help us find and fix errors?
Getting back to epistemology, when you ask questions like, “What is the correct criterion for induction to use in step 2 to differentiate between the infinity of theories?” that is a bad question which begs for an authoritarian answer.
All I am saying is your mind was not designed to do philosophical reasoning
My mind is a universal knowledge creator. What design could be better? I agree with you that it wasn’t designed for this in the sense that evolution doesn’t have intentions, but I don’t regard that as relevant.
Evolutionary psychology contains mistakes. I think discussion of universality is a way to skip past most of them (when universality is accepted, they become pretty irrelevant).
Your brain is not well suited to abstract reasoning, it is a fortunate coincidence that you are capable of it at all.
I’d urge you to read The Beginning of Infinity by David Deutsch which refutes this. I can give the arguments but I think reading it would be more efficient and we have enough topics going already.
forcing them to follow rigorous rules.
See! I told you the authoritarian attitude was there!
And there is no mathematical proof of Bayesian epistemology. Bayes’ theorem itself is a bit of math/logic which everyone accepts (including Popper of course). But Bayesian epistemology is an application of it to certain philosophical questions, which leaves the domain of math/logic, and there is no proof that application is correct.
Part 1: It is a mistake because the future does not resemble the past except in some vacuous senses. Why? Because stuff changes. For example an object in motion moves to a different place in the future. And human societies invent new technologies.
The object in motion moves according to the same laws in both the future and the past, in this sense the future resembles the past. You are right that the future does not resemble the past in all ways, but the ways in which it does themselves remain constant over time. Induction doesn’t apply in all cases but we can use induction to determine which cases it applies in and which it doesn’t. If this looks circular that’s because it is, but it works.
Popper’s approach is to improve our knowledge piecemeal by criticizing mistakes. The primary criticisms of this approach are that is it is incapable of offering guarantees, authority, justification, a way to force people to go against their biases, etc.. These criticisms are mistaken: no viable theory offers what they want. Setting aside those objections—that Popper doesn’t meet standard too high for anything to meet—it works and is how we make progress.
As far as Bayesianism is concerned this is a straw man. Most Bayesians don’t offer any guarantees in the sense of absolute certainty at all.
All justificationist epistemologies have connections to authority, and authority has nasty connections to politics. You hold a justificationist epistemology. When it comes down to it, justification generally consists of authority. And no amount of carefully deciding what is the right thing to set up as that authority changes that.
No Bayesian has ever proposed setting up some kind of Bayesian dictatorship. As far as I can tell the only governmental proposal based on Bayesianism thus far is Hanson’s futarchy, which could hardly be further from Authoritarianism.
forcing them to follow rigorous rules.
See! I told you the authoritarian attitude was there!
You misunderstand me. What I meant was that as a Bayesian I force my own thoughts to follow certain rules. I don’t force other people to do so. You are arguing from a superficial resemblance. Maths follows rigorous, unbreakable rules, does this mean that all mathematicians are evil fascists?
And there is no mathematical proof of Bayesian epistemology. Bayes’ theorem itself is a bit of math/logic which everyone accepts (including Popper of course). But Bayesian epistemology is an application of it to certain philosophical questions, which leaves the domain of math/logic, and there is no proof that application is correct.
Incorrect. E.T. Jaynes book Probability Theory: The Logic of Science gives a proof in the first two chapters.
My mind is a universal knowledge creator. What design could be better? I agree with you that it wasn’t designed for this in the sense that evolution doesn’t have intentions, but I don’t regard that as relevant.
Evolutionary psychology contains mistakes. I think discussion of universality is a way to skip past most of them (when universality is accepted, they become pretty irrelevant).
You obviously haven’t read much of the heuristics and biases program. I can’t describe it all very quickly here but I’ll just give you a taster.
Subjects asked to rank statements about a woman called Jill in order of probability of being true ranked “Jill is a feminist and a bank teller” as more probable than “Jill is a bank teller” despite this being logically impossible.
U.N. diplomats, when asked to guess the probabilities of various international events occurring in the nest year gave a higher probability to “USSR invades Poland causing complete cessation of diplomatic activities between USA and USSR” than they did to “Complete cessation of diplomatic activities between USA and USSR”.
Subjects who are given a handful of evidence and arguments for both sides of some issue, and asked to weigh them up, will inevitably conclude that the weight of the evidence given is in favour of their side. Different subjects will interpret the same evidence to mean precisely opposite things.
Employers can have their decision about whether to hire someone changed by whether they held a warm coffee or a cold coke in the elevator prior to the meeting.
People can have their opinion on an issue like nuclear power changed by a single image of a smiley or frowny face, flashed to briefly for conscious attention.
People’s estimates of the number of countries in Africa can be changed simply by telling them a random number beforehand, even if it is explicitly stated that this number has nothing to do with the question.
Students asked to estimate a day by which they are 99% confident their project will be finished, go past this day more than half the time.
People are more like to move to a town if the town’s name and their name begin with the same letter.
There’s a lot more, most of which can’t easily be explained in bullet form. Suffice to say these are not irrelevant to thinking, they are disastrous. It takes constant effort to keep them back, because they are so insidious you will not notice when they are influencing you.
And there is no mathematical proof of Bayesian epistemology. Bayes’ theorem itself is a bit of math/logic which everyone accepts (including Popper of course). But Bayesian epistemology is an application of it to certain philosophical questions, which leaves the domain of math/logic, and there is no proof that application is correct.
Incorrect. E.T. Jaynes book Probability Theory: The Logic of Science gives a proof in the first two chapters.
You obviously haven’t read much of the heuristics and biases program.
Would you agree that this is a bit condescending and you’re basically assuming in advance that you know more than me?
I actually have read about it and disagree with it on purpose, not out of ignorance.
Does that interest you?
And on the other hand, do you know anything about universality? You made no comment about that. Given that I said the universality issue trumps the details you discuss in your bullet points, and you didn’t dispute that, I’m not quite sure why you are providing these details, other than perhaps a simple assumption that I had no idea what I was talking about and that my position can be ignored without reply because, once my deep ignorance is addressed, I’ll forget all about this Popperian nonsense..
Incorrect. E.T. Jaynes book Probability Theory: The Logic of Science gives a proof in the first two chapters.
Ordered but there’s an error in the library system and I’m not sure if it will actually come or not. I don’t suppose the proof is online anywhere (I can access major article databases), or that you could give it or an outline? BTW I wonder why the proof takes 2 chapters. Proofs are normally fairly short things. And, well, even if it was 100 pages of straight math I don’t see why you’d break it into separate chapters.
You misunderstand me. What I meant was that as a Bayesian I force my own thoughts to follow certain rules. I don’t force other people to do so.
No I understood that. And that is authoritarian in regard to your own thoughts. It’s still a bad attitude even if you don’t do it to other people. When you force your thoughts to follow certain rules all the epistemological problems with authority and force will plague you (do you know what those are?).
Regarding Popper, you say you don’t agree with the common criticisms of him. OK. Great. So, what are your criticisms? You didn’t say.
If this looks circular that’s because it is, but it works.
If there was an epistemology that didn’t endorse circular arguments, would you prefer it over yours which does?
Would you agree that this is a bit condescending and you’re basically assuming in advance that you know more than me?
I actually have read about it and disagree with it on purpose, not out of ignorance.
I apologise for this, but I really don’t see how anyone could go through those studies without losing all faith in human intuition.
I don’t suppose the proof is online anywhere (I can access major article databases), or that you could give it or an outline?
The text can be found online. My browser (Chrome) wouldn’t open the files but you may have more luck.
BTW I wonder why the proof takes 2 chapters. Proofs are normally fairly short things. And, well, even if it was 100 pages of straight math I don’t see why you’d break it into separate chapters.
Part of the reason for length is that probability theory has a number of axioms and he has to prove them all. The reason for the two chapter split is that the first chapter is about explaining what he wants to do, why he wants to do it, and laying out his desiderata. It also contains a few digressions in case the reader isn’t familiar with one or more of the prerequisites for understanding it (propositional logic for example). All of the actual maths is in the second chapter.
No I understood that. And that is authoritarian in regard to your own thoughts.
I agree to the explicit meaning of this statement but you are sneaking in connotations. Let us look more closely about what ‘authoritarian’ means.
You probably mean it in the sense of centralised as opposed to decentralized control, and in that sense I will bite the bullet and say that thinking should be authoritarian.
However, the word has a number of negative connotations. Corruption, lack of respect for human rights and massive bureaucracy that stifles innovation to name a few. None of those apply to my thinking process, so even though the term may be technically correct it is somewhat intellectually dishonest to use it, something more value-neutral like ‘centralized control’ might be better.
Regarding Popper, you say you don’t agree with the common criticisms of him. OK. Great. So, what are your criticisms? You didn’t say.
I will confess that I am not familiar with the whole of Popper’s viewpoint. I have never read anything written by him although after this conversation I am planning to.
Therefore I do not know whether or not I broadly agree or disagree with him. I did not come here to attack him, originally I was just responding to a criticism of yours that Bayesianism fails in a certain situation
To some extent I think the approach with conjectures and criticisms may be correct, at least as a description of how thinking must get off the ground. Can you be a Popperian and conjecture Bayesianism?
The point that I do disagree with is the proposed asymmetry between confirmation and falsification. In my view neither the black swan or the white swan proves anything with certainty, but both do provide some evidence. It happens in this case that one piece of evidence is very strong while the other is very weak, in fact they are pretty much at opposite extremes of the full spectrum of evidence encountered in the real world. This does not mean there is a difference of type.
If there was an epistemology that didn’t endorse circular arguments, would you prefer it over yours which does?
All else being equal, yes. Other factors, such as real-world results might take precedence. I also doubt that any philosophy could manage without either circularity or assumptions, explicit or otherwise. As I see it when you start thinking you need something to begin your inference, logic derives truths form other truths, it cannot manufacture them out of a vacuum. So any philosophy has two choices:
Either, pick a few axioms, call them self evident and derive everything from them. This seems to work fairly well in pure maths, but not anywhere else. I suspect the difference lies in whether the axioms really are self evident or not.
Or, start out with some procedures for thinking. All claims are judged by these, including proposals to change the procedures for thinking. Thus the procedures may self-modify and will hopefully improve. This seems better to me, as long as the starting point passes a certain threshold of accuracy any errors are likely to get removed (the phrase used here is the Lens that Sees its Flaws). It is ultimately circular, since whatever the current procedures are they are justified only by themselves, but I can live with that.
Ideal Bayesians are of the former type, but they can afford to be as they are mathematically perfect beings who never make mistakes. Human Bayesians take the latter approach, which means in principle they might stop being Bayesians if they could see that for some reason it was wrong.
So I guess my answer is that if a position didn’t endorse circular arguments, I would be very worried that it is going down the unquestionable axioms route, even if it does not do so explicitly, so I would probably not prefer it.
Notice how it is only through the benefits of the second approach that I can even consider such a scenario.
I agree to the explicit meaning of this statement but you are sneaking in connotations. Let us look more closely about what ‘authoritarian’ means.
I’m not trying to argue by connotation. It’s hard to avoid connotations and I think the words I’m using are accurate.
You probably mean it in the sense of centralised as opposed to decentralized control, and in that sense I will bite the bullet and say that thinking should be authoritarian.
That’s not what I had in mind, but I do think that centralized control is a mistake.
I take fallibilism seriously: any idea may be wrong, and many are. Mistakes are common.
Consequently, it’s a bad idea to set something up to be in charge of your whole mind. It will have mistakes. And corrections to those mistakes which aren’t in charge will sometimes get disregarded.
However, the word has a number of negative connotations. Corruption, lack of respect for human rights and massive bureaucracy that stifles innovation to name a few. None of those apply to my thinking process, so even though the term may be technically correct it is somewhat intellectually dishonest to use it, something more value-neutral like ‘centralized control’ might be better.
Those 3 things are not what I had in mind. But I think the term is accurate. You yourself used the word “force”. Force is authoritarian. The reason for that is that the forcer is always claiming some kind of authority—I’m right, you’re wrong, and never mind further discussion, just obey.
You may find this statement strange. How can this concept apply to ideas within one mind? Doesn’t it only apply to disagreements between separate people?
But ideas are roughly autonomous portions of a mind (see: http://fallibleideas.com/ideas). They can conflict, they can force each other in the sense of one taking priority over another without the conflict being settled rationally.
Force is a fundamentally epistemological concept. Its political meanings are derivative. It is about non-truth-seeking ways of approaching disputes. It’s about not reaching agreement by one idea wins out anyway (by force).
Settling conflicts between the ideas in your mind by force is authoritarian. It is saying some ideas have authority/preference/priority/whatever, so they get their way. I reject this approach. If you don’t find a rational way to resolve a conflict between ideas, you should say you don’t know the answer, never pick a side b/c the ideas you deem the central controllers are on that side, and they have the authority to force other ideas to conform to them.
This is a big topic, and not so easy to explain. But it is important.
Force, in the sense of solving difficulties without argument, is not what I meant when I said I force my thoughts to follow certain rules. I don’t even see how that could work, my individual ideas do not argue with each-other, if they did I would speak to a psychiatrist.
I’m afraid you are going to have to explain in more detail.
They argue notionally. They are roughly autonomous, they have different substance/assertions/content, sometimes their content contradicts, and when you have two or more conflicting ideas you have to deal with that. You (sometimes) approach the conflict by what we might call an internal argument/debate. You think of arguments for all the sides (the substance/content of the conflict ideas), you try to think of a way to resolve the debate by figuring out the best answer, you criticize what you think may be mistakes in any of the ideas, you reject ideas you decide are mistaken, you assign probabilities to stuff and do math, perhaps, etc...
When things go well, you reach a conclusion you deem to be an improvement. It resolves the issue. Each of the ideas which is improved on notionally acknowledges this new idea is better, rather than still conflicting. For example, if one idea was to get pizza, and one was to get sushi, and both had the supporting idea that you can’t get both because it would cost too much, or take too long, or make you fat, then you could resolve the issue by figuring out how to do it quickly, cheaply and without getting fat (smaller portions). If you came up with a new idea that does all that, none of the previously conflicting ideas would have any criticism of it, no objection to it. The conflict is resolved.
Sometimes we don’t come up with a solution that resolves all the issues cleanly. This can be due to not trying, or because it’s hard, or whatever.
Then what?
Big topic, but what not to do is use force: arbitrarily decide which side wins (often based on some kind of authority or justification), and declare it the winner even though the substance of the other side is not addressed. Don’t force some of your ideas, which have substantive unaddressed points, to defer to the ideas you put in charge (granted authority).
Big topic, but what not to do is use force: arbitrarily decide which side wins (often based on some kind of authority or justification), and declare it the winner even though the substance of the other side is not addressed. Don’t force some of your ideas, which have substantive unaddressed points, to defer to the ideas you put in charge (granted authority).
I certainly don’t advocate deciding arbitrarily. The would fall into the fallacy of just making sh*t up which is the exact of everything Bayes stands for. However, I don’t have to be arbitrary, most of the ideas that run up against Bayes don’t have the same level of support. In general, I’ve found that a heuristic of “pick the idea that has a mathematical proof backing it up” seems to work fairly well.
There are also sometimes other clues, rationalisations tend to have a slightly different ‘feel’ to them if you introspect closely (in my experience at any rate), and when the ideas going up against Bayes seem to include a disproportionately high number of rationalisations, I start to notice a pattern.
I also disagree about ideas being autonomous. Ideas are entangled with each other in complex webs of mutual support and anti-support.
Did you read my link? Where did the argument about approximately autonomous ideas go wrong?
I did. To see what is wrong with it let me give an analogy. Cars have both engines and tyres. It is possible to replace the tyres without replacing the engine. Thus you will find many cars with very different tyres but identical engines, and many different engines but identical tyres. This does not mean that tyres are autonomous and would work fine without engines.
Well this changes the topic. But OK. How do you decide what has support? What is support and how does it differ from consistency?
Well, mathematical proofs are support, and they are not at all the same a consistency. In general however, if some random idea pops into my head, and I spot that it in fact it only occurred to me as a result of conjunction bias I am not going to say “well, it would be unfair of me to reject this just because it contradicts probability theory, so I must reject both it and probability theory until I can find a superior compromise position”. Frankly, that would be stupid.
@autonomous—you know we said “approximately autonomous” right? And that, for various purposes, tires are approximately autonomous, which means things like they can be replaced individually without touching the engine or knowing what type of engine it is. And a tire could be taken off one car and put on another.
No one was saying it’d function in isolation. Just like a person being autonomous doesn’t mean they would do well in isolation (e.g. in deep space). Just because people do need to be in appropriate environments to function doesn’t make “people are approximately autonomous” meaningless or false.
Well, mathematical proofs are support, and they are not at all the same a consistency.
First,l you have not answered my question. What is support? The general purpose definition. I want you to specify how it is determined if X supports Y, and also what that means (why should we care? what good is “support”?).
Second, let’s be more precise. If a person writes what he thinks to be a proof, what is supported? What he thinks is the conclusion of what he thinks is a proof, and nothing else? An infinite set of things which have wildly different properties? Something else?
No one was saying it’d function in isolation. Just like a person being autonomous doesn’t mean they would do well in isolation (e.g. in deep space). Just because people do need to be in appropriate environments to function doesn’t make “people are approximately autonomous” meaningless or false.
You argue from ideas being approximately autonomous to the fact that words like ‘authoritarian’ apply to them, and that the approximately debate, but this is not true in the car analogy. Is it ‘authoritarian’ that the brakes, accelerator and steering wheel have total control of the car, while the tyres and engine get no say, or is it just efficient?
I didn’t give a loose argument by analogy. You’re attacking a simplified straw man. I explained stuff at some length and you haven’t engaged here with all of what I said. e.g. your comments on “authoritarian” here do not mention or discuss anything I said about that. You also don’t mention force.
I don’t know the etiquette or format of this website well or how it works. When I have comments on the book, would it make sense to start a new thread or post somewhere/somehow?
Can you be a Popperian and conjecture Bayesianism?
You can conjecture Bayes’ theorem. You can also conjecture all the rest, however some things (such as induction, justificationism, foundationalism) contradict Popper’s epistemology. So at least one of them has a mistake to fix. Fixing that may or may not lead to drastic changes, abandonment of the main ideas, etc
The point that I do disagree with is the proposed asymmetry between confirmation and falsification.
That is a purely logical point Popper used to criticize some mistaken ideas. Are you disputing the logic? If you’re merely disputing the premises, it doesn’t really matter because its purpose is to criticize people who use those premises on their own terms.
In my view neither the black swan or the white swan proves anything with certainty,
Agreed.
but both do provide some evidence. It happens in this case that one piece of evidence is very strong while the other is very weak, in fact they are pretty much at opposite extremes of the full spectrum of evidence encountered in the real world. This does not mean there is a difference of type.
I think you are claiming that seeing a white swan is positive support for the assertion that all swans are white. (If not, please clarify). If so, this gets into important issues. Popper disputed the idea of positive support. The criticism of the concept begins by considering: what is support? And in particular, what is the difference between “X supports Y” and “X is consistent with Y”?
I also doubt that any philosophy could manage without either circularity or assumptions, explicit or otherwise. As I see it when you start thinking you need something to begin your inference, logic derives truths form other truths, it cannot manufacture them out of a vacuum.
Questioning this was one of Popper’s insights. The reason most people doubt it is possible is because, since Aristotle, pretty much all epistemology has taken this for granted. These ideas seeped into our culture and became common sense.
What’s weird about the situation is that most people are so attached to them that they are willing to accept circular arguments, arbitrary foundations, or other things like that. Those are OK! But that Popper might have a point is hard to swallow. I find circular arguments rather more doubtful than doing without what Popperians refer to broadly as “justification”. I think it’s amazing that people run into circularity or other similar problems and still don’t want to rethink all their premises. (No offense intended. Everyone has biases, and if we try to overcome them we can become less wrong about some matters, and stating guesses at what might be biases can help with that.)
All the circularity and foundations stem from seeking to justify ideas. To show they are correct. Popper’s epistemology is different: ideas never have any positive support, confirmation, verification, justification, high probability, etc… So how do we act? How do we decide which idea is better than the others? We can differentiate ideas by criticism. When we see a mistake in an idea, we criticize it (criticism = explaining a mistake/flaw). That refutes the idea. We should act on or use non-refuted ideas in preference over refuted ideas.
That’s the very short outline, but does that make any sense?
You can conjecture Bayes’ theorem. You can also conjecture all the rest, however some things (such as induction, justificationism, foundationalism) contradict Popper’s epistemology. So at least one of them has a mistake to fix. Fixing that may or may not lead to drastic changes, abandonment of the main ideas, etc
Fully agreed. In principle, if Popper’s epistemology is of the second, self-modifying type, there would be nothing wrong with drastic changes. One could argue that something like that is exactly how I arrived at my current beliefs, I wasn’t born a Bayesian.
I can also see some ways to make induction and foundationalism easer to swallow.
I don’t know the etiquette or format of this website well or how it works. When I have comments on the book, would it make sense to start a new thread or post somewhere/somehow?
A discussion post sounds about right for this, if enough people like it you might consider moving it to the main site.
I think you are claiming that seeing a white swan is positive support for the assertion that all swans are white. (If not, please clarify).
This is precisely what I am saying.
If so, this gets into important issues. Popper disputed the idea of positive support. The criticism of the concept begins by considering: what is support? And in particular, what is the difference between “X supports Y” and “X is consistent with Y”?
The beauty of Bayes is how it answers these questions. To distinguish between the two statements we express them each in terms of probabilities.
“X is consistent with Y” is not really a Bayesian way of putting things, I can see two ways of interpreting it. One is as P(X&Y) > 0, meaning it is at least theoretically possible that both X and Y are true. The other is that P(X|Y) is reasonably large, i.e. that X is plausible if we assume Y.
“X supports Y” means P(Y|X) > P(Y), X supports Y if and only if Y becomes more plausible when we learn of X. Bayes tells us that this is equivalent to P(X|Y) > P(X), i.e. if Y would suggest that X is more likely that we might think otherwise then X is support of Y.
Suppose we make X the statement “the first swan I see today is white” and Y the statement “all swans are white”. P(X|Y) is very close to 1, P(X|~Y) is less than 1 so P(X|Y) > P(X), so seeing a white swan offers support for the view that all swans are white. Very, very weak support, but support nonetheless.
(The above is not meant to be condescending, I apologise if you know all of it already).
To show they are correct. Popper’s epistemology is different: ideas never have any positive support, confirmation, verification, justification, high probability, etc...
This is a very tough bullet to bite.
How do we decide which idea is better than the others? We can differentiate ideas by criticism. When we see a mistake in an idea, we criticize it (criticism = explaining a mistake/flaw). That refutes the idea. We should act on or use non-refuted ideas in preference over refuted ideas.
One thing I don’t like about this is the whole ‘one strike and you’re out’ feel of it. It’s very boolean, the real world isn’t usually so crisp. Even a correct theory will sometimes have some evidence pointing against it, and in policy debates almost every suggestion will have some kind of downside.
There is also the worry that there could be more than one non-refuted idea, which makes it a bit difficult to make decisions. Bayesianism, on the other hand, when combined with expected utility theory, is perfect for making decisions.
1) gather data 2) generalize/extrapolate (induce) a conclusion from the data 3) the conclusion is probably right, with some exceptions
The problem is step 2 which does not how how to extrapolate a conclusion from a set of data.
Step 1 is problematic also, as I explained in some of my comments to Tim Tyler. What should I gather data about? What kind of data? What measurements are important? How accurate? And so on.
Yes I agree. Another issue I mentioned in one of my comments here is that your data isn’t a random sample of all possible data, so what do you do about bias? (I mean bias in the data, not bias in the person.)
Step 3 is also problematic (as it explicitly acknowledges).
PS is it just me or is it difficult to navigate long discussions and to find new nested posts? And I wasn’t able to find a way to get email notification of replies.
I don’t think I have the grasp on these subjects to hang in this, but this is great. -- I hope someone else comments in a more detailed manner.
In Popperian analysis, who ends the discussion of “what’s better?” You seem to have alluded to it being “whatever has no criticisms.” Is that accurate?
try to find relevant evidence to update the probabilities (this depends on more assumptions)
Why would Bayesian epistemology not be able to use the same evidence that Popperians used (e.g. the 1920 paper) and thus not require “assumptions” for new evidence? My rookie statement would be that the Bayesian has access to all the same kinds of evidence and tools that the Popperian approach does, as well as a reliable method for estimating probability outcomes.
Could you also clarify the difference between “conjecture” and “assumption.” Is it just that you’re saying that a conjecture is just a starting point for departure, whereas an assumption is assumed to be true?
An assumption seems both 1) justified if it has supporting evidence to make it highly likely as true to the best of our knowledge and 2) able to be just as “revisable” given counter-evidence as a “conjecture.”
Are you thinking that a Bayesian “assumption” is set in stone or that it could not be updated/modified if new evidence came along?
Lastly, what are “conjectures” based on? Are they random? If not, it would seem that they must be supported by at least some kind of assumptions to even have a reason for being conjectured in the first place. I think of them as “best guesses” and don’t see that as wildly different from the assumptions needed to get off the ground in any other analysis method.
In Popperian analysis, who ends the discussion of “what’s better?” You seem to have alluded to it being “whatever has no criticisms.” Is that accurate?
Yes, “no criticisms” is accurate. There are issues of what to do when you have a number of theories remaining which isn’t exactly one which I didn’t go into.
It’s not a matter of “who”—learning is a cooperative thing and people can use their own individual judgment. In a free society it’s OK if they don’t agree (for now—there’s always hope for later) about almost all topics.
I don’t regard the 1920 paper as evidence. It contains explanations and arguments. By “evidence” I normally mean “empirical evidence”—i.e. observation data. Is that not what you guys mean? There is some relevant evidence for liberalism vs socialism (e.g. the USSR’s empirical failure) but I don’t regard this evidence as crucial, and I don’t think that if you were to rely only on it that would work well (e.g. people could say the USSR did it wrong and if they did something a bit different, which has never been tried, then it would work. And the evidence could not refute that.)
BTW in the Popperian approach, the role of evidence is purely in criticism (and inspiration for ideas, which has no formal rules or anything). This is in contrast to inductive approaches (in general) which attempt to positively support/confirm/whatever theories with the weight of evidence.
If the Bayesian approach uses arguments as a type of evidence, and updates probabilities accordingly, how is that done? How is it decided which arguments win, and how much they win by? One aspect of the criticism approach is theories do not have probabilities but only two statuses: they are refuted or non-refuted. There’s never an issue of judging how strong an argument is (how do you do that?).
If you try to follow along with the Popperian approach too closely (to claim to have all the same tools) one objection will be that I don’t see Bayesian literature acknowledging Popper’s tools as valuable, talking about how to use them, etc… I will suspect that you aren’t in line with the Bayesian tradition. You might be improving it, but good luck convincing e.g. Yudkowsky of that.
The difference between a conjecture and an assumption is just as you say: conjectures aren’t assumed true but are open to criticism and debate.
I think the word “assumption” means not revisable (normally assumptions are made in a particular context, e.g. you assume X for the purposes of a particular debate which means you don’t question it. But you could have a different debate later and question it.). But I didn’t think Bayesianism made any assumptions except for its foundational ones. I don’t mind if you want to use the word a different way.
Regarding justification by supporting evidence, that is a very problematic concept which Popper criticized. The starting place of the criticism is to ask what “support” means. And in particular, what is the difference between support and mere consistency (non-contradiction)?
Conjectures are not based on anything and not supported. They are whatever you care to imagine. It’s good to have reasons for conjectures but there are no rules about what the reasons should be, and conjectures are never rejected because of the reason they were conjectured (nor because of the source of the conjecture), only because of criticisms of their substance. If someone makes too many poor conjectures and annoys people, it’s possible to criticize his methodology in order to help him. Popperian epistemology does not have any built-in guidelines for conjecturing on which it depends; they can be changed and violated as people see fit. I would rather call them “guesses” than “best guesses” because it’s often a good idea for one person to make several conjectures, including ones he suspects are mistaken, in order to learn more about them. It should not be each person puts forward his best theory and they face off, but everyone puts forward all the theories he thinks may be interesting and then everyone cooperates in criticizing all of them.
Edit: BTW I use the words “theory” and “idea” interchangeably. I do not mean by “theory” ideas with a certain amount of status/justification. I think “idea” is the better word but I frequently forget to use it (because Popper and Deutsch say “theory” all the time and I got used to it).
Starting with all possible hypotheses (sequences) as represented by computer programs (that generate those sequences), weighted by their simplicity (2^-n, where n is the program length);
So we have infinitely many theories, infinitely many of which are dead wrong, and only one of which is true, and we just use the shortest one and hope? And that’s supposed to be a good idea?
You usually weight them by their simplicity, if you want a probabalistic forecast. This is Occam’s razor. Picking the shortest one is not an unreasonable way to get a specific prediction.
Ockham’s razor principle is well supported theoretically and experimentally, and there is no other similarly general and powerful principle which could replace or augment it. Ockham’s razor principle has been proven to be invaluable for understanding our world. Indeed, it not only seems a necessary but also sufficient founding principle of science. Until other necessary or sufficient principles are found, it is prudent to accept Ockham’s razor as the foundation of inductive reasoning. So far, all attempts to discredit the universal role of Ockham’s razor have failed.
Infinitely many hypotheses increase in probability. What good is that? You have infinite possibilities before you and haven’t made progress towards picking between them.
When you say “this infinite set over here, its probability increases” you aren’t reaching an answer. You aren’t even getting any further than pure deduction would have gotten you.
Look, there’s two infinite sets: those contradicted by the evidence, and those not (deal with theories with “maybes” in them however you like, it does not matter to my point). The first set we don’t care about—we all agree to reject it. The second set is all that’s left to consider. if you increase the probability of every theory in it that doesn’t help you choose between them. it’s not useful. when you “confirm” or increase the probability of every theory logically consistent with the data, you aren’t reaching an answer, you aren’t making progress.
The progress is in the theories that are ruled out. When playing cards, you could consider all possible histories of the motions of the cards that are compatible with the evidence. Would you have any problem with making bets based on these probabilities? Solomonoff induction is very similar. While there are an infinite number of possibilities, both cases involve proving general properties of the distribution rather than considering each possibility individually.
In the future please capitalize your sentences; it improves readability (especially in large paragraphs).
If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.
The Solomonoff prior is really just a from of the principle of insufficient reason, which states that if there is no reason to think that one thing is more probable than another, they should be assigned the same probability. Since there are an infinite number of theories, we need to take some kind of limit. If we encode them as self-delimiting computer programs, we can write them as strings of digits (usually binary). Start with some maximum length and increase it toward infinity. Some programs will proceed normally, looping infinitely or encountering a stop instruction, making many programs equivalent because changing bits that are never used by the hypothesis does not change the theory. Other programs will leave the bounds of the maximum length, but this will be fixed as that length is taken to infinity.
This obviously isn’t a complete justification, but it is better than Popperian induction. Both rule out falsified theories and both penalize theories for unfalsifiability and complexity. Only Solomonoff induction allows us to quantify the size of these penalties in terms of probability. Popper would agree that a simpler theory, being compared to a more complex one, is more likely but not guaranteed to be true, but he could not give the numbers.
If you are still worried about the foundational issues of the Solomonoff prior, I’ll answer your questions, but it would be better if you asked me again in however long progress takes (that was supposed to sound humourous, as if I were describing a specific, known amount of time, but I really doubt that that is noticable in text). http://lesswrong.com/r/discussion/lw/534/where_does_uncertainty_come_from/ writes up some of the questions I’m thinking about now. It’s not by me, but Paul seems to wonder about the same issues. This should all be significantly more solid once some of these questions are answered.
“If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.”
That’s it? That is trivial, and doesn’t solve the major problems in epistemology. It’s correct enough (I’m not convinced theories have probabilities, but I think that’s a side issue) but it doesn’t get you very far. Any old non-Bayesian epistemology could tell you this.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better. This ordering has nothing empirical about it. Your approach here is not based on evidences or probabilities, just an ordering. Correct me if I got that wrong. That raises the question: why is the Solomonoff ordering correct? Why not some other ordering? Here’s one objection: “God did everything” is a short theory which is compatible with all evidence. You can make separate versions of it for all possible sets of predictions if you want. Doesn’t that mean we’re either stuck with some kind of “God did everything” or the final truth is even shorter?
You mention “Popperian induction”. Please don’t speak for Popper. The idea that Popper advocated induction is a myth. A rather crass one; he refuted induction and published a lot of material against it. Instead, ask me about his positions, OK? Popper would not agree that the simpler theory is “more likely”. There’s many issues here. One is that Popper said we should prefer low probability theories because they say more about the world.
You seem to present “Popperian induction” as an incomplete justification. Maybe you are unware that Popper’s epistemology rejects the concept of justification itself. It is thus a mistake to criticize it on justificationist grounds. It isn’t any type of justification and doesn’t want to be.
In order to quote people, you can use a single greater than sign ‘>’ at the beginning of a line.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
Note I said that and a prior. The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act. From Wikipedia: ‘Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures’, where a statistical procedure may be taken as a guide for optimal action.
Sorry about saying ‘Popperian induction’. I only have a basic knowledge of Popper. Would Popper say that predicting the results of actions is (one of) the goals of science? This is, of course, slightly more general than induction.
Wikipedia quotes Popper as saying simpler theories are to be preferred ‘because their empirical content is greater; and because they are better testable’. Does this mean that he would bet something important on this? If there were two possible explanations for a plague and if the simpler one were true than, with medicine, we could save 100 lives but if the more complex one were true we could save 200 lives, how would you decide which cure the factories should manufacture (and it takes a long time to prepare the factories or something so you can only make one type of cure).
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better.
It is exactly not about this. The reason to prefer simpler theories is that more possible universes correspond to them. For a simple universe, axioms 1 through 10 have to come out the right way, but the rest can be anything, as they are meaningless since the universe is already fully specified. For a more complex theory, axioms 11-15 must also turn out a certain way, so fewer possible universe are compatible with this theory. I would also add the principle of sufficient reason, which I think is likely, as further justification for Occam’s razor, but that is irrelevant here.
Popper said we should prefer low probability theories because they say more about the world.
This seems wrong. Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act.
Popperian epistemology doesn’t assign probabilities like that, and has a way to act. So would you agree that, if you fail to refute Popperian epistemology, then one of your major claims is wrong? Or do you have a backup argument: you don’t have to, but you should anyway because..?
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
Secondary sources about Popper, like wikipedia, are not trustworthy. Popper would not bet anything important on that simpler theories thing. That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
The way to make decisions is by making conjectures about what to do, and criticizing those conjectures. We learn by critical, imaginative argument (including within one mind). Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified. Have I misunderstood again?
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
I’m not totally sure what your concept of a universe or axiom is here. Also I note that the real world is pretty complicated.
Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
Popperian epistemology doesn’t assign probabilities like that, and has a way to act.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences? Otherwise, your choices must be describable using terms of a prior probability and Bayesian updating (and, while we already agree that the latter is obvious, here we are using it to update a set of probabilities and, on the pain of inconsistency, our new probabilities must have that relationship to our old ones).
Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
Definitely use all the evidence when making decisions. I didn’t mean for my example to be complete. I was wondering how a question like that could be addressed in general. What pieces of information would be important and how would they be taken into account? You can assume that the less relevant variables, like which disease is more painful, are equal in both cases.
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
I may have been unclear here. I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified.
No, not at all. The more likely theories are those that include small amounts of theory, not small amounts of prediction. Eliezer discusses this in the sequences here, here, andhere. Those don’t really cover Solomonoff induction directly, but they will probably give you a better idea of what I’m trying to say than I did. I think Solomonoff induction is better covered in another post, but I can’t find it right now.
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
Sorry, I was abusing one word ‘theories’ to mean both ‘individual descriptions of the universe’ and ‘sets of descriptions that make identical predictions in some realm (possibly in all realms)‘. It is a very natural place to slip definitions, because, for example, when discussing biology, we often don’t care about the distinction between ‘Classical physics is true and birds are descended from dinosaurs.’ and ‘Quantum physics is true and birds are descended from dinosaurs.’ Once enough information is specified to make predictions, a theory (in the second sense) is on equal ground with another theory that contains the same amount of information and that makes different predictions only in realms where it has not been tested, as well as with a set of theories for which the set can be specified with the same amount of information but for which specifying one theory out of the set would take more information.
That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
All of this is present is Bayesian epistemology.
Consider Bayes theorem, with theories A and B and evidence E:
P(A|E) = P(E|A) P(A) / P(E)
Let’s look at how the probability of a theory increases upon learning E, using a ratio.
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B. This means that the more theory A narrowly predicts E, the actual observation, to the exclusion of other possible observations, the more probability we assign to it. This is a quantitative form of Popper’s preference for more specific and more easily falsifiable theories, as proven by Bayes theorem.
The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?)
That’s basically what Solomonoff means by prior probability.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences?
Yes Popper is non-consequentialist.
Consequentialism is a bad theory. It says ideas should be evaluated by their consequences (only). This does not address the question of how to determine what are good or bad consequences.
If you try to evaluate methods of determining what are good or bad consequences, by their consequences, you’ll end up with serious regress problems. If you don’t, you’ll have to introduce something other than consequences.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
For example, I might assert that a consequence is good if it makes me happy, and bad if it makes me unhappy. (I don’t in fact assert this.) I would then conclude that an idea is good if its consequences make me happy, and bad if its consequences make me unhappy. No regress involved.
(And note that this is different from saying that an idea is good if the idea makes me happy. If it turns out that the idea “I could drink drain cleaner” makes me happy, but that actually drinking drain cleaner makes me unhappy, then it’s a bad idea by the first theory but a good idea by the second theory.)
A certain amount of precision is helpful when thinking about these sorts of things.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
...
A certain amount of precision is helpful when thinking about these sorts of things.
If you reread the sentence in which I discuss a regress, you will notice it begins with “if” and says that a certain method would result in a regress, the point being you have to do something else. So it was your mistake.
That is not what I meant by consequentialism, and I agree that that theory entails an infinite regress. The theory I was referring to, which is the first google result for consequentialism, states that actions should be judged by their consequences.
That theory is bad too. For one thing, you might do something really dumb—say, shoot at a cop—and the consequence might be something good, e.g. you might accidentally hit the robber behind him who was about to kill him. you might end up declared a hero.
For another thing, “judge by consequences” does not answer the question of what are good or bad consequences. It tells us almost nothing. The only content is don’t judge by anything else. Why not? Beats me.
If you mean judge by rationally expected consequences, or something like that, you could drop the first objection but I still don’t see the use of it. If you merely want to exclude mysticism I think we can do that with a lighter restriction.
Sorry, I didn’t explain this very well. I don’t use consequentialism to judge people, I use it to judge possible courses of action. I (try to) make choices with the best consequences, this fully determines actions, so judgments of, for example, who is a bad person, do not add anything.
You are right that this is very broad. My point is that all consequentialist decision rules are either Bayesian decision rules or limits of Bayesian decision rules, according to a theorem.
I didn’t discuss who is a bad person. An action might be bad but have a good result (this time) by chance. And you haven’t said a word about what kinds of consequences of actions are good or bad … I mean desirable or undesirable. And you haven’t said why everything but consequences is inadmissible.
In your example of someone shooting a police officer, I would say that it is good that the police officer’s life was saved, but it is bad that there is a person who would shoot people so irresponsibly and I would not declare that person a hero as that will neither help save more police officers or reduce the number of people shooting recklessly; in fact, it would probably increase the number of reckless people.
I don’t want to get into the specifics of morality, because it is complex. The only reason that I specified consequentialist decision making is that it is a condition of the theorem that proves Bayesian decision making to be optimal. Entirely nonconsequentialist systems don’t need to learn about the universe to make decisions and partially consequentialist systems are more complicated. For the latter, Bayesianism is often necessary if there are times when nonconsequentialist factors have little import to a decision.
it is bad that there is a person who would shoot people so irresponsibly
You are here judging a non-action by a non-consequence.
Yes, this is a non-action; I often say it is bad that as shorthand for cetris paribus, I would act so as to make not be the case. However, it is a consequence of what happened before (though you may have just meant it is not a consequence of my action). Judgements are often attached to consequences without specifying which action they are consequences of, just for convenience.
I think you mean systems which ignore all consequences. Popper’s system does not do that.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B.
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
Examples include Newton’s Laws and Quantum Theory. They don’t say they happen sometimes but always, and that’s important. Good scientific theories are always like that. Even when they have a restricted, non-universal domain, it’s 100% within the domain.
Physics is currently thought to be deterministic. And even if physics was random, we would say that e.g. motion happens randomly 100% of the time, or whatever the law is. We would expect a law of motion with a call to a random function to still always be what happens.
PS Since you seem to have an interest in math, I’d be curious about your thoughts on this:
The article you sent me is mathematically sound, but Popper draws the wrong conclusion from it. He has already accepted that P(H|E) can be greater than P(H). That’s all that’s necessary for induction: updating probability distribution. The stuff he says at the end about H ← E being countersupported by E does not prevent decision making based on the new distribution.
Setting aside Popper’s point for a minute, p(h|e) > p(h) is not sufficient for induction.
The reason it is not sufficient is that infinitely many h gain probability for any e. The problem of dealing with those remains unaddressed. And it would be incorrect and biased to selectively pick some pet theory from that infinite set and talk about how it’s supported.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
It seems obvious that low probability theories are good. Since probabilities must add up to 100%, there can be only a few high-probability theories and, when one is true, there is not much work to be done in finding it, since it is already so likely. telling someone to look among low-probability theories is like telling them to look among nonapples when looking for possible products to sell, and it provides no way of distinguishing good low-prior theories, like quantum mechanics, from bad ones, like astrology.
Unfortunately, I cannot read that article, as it is behind a paywall. If you have access to it, perhaps you could email it to me at endoself (at) yahoo (dot) com .
ETA:
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
I was only talking about Popper’s idea of theories with high content. That particular analysis was not meant to address theories that predicted certain outcomes with probability 1.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
It’s a loose guideline for people about where it may be fruitful to look. It can also be used in critical arguments if/when people think of arguments that use it.
One of the differences between Popper and Bayesian Epistemology is that Popper thinks being overly formal is a fault not a merit. Much of Popper’s philosophy does not consist of formal, rigorous guidelines to be followed exactly. Popper isn’t big on rules of procedure. A lot is explanation. Some is knowledge to use on your own. Some is advice.
The more likely theories are those that include small amounts of theory, not small amounts of prediction.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
Define theories of that type for all possible sets of predictions. Then at any given time you will have infinitely many of them that predict all your data with 100% probability.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
No, it has tons of theory. God is a very complex concept. Note that ‘God did everything’ is more complex and therefore less likely than ‘everything happened’. Did you read http://lesswrong.com/lw/jp/occams_razor/ ?
How do you figure God is complex? God as I mean it simply can do anything, no reason given. That is its only attribute: that it arbitrarily does anything the theory its attached to cares to predict. We can even stop calling it “God”. We could even not mention it at all so there is no theory and merely give a list of predictions. Would that be good, in your view?
If ‘God’ is meaningless and can merely be attached to any theory, then the theory is the same with and without God. There is nothing to refute, since there is no difference. If you defined ‘God’ to mean a being who created all species or who commanded a system of morality, I would have both reason to care about and means to refute God. If you defined ‘God’ to mean ‘quantum physics’, there would be applications and means of proving that ‘God’ is a good approximation, but this definition is nonsensical, since it is not what is usually meant by ‘God. If the theory of ‘God’ has no content, there is nothing to discuss, but the is again a very unusual definition.
If there is no simpler description, then a list of predictions is better but, if an explanation simpler then merely a list of prediction is at all possible, then that would be more likely.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Essentially. It is simpler if it has a higher Solomonoff prior.
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
Yes, there is more than just predictions. However, prediction are the only things that tell us how to update our probability distributions.
Quoting from The Fabric of Reality, chapter 1, by David Deutsch.
Yet some philosophers — and even some scientists — disparage the role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of experiments: its entire content lies in its predictive formulae. They consider that any consistent explanation that a theory may give for its predictions is as good as any other — or as good as no explanation at all — so long as the predictions are true. This view is called instrumentalism (because it says that a theory is no more than an ‘instrument’ for making predictions). To instrumentalists, the idea that science can enable us to understand the underlying reality that accounts for our observations is a fallacy and a conceit. They do not see how anything a scientific theory may say beyond predicting the outcomes of experiments can be more than empty words.
[cut a quote of Steven Weinberg clearly advocating instrumentalism. the particular explanation he says doesn’t matter is that space time is curved. space time curvature is an example of a non-predictive explanation.]
imagine that an extraterrestrial scientist has visited the Earth and given us an ultra-high-technology ‘oracle’ which can predict the outcome of any possible experiment, but provides no explanations. According to instrumentalists, once we had that oracle we should have no further use for scientific theories, except as a means of entertaining ourselves. But is that true? How would the oracle be used in practice? In some sense it would contain the knowledge necessary to build, say, an interstellar spaceship. But how exactly would that help us to build one, or to build another oracle of the same kind — or even a better mousetrap? The oracle only predicts the outcomes of experiments. Therefore, in order to use it at all we must first know what experiments to ask it about. If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction — even perfect, universal prediction — is simply no substitute for explanation.
Similarly, in scientific research the oracle would not provide us with any new theory. Not until we already had a theory, and had thought of an experiment that would test it, could we possibly ask the oracle what would happen if the theory were subjected to that test. Thus, the oracle would not be replacing theories at all: it would be replacing experiments. It would spare us the expense of running laboratories and particle accelerators.
[cut elaboration]
The oracle would be very useful in many situations, but its usefulness would always depend on people’s ability to solve scientific problems in just the way they have to now, namely by devising explanatory theories. It would not even replace all experimentation, because its ability to predict the outcome of a particular experiment would in practice depend on how easy it was to describe the experiment accurately enough for the oracle to give a useful answer, compared with doing the experiment in reality. After all, the oracle would have to have some sort of ‘user interface’. Perhaps a description of the experiment would have to be entered into it, in some standard language. In that language, some experiments would be harder to specify than others. In practice, for many experiments the specification would be too complex to be entered. Thus the oracle would have the same general advantages and disadvantages as any other source of experimental data, and it would be useful only in cases where consulting it happened to be more convenient than using other sources. To put that another way: there already is one such oracle out there, namely the physical world. It tells us the result of any possible experiment if we ask it in the right language (i.e. if we do the experiment), though in some cases it is impractical for us to ‘enter a description of the experiment’ in the required form (i.e. to build and operate the apparatus). But it provides no explanations.
In a few applications, for instance weather forecasting, we may be almost as satisfied with a purely predictive oracle as with an explanatory theory. But even then, that would be strictly so only if the oracle’s weather forecast were complete and perfect. In practice, weather forecasts are incomplete and imperfect, and to make up for that they include explanations of how the forecasters arrived at their predictions. The explanations allow us to judge the reliability of a forecast and to deduce further predictions relevant to our own location and needs. For instance, it makes a difference to me whether today’s forecast that it will be windy tomorrow is based on an expectation of a nearby high-pressure area, or of a more distant hurricane. I would take more precautions in the latter case.
[“wind due to hurricane” and “wind due to high-pressure area” are different explanations for a particular prediction.]
So knowledge is more than just predictive because it also lets us design things?
Here’s a solution to the problem with the oracle—design a computer that inputs every possible design to the oracle and picks the best. You may object that this would be extremely time-consuming and therefore impractical. However, you don’t need to build the computer; just ask the oracle what would happen if you did.
What can we learn from this? This kind of knowledge can be seen as predictive, but only incidentally, because the computer happen to be implemented in the physical world. If it were implemented mathematically, as an abstract algorithm, we would recognize this as deductive, mathematical knowledge. But math is all about tautologies; nothing new is learned. Okay, I apologize for that. I think I’ve been changing my definition of knowledge repeatedly to include or exclude such things. I don’t really care as much about consistent definitions as I should. Hopefully it is clear from context. I’ll go back to your original question.
Would a list of predictions with no theory/explanation be good or bad, in your view?
The difference between the two cases is not the same as the crucial difference here. Having a theory as opposed to a list of predictions for every possible experiment does not necessarily make the theorems easier to prove. When it does, which is almost always, this is simply because that theory is more concise, so it is easier to deduce things from. This seems more like a matter of computing power than one of epistemology.
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
And wouldn’t the oracle predict that the computer program would never halt, since it would attempt to enter infinitely many questions into the oracle?
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size, but, again, this is irrelevant; whatever trick you use to make this work, you will not change the conclusions.
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
I think figuring out what criteria you want is an example of a non-predictive issue. That makes it not beside the point. And if the computer picks the best according to criteria we give it, they will contain mistakes. We won’t actually get the best answer. We’ll have to learn stuff and improve our knowledge all in order to set up your predictive thing. So there is this whole realm of non-predictive learning.
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size,
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well. So your ability to get useful predictions out of the oracle depends on your understanding, not just on predicting anything.
I think figuring out what criteria you want is an example of a non-predictive issue.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well.
Good point. I guess you can’t bootstrap an oracle like this; some things possible mathematically, like calculating a function over an infinity of points, just can’t be done physically. My point still stands, but this illustration definitely dies.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
That’s it? That’s just not very impressive by my standards. Popper’s epistemology is far more advanced, already. Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
By ‘what Eliezer is working on now’ I meant AI, which would probably be necessary to extract my desires from my brain in practice. In principle, we could just use Bayes’ theorem a lot, assuming we had precise definitions of these concepts.
Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
Popperian epistemology is incompatible with Bayesian epistemology, which I accept from its own justification, not from a lack of any other theory. I disliked what I had heard about Popper before I started reading LessWrong, but I forget my exact argument, so I do not know if it was valid. From what I do remember, I suspect it was not.
So, you reject Popper’s ideas without having any criticism of them that you can remember?
That’s it?
You don’t care that Popper’s ideas have criticisms of Bayesian epistemology which you haven’t answered. You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
So, you reject Popper’s ideas without having any criticism of them that you can remember?
No, I brought up my past experience with Popper because you asked if my opinions on him came from Eliezer.
You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
No, I think Bayesian epistemology has been mathematically proven. I don’t spend a lot of time investigating alternatives for the same reason I don’t spend time investigating alternatives to calculus.
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know. You did bring up some things of this type, like that paper by Popper, but none of them have logically stood up, unless I am missing something.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It seems to me that saying it’s proven, the end, is incompatible with it having any flaws or unsolved problems or need for further research. So, which is it?
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know.
All of the above. It is wrong b/c, e.g., it is instrumentalist (has not understood the value of explanatory knowledge) and inductivist (induction is refuted). It is incomplete b/c, e.g. it cannot deal with non-observational knowledge such as moral knowledge. You haven’t proved much to me however I’ve been directed to two books, so judgment there is pending.
I don’t know how you concluded that none of my arguments stood up logically. Did you really think you’d logically refuted every point? I don’t agree, I think most of your arguments were not pure logic, and I thought that various issues were pending further discussion of sub-points. As I recall, some points I raised have not been answered. I’m having several conversations in parallel so I don’t recall which in particular you didn’t address which were replies to you personally, but for example I quoted an argument by David Deutsch about an oracle. The replies I got about how to try to cheat the oracle did not address the substantive point of the thought experiment, and did not address the issue (discussed in the quote) that oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle (we already have an oracle and none of the replies I got about use the oracle would actually work with the oracle we have). As I saw it, my (quoted) points on that issue stood. The replies were some combination of incomplete and missing the point. They were also clever which is a bit of fun. I thought of what I think is a better way to try to cheat the rules, which is to ask the oracle to predict the contents of philosophy books that would be written if philosophy was studied for trillions of years by the best people. However, again, the assumption that any question which is easily described in English can be easily entered into the oracle and get a prediction was not part of the thought experiment. And the reason I hadn’t explained all this yet is that there were various other points pending anyway, so shrug, it’s hard to decide where to start when you have many different things to say.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It is proven that the correct epistemology, meaning one that is necessary to achieve general goals, is isomorphic to Bayesianism with some prior (for beings that know all math). What that prior is requires more work. While the constraint of knowing all math is extremely unrealistic, do you agree that the theory of what knowledge would be had in such situations is a useful guide to action until we have a more general theory. Popperian epistemology cannot tell me how much money to bet at what odds for or against P = NP any more than Bayesian epistemology can, but at least Bayesian epistemology set this as a goal.
it is instrumentalist (has not understood the value of explanatory knowledge)
oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle
This is all based on our limited mathematical ability. A theory does have an advantage over an oracle or the reality-oracle: we can read it. Would you agree that all the benefits of a theory come from this plus knowing all math. The difference is one of mathematical knowledge, not of physical knowledge. How does Popper help with this? Are there guidelines for what ‘equivalent’ formulations of a theory are mathematically better? If so, this is something that Bayesianism does not try to cover, so this may have value. However, this is unrelated to the question of the validity of “don’t assign probabilities to theories”.
inductivist (induction is refuted)
I thought I addressed this but, to recap:
p(h, eb) > p(h, b) [bottom left of page 1]
That (well and how much bigger) is all I need to make decisions.
All this means: that factor that contains all of h that does not follow deductively from e is strongly countersupported by e.
So what? I already have my new probabilities.
[T]he calculus of probability reveals that probabilistic support cannot be inductive support.
What is induction if not the calculation of new probabilities for hypotheses? Should I care about these ‘inductive truths’ that Popper disproves the existence of? I already have an algorithm to calculate the best action to take. It seems like Bayesianism isn’t inductivist by Popper’s definition.
moral knowledge
I’d like to be sure that we are using the same definitions of our terms, so please give an example.
You mean proven given some assumptions about what an epistemology should be, right?
Would you agree that all the benefits of a theory come from this [can read it] plus knowing all math.
No. We need explanations to understand the world. In real life, is only when we have explanations that we can make good predictions at all. For example, suppose you have a predictive theory about dice and you want to make bets. I chose that example intentionally to engage with areas of your strength. OK, now you face the issue: does a particular real world situation have the correct attributes for my predictive theory to apply? You have to address that to know if your predictions will be correct or not. We always face this kind of problem to do much of anything. How do we figure out when our theories apply? We come up with explanations about what kinds of situations they apply to, and what situation we are in, and we then come up with explanations about why we think we are/aren’t in the right kind of situation, and we use critical argument to improve these explanations. Bayesian Epistemology does not address all this.
p(h, eb) > p(h, b) [bottom left of page 1]
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
I gave examples of moral knowledge in another comment to you. Morality is knowledge about how to live, what is a good life. e.g. murder is immoral.
You mean proven given some assumptions about what an epistemology should be, right?
Yes, I stated my assumptions in the sentence, though I may have missed some.
We always face this kind of problem to do much of anything. How do we figure out when our theories apply?
This comes back to the distinction between one complete theory that fully specifies the universe and a set of theories that are considered to be one because we are only looking at a certain domain. In the former case, the domain of applicability is everywhere. In the latter, we have a probability distribution that tells us how likely it is to fail in every domain. So, this kind of thing is all there in the math.
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
What do you mean by ‘a good theory’. Bayesian never select one theory as ‘good’ as follow that; we always consider the possibility of being wrong. When theories have higher probability than others, I guess you could call them good. I don’t see why this is hard; just calculate P(H | E) for all the theories and give more weight to the more likely ones when making decisions.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Let’s consider only theories which make all their predictions with 100% probability for now. And theories which cover everything.
Then:
If H and E are consistent, then it follows that P(H | E) > P(H).
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
We can put all the Hs in two categories: the consistent ones which gain equal probability, and the inconsistent ones for which P(H|E) = 0. (Assumption warning: we’re relying on getting it right which H are consistent with which E.)
This means:
1) consistency and support coincide.
2) there are infinitely many equally supported theories. There are only and exactly two amounts of support that any theory has given all current evidence, one of which is 0.
3) The support concept plays no role in helping us distinguish between the theories with more than 0 support.
4) The support concept can be dropped entirely because it has no use at all. The consistency concept does everything
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Similar arguments can be made without my initial assumptions/restrictions. For example introducing theories that make predictions with less than 100% probability will not help you because they are going to have lower probability than theories which make the same predictions with 100% probability.
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
Well the ratio is the same, but that’s probably what you meant.
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Have a prior. This reintroduces probabilities and deals with the remaining theories. You will converge on the right theory eventually no matter what your prior is. Of course, that does not mean that all priors are equally rational.
If they all have the same prior probability, then their probabilities are the same and stay that way. If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
How does having a prior save the concept of support? Can you give an example? Maybe the one here, currently near the bottom:
If they all have the same prior probability, then their probabilities are the same and stay that way.
Well shouldn’t they? If you look at it from the perspective of making decisions rather than finding one right theory, it’s obvious that they are equiprobable and this should be recognized.
If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
Solomonoff does not give “some things higher prior probabilities in a 100% non-evidence-based way”. All hypotheses have the same probability, many just make similar predictions.
It seems someone has downvoted you for not being familiar with Eliezer’s work on AI. Basically, this is overly anthropomorphic. It is one of our goals to ensure that an AI can progress from a ‘seed AI’ to a superintelligent AI without anything going wrong, but, in practice, we’ve observed that using metaphors like ‘parenting’ confuses people too much to make progress, so we avoid it.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
One of the fundamental attributes of an AI is that it’s a program which can learn new things.
Humans are also entities that learn new things.
But humans, left alone, don’t fare so well. Helping people learn is important, especially children. This avoids having everyone reinvent the wheel.
The parenting issue therefore must be addressed for AI. I am familiar with the main ideas of the kind of AI work you guys do, but I have not found the answer to this.
One possible way to address it is to say the AI will reinvent the wheel. It will have no help but just figure everything out from scratch.
Another approach would be to program some ideas into the AI (changeable, or not, or some of each), and then leave it alone with that starting point.
Another approach would be to talk with the AI, answer its questions, lecture it, etc… This is the approach humans use with their children.
Each of these approaches has various problems with it which are non-trivial to solve.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
When humans hear parenting, they think of the human parenting process. Describe the AI as ‘learning’ and the humans as ‘helping it learn’. This get us closer to the idea of humans learning about the universe around them, rather than being raised as generic members of society.
Don’t worry about downvotes, they do not matter.
Well, the point of down votes is discourage certain behaviour, and I agree that you should use terminology that we have found less likely to cause confusion.
This is definitely an important problem, but we’re not really at the stage where it is necessary yet. I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
When humans hear parenting, they think of the human parenting process.
Not all humans. Not me. Is that not a bias?
Well, the point of down votes is discourage certain behaviour
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution. We should expect an AI to use it, we shouldn’t expect a brand new solution to this hard problem (historically there have been very few candidate solutions proposed, most not at all promising).
The implementation details are not very important because the result will be universal, just like people are. This is similar to how the implementation details of universal computers are not important for many purposes.
Are you guys familiar with these concepts? There is important knowledge relevant to creating AIs which your statement seems to me to overlook.
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
As a general rule, if I downvote, I either reply to the post, or it is something that should be obvious to someone who has read the main sequences.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution.
No, there is another: the brain. It is also much faster than evolution, an advantage I would want a FAI to have.
You’re conflating two things. Biological evolution is a very specific algorithm, with well-studied mathematical properties. ‘Evolution’ in general just means any change over time. You seem to be using it in an intermediate sense, as any change that proceeds through reproduction, variation, and selection, which is also a common meaning. This, however, is still very broad, so there’s very little that you can learn about an AI just from knowing “it will come up with many ideas, mostly based on previous ones, and reject most of them”. This seems less informative than “it will look at evidence and then rationally adjust its understanding”.
Why is it that you guys want to make AI but don’t study relevant topics like this?
Eliezer has studied cognitive science. Those of us not working directly with him have very little to do with AI design. Even Eliezer’s current work is slightly more background theory than AI itself.
I’m not conflating them. I did not mean “change over time”.
There are many things we can learn from evolutionary epistemology. It seeming broad to you does not prevent that. You would do better to ask what good it is instead of guess it is no good.
For one thing it connects with meme theory.
A different example is that it explains misunderstandings when people communicate. Misunderstandings are extremely common because communication involves 1) guessing what the other person is trying to say 2) selecting between those guesses with criticism 3) making more guesses which are variants of previous guesses 4) more selection 5) etc
This explanation helps us see how easily communication can go wrong. It raises interesting issues like why so much communication doesn’t go wrong. It refutes various myths like that people absorb their teacher’s lectures a little like sponges.
It matters. And other explanations of miscommunication are worse.
Eliezer has studied cognitive science.
But that isn’t the topic I was speaking of. I meant evolutionary epistemology. Which btw I know that Eliezer has not studied much because he isn’t familiar with one of it’s major figures (Popper).
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics. There has not been any science experiment to test it which I consider important, but I could conceive of some because if you specified different and perverse laws of physics you could break evolution. In a different sense, evolution is tested constantly in that the laws of physics and evidence we see around us, every day, are not that perverse but conceivable physics that would break evolution.
The reason I accept evolution (again I refer to the epistemological theory about how knowledge is created) is that it is a good explanation, and it solves an important philosophical problem, and I don’t know anything wrong with it, and I also don’t know any rivals which solve the problem.
The problem has a long history. Where does “apparent design” come from? Paley gave an example of finding a watch in nature, which he said you know can’t have gotten there by chance. That’s correct—the watch has knowledge (aka apparent designed, or purposeful complexity, or many other terms). The watch is adapted “to a purpose” as some people put it (I’m not really a fan of the purpose terminology. But it’s adapted! And I think it gets the point across ok.)
Paley then guessed as follows: there is no possible solution to the origins of knowledge other than “A designer (God) created it”. This is a very bad solution even pre-Darwin because it does not actually solve the problem. The designer itself has knowledge, adaptation to a purpose, whatever. So where did it come from? The origin is not answered.
Since then, the problem has been solved by the theory of evolution and nothing else. And it applies to more than just watches found in nature, and to plants and animals. It also applies to human knowledge. The answer “intelligence did it” is no better than “God did it”. How does intelligence do it? The only known answer is: by evolution.
The best thing to read on this topic is The Beginning of Infinity by David Deutsch which discusses Popperian epistemology, evolution, Paley’s problem and its solution, and also has two chapters about meme theory which give important applications.
Also here: http://fallibleideas.com/tradition (Deutsch discusses static and dynamic memes and societies. I discuss “traditions” rather than “memes”. It’s quite similar stuff.)
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics.
What? Epistemological evolution seems to be about how the mind works, independent of what philosophical status is accorded to the thoughts. Surely it could be tested just by checking if the mind actually develops ideas in accordance with the way it is predicted to.
If you want to check how minds work, you could do that. But that’s very hard. We’re not there yet. We don’t know how.
How minds work is a separate issue from evolutionary epistemology. Epistemology is about how knowledge is created (in abstract, not in human minds specifically). If it turns out there is another way, it wouldn’t upset the evolution would create knowledge if done in minds.
There’s no reason to think there is another way. No argument that there is. No explanation of why to expect there to be. No promising research on the verge of working one out. Shrug.
Epistemology is about how knowledge is created (in abstract, not in human minds specifically).
I see. I thought that evolutionary epistemology was a theory of human minds, though I know that that technically isn’t epistemology. Does evolutionary epistemology describe knowledge about the world, mathematical knowledge, or both (I suspect you will say both)?
So, you’re saying that in order to create knowledge, there has to be copying, variation, and selection. I would agree with the first two, but not necessarily the third. Consider a formal axiomatic system. It produces an ever-growing list of theorems, but none are ever selected any more than others. Would you still consider this system to be learning?
With deduction, all the consequences are already contained in the premises and axioms. Abstractly, that’s not learning.
When human mathematicians do deduction, they do learn stuff, because they also think about stuff while doing it, they don’t just mechanically and thoughtlessly follow the rules of math.
So induction (or probabilistic updating, since you said that Popper proved it not to be the same as whatever philosopher call ‘induction’) isn’t learning either because the conclusions are contained in the priors and observations?
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
the idea of induction is that the conclusions are NOT logically contained in the observations (that’s why it is not deduction).
if you make up a prior from which everything deductively follows, and everything else is mere deduction from there, then all of your problems and mistakes are in the prior.
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
The correct method of updating one’s probability distributions is contained in the observations. P(H|E) = P(H)P(E|H)/P(E) .
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
So how could evolutionary epistemology be relevant to AI design?
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
The conjectures need to be based on the previous conjectures (not necessarily all of the, but some). That makes it replication with variation. The criticism is the selection.
Any AI design that completely ignores this is, imo, hopeless. I think that’s why the AI field hasn’t really gotten anywhere. They don’t understand what they are trying to make, because they have the wrong philosophy (in particular the wrong explanations. i don’t mean math or logic).
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
Note that there are AI approaches which do do something close to what you think an AI “needs”. For example, some of Simon Colton’s work can be thought of in a way roughly like what you want. But it is a mistake to think that such an entity needs to do that. (Some of the hardcore Bayesians make the same mistake in assuming that an AI must use a Bayesian framework. That something works well as a philosophical approach is not the same claim as that it should work well in a specific setting where we want an artificial entity to produce certain classes of systematic reliable results.)
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to. All the knowledge is provided by the human programmer—they are designed by an intelligent person and to the extent they “act intelligent” it’s all due to the person providing the thinking for it.
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to.
I’m not sure this is at all well-defined. I’m curious, what would make you change your mind? If for example, Colton’s systems constructed new definitions, proofs, conjectures, and counter-examples in math would that be enough to decide they were learning?
Could you explain how this is connected to the issue of making new knowledge?
Or: show me the code, and explain to me how it works, and how the code doesn’t contain all the knowledge the AI creates.
This seems a bit like showing a negative. I will suggest you look for a start at Simon Colton’s paper in the Journal of Integer Sequences which uses a program that operates in a way very close to the way you think an AI would need to operate in terms of making conjectures and trying to refute them. I don’t know if the source code is easily available. It used to be on Colton’s website but I don’t see it there anymore; if his work seems at all interesting to you you can presumably email him requesting a copy. I don’t know how to show that the AI “doesn’t contain all the knowledge the AI creates” aside from the fact that the system constructed concepts and conjectures in number theory which had not previously been constructed. Moreover, Colton’s own background in number theory is not very heavy, so it is difficult to claim that he’s importing his own knowledge into the code. If you define more precisely what you mean by the code containing the knowledge I might be able to answer that further. Without a more precise notion it isn’t clear to me how to respond.
Holding a conversation requires creating knowledge of what the other guy is saying.
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
In a spam filter, a programmer figures out how he wants spam filtered (he has the idea), then he tells the computer to do it. The computer doesn’t figure out the idea or any new idea.
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension; they contain important knowledge such as the knowledge of optics that makes their eyes work. This is why “Where can cats come from?” has been considered an important question (people want an explanation of the knowledge which i sometimes called “apparent design), while “Where can rocks come from?” is not in the same category of question (it does have some interest for other reasons).
With people, people create ideas that aren’t in their genes, and were’t told to them by their parents or anyone else. That includes abstract ideas that aren’t the summation of observation. They sometimes create ideas no one ever thought of before. THey create new ideas.
In an AI (AGI you call it?) should be like a person: it should create new ideas which are not in it’s “genes” (programming). If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension;
Are genetic algorithm systems then creating something new by your definition?
In an AI (AGI you call it?)
Different concepts. An artificial intelligent is not (necessarily) a well-defined notion. An AGI is an artficial general intelligence, essentially something that passes the Turing test. Not the same concept.
If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
When you “discover” that 2+1 = 3, given premises and axioms, you aren’t discovering something new.
But working mathematicians do more than that. They create new knowledge. It includes:
1) they learn new ways to think about the premises and axioms
2) they do not publish deductively implied facts unselectively or randomly. they choose the ones that they consider important. by making these choices they are adding content not found in the premises and axioms
3) they make choices between different possible proofs of the same thing. again where they make choices they are adding stuff, based on their own non-deductive understanding
4) when mathematicians work on proofs, they also think about stuff as they go. just like when experimental scientists do fairly mundane tasks in a lab, at the same time they will think and make it interesting with their thoughts.
Are genetic algorithm systems then creating something new by your definition?
They could be. I don’t think any exist yet that do. For example I read a Dawkins paper about one. In the paper he basically explained how he tweaked the code in order to get the results he wanted. He didn’t, apparently, realize that it was him, not the program, creating the output.
By “AI” I mean AGI. An intelligence (like a person) which is artificial. Please read all my prior statements in light of that.
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
Well, OK, but they’d understand how it was created, and could explain that. They could explain what they know about why it works (it copies what humans do). And they could also make the code public and discuss what it doesn’t include (e.g. hard coded special cases. except for the 3 he included on purpose, and he explains why they are there). That’d be pretty convincing!
I don’t think this is true. While he probably wouldn’t announce it if he was working on AI, he’ has indicated that he’s working on two books (HPMoR and a rationality book), and has another book queued. He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
I more meant “he’s probably thinking about this in the back of his mind fairly often”, as well as trying to be humourous.
He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
Do you know what he would think of work that has a small chance of solving goal stability and a slightly larger chance of helping with AI in general? This seems like a net plus to me, but you seem to have heard what he thinks should be studied from a slightly clearer source than I did.
I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
The reason is that if you could predict this you would in effect already have made the discovery.
Understanding is not primarily predictive and it is useful in a practical way. For example, you have to understand issues to address critical arguments offered by your peers. Merely predicting that they are wrong isn’t a good approach. It’s crucial to understand what their point is and to reason with them.
Understanding ideas helps us improve on them. It’s crucial to making judgments about what would be an improvement or not. It lets us judge changes better b/c e.g. we have some conception of why it works, which means we can evaluate what would break it or not.
I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
That is not what I meant. If we could predict that the LHC will discover superparticles then yes, we would already know that. However, since we don’t know whether it will produce superparticles, we can predict that it will give us a lot of knowledge, since we will either learn that superparticles in the mass range detectable by the LHC exist or that they do not exist, so we can predict that we will learn a lot more about the universe by continuing to run the LHC than by filling in the tunnel where it is housed.
So if new knowledge doesn’t come from prediction, what creates it? Answering this is one of epistemology’s main tasks. If you are focussing on prediction then you aren’t addressing it. Am I missing something?
New knowledge comes from observation. If you are referring to knowledge of what a theory says rather than of which theory is true, then this is assumed to be known. The math of how to deal with a situation where a theory is known but its consequences cannot be fully understood due to mathematical limitations is still in its infancy, but this has never posed a problem in practice.
That is a substantive and strong empiricist claim which I think is false.
For example, we have knowledge of things we never observed. Like stars. Observation is always indirect and its correctness always depends on theories such as our theories about whether the chain of proxies we are observing with will in fact observe what we want to observe.
Do you understand what I’m talking about and have a reply, or do you need me to explain further?
then this is assumed to be known
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
It is assumed in practice, applied epistemology being a rather important thing to have. In ‘pure’ epistemology, it is just labelled incomplete; we definitely don’t have all the answers yet.
it is just labelled incomplete; we definitely don’t have all the answers yet.
It seems to me that you’re pretty much conceding that your epistemology doesn’t work. (All flaws can be taken as “incomplete” parts where, in the future, maybe a solution will be found.)
That would leave the following important disagreement: Popper’s epistemology is not incomplete in any significant way. There is room for improvement, sure, but not really any flaws worth complaining about. No big unsolved problems marring it. So, why not drop this epistemology that doesn’t have the answers yet for one that does?
It seems to me that you’re pretty much conceding that your epistemology doesn’t work.
Would you describe quantum mechanics’ incompatibility with general relativity as “the theory doesn’t work”? For a being with unlimited computing power in a universe that is known to be computable (except for the being itself obviously), we are almost entirely done. Furthermore, many of the missing pieces to get from that to something much more complete seem related.
Popper’s epistemology is not incomplete in any significant way.
No, it is just wrong. Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories. Any consistent consequentialist decision rule must be basically equivalent to that. The statement that there is no way to assign probabilities to theories therefore implies that there is no algorithm that a consequentialist can follow to reliably achieve their goals. Note that even if Popper’s values are not consequentialist, a consequentialist should still be able to act based on the knowledge obtained by a valid epistemology.
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories.
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t believe these givens are all true. If you think otherwise could we start with you giving the details more? I don’t want to argue with parts you simply omitted b/c I’ll have to guess what you think too much.
As a separate issue, “given my preferences” is such a huge given. It means that your epistemology does not deal in moral knowledge. At all. It simply takes preferences as givens and doesn’t tell you which to have. So in practice in real life it cannot be used for a lot of important issues. That’s a big flaw. And it means a whole entire second epistemology is needed to deal in moral knowledge. And if we have one of those, and it works, why not use it for all knowledge?
The rest of the paragraph was what I meant by this. You agree that Popperian epistemology states that theories should not be assigned probabilities.
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Depends. If it’s standards make it useless, then, while internally consistent, I can judge it to be pointless. I just want an epistemology that can help me actually make decisions based on what I learn about reality.
Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories.
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future. In this context, it is far more general than anything composed of a bunch of term, each of which describes some properties of a person.
It simply takes preferences as givens and doesn’t tell you which to have.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
I just want an epistemology that can help me actually make decisions based on what I learn about reality.
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
Overall you’ve said almost nothing that’s actually about Popperian epistemology. You just took one claim (which has nothing to do with what it’s about, it’s just a minor point about what it isn’t) and said it’s wrong (without detailed elaboration).
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future.
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
How do you learn what preferences are good to have, in that way?
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules. Even if the probabilities are not mentioned when constructing the rule, they can be inferred from its final form.
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
I don’t know what you mean by ′ “utility” the thing people in real life have’.
How do you learn what preferences are good to have, in that way?
Can we please not get into this. If it helps, assume I am an expected paperclip maximizer. How would I decide then?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
What was the argument for that?
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
I don’t know what you mean by ′ “utility” the thing people in real life have’.
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
Can we please not get into this.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
What was the argument for that?
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
I thought you didn’t address the issue (and need to): you did not say what mathematical properties you think that real utilities have and how you deal with them.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
Using what premises?
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
Every possible universe is associated with a utility.
Any two utilities can be compared.
These comparisons are transitive.
Weighted averages of utilities can be taken.
For any three possible universe, L, M, and N, with L < M, a weighted average of L and N is less than a weighted average of M and N, if N is accorded the same weight in both cases.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
Using what premises?
Basically just definitions. I’m currently trying to enumerate them, which is why I wanted to find the proof of the theorem we were discussing.
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
Care about in the sense of when I’m deciding whether to make it. I don’t really care about how reasonable other people’s decisions are unless it’s relevant to my interactions with them, where I will need that knowledge to make my own decisions.
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Wait, you bought the book just for that proof? I don’t even know if its the best proof of it (in terms of making assumptions that aren’t necessary to get the result). I’m confidant in the proof because of all the other similar proofs I’ve read, though none seem as widely applicable as that one. I can almost sketch a proof in my mind. Some simple ones are explained well at http://en.wikipedia.org/wiki/Coherence_%28philosophical_gambling_strategy%29 .
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
Wait, you bought the book just for that proof?
No, I used inter library loan.
I don’t really care about how reasonable other people’s decisions are
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
Basically just definitions.
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
Yes, I did. Oops.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
But that is what a choice is between—the universe where you choose one way and the universe where you choose another. Often large parts of the universe are ignored, but only because the action’s consequences for those parts are not distinguishable from how those part would be if a different action was taken. A utility function may be a sum (or more complicated combination) of parts referring to individual aspects of the universe, but, in this context, let’s not call those ‘utilities’; we’ll reserve that word for the final thing used to make decisions. Most of this is not consciously invoked when people make decisions, but a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice.
I don’t really care about how reasonable other people’s decisions are
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
I could could achieve better consequences using an ‘irrational’ process, I would, but this sounds nonsensical because I am used to defining ‘rational’ as that which reliably gets the best consequences.
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
But that is what a choice is between—the universe where you choose one way and the universe where you choose another.
So what? I think you’re basically saying that your formulation is equivalent to what people (should) do. But that doesn’t address the issue of what people actually do—it doesn’t demonstrate the equivalence. As you guys like to point out, people often think in ways that don’t make sense, including violating basic logic.
But also, for example, I think a person might evaluate getting a cat, and getting an iphone, and then they might (incorrectly) evaluate both by adding the benefits instead of by considering the universe with both based on its own properties.
Another issue is that I don’t think any two utilities people have can be compared. They are sometimes judged with different, contradictory standards. This leads to two major issues when trying to compare them 1) the person doesn’t know how 2) it might not be possible to compare even in theory because one or both contain some mistakes. the mistakes might need to be fixed before comparing, but that would change it.
a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice
I’m not saying people are doing it correctly. Whether they are right or wrong has no bearing on whether “utility” the mathematical object with the 5 properties you listed corresponds to “utility” the thing people do.
If you want to discuss what people should do, rather than what they do do, that is a moral issue. So it leads to questions like: how does bayesian epistemology create moral knowledge and how does it evaluate moral statements?
If you want to discuss what kind of advice is helpful to people (which people?), then I”m sure how you can see how talking about entire universes could easily confuse people, and how some other procedure being a special case of it may not be very good advice which does not address the practical problems they are having.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
Do you think that the Dutch book arguments go “way beyond deductive or definitions”? Well, I guess that would depend on what you conclude from them. For now, lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
The confusion here is that we’re not judging an action. If I make a mistake and happen to benefit from it, there were good consequences, but there was no choice involved. I don’t care about this; it already happened. What I do care about, and what I can accomplish, is avoiding similar mistakes in the future.
If you want to discuss what people should do, rather than what they do do, that is a moral issue.
Yes, that is what I was discussing. I probably don’t want to actually get into my arguments here. Can you give an example of what you mean by “moral knowledge”?
Applying dutch book arguments to real life situations always goes way behind deduction and definitions, yes.
lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
A need? Are you talking about morality now?
Why are we saying this? You now speak of probabilities of events. Previously we were discussing epistemology which is about ideas. I object to assigning probabilities to the truth of ideas. Assigning them to events is OK when
1) the laws of physics are indeterministic (never, as far as we know)
2) we have incomplete information and want to make a prediction that would be deterministic except that we have to put several possibilities in some places, which leads to several possible answers. and probability is a reasonable way to organize thoughts about that.
So what?
Can you give an example of what you mean by “moral knowledge”?
Murder is immoral.
Being closed minded makes ones life worse because it sabotages improvement.
Can you give an example of what you mean by “moral knowledge”?
Murder is immoral.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”? How would you test this? What would you consider a proof of it?
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means. I am a moral realist to some, a relativist to others, and an error theorist to other others. I could prove the statement for many common non-confused definitions, though not for, for example, people who say ‘morality’ is synomnymous to ‘that which is commanded by God’, which is based on confusion but at least everyone can agree on when it is or isn’t true and not for error theorists, as both groups’ definitions make the sentence false.
Being closed minded makes ones life worse because it sabotages improvement.
In theory I could prove this sentence, but in practice I could not do this clearly, especially over the internet. It would probably be much easier for you to read the sequences, which get to this toward the end, but, depending on your answers to some of my questions, there may be an easier way to explain this.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”?
Yes. One epistemology. All types of knowledge. Unified!
How would you test this?
You would not.
What would you consider a proof of it?
We don’t accept proofs of anything, we are fallibilists. We consider mathematical proofs to be good arguments though. I don’t really want to argue about those (unless you’re terribly interested. btw this is covered in the math chapter of The Fabric of Reality by David Deutsch). But the point is we don’t accept anything as providing certainty or even probableness. In our terminology, nothing provides justification.
What we do instead is explain our ideas, and to criticize mistakes, and in this way to improve our ideas. This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
Well, biological evolution is a much smaller part of conceptspace than “replication, variation, selection” and now I’m realizing that you probably haven’t read A Human’s Guide to Words which is extremely important and interesting and, while you’ll know much of it, has things that are unique and original and that you’ll learn a lot from. Please read it.
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
I do apply reason to those things, I just don’t use the words ‘morality’ in my reasoning process because too many people get confused. It is only a word after all.
On a side note, I am staring to like what I hear of Popper. It seems to embody an understanding of the brain and a bunch of useful advice for it. I think I disagree with some things, but on grounds that seems like the sort of thing that is accepted as motivation for the theory self-modify. Does that make sense? Anyways, it’s not Popper’s fault that there are a set of theorems that in principle remove the need for other types of thought and in practice cause big changes in the way we understand and evaluate the heuristics that are necessary because the brain is fallible and computationally limited.
Wei Dai likes thinking about how to deal with questions outside of Bayesianism’s current domain of applicability, so he might be interested in this.
lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
A need? Are you talking about morality now?
Interpret this as a need in order to achieve some specified goal in order to keep this part the debate out of morality. A paperclip maximizer, for example would obviously need to not pay 200 paperclips for a lottery with a maximum payout of 100 paperclips in order to achieve its goals. Furthermore, this applies to any consequentialist set of preferences.
Why are we saying this? You now speak of probabilities of events.
So you assume morality (the “specified goal”). That makes your theory amoral.
Well there’s a bit more than this, but it’s not important right now. One can work toward any goal just by assuming it as a goal.
Why is there a need to assign probabilities to theories? Popperian epistemology functions without doing that.
Because of the Dutch book arguments. The probabilities can be inferred from the choices. I’m not sure if the agent’s probability distribution can be fully determined from a finite set of wagers, but it can be definitely be inferred to an arbitrary degree of precision by adding enough wagers.
Can you give an example of how you use a Dutch book argument on a non-gambling topic? For example, if I’m considering issues like whether to go swimming today, and what nickname to call my friend, and I don’t assign probabilities like “80% sure that calling her Kate is the best option”, how do I get Dutch Booked?
First you hypothetically ask what would happen if you were asked to make bets on whether calling her Kate would result in world X (with utility U(X)). Do this for all choices and all possible worlds. This gives you probabilities and utilities. You then take a weighted average, as per the VNM theorem.
You don’t get to decide utilities so much as you have to figure out what they are. You already have a utility function, and you do your best to describe it . How do you weight the things you value relative to each other?
This takes observation, because what we think we value often turns out not to be a good description of our feelings and behavior.
By criticizing them. And conjecturing improvements which meet the challenges of the criticism. It is the same method as for improving all other knowledge.
In outline it is pretty simple. You may wonder things like what would be a good moral criticism. To that I would say: there’s many books full of examples, why dismiss all that? There is no one true way of arguing. Normal arguments are ok, I do not reject them all out of hand but try to meet their challenges. Even the ones with some kind of mistake (most of them), you can often find some substantive point which can be rescued. It’s important to engage with the best versions of theories you can think of.
BTW once upon a time I was vaguely socialist. Now I’m a (classical) liberal. People do change their fundamental moral values for the better in real life. I attended a speech by a former Muslim terrorist who is now a pro-Western Christian (walid shoebat).
I’ve changed my social values plenty of times, because I decided different policies better served my terminal values. If you wanted to convince me to support looser gun control, for instance, I would be amenable to that because my position on gun control is simply an avenue for satisfying my core values, which might better be satisfied in a different way.
If you tried to convince me to support increased human suffering as an end goal, I would not be amenable to that, unless it turns out I have some value I regard as even more important that would be served by it.
This is what Popper called the Myth of the Framework and refuted in his essay by that name. It’s just not true that everyone is totally set in their ways and extremely closed minded, as you suggest. People with different frameworks learn from each other.
One example is children learn. They are not born sharing their parents framework.
You probably think that frameworks are genetic, so they are. Dealing with that would take a lengthy discussion. Are you interested in this stuff? Would you read a book about it? Do you want to take it seriously?
I’m somewhat skeptical b/c e.g. you gave no reply to some of what I said.
I think a lot of the reason people don’t learn other frameworks, in practice, is merely that they choose not to. They think it sounds stupid (before they understand what it’s actually saying) and decide not to try.
When did I suggest that everyone is set in their ways and extremely closed minded? As I already pointed out, I’ve changed my own social values plenty of times. Our social frameworks are extremely plastic, because there are many possible ways to serve our terminal values.
I have responded to moral arguments with regards to more things than I could reasonably list here (economics, legal codes, etc.) I have done so because I was convinced that alternatives to my preexisting social framework better served my values.
Valuing strict gun control, to pick an example, is not genetically coded for. A person might have various inborn tendencies which will affect how they’re likely to feel about gun control; they might have innate predispositions towards authoritarianism or libertarianism, for instance, that will affect how they form their opinion. A person who valued freedom highly enough might support little or no gun control even if they were convinced that it would result in a greater loss of life. You would have a hard time finding anyone who valued freedom so much that they would support looser gun control if they were convinced it would destroy 90% of the world population, which gives you a bit of information about how they weight their preferences.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more that would be served by this. If you could argue that my preference for happiness is arbitrary, that preference for suffering is more natural, even if you could demonstrate that the moral goodness of human suffering is intrinsically inscribed on the fabric of the universe, why should I care? To make me want to make humans unhappy, you’d have to convince me there’s something else I want enough to make humans unhappy for its sake.
I also don’t feel I’m being properly understood here; I’m sorry if I’m not following up on everything, but I’m trying to focus on the things that I think meaningfully further the conversation, and I think some of your arguments are based on misapprehensions about where I’m coming from. You’ve already made it clear that you feel the same, but you can take it as assured that I’m both trying to understand you and make myself understood.
When did I suggest that everyone is set in their ways and extremely closed minded?
You suggested it about a category of ideas which you called “core values”.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
That’s more or less correct. To quote one of Eliezer’s works of ridiculous fanfiction, “A moral system has room for only one absolute commandment; if two unbreakable rules collide, one has to give way.”
If circumstances force my various priorities into conflict, some must give way to others, and if I value one thing more than anything else, I must be willing to sacrifice anything else for it. That doesn’t necessarily make it my only terminal value; I might have major parts of my social framework which ultimately reduce to service to another value, and they’d have to bend if they ever came into conflict with a more heavily weighted value.
Well in the first half, you get Dutch booked in the usual way. It’s not necessarily actually happening, but there still must be probabilities that you would use if it were. In the second half, if you don’t follow the procedure (or an equivalent one) you violate at least one VNM axiom.
If you violate axiom 1, there are situations in which you don’t have a preferred choice—not as is “both are equally good/bad” but as in your decision process does not give an answer or gives more than one answer. I don’t think I’d call this a decision process.
If you violate axiom 2, there are outcomes L, M and N such that you’d want to switch from L to M and then from M to N, but you would not want to switch from L to N.
Axiom 3 is unimportant and is just there to simplify the math.
For axiom 4, imagine a situation where a statement with unknown truth-value, X, determines whether you get to choose between two outcomes, L and M, with L < M, or have no choice in accepting a third outcome, N. If you violate the axiom, there is a situation like this where, if you were asked for your choice before you know X (it will be ignored if X is false), you would pick L, even though L < M.
Do any of these situations describe your preferences?
And I’m still curious how the utilities are decided. By whim?
If your decision process is not equivalent to one that uses the previously described procedure, there are situations where something like one of the following will happen.
I ask you if you want chocolate or vanilla ice cream and you don’t decide. Not just you don’t care which one you get or you would prefer not to have ice cream, but you don’t output anything and see nothing wrong with that.
You prefer chocolate to vanilla ice cream, so you would willingly pay 1c to have the vanilla ice cream that you have been promised upgraded to chocolate. You also happen to prefer strawberry to chocolate, so you are willing to pay 1c to exchange a promise of a chocolate ice cream for a promise of a strawberry ice cream. Furthermore, it turn out you prefer vanilla to strawberry, so whenever you are offered a strawberry ice cream, you gladly pay a single cent to change that to an offer of vanilla, ad infinitum.
N/A
You like chocolate ice cream more than vanilla ice cream. Nobody knows if you’ll get ice cream today, but you are asked for your choice just in case, so you pick vanilla.
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
Let’s say they did learn better. How would they do this—changing their utility function? Someone with a utility function like this really does prefer B+1c to A, C+1c to B, and A+1c to C. Even if they did change their utility function, the new one would either have a new hole or it would obey the results of the VNM-theorem.
So Bayes teaches: do not disobey the laws of logic and math.
Still wondering where the assigning probabilities to truths of theories is.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far. And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
Still wondering where the assigning probabilities to truths of theories is.
Well the Dutch books make it so you have to pick some probabilities. Actually getting the right prior is incomplete, though Solomonoff induction is most of the way there.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far.
Where else are you hoping to go?
And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
In principle, yes. There’s actually a computer program called AIXItl that does it. In practice I use approximations to it. It probably could be done to a very higher degree of certainty. There are a lot of issues and a lot of relevant data.
Well the Dutch books make it so you have to pick some probabilities.
Can you give an example? Use the ice cream flavors. What probabilities do you have to pick to buy ice cream without being dutch booked?
Where else are you hoping to go?
Explanatory knowledge. Understanding the world. Philosophical knowledge. Moral knowledge. Non-scientific, non-emprical knowledge. Beyond prediction and observation.
In principle, yes.
How do you know if your approximations are OK to make or ruin things? How do you work out what kinds of approximations are and aren’t safe to make?
The way I would do that is by understanding the explanation of why something is supposed to work. In that way, I can evaluate proposed changes to see whether they mess up the main point or not.
Endo, I think you are making things more confusing by combining issues of Bayesianism with issues of utility. It might help to keep them more separate or to be clear when one is talking about one, the other, or some hybrid.
I use the term Bayesianism to include utility because (a) they are connected and (b) a philosophy of probabilities as abstract mathematical constructs with no applications doesn’t seem complete; it needs an explanation of why those specific objects are studied. How do you think that any of this caused or could cause confusion?
Well, it empirically seems to be causing confusion. See curi’s remarks about the ice cream example. Also, one doesn’t need Bayesianism to include utility and that isn’t standard (although it is true that they do go very well together).
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
So the argument is now not that that suboptimal issues don’t exist but that they aren’t a big deal? Are you aware that the primary reason that this involves small amounts of ice cream is for convenience of the example? There’s no reason these couldn’t happen with far more serious issues (such as what medicine to use).
I know. I thought it was strange that you said “ad infinitum” when it would not go on forever. And that you presented this as dire but made your example non-dire.
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
How can I notice? I imagine you will say that involves probabilities. But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
And what I definitely don’t see is probabilities that various theories are true (as opposed to probabilities about events which are ok).
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
Yes, but the Bayesian avoids having this step. For any step you can construct a “criticism” that will duplicate what the Bayesian will do. This is connected to a number of issues, including the fact that what constitutes valid criticism in a Popperian framework is far from clear.
But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
Ice cream is an analogy. It might not be a great one since it is connected to preferences (which sometimes gets confused with Bayesianism). The analogy isn’t a great one. It might make more sense to just go read Cox’s theorem and translate to yourself what the assumptions mean about an approach.
what constitutes valid criticism in a Popperian framework is far from clear.
Anything which is not itself criticized.
Ice cream is an analogy.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out? To help concretize the idea for me.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out
Well, I’m not sure, in that I’m not convinced that Dutch Booking really does occur much in real life other than in the obvious contexts. But there are a lot of contexts it does occur in. For example, a fair number of complicated stock maneuvers can be thought of essentially as attempts to dutch book other players in the stock market.
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
“data” could be any observable property of the world
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Can you give an example using a moral argument, or anything that would help illustrate how you take things that don’t look like they are Bayes’ law cases and apply it anyway?
The linked page says imperfectly efficient minds give off heat and that this is probabilistic (which is weird b/c the laws of physics govern it and they are not probabilistic but deterministic). Even if I accept this, I don’t quite see the relevance. Are you reductionists? I don’t think that the underlying physical processes tell us everything interesting about the epistemology.
Confirmation cannot be any evidence for universal theories. None, probabilistic or otherwise. Popper explained this and did the math. If you disagree people provide the math that governs it and explain how it works.
I know math. The problem is that you haven’t provided anything that works, or any criticism of Popper. Basically all your contributions to the discussion are appeals to authority. You don’t argue, you just say “This source is right; read it and concede”. And most of your sources are wikipedia quality… If you won’t say anything I can’t find on google, why talk at all?
There are plenty of explanations of Solomonoff induction out there. You asked for how the math of confirmation works—and that’s the math of universal inductive inference. If you just want an instance of confirmation, see Bayes’s theorem.
It is not an “appeal to authority” to direct you to the maths that answers your query!
That brings us to the second part of the Yudkowsky quote that you criticised:
Above you agree that Popper did argue this. Also, it is a fact—Popper did argue for this difference:
Yet finding evidence against a theory is actually a probabilistic process—just like confirmation is. So, Yudkowsky is correct in saying that Popper was wrong about this. Popper really did believe and promote this kind of material.
Popper did not argue that that confirmation and falsification have fundamentally different rules. They both obey the rules of logic.
Confirmation cannot be any evidence for universal theories. None, probabilistic or otherwise. Popper explained this and did the math. If you disagree people provide the math that governs it and explain how it works.
As to the rest you’re asking how Popper deals with fallible evidence. If you would read his books you could find the answer. He does have an answer, not none, and it isn’t probabilistic.
Let me ask you: how do you deal with the regress I asked Manfred about?
Have you read http://lesswrong.com/lw/ih/absence_of_evidence_is_evidence_of_absence/ ?
It says “When we see evidence, hypotheses that assigned a higher likelihood to that evidence, gain probability at the expense of hypotheses that assigned a lower likelihood to the evidence.”
This does not work. There are infinitely many possible hypotheses which assign a 100% probability to any given piece of evidence. So we can’t get anywhere like this. The probability of each remains infinitesimal.
Actually, its possible to have infinitely many hypotheses, each assigned non-infinitesimal evidence. For example, I could assign probability 50% to the simplest hypothesis, 25% to the second simplest, 12.5% to the third simplest and so on down (I wouldn’t necessarily reccomend that exact assignment, its just the easiest example).
In general, all we need is a criterion of simplicity, such that there are only finitely many hypotheses simpler than any given hypothesis (Kolmogorov Complexity and Minimum Message Length both have this property) and an Occam’s razor style rule saying that simpler hypotheses get higher prior probabilities than more complex hypotheses. Solomonoff induction is a way of doing this.
It seems like people are presenting a moving target. First I was directed to one essay. In response to my criticism of a statement from that essay, you suggest that a different technique other than the one I quoted could work. Do you think I was right that the section of the essay I quoted doesn’t solve the problem?
I am aware that you can assign probabilities to infinite sets in the way you mention. This is beside the point. If you get the probabilities above infinitesimal by doing that it’s simply a different method than the one I was commenting on. The one I commented on, in which “hypotheses that assigned a higher likelihood to that evidence, gain probability” does not get them above infinitesimal or do anything very useful.
Some very general remarks:
You’re missing the point, which is that we need to act—we need to use the information we have as best we can in order to achieve ‘the greatest good’. (The question of what ‘the greatest good’ means is non-trivial but it’s orthogonal to present concerns.)
The agent chooses an action, and then depending on the state of world, the effects of the action are ‘good’ or ‘bad’. Here, the expression “the state of the world” incorporates both contingent facts about ‘how things are’ and the ‘natural laws’ describing how present causes have future effects.
Now, one very broad strategy for answering the question “what should we do?” is to try to break it down as follows:
We assign ‘weights’ p[i] to a wide variety of different ‘states of the world’, to represent the incomplete (but real) information we have thus far acquired.
For each such state, we calculate the effects that each of our actions a[j] would have, and assign ‘weights’ u[i,j] to the outcomes to represent how desirable we think they are.
We choose the action a[j] such that Sum(over i) p[i] * u[i,j] is maximized.
As a matter of terminology, we refer to the weights in step 1 as “probabilities” and those in step 2 as “utilities”.
Here’s an important question: “To what extent is the above procedure inevitable if we are to make rational decisions?”
The standard Lesswrong ideology here is that the above procedure (supplemented by Bayes’ theorem for updating ‘probability weights’) is absolutely central to ‘rationality’ - that any rational decision-maker must be following it, whether explicitly or implicitly.
It’s important to understand that Lesswrong’s discussions of rationality take place in the context of ‘thinking about how to design an artificial intelligence’. One of the great virtues of the Bayesian approach is that it’s clear what it would mean to implement it, and we can actually put it into practice on a wide variety of problems.
Anyway, if you want to challenge Bayesianism then you need to show how it makes irrational choices. It’s not sufficient to present a philosophical view under which assigning probabilities to theories is itself irrational, because that’s just a means to an end. What matters is whether an agent makes clever or stupid decisions, not how it gets there.
And now something more specific:
No-one but you ever assumed that the hypotheses would begin at infinitesimal probability. The idea that we need to “assign probabilities to infinite sets in the way [benelliot] mention[ed]” is so obvious and commonplace that you should assume it even if it’s not actually spelled out.
In your theory, do the probabilities of the infinitely many theories add up to 1?
Does increasing their probabilities ever change the ordering of theories which assigned the same probability to some evidence/event?
If all finite sets of evidence leave infinitely many theories unchanged in ordering, then would we basically be acting on the a priori conclusions built into our way of assigning the initial probabilities?
If we were, would that be rational, in your view?
And do you have anything to say about the regress problem?
The ‘moving target’ effect is caused by the fact that you are talking to several different people, the grandparent is my first comment in this discussion.
The concept mentioned in that essay is Bayes’ Theorem, which tells us how to update our probabilities on new evidence. It does not solve the problem of how to avoid infinitely many hypotheses for the same reason that Newton’s laws to not explain the price of gold in London, its not supposed to. Bayes theorem tells us how to change our probabilities with new evidence, and in the process assumes that those probabilities are real numbers (as opposed to infinitesimals).
Solomonoff induction tells us how to assign the initial probabilities, which are then fed into Bayes theorem to determine the current probabilities after adjusting based on the evidence. Both are essential, criticising BT for not doing SI’s job is like saying a car’s wheels are useless because they can’t do the engine’s job of providing power.
Does any of this deal with the infinite regress problem?
I’m sorry, what is the infinite regress problem?
I don’t see any infinite regress at all, Solomonoff Induction tells us the prior, Bayesian Updating turns the prior into a posterior. They depend on each other to work properly but I don’t think they depend on anything else (unless you wish to doubt the basics of probability theory).
The regress was discussed in other comments here. I took you to be saying “everything together, works” and wanting to discuss the philosophy as a whole.
I thought that would be more productive than arguing with you about whether Bayes theorem really “assumes that those probabilities are real numbers” and various other details. That’s certainly not what other people here told me when I brought up infinitesimals. I also thought it would be more productive than going back to the text I quoted and explaining why that quote doesn’t make sense. Whether it is correct or not isn’t very important if a better idea, along the same lines, works.
The regress argument begins like this: What is the justification or probability for Solomonoff Induction and Bayesian updating? Or if they are not justified, and do not have a probability, then why should we accept them in the first place?
When you say they don’t depend on anything else, maybe you are answering the regress issue by saying they are unquestionable foundations. Is that it?
Well, to some extent every system must have unquestionable foundations, even maths must assume the axioms. The principle of induction (the more something has happened in the past, the more likely it is to happen in the future, all else being equal), cannot be justified without the justification being circular, but I doubt you could get through a single day without it. Ultimately every approach must fall back on an infinite regress as you put it, this doesn’t prevent that system from working.
However, both Bayes’ Theorem and Solomonoff Induction can be justified:
Bayes’ Theorem is an elementary deductive consequence of basic probability theory, particular the fairly obvious (at least it seems that way to me) that P(A&B) = P(A)*P(B|A). If it doesn’t seem obvious to you, then I know of at least two approaches for proving it. One is the Cox theorems, which begin by saying we want to rank statements by their plausibility, and we want certain things to be true this ranking (it must obey the laws of logic, it must treat hypotheses consistently etc), and from these derive probability theory.
Another approach is the Dutch Book arguments, which show that if you are making bets based on your probability estimates of certain things being true, then unless your probability estimates obey Bayes Theorem you can be tricked into a set of bets which result in a guaranteed loss.
To justify Solomonoff Induction, we imagine a theoretical predictor which bases its prior on Solomonoff Induction and updates by Bayes Theorem. Given any other predictor, we can compare our predictor to this opponent by comparing the probability estimates they assign to the actual outcome, then Solomonoff induction will at worst lose by a constant factor based on the complexity of the opponent.
This is the best that can be demanded of any prior, it is impossible to give perfect predictions in every possible universe, since you can always be beaten by a predictor taylor-made for that universe (which will generally perform very badly in most others).
(note: I am not an expert, it is possible that I have some details wrong, please correct me if I do)
“Well, to some extent every system must have unquestionable foundations”
No, Popper’s epistemology does not have unquestionable foundations.
You doubt I could get by without induction, but I can and do. Popper’s epistemology has no induction. It also has no regress.
Arguing that there is no choice but these imperfect concepts only works if there really is no choice. But there are alternatives.
I think that things like unquestionable foundations, or an infinite regress, are flaws. I think we should reject flawed things when we have better options. And I think Bayesian Epistemology has these flaws. Am I going wrong somewhere?
“However, both Bayes’ Theorem and Solomonoff Induction can be justified”
Justified by statements which are themselves justified (which leads to regress issues)? Or you mean justified given some unquestionable foundations? In your statements below, I don’t think you specify precisely what you deem to be able to justify things.
“Bayes’ Theorem is an elementary deductive consequence of basic probability theory”
Yes. It is not controversial itself. What I’m not convinced of is the claim that this basic bit of math solves any major epistemological problem.
Regarding Solomonoff induction, I think you are now attempting to justify it by argument. But you haven’t stated what are the rules for what counts as a good argument and why. Could you specify that? There’s not enough rigor here. And in my understanding Bayesian epistemology aims for rigor and that is one of the reasons they like math and try to use math in their epistemology. It seems to me you are departing from that worldview and its methods.
Another aspect of the situation is you have focussed on prediction. That is instrumentalist. Epistemologies should be able to deal with all categories of knowledge, not just predictive knowledge. For example they should be able to deal with creating non-emprical, non-predictive moral knowledge. Can Solomonoff induction do that? How?
Hang on, Popper’s philosophy doesn’t depend on any foundations? I’m going to call shenanigans on this. Earlier you gave and example of Popperian inference:
Unquestioned assumptions include, but are not limited to the following:
The objects under discussion actually exist (Solomonoff Induction does not make this assumption)
“There is no evidence which could prove T” is stated without any proof, what if you got all the swans in one place, what if you found a reason why the existence of a black swan was impossibile?
Any observation of a black swan must be correct (Bayes Theorem is explicitly designed to avoid this assumption)
You can generalise from this one example to a point about all theories
“Science is only interested in universal theories”. Really? Are palaeontology and astronomy not sciences? They are both often concerned with specifics.
You must always begin with assumptions, if nothing else you must assume maths (which is pretty much the only thing that Bayes Theorem and Solomonoff Induction do assume).
To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.
Stating that you believe Bayes’ theorem but doubt that it can actually solve epistemic problems is like saying you believe Pythagoras’ theorem but doubt it can actually tell you the side lengths of right angled triangles, it demonstrates a failure to internalise.
Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.
Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. Any argument will be convincing to some possible minds and not to others, and I’m okay with that, because I only have one mind.
Allow me to direct you to my all time favourite philosopher
That “Popperian inference” is simply logic.
Deductive arguments have premises, as you say.
Popper’s philosophy itself is not a deductive argument which depends on the truth of its premises and which, given their truth, is logically indisputable.
We’re well aware of issues like the fallibility of evidence (you may think you see a black swan, but didn’t). Those do not contradict this logical point about a particular asymmetry.
“You must always begin with assumptions”
No you don’t have to. Popper’s approach begins with conjectures. None of them are assumed, they are simply conjectured.
Here’s an example. You claim this is an assumption:
“You can generalise from this one example to a point about all theories”
In a Popperian approach, that is not assumed. It is conjectured. It is then open to critical debate. Do you see something wrong with it? Do you have an argument against it? Conjectures can be refuted by criticism.
BTW Popper wasn’t “generalizing”. He was making a point about all theories (in particular categories) in the first place and then illustrating it second. “Generalizing” is a vague and problematic concept.
“Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. ”
He argues, but without setting up predefined, static rules for argument. The rules for argument are conjectured, criticized, modified. They are a work in progress.
Regarding the Hume quote, are you saying you’re a positivist or similar?
“Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.”
Only probabilistic beliefs. I think it is only appropriate to use when you have actual numbers instead of simply having to assign them to everything involved by estimating.
“To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.”
Mistakes have real world consequences. I think Popper’s epistemology works better in the real world. Everyone thinks their epistemology is more practical. How can we decide? By looking at whether they make sense, whether they are refuted by criticism, etc… If you have a practical criticism of Popperian epistemology you’re welcome to state it.
I agree with that.
How does this translate into illustrating whether either epistemology has “real world consequences”? Criticism and “sense making” are widespread, varied, and not always valuable.
I think what would be most helpful if you set up a hypothetical example and then proceeded to show how Popperian espistemology would lead to a success while a Bayesian approach would lead to a “real world consequence.” I think your question, “How can we decide?” was perfect, but I think your answer was incorrect.
Example: we want to know if liberalism or socialism is correct.
Popperian approach: consider what problem the ideas in question are intended to solve and whether they solve it. They should explain how they solve the problem; if they don’t, reject them. Criticize them. If a flaw is discovered, reject them. Conjecture new theories also to solve the problem. Criticize those too. Theories similar to rejected theories may be conjectured; and it’s important to do that if you think you see a way to not have the same flaw as before. Some more specific statements follow:
Liberalism offers us explanations such as: voluntary trade is mutually beneficial to everyone involved, and harms no one, so it should not be restricted. And: freedom is compatible with a society that makes progress because as people have new ideas they can try them out without the law having to be changed first. And: tolerance of people with different ideas is important because everyone with an improvement on existing customs will at first have a different idea which is unpopular.
Socialism offers explanations like, “People should get what they need, and give what they are able to” and “Central planning is more efficient than the chaos of free trade.”
Socialim’s explanations have been refuted by criticisms like Mises’s 1920 paper which explained that central planners have no rational way to plan (in short: because you need prices to do economic calculation). And “need” has been criticized, e.g. how do you determine what is a need? And the concept of what people are “able to give” is also problematic. Of course the full debate on this is very long.
Many criticisms of liberalism have been offered. Some were correct. Older theories of liberalism were rejected and new versions formulated. If we consider the best modern version, then there are currently no outstanding criticisms of it. It is not refuted, and it has no rivals with the same status. So we should (until this situation changes) accept and use liberalism.
New socialist ideas were also created many times in response to criticism. However, no one has been able to come up with coherent ideas which address all the criticisms and still reach the same conclusions (or anything even close).
Liberalism’s “justification” is merely this: it is the only theory we do not currently have a criticism of. A criticism is an explanation of what we think is a flaw or mistake. It’s a better idea to use a theory we don’t see anything wrong with than one we do. Or in other words: we should act on our best (fallible) knowledge that we have so far. In this way, the Popperian approach doesn’t really justify anything in the normal sense, and does without foundations.
Bayesian approach: Assign them probabilities (how?), try to find relevant evidence to update the probabilities (this depends on more assumptions), ignore that whenever you increase the probability of liberalism (say) you should also be increasing the probability of infinitely many other theories which made the same empirical assertions. Halt when—I don’t know. Make sure the evidence you update with doesn’t have any bias by—I don’t know, it sure can’t be a random sample of all possible evidence.
No doubt my Bayesian approach was unfair. Please correct it and add more specific details (e.g. what prior probability does liberalism have, what is some evidence to let us update that, what is the new probability, etc...)
PS is it just me or is it difficult to navigate long discussions and to find new nested posts? And I wasn’t able to find a way to get email notification of replies.
I’m beginning to see where the problem in this debate is coming from.
Bayesian humans don’t always assign actual probabilities, I almost never do. What we do in practice is vaguely similar to your Popperian approach.
The main difference is that we do thought experiments about Ideal Bayesians, strange beings with the power of logical omniscience (which gets them round the problem of Solomonoff Induction being uncomputable), and we see which types of reasoning might be convincing to them, and use this a standard for which types are legitimate.
Even this might in practice be questioned, if someone showed be a thought experiment in which an ideal Bayesian systematically arrived at worse beliefs than some competitor I might stop being a Bayesian. I can’t tell you what I would use as a standard in this case, since if I could predict that theory X would turn out to be better than Bayesianism I would already be an X theorist.
Popperian reasoning, on the other hand, appears to use human intuition as its standard. The conjectures he makes ultimately come from his own head, and inevitably they will be things that seem intuitively plausible to him. It is also his own head which does the job of evaluating which criticisms are plausible. He may bootstrap himself up into something that looks more rigorous, but ultimately if his intuitions are wrong he’s unlikely to recover from it. The intuitions may not be unquestioned but they might as well be for all the chance he has of getting away from their flaws.
Cognitive science tells us that our intuitions are often wrong. In extreme cases they contradict logic itself, in ways that we rarely notice. Thus they need to be improved upon, but to improve upon them we need a standard to judge them by, something where we can say “I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”. A good example of this is conjunction bias, conjunctions are often more plausible than disjunctions to human intuition, but they are always less likely to be correct, and we know this through probability theory.
So here’s how a human Bayesian might look, this approach only reflects the level of Bayesian strength I currently have, and can definitely be improved upon.
We wouldn’t think in terms of Liberalism and Socialism, both of them are package deals containing many different epistemic beliefs and prescriptive advice. Conjunction bias might fool you into thinking that one of them is probably right, but in fact both are astonishingly unlikely.
We hold off on proposing solutions (scientifically proven to lead to better solutions) and instead just discuss the problem. We clearly frame exactly what our values are in this situation, possibly in the form of a precisely delineated utility function and possibly not, so we know what we are trying to achieve.
We attempt to get our facts straight. Each fact is individually analysed, to see whether we have enough evidence to overcome its complexity. This process continues permanently, every statement is evaluated.
We then suggest policies which seem likely to satisfy our values, and calculate which one is likely to do so best.
I’m not sure there’s actually a difference between the two approaches, ultimately only arrived at Bayesian through my intuitions as well, so there is no difference at the foundations. Bayesianism is just Popperianism done better.
PS there is a little picture of an envelope next to your name and karma score in the right hand corner. It turns red when one of your comments has a reply. Click on it too see the most recent replies to your comments.
No. It does not have a fixed standard. Fixed standards are part of the justificationist attitude which is a mistake which leads to problems such as regress. Justification isn’t possible and the idea of seeking it must be dropped.
Instead, the standard should use our current knowledge (the starting point isn’t very important) and then change as people find mistakes in it (no matter what standard we use for now, we should expect it to have many mistakes to improve later).
Popperian epistemology has no standard for conjectures. The flexible, tentative standard is for criticism, not conjecture.
The “work”—the sorting of good ideas from bad—is all done by criticism and not by rules for how to create ideas in the first place.
You imply that people are parochial and biased and thus stuck. First, note the problems you bring up here are for all epistemologies to deal with. Having a standard you tell everyone to follow does not solve them. Second, people can explain their methods of criticism and theory evaluation to other people and get feedback. We aren’t alone in this. Third, some ways (e.g. having less bias) as a matter of fact work better than others, so people can get feedback from reality when they are doing it right, plus it makes their life better (incentive). More could be said. Tell me if you think it needs more (why?).
“I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”
I think by “know” here you are referring to the justified, true belief theory of knowledge. And you are expecting that the authority or certainty of objective knowledge will defeat bias. This is a mistake. Like it or not, we cannot ever have knowledge of that type (e.g. b/c justification attempts lead to regress). What we can have is fallible, conjectural knowledge. This isn’t bad; it works fine; it doesn’t not devolve into everyone believing their bias.
Liberalism is not a package by accident. It is a collection of ideas around one theme. They are all related and fit together. They are less good in isolation—e.g. if you take away one idea you’ll find that now one of the other ideas has an unsolved and unaddressed problem. It is sometimes interesting to consider the ideas individually but to a significant extent they all are correct or incorrect as a group.
The way I’m seeing it is that most of the time you (and everyone else) does something roughly similar to what Popper said to. This isn’t a surprise b/c most people do learn stuff and that is the only method possible of creating any knowledge. But when you start using Bayesian philosophy more directly, by e.g. explicitly assigning and updating probabilities to try to settle non-probabilistic issues (like moral issues), then you start making mistakes. You say you don’t do that very often. OK. But there’s other more subtle ones. One is what Popper called The Myth of the Framework where you suggest that people with different frameworks (initial biases) will both be stuck on thinking that what seems right to them (now) is correct and won’t change. And you suggest the way past this is, basically, authoritative declarations where you put someone’s biases against Truth so he has no choice but to recant. This is a mistake!
PS wow that inbox page is helpful… :-)
To some extent our thought processes can certainly improve, however there is no guarantee of this, let me give an example to illustrate:
Alice is an inductive thinker, in general she believes that is something has happened often in the past it is more likely to happen in the future. She does treat this as an absolute, it is only probabilistic, and it does not work in certain specific situations (such as pulling beads out of a jar with 5 red and 5 blue beads), but she used induction to discover which situations those were. She is not particular worried that induction might be wrong, after all, it has almost always worked in the past.
Bob is an anti-inductive thinker, he believes that the more often something happens, the less likely it is to happen in the future. To him, the universe is like a giant bag of beads, and the more something happens the more depleted the universe’s supply of it becomes. He also concedes that anti-induction is merely probabilistic, and there are certain situations (the bag of beads example) were it has already worked a few times so he doesn’t think its very likely to work now. He isn’t particularly worried that he might be wrong, anti-induction has almost never worked for him before, so he must be set up for the winning streak of a lifetime.
Ultimately, neither will ever be convinced of the other’s viewpoint. If Alice conjectures anti-induction then she will immediately have a knock-down criticism, and vice versa for Bob and Induction. One of them has an irreversibly flawed starting point.
Like it or not, you, me, Popper and every other human is an Alice. If you don’t believe me, just ask which of the following criticisms seems more logically appealing to you:
“Socialism has never worked in the past, every socialist state has turned into a nightmarish tyranny, so this country shouldn’t become socialist”
“Liberalism has usually worked in the past, most liberal democracies are wealthy and have the highest standards of living in human history, so this country shouldn’t become liberal”
This might be correct, but there is a heavy burden of proof to show it. Liberalism and Socialism are two philosophies out of thousands (maybe millions) of possibilities. This means that you need huge amounts evidence to distinguish the two of them from the crowd and comparatively little evidence to distinguish one from the other.
That is a recipe for disaster. There are too many possible conjectures, we cannot consider them all, we need some way to prioritise some over others. If you do not specify a way then people will just do so according to personal preference.
As I see it, Popperian reasoning is pretty much the way humans reason naturally, and you only have to look at any modern political debate to see why that’s a problem.
Yes, there is no guarantee. One doesn’t need a guarantee for something to happen. And one can’t have guarantees about anything, ever. So the request for guarantees is itself a mistake.
The sketches you give of Bob and Alice are not like real people. They are simplified and superficial, and people like that could not function in day to day life. The situation with normal people is different. No everyday people have irreversibly flawed starting points.
The argument for this is not short and simple, but I can give it. First I’d like to get clear what it means, and why we would be discussing it. Would you agree that if my statement here is correct then Popper is substantially right about epistemology? Would you concede? If not, what would you make of it?
That is a misconception. One of its prominent advocates was Hume. We do not dispute things like this out of ignorance, out of never hearing it before. One of the many problems with it is that people can’t be like Alice because there is no method of induction—it is a myth that one could possibly do induction because induction doesn’t describe a procedure a person could do. Induction has no set of instructions to follow to offer.
That may sound strange to you. You may think it offers a procedure like:
1) gather data 2) generalize/extrapolate (induce) a conclusion from the data 3) the conclusion is probably right, with some exceptions
The problem is step 2 which does not how how to extrapolate a conclusion from a set of data. There are infinitely many conclusions consistent with any finite data set. So the entire procedure rests on having a method of choosing between them. All proposals made for this either don’t work or are vague. The one I would guess you favor is Occam’s Razor—pick the simplest one. This is both vague (what are the precise guidelines for deciding what is simpler) and wrong (under many interpretations. for example because it might reject all explanatory theories b/c omitting the explanation is simpler).
Another issue is how one thinks about things he has no past experience about. Induction does not answer that. Yet people do it.
I think they are both terrible arguments and they aren’t how I think about the issue.
The “burden of proof” concept is a justificationist mistake. Ideas cannot be proven (which violates fallibility) and they can’t be positively shown to be true. You are judging Popperian ideas by standards which Popper rejected which is a mistake.
But it works in practice. The reason it doesn’t turn into a disaster is people want to find the truth. They aren’t stopped from making a mess of things by authoritative rules but by their own choices because they have some understanding of what will and won’t work.
The authority based approach is a mistake in many ways. For example, authorities can themselves be mistaken and could impose disasters on people. And people don’t always listen to authority. We don’t need to try to force people to follow some authoritative theory to make them think properly, they need to understand the issues and do it voluntarily.
Personal preferences aren’t evil, and imposing what you deem the best preference as a replacement is an anti-liberal mistake.
No. Since Aristotle, justificationism has dominated philosophy and governs the unconscious assumptions people make in debates. They do not think like Popperians or understand Popper’s philosophy (except to the extent that some of their mental processes are capable of creating knowledge, and those have to be in line with the truth of the matter about what does create knowledge).
Since I’m not familiar with the whole of Popper’s position I’m noting going to accept it blindly. I’m also not even certain that he’s incompatible with Bayesianism.
Anyway, that fact that no human has a starting point as badly flawed as anti-induction doesn’t make Bayesianism invalid. It may well be that we are just very badly flawed, and can only get out of those flaws by taking the mathematically best approach to truth. This is Bayesianism, it has been proven in more than one way.
This is exactly we we need induction. It is usually possible to stick any future onto any past and get a consistent history, induction tells us that if we want a probable history we need to make the future and the past resemble each other.
People certainly say that. Most of them even believe it on a conscious level, but there in your average discussion there is a huge amount of other stuff going on, from signalling tribal loyalty to rationalising away unpleasant conclusions. You will not wander down the correct path by chance, you must use a map and navigate.
I have no further interest in talking with you if you resort to straw men like this. I am not proposing we set up a dictatorship and kill all non-Bayesians, nor am I advocating censorship of views opposed the correct Bayesian conclusion.
All I am saying is your mind was not designed to do philosophical reasoning. It was designed to chase antelope across the savannah, lob a spear in them, drag them back home to the tribe, and come up with an eloquent explanation for why you deserve a bigger share of the meat (this last bit got the lion’s share of the processing power).
Your brain is not well suited to abstract reasoning, it is a fortunate coincidence that you are capable of it at all. Hopefully, you are lucky enough to have a starting point which is not irreversibly flawed, and you may be able to self improve, but this should be in the direction of realising that you run on corrupt hardware, distrusting your own thoughts, and forcing them to follow rigorous rules. Which rules? The ones that have been mathematically proven to be the best seem like a good starting point.
(The above is not intended as a personal attack, it is equally true of everyone)
I did not say it makes Bayesianism invalid. I said it doesn’t make Popperism invalid or require epistemological pessimism. You were making myth of the framework arguments against Popper’s view. My comments on those were not intended to refute Bayesianism itself.
That is a mistake and Popper’s approach is superior.
Part 1: It is a mistake because the future does not resemble the past except in some vacuous senses. Why? Because stuff changes. For example an object in motion moves to a different place in the future. And human societies invent new technologies.
It is always the case that some things resemble the past and some don’t. And the guideline that “the future resembles the past” gives no guidance whatsoever in figuring out which are which.
Popper’s approach is to improve our knowledge piecemeal by criticizing mistakes. The primary criticisms of this approach are that is it is incapable of offering guarantees, authority, justification, a way to force people to go against their biases, etc.. These criticisms are mistaken: no viable theory offers what they want. Setting aside those objections—that Popper doesn’t meet standard too high for anything to meet—it works and is how we make progress.
Regarding people wanting to find the truth, indeed they don’t always. Sometimes they don’t learn. Telling them they should be Bayesians won’t change that either. What can change it is sorting out the mess of their psychology enough to figure out some advice they can use. BTW the basic problem you refer to is static memes, the theory of which David Deutsch explains in his new (Popperian) book The Beginning of Infinity.
Please calm down. I am trying my best to explain clearly. If I think that some of your ideas have nasty consequences that doesn’t mean I’m trying to insult you. It could be the case that some of your ideas actually do have nasty consequences of which you are unaware, and that by pointing out some of the ways your ideas relate to some ideas you consciously deem bad, you may learn better.
All justificationist epistemologies have connections to authority, and authority has nasty connections to politics. You hold a justificationist epistemology. When it comes down to it, justification generally consists of authority. And no amount of carefully deciding what is the right thing to set up as that authority changes that.
This connect to one of Popper’s political insights which is that most political theories focus on the problem “Who should rule?” (or: what policies should rule?). This question is a mistake which begs for an authoritarian answer. The right question is a fallibilist one: how can we set up political institutions that help us find and fix errors?
Getting back to epistemology, when you ask questions like, “What is the correct criterion for induction to use in step 2 to differentiate between the infinity of theories?” that is a bad question which begs for an authoritarian answer.
My mind is a universal knowledge creator. What design could be better? I agree with you that it wasn’t designed for this in the sense that evolution doesn’t have intentions, but I don’t regard that as relevant.
Evolutionary psychology contains mistakes. I think discussion of universality is a way to skip past most of them (when universality is accepted, they become pretty irrelevant).
I’d urge you to read The Beginning of Infinity by David Deutsch which refutes this. I can give the arguments but I think reading it would be more efficient and we have enough topics going already.
See! I told you the authoritarian attitude was there!
And there is no mathematical proof of Bayesian epistemology. Bayes’ theorem itself is a bit of math/logic which everyone accepts (including Popper of course). But Bayesian epistemology is an application of it to certain philosophical questions, which leaves the domain of math/logic, and there is no proof that application is correct.
I know. My comments weren’t either.
The object in motion moves according to the same laws in both the future and the past, in this sense the future resembles the past. You are right that the future does not resemble the past in all ways, but the ways in which it does themselves remain constant over time. Induction doesn’t apply in all cases but we can use induction to determine which cases it applies in and which it doesn’t. If this looks circular that’s because it is, but it works.
As far as Bayesianism is concerned this is a straw man. Most Bayesians don’t offer any guarantees in the sense of absolute certainty at all.
No Bayesian has ever proposed setting up some kind of Bayesian dictatorship. As far as I can tell the only governmental proposal based on Bayesianism thus far is Hanson’s futarchy, which could hardly be further from Authoritarianism.
You misunderstand me. What I meant was that as a Bayesian I force my own thoughts to follow certain rules. I don’t force other people to do so. You are arguing from a superficial resemblance. Maths follows rigorous, unbreakable rules, does this mean that all mathematicians are evil fascists?
Incorrect. E.T. Jaynes book Probability Theory: The Logic of Science gives a proof in the first two chapters.
You obviously haven’t read much of the heuristics and biases program. I can’t describe it all very quickly here but I’ll just give you a taster.
Subjects asked to rank statements about a woman called Jill in order of probability of being true ranked “Jill is a feminist and a bank teller” as more probable than “Jill is a bank teller” despite this being logically impossible.
U.N. diplomats, when asked to guess the probabilities of various international events occurring in the nest year gave a higher probability to “USSR invades Poland causing complete cessation of diplomatic activities between USA and USSR” than they did to “Complete cessation of diplomatic activities between USA and USSR”.
Subjects who are given a handful of evidence and arguments for both sides of some issue, and asked to weigh them up, will inevitably conclude that the weight of the evidence given is in favour of their side. Different subjects will interpret the same evidence to mean precisely opposite things.
Employers can have their decision about whether to hire someone changed by whether they held a warm coffee or a cold coke in the elevator prior to the meeting.
People can have their opinion on an issue like nuclear power changed by a single image of a smiley or frowny face, flashed to briefly for conscious attention.
People’s estimates of the number of countries in Africa can be changed simply by telling them a random number beforehand, even if it is explicitly stated that this number has nothing to do with the question.
Students asked to estimate a day by which they are 99% confident their project will be finished, go past this day more than half the time.
People are more like to move to a town if the town’s name and their name begin with the same letter.
There’s a lot more, most of which can’t easily be explained in bullet form. Suffice to say these are not irrelevant to thinking, they are disastrous. It takes constant effort to keep them back, because they are so insidious you will not notice when they are influencing you.
Replied here:
http://lesswrong.com/r/discussion/lw/54u/bayesian_epistemology_vs_popper/
Would you agree that this is a bit condescending and you’re basically assuming in advance that you know more than me?
I actually have read about it and disagree with it on purpose, not out of ignorance.
Does that interest you?
And on the other hand, do you know anything about universality? You made no comment about that. Given that I said the universality issue trumps the details you discuss in your bullet points, and you didn’t dispute that, I’m not quite sure why you are providing these details, other than perhaps a simple assumption that I had no idea what I was talking about and that my position can be ignored without reply because, once my deep ignorance is addressed, I’ll forget all about this Popperian nonsense..
Ordered but there’s an error in the library system and I’m not sure if it will actually come or not. I don’t suppose the proof is online anywhere (I can access major article databases), or that you could give it or an outline? BTW I wonder why the proof takes 2 chapters. Proofs are normally fairly short things. And, well, even if it was 100 pages of straight math I don’t see why you’d break it into separate chapters.
No I understood that. And that is authoritarian in regard to your own thoughts. It’s still a bad attitude even if you don’t do it to other people. When you force your thoughts to follow certain rules all the epistemological problems with authority and force will plague you (do you know what those are?).
Regarding Popper, you say you don’t agree with the common criticisms of him. OK. Great. So, what are your criticisms? You didn’t say.
If there was an epistemology that didn’t endorse circular arguments, would you prefer it over yours which does?
I apologise for this, but I really don’t see how anyone could go through those studies without losing all faith in human intuition.
The text can be found online. My browser (Chrome) wouldn’t open the files but you may have more luck.
Part of the reason for length is that probability theory has a number of axioms and he has to prove them all. The reason for the two chapter split is that the first chapter is about explaining what he wants to do, why he wants to do it, and laying out his desiderata. It also contains a few digressions in case the reader isn’t familiar with one or more of the prerequisites for understanding it (propositional logic for example). All of the actual maths is in the second chapter.
I agree to the explicit meaning of this statement but you are sneaking in connotations. Let us look more closely about what ‘authoritarian’ means.
You probably mean it in the sense of centralised as opposed to decentralized control, and in that sense I will bite the bullet and say that thinking should be authoritarian.
However, the word has a number of negative connotations. Corruption, lack of respect for human rights and massive bureaucracy that stifles innovation to name a few. None of those apply to my thinking process, so even though the term may be technically correct it is somewhat intellectually dishonest to use it, something more value-neutral like ‘centralized control’ might be better.
I will confess that I am not familiar with the whole of Popper’s viewpoint. I have never read anything written by him although after this conversation I am planning to.
Therefore I do not know whether or not I broadly agree or disagree with him. I did not come here to attack him, originally I was just responding to a criticism of yours that Bayesianism fails in a certain situation
To some extent I think the approach with conjectures and criticisms may be correct, at least as a description of how thinking must get off the ground. Can you be a Popperian and conjecture Bayesianism?
The point that I do disagree with is the proposed asymmetry between confirmation and falsification. In my view neither the black swan or the white swan proves anything with certainty, but both do provide some evidence. It happens in this case that one piece of evidence is very strong while the other is very weak, in fact they are pretty much at opposite extremes of the full spectrum of evidence encountered in the real world. This does not mean there is a difference of type.
All else being equal, yes. Other factors, such as real-world results might take precedence. I also doubt that any philosophy could manage without either circularity or assumptions, explicit or otherwise. As I see it when you start thinking you need something to begin your inference, logic derives truths form other truths, it cannot manufacture them out of a vacuum. So any philosophy has two choices:
Either, pick a few axioms, call them self evident and derive everything from them. This seems to work fairly well in pure maths, but not anywhere else. I suspect the difference lies in whether the axioms really are self evident or not.
Or, start out with some procedures for thinking. All claims are judged by these, including proposals to change the procedures for thinking. Thus the procedures may self-modify and will hopefully improve. This seems better to me, as long as the starting point passes a certain threshold of accuracy any errors are likely to get removed (the phrase used here is the Lens that Sees its Flaws). It is ultimately circular, since whatever the current procedures are they are justified only by themselves, but I can live with that.
Ideal Bayesians are of the former type, but they can afford to be as they are mathematically perfect beings who never make mistakes. Human Bayesians take the latter approach, which means in principle they might stop being Bayesians if they could see that for some reason it was wrong.
So I guess my answer is that if a position didn’t endorse circular arguments, I would be very worried that it is going down the unquestionable axioms route, even if it does not do so explicitly, so I would probably not prefer it.
Notice how it is only through the benefits of the second approach that I can even consider such a scenario.
I’m not trying to argue by connotation. It’s hard to avoid connotations and I think the words I’m using are accurate.
That’s not what I had in mind, but I do think that centralized control is a mistake.
I take fallibilism seriously: any idea may be wrong, and many are. Mistakes are common.
Consequently, it’s a bad idea to set something up to be in charge of your whole mind. It will have mistakes. And corrections to those mistakes which aren’t in charge will sometimes get disregarded.
Those 3 things are not what I had in mind. But I think the term is accurate. You yourself used the word “force”. Force is authoritarian. The reason for that is that the forcer is always claiming some kind of authority—I’m right, you’re wrong, and never mind further discussion, just obey.
You may find this statement strange. How can this concept apply to ideas within one mind? Doesn’t it only apply to disagreements between separate people?
But ideas are roughly autonomous portions of a mind (see: http://fallibleideas.com/ideas). They can conflict, they can force each other in the sense of one taking priority over another without the conflict being settled rationally.
Force is a fundamentally epistemological concept. Its political meanings are derivative. It is about non-truth-seeking ways of approaching disputes. It’s about not reaching agreement by one idea wins out anyway (by force).
Settling conflicts between the ideas in your mind by force is authoritarian. It is saying some ideas have authority/preference/priority/whatever, so they get their way. I reject this approach. If you don’t find a rational way to resolve a conflict between ideas, you should say you don’t know the answer, never pick a side b/c the ideas you deem the central controllers are on that side, and they have the authority to force other ideas to conform to them.
This is a big topic, and not so easy to explain. But it is important.
Force, in the sense of solving difficulties without argument, is not what I meant when I said I force my thoughts to follow certain rules. I don’t even see how that could work, my individual ideas do not argue with each-other, if they did I would speak to a psychiatrist.
I’m afraid you are going to have to explain in more detail.
They argue notionally. They are roughly autonomous, they have different substance/assertions/content, sometimes their content contradicts, and when you have two or more conflicting ideas you have to deal with that. You (sometimes) approach the conflict by what we might call an internal argument/debate. You think of arguments for all the sides (the substance/content of the conflict ideas), you try to think of a way to resolve the debate by figuring out the best answer, you criticize what you think may be mistakes in any of the ideas, you reject ideas you decide are mistaken, you assign probabilities to stuff and do math, perhaps, etc...
When things go well, you reach a conclusion you deem to be an improvement. It resolves the issue. Each of the ideas which is improved on notionally acknowledges this new idea is better, rather than still conflicting. For example, if one idea was to get pizza, and one was to get sushi, and both had the supporting idea that you can’t get both because it would cost too much, or take too long, or make you fat, then you could resolve the issue by figuring out how to do it quickly, cheaply and without getting fat (smaller portions). If you came up with a new idea that does all that, none of the previously conflicting ideas would have any criticism of it, no objection to it. The conflict is resolved.
Sometimes we don’t come up with a solution that resolves all the issues cleanly. This can be due to not trying, or because it’s hard, or whatever.
Then what?
Big topic, but what not to do is use force: arbitrarily decide which side wins (often based on some kind of authority or justification), and declare it the winner even though the substance of the other side is not addressed. Don’t force some of your ideas, which have substantive unaddressed points, to defer to the ideas you put in charge (granted authority).
I certainly don’t advocate deciding arbitrarily. The would fall into the fallacy of just making sh*t up which is the exact of everything Bayes stands for. However, I don’t have to be arbitrary, most of the ideas that run up against Bayes don’t have the same level of support. In general, I’ve found that a heuristic of “pick the idea that has a mathematical proof backing it up” seems to work fairly well.
There are also sometimes other clues, rationalisations tend to have a slightly different ‘feel’ to them if you introspect closely (in my experience at any rate), and when the ideas going up against Bayes seem to include a disproportionately high number of rationalisations, I start to notice a pattern.
I also disagree about ideas being autonomous. Ideas are entangled with each other in complex webs of mutual support and anti-support.
Did you read my link? Where did the argument about approximately autonomous ideas go wrong?
Well this changes the topic. But OK. How do you decide what has support? What is support and how does it differ from consistency?
I did. To see what is wrong with it let me give an analogy. Cars have both engines and tyres. It is possible to replace the tyres without replacing the engine. Thus you will find many cars with very different tyres but identical engines, and many different engines but identical tyres. This does not mean that tyres are autonomous and would work fine without engines.
Well, mathematical proofs are support, and they are not at all the same a consistency. In general however, if some random idea pops into my head, and I spot that it in fact it only occurred to me as a result of conjunction bias I am not going to say “well, it would be unfair of me to reject this just because it contradicts probability theory, so I must reject both it and probability theory until I can find a superior compromise position”. Frankly, that would be stupid.
@autonomous—you know we said “approximately autonomous” right? And that, for various purposes, tires are approximately autonomous, which means things like they can be replaced individually without touching the engine or knowing what type of engine it is. And a tire could be taken off one car and put on another.
No one was saying it’d function in isolation. Just like a person being autonomous doesn’t mean they would do well in isolation (e.g. in deep space). Just because people do need to be in appropriate environments to function doesn’t make “people are approximately autonomous” meaningless or false.
First,l you have not answered my question. What is support? The general purpose definition. I want you to specify how it is determined if X supports Y, and also what that means (why should we care? what good is “support”?).
Second, let’s be more precise. If a person writes what he thinks to be a proof, what is supported? What he thinks is the conclusion of what he thinks is a proof, and nothing else? An infinite set of things which have wildly different properties? Something else?
You argue from ideas being approximately autonomous to the fact that words like ‘authoritarian’ apply to them, and that the approximately debate, but this is not true in the car analogy. Is it ‘authoritarian’ that the brakes, accelerator and steering wheel have total control of the car, while the tyres and engine get no say, or is it just efficient?
I didn’t give a loose argument by analogy. You’re attacking a simplified straw man. I explained stuff at some length and you haven’t engaged here with all of what I said. e.g. your comments on “authoritarian” here do not mention or discuss anything I said about that. You also don’t mention force.
I haven’t got any faith in human intuition. That’s not what I said.
OK fair enough.
Oh the book is here: http://bayes.wustl.edu/etj/prob/book.pdf
That was easy.
I don’t know the etiquette or format of this website well or how it works. When I have comments on the book, would it make sense to start a new thread or post somewhere/somehow?
You can conjecture Bayes’ theorem. You can also conjecture all the rest, however some things (such as induction, justificationism, foundationalism) contradict Popper’s epistemology. So at least one of them has a mistake to fix. Fixing that may or may not lead to drastic changes, abandonment of the main ideas, etc
That is a purely logical point Popper used to criticize some mistaken ideas. Are you disputing the logic? If you’re merely disputing the premises, it doesn’t really matter because its purpose is to criticize people who use those premises on their own terms.
Agreed.
I think you are claiming that seeing a white swan is positive support for the assertion that all swans are white. (If not, please clarify). If so, this gets into important issues. Popper disputed the idea of positive support. The criticism of the concept begins by considering: what is support? And in particular, what is the difference between “X supports Y” and “X is consistent with Y”?
Questioning this was one of Popper’s insights. The reason most people doubt it is possible is because, since Aristotle, pretty much all epistemology has taken this for granted. These ideas seeped into our culture and became common sense.
What’s weird about the situation is that most people are so attached to them that they are willing to accept circular arguments, arbitrary foundations, or other things like that. Those are OK! But that Popper might have a point is hard to swallow. I find circular arguments rather more doubtful than doing without what Popperians refer to broadly as “justification”. I think it’s amazing that people run into circularity or other similar problems and still don’t want to rethink all their premises. (No offense intended. Everyone has biases, and if we try to overcome them we can become less wrong about some matters, and stating guesses at what might be biases can help with that.)
All the circularity and foundations stem from seeking to justify ideas. To show they are correct. Popper’s epistemology is different: ideas never have any positive support, confirmation, verification, justification, high probability, etc… So how do we act? How do we decide which idea is better than the others? We can differentiate ideas by criticism. When we see a mistake in an idea, we criticize it (criticism = explaining a mistake/flaw). That refutes the idea. We should act on or use non-refuted ideas in preference over refuted ideas.
That’s the very short outline, but does that make any sense?
Fully agreed. In principle, if Popper’s epistemology is of the second, self-modifying type, there would be nothing wrong with drastic changes. One could argue that something like that is exactly how I arrived at my current beliefs, I wasn’t born a Bayesian.
I can also see some ways to make induction and foundationalism easer to swallow.
A discussion post sounds about right for this, if enough people like it you might consider moving it to the main site.
This is precisely what I am saying.
The beauty of Bayes is how it answers these questions. To distinguish between the two statements we express them each in terms of probabilities.
“X is consistent with Y” is not really a Bayesian way of putting things, I can see two ways of interpreting it. One is as P(X&Y) > 0, meaning it is at least theoretically possible that both X and Y are true. The other is that P(X|Y) is reasonably large, i.e. that X is plausible if we assume Y.
“X supports Y” means P(Y|X) > P(Y), X supports Y if and only if Y becomes more plausible when we learn of X. Bayes tells us that this is equivalent to P(X|Y) > P(X), i.e. if Y would suggest that X is more likely that we might think otherwise then X is support of Y.
Suppose we make X the statement “the first swan I see today is white” and Y the statement “all swans are white”. P(X|Y) is very close to 1, P(X|~Y) is less than 1 so P(X|Y) > P(X), so seeing a white swan offers support for the view that all swans are white. Very, very weak support, but support nonetheless.
(The above is not meant to be condescending, I apologise if you know all of it already).
This is a very tough bullet to bite.
One thing I don’t like about this is the whole ‘one strike and you’re out’ feel of it. It’s very boolean, the real world isn’t usually so crisp. Even a correct theory will sometimes have some evidence pointing against it, and in policy debates almost every suggestion will have some kind of downside.
There is also the worry that there could be more than one non-refuted idea, which makes it a bit difficult to make decisions. Bayesianism, on the other hand, when combined with expected utility theory, is perfect for making decisions.
When replying it said “comment too long” so I posted my reply here:
http://lesswrong.com/r/discussion/lw/552/reply_to_benelliott_about_popper_issues/
Step 1 is problematic also, as I explained in some of my comments to Tim Tyler. What should I gather data about? What kind of data? What measurements are important? How accurate? And so on.
Yes I agree. Another issue I mentioned in one of my comments here is that your data isn’t a random sample of all possible data, so what do you do about bias? (I mean bias in the data, not bias in the person.)
Step 3 is also problematic (as it explicitly acknowledges).
Finding it difficult also.
Have you found: http://lesswrong.com/message/inbox/
I don’t think I have the grasp on these subjects to hang in this, but this is great. -- I hope someone else comments in a more detailed manner.
In Popperian analysis, who ends the discussion of “what’s better?” You seem to have alluded to it being “whatever has no criticisms.” Is that accurate?
Why would Bayesian epistemology not be able to use the same evidence that Popperians used (e.g. the 1920 paper) and thus not require “assumptions” for new evidence? My rookie statement would be that the Bayesian has access to all the same kinds of evidence and tools that the Popperian approach does, as well as a reliable method for estimating probability outcomes.
Could you also clarify the difference between “conjecture” and “assumption.” Is it just that you’re saying that a conjecture is just a starting point for departure, whereas an assumption is assumed to be true?
An assumption seems both 1) justified if it has supporting evidence to make it highly likely as true to the best of our knowledge and 2) able to be just as “revisable” given counter-evidence as a “conjecture.”
Are you thinking that a Bayesian “assumption” is set in stone or that it could not be updated/modified if new evidence came along?
Lastly, what are “conjectures” based on? Are they random? If not, it would seem that they must be supported by at least some kind of assumptions to even have a reason for being conjectured in the first place. I think of them as “best guesses” and don’t see that as wildly different from the assumptions needed to get off the ground in any other analysis method.
Yes, “no criticisms” is accurate. There are issues of what to do when you have a number of theories remaining which isn’t exactly one which I didn’t go into.
It’s not a matter of “who”—learning is a cooperative thing and people can use their own individual judgment. In a free society it’s OK if they don’t agree (for now—there’s always hope for later) about almost all topics.
I don’t regard the 1920 paper as evidence. It contains explanations and arguments. By “evidence” I normally mean “empirical evidence”—i.e. observation data. Is that not what you guys mean? There is some relevant evidence for liberalism vs socialism (e.g. the USSR’s empirical failure) but I don’t regard this evidence as crucial, and I don’t think that if you were to rely only on it that would work well (e.g. people could say the USSR did it wrong and if they did something a bit different, which has never been tried, then it would work. And the evidence could not refute that.)
BTW in the Popperian approach, the role of evidence is purely in criticism (and inspiration for ideas, which has no formal rules or anything). This is in contrast to inductive approaches (in general) which attempt to positively support/confirm/whatever theories with the weight of evidence.
If the Bayesian approach uses arguments as a type of evidence, and updates probabilities accordingly, how is that done? How is it decided which arguments win, and how much they win by? One aspect of the criticism approach is theories do not have probabilities but only two statuses: they are refuted or non-refuted. There’s never an issue of judging how strong an argument is (how do you do that?).
If you try to follow along with the Popperian approach too closely (to claim to have all the same tools) one objection will be that I don’t see Bayesian literature acknowledging Popper’s tools as valuable, talking about how to use them, etc… I will suspect that you aren’t in line with the Bayesian tradition. You might be improving it, but good luck convincing e.g. Yudkowsky of that.
The difference between a conjecture and an assumption is just as you say: conjectures aren’t assumed true but are open to criticism and debate.
I think the word “assumption” means not revisable (normally assumptions are made in a particular context, e.g. you assume X for the purposes of a particular debate which means you don’t question it. But you could have a different debate later and question it.). But I didn’t think Bayesianism made any assumptions except for its foundational ones. I don’t mind if you want to use the word a different way.
Regarding justification by supporting evidence, that is a very problematic concept which Popper criticized. The starting place of the criticism is to ask what “support” means. And in particular, what is the difference between support and mere consistency (non-contradiction)?
Conjectures are not based on anything and not supported. They are whatever you care to imagine. It’s good to have reasons for conjectures but there are no rules about what the reasons should be, and conjectures are never rejected because of the reason they were conjectured (nor because of the source of the conjecture), only because of criticisms of their substance. If someone makes too many poor conjectures and annoys people, it’s possible to criticize his methodology in order to help him. Popperian epistemology does not have any built-in guidelines for conjecturing on which it depends; they can be changed and violated as people see fit. I would rather call them “guesses” than “best guesses” because it’s often a good idea for one person to make several conjectures, including ones he suspects are mistaken, in order to learn more about them. It should not be each person puts forward his best theory and they face off, but everyone puts forward all the theories he thinks may be interesting and then everyone cooperates in criticizing all of them.
Edit: BTW I use the words “theory” and “idea” interchangeably. I do not mean by “theory” ideas with a certain amount of status/justification. I think “idea” is the better word but I frequently forget to use it (because Popper and Deutsch say “theory” all the time and I got used to it).
So, you weight them by their simplicity, to formally implement Occam’s razor:
So we have infinitely many theories, infinitely many of which are dead wrong, and only one of which is true, and we just use the shortest one and hope? And that’s supposed to be a good idea?
You usually weight them by their simplicity, if you want a probabalistic forecast. This is Occam’s razor. Picking the shortest one is not an unreasonable way to get a specific prediction.
Here is Hutter on how good an idea it is:
What do you mean by `get anywhere’? I can update my probability estimates and use the new estimates to make decisions perfectly well.
What does this have to do with whether confirmation can be used as evidence?
Infinitely many hypotheses increase in probability. What good is that? You have infinite possibilities before you and haven’t made progress towards picking between them.
When you say “this infinite set over here, its probability increases” you aren’t reaching an answer. You aren’t even getting any further than pure deduction would have gotten you.
Look, there’s two infinite sets: those contradicted by the evidence, and those not (deal with theories with “maybes” in them however you like, it does not matter to my point). The first set we don’t care about—we all agree to reject it. The second set is all that’s left to consider. if you increase the probability of every theory in it that doesn’t help you choose between them. it’s not useful. when you “confirm” or increase the probability of every theory logically consistent with the data, you aren’t reaching an answer, you aren’t making progress.
The progress is in the theories that are ruled out. When playing cards, you could consider all possible histories of the motions of the cards that are compatible with the evidence. Would you have any problem with making bets based on these probabilities? Solomonoff induction is very similar. While there are an infinite number of possibilities, both cases involve proving general properties of the distribution rather than considering each possibility individually.
In the future please capitalize your sentences; it improves readability (especially in large paragraphs).
“The progress is in the theories that are ruled out.”
This is purely a matter of deduction, right? Bayes’ theorem doesn’t come into it.
One doesn’t have to be a Bayesian to rule out theories contradicted by the evidence.
Further, there are always infinitely many theories that aren’t ruled out. This is the hard part of epistemology. How do you deal with those?
If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.
The Solomonoff prior is really just a from of the principle of insufficient reason, which states that if there is no reason to think that one thing is more probable than another, they should be assigned the same probability. Since there are an infinite number of theories, we need to take some kind of limit. If we encode them as self-delimiting computer programs, we can write them as strings of digits (usually binary). Start with some maximum length and increase it toward infinity. Some programs will proceed normally, looping infinitely or encountering a stop instruction, making many programs equivalent because changing bits that are never used by the hypothesis does not change the theory. Other programs will leave the bounds of the maximum length, but this will be fixed as that length is taken to infinity.
This obviously isn’t a complete justification, but it is better than Popperian induction. Both rule out falsified theories and both penalize theories for unfalsifiability and complexity. Only Solomonoff induction allows us to quantify the size of these penalties in terms of probability. Popper would agree that a simpler theory, being compared to a more complex one, is more likely but not guaranteed to be true, but he could not give the numbers.
If you are still worried about the foundational issues of the Solomonoff prior, I’ll answer your questions, but it would be better if you asked me again in however long progress takes (that was supposed to sound humourous, as if I were describing a specific, known amount of time, but I really doubt that that is noticable in text). http://lesswrong.com/r/discussion/lw/534/where_does_uncertainty_come_from/ writes up some of the questions I’m thinking about now. It’s not by me, but Paul seems to wonder about the same issues. This should all be significantly more solid once some of these questions are answered.
“If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.”
That’s it? That is trivial, and doesn’t solve the major problems in epistemology. It’s correct enough (I’m not convinced theories have probabilities, but I think that’s a side issue) but it doesn’t get you very far. Any old non-Bayesian epistemology could tell you this.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better. This ordering has nothing empirical about it. Your approach here is not based on evidences or probabilities, just an ordering. Correct me if I got that wrong. That raises the question: why is the Solomonoff ordering correct? Why not some other ordering? Here’s one objection: “God did everything” is a short theory which is compatible with all evidence. You can make separate versions of it for all possible sets of predictions if you want. Doesn’t that mean we’re either stuck with some kind of “God did everything” or the final truth is even shorter?
You mention “Popperian induction”. Please don’t speak for Popper. The idea that Popper advocated induction is a myth. A rather crass one; he refuted induction and published a lot of material against it. Instead, ask me about his positions, OK? Popper would not agree that the simpler theory is “more likely”. There’s many issues here. One is that Popper said we should prefer low probability theories because they say more about the world.
You seem to present “Popperian induction” as an incomplete justification. Maybe you are unware that Popper’s epistemology rejects the concept of justification itself. It is thus a mistake to criticize it on justificationist grounds. It isn’t any type of justification and doesn’t want to be.
In order to quote people, you can use a single greater than sign ‘>’ at the beginning of a line.
Note I said that and a prior. The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act. From Wikipedia: ‘Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures’, where a statistical procedure may be taken as a guide for optimal action.
Sorry about saying ‘Popperian induction’. I only have a basic knowledge of Popper. Would Popper say that predicting the results of actions is (one of) the goals of science? This is, of course, slightly more general than induction.
Wikipedia quotes Popper as saying simpler theories are to be preferred ‘because their empirical content is greater; and because they are better testable’. Does this mean that he would bet something important on this? If there were two possible explanations for a plague and if the simpler one were true than, with medicine, we could save 100 lives but if the more complex one were true we could save 200 lives, how would you decide which cure the factories should manufacture (and it takes a long time to prepare the factories or something so you can only make one type of cure).
It is exactly not about this. The reason to prefer simpler theories is that more possible universes correspond to them. For a simple universe, axioms 1 through 10 have to come out the right way, but the rest can be anything, as they are meaningless since the universe is already fully specified. For a more complex theory, axioms 11-15 must also turn out a certain way, so fewer possible universe are compatible with this theory. I would also add the principle of sufficient reason, which I think is likely, as further justification for Occam’s razor, but that is irrelevant here.
This seems wrong. Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
Popperian epistemology doesn’t assign probabilities like that, and has a way to act. So would you agree that, if you fail to refute Popperian epistemology, then one of your major claims is wrong? Or do you have a backup argument: you don’t have to, but you should anyway because..?
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
Secondary sources about Popper, like wikipedia, are not trustworthy. Popper would not bet anything important on that simpler theories thing. That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
The way to make decisions is by making conjectures about what to do, and criticizing those conjectures. We learn by critical, imaginative argument (including within one mind). Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified. Have I misunderstood again?
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
I’m not totally sure what your concept of a universe or axiom is here. Also I note that the real world is pretty complicated.
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences? Otherwise, your choices must be describable using terms of a prior probability and Bayesian updating (and, while we already agree that the latter is obvious, here we are using it to update a set of probabilities and, on the pain of inconsistency, our new probabilities must have that relationship to our old ones).
Definitely use all the evidence when making decisions. I didn’t mean for my example to be complete. I was wondering how a question like that could be addressed in general. What pieces of information would be important and how would they be taken into account? You can assume that the less relevant variables, like which disease is more painful, are equal in both cases.
I may have been unclear here. I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
No, not at all. The more likely theories are those that include small amounts of theory, not small amounts of prediction. Eliezer discusses this in the sequences here, here, andhere. Those don’t really cover Solomonoff induction directly, but they will probably give you a better idea of what I’m trying to say than I did. I think Solomonoff induction is better covered in another post, but I can’t find it right now.
Sorry, I was abusing one word ‘theories’ to mean both ‘individual descriptions of the universe’ and ‘sets of descriptions that make identical predictions in some realm (possibly in all realms)‘. It is a very natural place to slip definitions, because, for example, when discussing biology, we often don’t care about the distinction between ‘Classical physics is true and birds are descended from dinosaurs.’ and ‘Quantum physics is true and birds are descended from dinosaurs.’ Once enough information is specified to make predictions, a theory (in the second sense) is on equal ground with another theory that contains the same amount of information and that makes different predictions only in realms where it has not been tested, as well as with a set of theories for which the set can be specified with the same amount of information but for which specifying one theory out of the set would take more information.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
All of this is present is Bayesian epistemology.
Consider Bayes theorem, with theories A and B and evidence E:
P(A|E) = P(E|A) P(A) / P(E)
Let’s look at how the probability of a theory increases upon learning E, using a ratio.
P(A|E) / P(A) = P(E|A) / P(E)
P(B|E) / P(B) = P(E|B) / P(E)
Which one increases by a larger ratio?
[P(A|E) / P(A)] / [P(B|E) / P(B)] = [P(E|A) / P(E)] / [P(E|B) / P(E)] = P(E|A) / P(E|B)
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B. This means that the more theory A narrowly predicts E, the actual observation, to the exclusion of other possible observations, the more probability we assign to it. This is a quantitative form of Popper’s preference for more specific and more easily falsifiable theories, as proven by Bayes theorem.
That’s basically what Solomonoff means by prior probability.
Yes Popper is non-consequentialist.
Consequentialism is a bad theory. It says ideas should be evaluated by their consequences (only). This does not address the question of how to determine what are good or bad consequences.
If you try to evaluate methods of determining what are good or bad consequences, by their consequences, you’ll end up with serious regress problems. If you don’t, you’ll have to introduce something other than consequences.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
For example, I might assert that a consequence is good if it makes me happy, and bad if it makes me unhappy. (I don’t in fact assert this.) I would then conclude that an idea is good if its consequences make me happy, and bad if its consequences make me unhappy. No regress involved.
(And note that this is different from saying that an idea is good if the idea makes me happy. If it turns out that the idea “I could drink drain cleaner” makes me happy, but that actually drinking drain cleaner makes me unhappy, then it’s a bad idea by the first theory but a good idea by the second theory.)
A certain amount of precision is helpful when thinking about these sorts of things.
If you reread the sentence in which I discuss a regress, you will notice it begins with “if” and says that a certain method would result in a regress, the point being you have to do something else. So it was your mistake.
That is not what I meant by consequentialism, and I agree that that theory entails an infinite regress. The theory I was referring to, which is the first google result for consequentialism, states that actions should be judged by their consequences.
That theory is bad too. For one thing, you might do something really dumb—say, shoot at a cop—and the consequence might be something good, e.g. you might accidentally hit the robber behind him who was about to kill him. you might end up declared a hero.
For another thing, “judge by consequences” does not answer the question of what are good or bad consequences. It tells us almost nothing. The only content is don’t judge by anything else. Why not? Beats me.
If you mean judge by rationally expected consequences, or something like that, you could drop the first objection but I still don’t see the use of it. If you merely want to exclude mysticism I think we can do that with a lighter restriction.
Sorry, I didn’t explain this very well. I don’t use consequentialism to judge people, I use it to judge possible courses of action. I (try to) make choices with the best consequences, this fully determines actions, so judgments of, for example, who is a bad person, do not add anything.
You are right that this is very broad. My point is that all consequentialist decision rules are either Bayesian decision rules or limits of Bayesian decision rules, according to a theorem.
I didn’t discuss who is a bad person. An action might be bad but have a good result (this time) by chance. And you haven’t said a word about what kinds of consequences of actions are good or bad … I mean desirable or undesirable. And you haven’t said why everything but consequences is inadmissible.
In your example of someone shooting a police officer, I would say that it is good that the police officer’s life was saved, but it is bad that there is a person who would shoot people so irresponsibly and I would not declare that person a hero as that will neither help save more police officers or reduce the number of people shooting recklessly; in fact, it would probably increase the number of reckless people.
I don’t want to get into the specifics of morality, because it is complex. The only reason that I specified consequentialist decision making is that it is a condition of the theorem that proves Bayesian decision making to be optimal. Entirely nonconsequentialist systems don’t need to learn about the universe to make decisions and partially consequentialist systems are more complicated. For the latter, Bayesianism is often necessary if there are times when nonconsequentialist factors have little import to a decision.
You are here judging a non-action by a non-consequence.
I think you mean systems which ignore all consequences. Popper’s system does not do that.
Popper’s system incorporates observational evidence in the form of criticism: ideas can be criticized for contradicting it.
Yes, this is a non-action; I often say it is bad that as shorthand for cetris paribus, I would act so as to make not be the case. However, it is a consequence of what happened before (though you may have just meant it is not a consequence of my action). Judgements are often attached to consequences without specifying which action they are consequences of, just for convenience.
Yes, that was what I meant.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
Examples include Newton’s Laws and Quantum Theory. They don’t say they happen sometimes but always, and that’s important. Good scientific theories are always like that. Even when they have a restricted, non-universal domain, it’s 100% within the domain.
Physics is currently thought to be deterministic. And even if physics was random, we would say that e.g. motion happens randomly 100% of the time, or whatever the law is. We would expect a law of motion with a call to a random function to still always be what happens.
PS Since you seem to have an interest in math, I’d be curious about your thoughts on this:
http://scholar.google.com/scholar?cluster=10839009135739435828&hl=en&as_sdt=0,5
There’s an improved version in Popper’s book The World of Parmenides but that may be harder for you to get.
The article you sent me is mathematically sound, but Popper draws the wrong conclusion from it. He has already accepted that P(H|E) can be greater than P(H). That’s all that’s necessary for induction: updating probability distribution. The stuff he says at the end about H ← E being countersupported by E does not prevent decision making based on the new distribution.
Setting aside Popper’s point for a minute, p(h|e) > p(h) is not sufficient for induction.
The reason it is not sufficient is that infinitely many h gain probability for any e. The problem of dealing with those remains unaddressed. And it would be incorrect and biased to selectively pick some pet theory from that infinite set and talk about how it’s supported.
Do you see what I’m getting at?
Yes, that is what the Solomonoff prior is for. It gives numbers to all the P(H_i).
And what is the argument for that prior? Why is it not arbitrary and often incorrect?
And whatever argument you give, I’ll also be curious: what method of arguing are you using? Deduction? Induction? Something else?
I tried to present it, but was obviously very unclear. If you read http://lesswrong.com/lw/jk/burdensome_details/ , http://lesswrong.com/lw/jn/how_much_evidence_does_it_take/ , and http://lesswrong.com/lw/jp/occams_razor/ , it’s basically a formalization of those ideas, with a tiny amount of handwaving.
Deduction.
Deduction requires premises to function. Where did you get the premises?
It seems obvious that low probability theories are good. Since probabilities must add up to 100%, there can be only a few high-probability theories and, when one is true, there is not much work to be done in finding it, since it is already so likely. telling someone to look among low-probability theories is like telling them to look among nonapples when looking for possible products to sell, and it provides no way of distinguishing good low-prior theories, like quantum mechanics, from bad ones, like astrology.
Unfortunately, I cannot read that article, as it is behind a paywall. If you have access to it, perhaps you could email it to me at endoself (at) yahoo (dot) com .
ETA:
I was only talking about Popper’s idea of theories with high content. That particular analysis was not meant to address theories that predicted certain outcomes with probability 1.
It’s a loose guideline for people about where it may be fruitful to look. It can also be used in critical arguments if/when people think of arguments that use it.
One of the differences between Popper and Bayesian Epistemology is that Popper thinks being overly formal is a fault not a merit. Much of Popper’s philosophy does not consist of formal, rigorous guidelines to be followed exactly. Popper isn’t big on rules of procedure. A lot is explanation. Some is knowledge to use on your own. Some is advice.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
Define theories of that type for all possible sets of predictions. Then at any given time you will have infinitely many of them that predict all your data with 100% probability.
Why is that wrong?
No, it has tons of theory. God is a very complex concept. Note that ‘God did everything’ is more complex and therefore less likely than ‘everything happened’. Did you read http://lesswrong.com/lw/jp/occams_razor/ ?
How do you figure God is complex? God as I mean it simply can do anything, no reason given. That is its only attribute: that it arbitrarily does anything the theory its attached to cares to predict. We can even stop calling it “God”. We could even not mention it at all so there is no theory and merely give a list of predictions. Would that be good, in your view?
If ‘God’ is meaningless and can merely be attached to any theory, then the theory is the same with and without God. There is nothing to refute, since there is no difference. If you defined ‘God’ to mean a being who created all species or who commanded a system of morality, I would have both reason to care about and means to refute God. If you defined ‘God’ to mean ‘quantum physics’, there would be applications and means of proving that ‘God’ is a good approximation, but this definition is nonsensical, since it is not what is usually meant by ‘God. If the theory of ‘God’ has no content, there is nothing to discuss, but the is again a very unusual definition.
Would a list of predictions with no theory/explanation be good or bad, in your view?
If there is no simpler description, then a list of predictions is better but, if an explanation simpler then merely a list of prediction is at all possible, then that would be more likely.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
Essentially. It is simpler if it has a higher Solomonoff prior.
Yes, there is more than just predictions. However, prediction are the only things that tell us how to update our probability distributions.
So, your epistemology is 100% instrumentalist and does not deal with non-predictive knowledge at all?
Can you give an example of non-predictive knowledge and what role it should play?
Quoting from The Fabric of Reality, chapter 1, by David Deutsch.
Yet some philosophers — and even some scientists — disparage the role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of experiments: its entire content lies in its predictive formulae. They consider that any consistent explanation that a theory may give for its predictions is as good as any other — or as good as no explanation at all — so long as the predictions are true. This view is called instrumentalism (because it says that a theory is no more than an ‘instrument’ for making predictions). To instrumentalists, the idea that science can enable us to understand the underlying reality that accounts for our observations is a fallacy and a conceit. They do not see how anything a scientific theory may say beyond predicting the outcomes of experiments can be more than empty words.
[cut a quote of Steven Weinberg clearly advocating instrumentalism. the particular explanation he says doesn’t matter is that space time is curved. space time curvature is an example of a non-predictive explanation.]
imagine that an extraterrestrial scientist has visited the Earth and given us an ultra-high-technology ‘oracle’ which can predict the outcome of any possible experiment, but provides no explanations. According to instrumentalists, once we had that oracle we should have no further use for scientific theories, except as a means of entertaining ourselves. But is that true? How would the oracle be used in practice? In some sense it would contain the knowledge necessary to build, say, an interstellar spaceship. But how exactly would that help us to build one, or to build another oracle of the same kind — or even a better mousetrap? The oracle only predicts the outcomes of experiments. Therefore, in order to use it at all we must first know what experiments to ask it about. If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction — even perfect, universal prediction — is simply no substitute for explanation.
Similarly, in scientific research the oracle would not provide us with any new theory. Not until we already had a theory, and had thought of an experiment that would test it, could we possibly ask the oracle what would happen if the theory were subjected to that test. Thus, the oracle would not be replacing theories at all: it would be replacing experiments. It would spare us the expense of running laboratories and particle accelerators.
[cut elaboration]
The oracle would be very useful in many situations, but its usefulness would always depend on people’s ability to solve scientific problems in just the way they have to now, namely by devising explanatory theories. It would not even replace all experimentation, because its ability to predict the outcome of a particular experiment would in practice depend on how easy it was to describe the experiment accurately enough for the oracle to give a useful answer, compared with doing the experiment in reality. After all, the oracle would have to have some sort of ‘user interface’. Perhaps a description of the experiment would have to be entered into it, in some standard language. In that language, some experiments would be harder to specify than others. In practice, for many experiments the specification would be too complex to be entered. Thus the oracle would have the same general advantages and disadvantages as any other source of experimental data, and it would be useful only in cases where consulting it happened to be more convenient than using other sources. To put that another way: there already is one such oracle out there, namely the physical world. It tells us the result of any possible experiment if we ask it in the right language (i.e. if we do the experiment), though in some cases it is impractical for us to ‘enter a description of the experiment’ in the required form (i.e. to build and operate the apparatus). But it provides no explanations.
In a few applications, for instance weather forecasting, we may be almost as satisfied with a purely predictive oracle as with an explanatory theory. But even then, that would be strictly so only if the oracle’s weather forecast were complete and perfect. In practice, weather forecasts are incomplete and imperfect, and to make up for that they include explanations of how the forecasters arrived at their predictions. The explanations allow us to judge the reliability of a forecast and to deduce further predictions relevant to our own location and needs. For instance, it makes a difference to me whether today’s forecast that it will be windy tomorrow is based on an expectation of a nearby high-pressure area, or of a more distant hurricane. I would take more precautions in the latter case.
[“wind due to hurricane” and “wind due to high-pressure area” are different explanations for a particular prediction.]
So knowledge is more than just predictive because it also lets us design things?
Here’s a solution to the problem with the oracle—design a computer that inputs every possible design to the oracle and picks the best. You may object that this would be extremely time-consuming and therefore impractical. However, you don’t need to build the computer; just ask the oracle what would happen if you did.
What can we learn from this? This kind of knowledge can be seen as predictive, but only incidentally, because the computer happen to be implemented in the physical world. If it were implemented mathematically, as an abstract algorithm, we would recognize this as deductive, mathematical knowledge. But math is all about tautologies; nothing new is learned. Okay, I apologize for that. I think I’ve been changing my definition of knowledge repeatedly to include or exclude such things. I don’t really care as much about consistent definitions as I should. Hopefully it is clear from context. I’ll go back to your original question.
The difference between the two cases is not the same as the crucial difference here. Having a theory as opposed to a list of predictions for every possible experiment does not necessarily make the theorems easier to prove. When it does, which is almost always, this is simply because that theory is more concise, so it is easier to deduce things from. This seems more like a matter of computing power than one of epistemology.
How does it pick the best?
And wouldn’t the oracle predict that the computer program would never halt, since it would attempt to enter infinitely many questions into the oracle?
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size, but, again, this is irrelevant; whatever trick you use to make this work, you will not change the conclusions.
I think figuring out what criteria you want is an example of a non-predictive issue. That makes it not beside the point. And if the computer picks the best according to criteria we give it, they will contain mistakes. We won’t actually get the best answer. We’ll have to learn stuff and improve our knowledge all in order to set up your predictive thing. So there is this whole realm of non-predictive learning.
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well. So your ability to get useful predictions out of the oracle depends on your understanding, not just on predicting anything.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
Good point. I guess you can’t bootstrap an oracle like this; some things possible mathematically, like calculating a function over an infinity of points, just can’t be done physically. My point still stands, but this illustration definitely dies.
That’s it? That’s just not very impressive by my standards. Popper’s epistemology is far more advanced, already. Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
By ‘what Eliezer is working on now’ I meant AI, which would probably be necessary to extract my desires from my brain in practice. In principle, we could just use Bayes’ theorem a lot, assuming we had precise definitions of these concepts.
Popperian epistemology is incompatible with Bayesian epistemology, which I accept from its own justification, not from a lack of any other theory. I disliked what I had heard about Popper before I started reading LessWrong, but I forget my exact argument, so I do not know if it was valid. From what I do remember, I suspect it was not.
So, you reject Popper’s ideas without having any criticism of them that you can remember?
That’s it?
You don’t care that Popper’s ideas have criticisms of Bayesian epistemology which you haven’t answered. You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
No, I brought up my past experience with Popper because you asked if my opinions on him came from Eliezer.
No, I think Bayesian epistemology has been mathematically proven. I don’t spend a lot of time investigating alternatives for the same reason I don’t spend time investigating alternatives to calculus.
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know. You did bring up some things of this type, like that paper by Popper, but none of them have logically stood up, unless I am missing something.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It seems to me that saying it’s proven, the end, is incompatible with it having any flaws or unsolved problems or need for further research. So, which is it?
All of the above. It is wrong b/c, e.g., it is instrumentalist (has not understood the value of explanatory knowledge) and inductivist (induction is refuted). It is incomplete b/c, e.g. it cannot deal with non-observational knowledge such as moral knowledge. You haven’t proved much to me however I’ve been directed to two books, so judgment there is pending.
I don’t know how you concluded that none of my arguments stood up logically. Did you really think you’d logically refuted every point? I don’t agree, I think most of your arguments were not pure logic, and I thought that various issues were pending further discussion of sub-points. As I recall, some points I raised have not been answered. I’m having several conversations in parallel so I don’t recall which in particular you didn’t address which were replies to you personally, but for example I quoted an argument by David Deutsch about an oracle. The replies I got about how to try to cheat the oracle did not address the substantive point of the thought experiment, and did not address the issue (discussed in the quote) that oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle (we already have an oracle and none of the replies I got about use the oracle would actually work with the oracle we have). As I saw it, my (quoted) points on that issue stood. The replies were some combination of incomplete and missing the point. They were also clever which is a bit of fun. I thought of what I think is a better way to try to cheat the rules, which is to ask the oracle to predict the contents of philosophy books that would be written if philosophy was studied for trillions of years by the best people. However, again, the assumption that any question which is easily described in English can be easily entered into the oracle and get a prediction was not part of the thought experiment. And the reason I hadn’t explained all this yet is that there were various other points pending anyway, so shrug, it’s hard to decide where to start when you have many different things to say.
It is proven that the correct epistemology, meaning one that is necessary to achieve general goals, is isomorphic to Bayesianism with some prior (for beings that know all math). What that prior is requires more work. While the constraint of knowing all math is extremely unrealistic, do you agree that the theory of what knowledge would be had in such situations is a useful guide to action until we have a more general theory. Popperian epistemology cannot tell me how much money to bet at what odds for or against P = NP any more than Bayesian epistemology can, but at least Bayesian epistemology set this as a goal.
This is all based on our limited mathematical ability. A theory does have an advantage over an oracle or the reality-oracle: we can read it. Would you agree that all the benefits of a theory come from this plus knowing all math. The difference is one of mathematical knowledge, not of physical knowledge. How does Popper help with this? Are there guidelines for what ‘equivalent’ formulations of a theory are mathematically better? If so, this is something that Bayesianism does not try to cover, so this may have value. However, this is unrelated to the question of the validity of “don’t assign probabilities to theories”.
I thought I addressed this but, to recap:
That (well and how much bigger) is all I need to make decisions.
So what? I already have my new probabilities.
What is induction if not the calculation of new probabilities for hypotheses? Should I care about these ‘inductive truths’ that Popper disproves the existence of? I already have an algorithm to calculate the best action to take. It seems like Bayesianism isn’t inductivist by Popper’s definition.
I’d like to be sure that we are using the same definitions of our terms, so please give an example.
You mean proven given some assumptions about what an epistemology should be, right?
No. We need explanations to understand the world. In real life, is only when we have explanations that we can make good predictions at all. For example, suppose you have a predictive theory about dice and you want to make bets. I chose that example intentionally to engage with areas of your strength. OK, now you face the issue: does a particular real world situation have the correct attributes for my predictive theory to apply? You have to address that to know if your predictions will be correct or not. We always face this kind of problem to do much of anything. How do we figure out when our theories apply? We come up with explanations about what kinds of situations they apply to, and what situation we are in, and we then come up with explanations about why we think we are/aren’t in the right kind of situation, and we use critical argument to improve these explanations. Bayesian Epistemology does not address all this.
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
I gave examples of moral knowledge in another comment to you. Morality is knowledge about how to live, what is a good life. e.g. murder is immoral.
Yes, I stated my assumptions in the sentence, though I may have missed some.
This comes back to the distinction between one complete theory that fully specifies the universe and a set of theories that are considered to be one because we are only looking at a certain domain. In the former case, the domain of applicability is everywhere. In the latter, we have a probability distribution that tells us how likely it is to fail in every domain. So, this kind of thing is all there in the math.
What do you mean by ‘a good theory’. Bayesian never select one theory as ‘good’ as follow that; we always consider the possibility of being wrong. When theories have higher probability than others, I guess you could call them good. I don’t see why this is hard; just calculate P(H | E) for all the theories and give more weight to the more likely ones when making decisions.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Let’s consider only theories which make all their predictions with 100% probability for now. And theories which cover everything.
Then:
If H and E are consistent, then it follows that P(H | E) > P(H).
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
We can put all the Hs in two categories: the consistent ones which gain equal probability, and the inconsistent ones for which P(H|E) = 0. (Assumption warning: we’re relying on getting it right which H are consistent with which E.)
This means:
1) consistency and support coincide.
2) there are infinitely many equally supported theories. There are only and exactly two amounts of support that any theory has given all current evidence, one of which is 0.
3) The support concept plays no role in helping us distinguish between the theories with more than 0 support.
4) The support concept can be dropped entirely because it has no use at all. The consistency concept does everything
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Similar arguments can be made without my initial assumptions/restrictions. For example introducing theories that make predictions with less than 100% probability will not help you because they are going to have lower probability than theories which make the same predictions with 100% probability.
Well the ratio is the same, but that’s probably what you meant.
Have a prior. This reintroduces probabilities and deals with the remaining theories. You will converge on the right theory eventually no matter what your prior is. Of course, that does not mean that all priors are equally rational.
If they all have the same prior probability, then their probabilities are the same and stay that way. If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
How does having a prior save the concept of support? Can you give an example? Maybe the one here, currently near the bottom:
http://lesswrong.com/lw/54u/bayesian_epistemology_vs_popper/3urr?context=3
Well shouldn’t they? If you look at it from the perspective of making decisions rather than finding one right theory, it’s obvious that they are equiprobable and this should be recognized.
Solomonoff does not give “some things higher prior probabilities in a 100% non-evidence-based way”. All hypotheses have the same probability, many just make similar predictions.
Is anyone here working on the problem of parenting/educating AIs?
It seems someone has downvoted you for not being familiar with Eliezer’s work on AI. Basically, this is overly anthropomorphic. It is one of our goals to ensure that an AI can progress from a ‘seed AI’ to a superintelligent AI without anything going wrong, but, in practice, we’ve observed that using metaphors like ‘parenting’ confuses people too much to make progress, so we avoid it.
Don’t worry about downvotes, they do not matter.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
One of the fundamental attributes of an AI is that it’s a program which can learn new things.
Humans are also entities that learn new things.
But humans, left alone, don’t fare so well. Helping people learn is important, especially children. This avoids having everyone reinvent the wheel.
The parenting issue therefore must be addressed for AI. I am familiar with the main ideas of the kind of AI work you guys do, but I have not found the answer to this.
One possible way to address it is to say the AI will reinvent the wheel. It will have no help but just figure everything out from scratch.
Another approach would be to program some ideas into the AI (changeable, or not, or some of each), and then leave it alone with that starting point.
Another approach would be to talk with the AI, answer its questions, lecture it, etc… This is the approach humans use with their children.
Each of these approaches has various problems with it which are non-trivial to solve.
Make sense so far?
When humans hear parenting, they think of the human parenting process. Describe the AI as ‘learning’ and the humans as ‘helping it learn’. This get us closer to the idea of humans learning about the universe around them, rather than being raised as generic members of society.
Well, the point of down votes is discourage certain behaviour, and I agree that you should use terminology that we have found less likely to cause confusion.
AIs don’t necessarily have so much of a problem with this. They learn very differently than humans: http://lesswrong.com/lw/jo/einsteins_arrogance/ , http://lesswrong.com/lw/qj/einsteins_speed/ , http://lesswrong.com/lw/qk/that_alien_message/
This is definitely an important problem, but we’re not really at the stage where it is necessary yet. I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
Not all humans. Not me. Is that not a bias?
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution. We should expect an AI to use it, we shouldn’t expect a brand new solution to this hard problem (historically there have been very few candidate solutions proposed, most not at all promising).
The implementation details are not very important because the result will be universal, just like people are. This is similar to how the implementation details of universal computers are not important for many purposes.
Are you guys familiar with these concepts? There is important knowledge relevant to creating AIs which your statement seems to me to overlook.
Yes, that would be a bias. Note that this kind of bias is not always explicitly noticed.
As a general rule, if I downvote, I either reply to the post, or it is something that should be obvious to someone who has read the main sequences.
No, there is another: the brain. It is also much faster than evolution, an advantage I would want a FAI to have.
You are unfamiliar with the basic concepts of evolutionary epistemology. The brain internally does evolution of ideas.
Why is it that you guys want to make AI but don’t study relevant topics like this?
You’re conflating two things. Biological evolution is a very specific algorithm, with well-studied mathematical properties. ‘Evolution’ in general just means any change over time. You seem to be using it in an intermediate sense, as any change that proceeds through reproduction, variation, and selection, which is also a common meaning. This, however, is still very broad, so there’s very little that you can learn about an AI just from knowing “it will come up with many ideas, mostly based on previous ones, and reject most of them”. This seems less informative than “it will look at evidence and then rationally adjust its understanding”.
There’s an article related to this: http://lesswrong.com/lw/l6/no_evolutions_for_corporations_or_nanodevices/
Eliezer has studied cognitive science. Those of us not working directly with him have very little to do with AI design. Even Eliezer’s current work is slightly more background theory than AI itself.
I’m not conflating them. I did not mean “change over time”.
There are many things we can learn from evolutionary epistemology. It seeming broad to you does not prevent that. You would do better to ask what good it is instead of guess it is no good.
For one thing it connects with meme theory.
A different example is that it explains misunderstandings when people communicate. Misunderstandings are extremely common because communication involves 1) guessing what the other person is trying to say 2) selecting between those guesses with criticism 3) making more guesses which are variants of previous guesses 4) more selection 5) etc
This explanation helps us see how easily communication can go wrong. It raises interesting issues like why so much communication doesn’t go wrong. It refutes various myths like that people absorb their teacher’s lectures a little like sponges.
It matters. And other explanations of miscommunication are worse.
But that isn’t the topic I was speaking of. I meant evolutionary epistemology. Which btw I know that Eliezer has not studied much because he isn’t familiar with one of it’s major figures (Popper).
I don’t know enough about evolutionary epistemology to evaluate the usefulness and applicability of its ideas.
How was evolutionary epistemology tested? Are there experiments or just introspection?
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics. There has not been any science experiment to test it which I consider important, but I could conceive of some because if you specified different and perverse laws of physics you could break evolution. In a different sense, evolution is tested constantly in that the laws of physics and evidence we see around us, every day, are not that perverse but conceivable physics that would break evolution.
The reason I accept evolution (again I refer to the epistemological theory about how knowledge is created) is that it is a good explanation, and it solves an important philosophical problem, and I don’t know anything wrong with it, and I also don’t know any rivals which solve the problem.
The problem has a long history. Where does “apparent design” come from? Paley gave an example of finding a watch in nature, which he said you know can’t have gotten there by chance. That’s correct—the watch has knowledge (aka apparent designed, or purposeful complexity, or many other terms). The watch is adapted “to a purpose” as some people put it (I’m not really a fan of the purpose terminology. But it’s adapted! And I think it gets the point across ok.)
Paley then guessed as follows: there is no possible solution to the origins of knowledge other than “A designer (God) created it”. This is a very bad solution even pre-Darwin because it does not actually solve the problem. The designer itself has knowledge, adaptation to a purpose, whatever. So where did it come from? The origin is not answered.
Since then, the problem has been solved by the theory of evolution and nothing else. And it applies to more than just watches found in nature, and to plants and animals. It also applies to human knowledge. The answer “intelligence did it” is no better than “God did it”. How does intelligence do it? The only known answer is: by evolution.
The best thing to read on this topic is The Beginning of Infinity by David Deutsch which discusses Popperian epistemology, evolution, Paley’s problem and its solution, and also has two chapters about meme theory which give important applications.
You can also find some, e.g. here: http://fallibleideas.com/evolution-and-knowledge
Also here: http://fallibleideas.com/tradition (Deutsch discusses static and dynamic memes and societies. I discuss “traditions” rather than “memes”. It’s quite similar stuff.)
What? Epistemological evolution seems to be about how the mind works, independent of what philosophical status is accorded to the thoughts. Surely it could be tested just by checking if the mind actually develops ideas in accordance with the way it is predicted to.
If you want to check how minds work, you could do that. But that’s very hard. We’re not there yet. We don’t know how.
How minds work is a separate issue from evolutionary epistemology. Epistemology is about how knowledge is created (in abstract, not in human minds specifically). If it turns out there is another way, it wouldn’t upset the evolution would create knowledge if done in minds.
There’s no reason to think there is another way. No argument that there is. No explanation of why to expect there to be. No promising research on the verge of working one out. Shrug.
I see. I thought that evolutionary epistemology was a theory of human minds, though I know that that technically isn’t epistemology. Does evolutionary epistemology describe knowledge about the world, mathematical knowledge, or both (I suspect you will say both)?
It describes the creation of any type of knowledge. It doesn’t tell you the specifics of any field itself, but doing it helps you learn them.
So, you’re saying that in order to create knowledge, there has to be copying, variation, and selection. I would agree with the first two, but not necessarily the third. Consider a formal axiomatic system. It produces an ever-growing list of theorems, but none are ever selected any more than others. Would you still consider this system to be learning?
With deduction, all the consequences are already contained in the premises and axioms. Abstractly, that’s not learning.
When human mathematicians do deduction, they do learn stuff, because they also think about stuff while doing it, they don’t just mechanically and thoughtlessly follow the rules of math.
So induction (or probabilistic updating, since you said that Popper proved it not to be the same as whatever philosopher call ‘induction’) isn’t learning either because the conclusions are contained in the priors and observations?
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
the idea of induction is that the conclusions are NOT logically contained in the observations (that’s why it is not deduction).
if you make up a prior from which everything deductively follows, and everything else is mere deduction from there, then all of your problems and mistakes are in the prior.
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
The correct method of updating one’s probability distributions is contained in the observations. P(H|E) = P(H)P(E|H)/P(E) .
So how could evolutionary epistemology be relevant to AI design?
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
The conjectures need to be based on the previous conjectures (not necessarily all of the, but some). That makes it replication with variation. The criticism is the selection.
Any AI design that completely ignores this is, imo, hopeless. I think that’s why the AI field hasn’t really gotten anywhere. They don’t understand what they are trying to make, because they have the wrong philosophy (in particular the wrong explanations. i don’t mean math or logic).
Could you explain where AIXI does any of that?
Or could you explain where a Bayesian spam filter does that?
Note that there are AI approaches which do do something close to what you think an AI “needs”. For example, some of Simon Colton’s work can be thought of in a way roughly like what you want. But it is a mistake to think that such an entity needs to do that. (Some of the hardcore Bayesians make the same mistake in assuming that an AI must use a Bayesian framework. That something works well as a philosophical approach is not the same claim as that it should work well in a specific setting where we want an artificial entity to produce certain classes of systematic reliable results.)
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to. All the knowledge is provided by the human programmer—they are designed by an intelligent person and to the extent they “act intelligent” it’s all due to the person providing the thinking for it.
I’m not sure this is at all well-defined. I’m curious, what would make you change your mind? If for example, Colton’s systems constructed new definitions, proofs, conjectures, and counter-examples in math would that be enough to decide they were learning?
How about it starts by passing the turing test?
Or: show me the code, and explain to me how it works, and how the code doesn’t contain all the knowledge the AI creates.
Could you explain how this is connected to the issue of making new knowledge?
This seems a bit like showing a negative. I will suggest you look for a start at Simon Colton’s paper in the Journal of Integer Sequences which uses a program that operates in a way very close to the way you think an AI would need to operate in terms of making conjectures and trying to refute them. I don’t know if the source code is easily available. It used to be on Colton’s website but I don’t see it there anymore; if his work seems at all interesting to you you can presumably email him requesting a copy. I don’t know how to show that the AI “doesn’t contain all the knowledge the AI creates” aside from the fact that the system constructed concepts and conjectures in number theory which had not previously been constructed. Moreover, Colton’s own background in number theory is not very heavy, so it is difficult to claim that he’s importing his own knowledge into the code. If you define more precisely what you mean by the code containing the knowledge I might be able to answer that further. Without a more precise notion it isn’t clear to me how to respond.
Holding a conversation requires creating knowledge of what the other guy is saying.
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
In a spam filter, a programmer figures out how he wants spam filtered (he has the idea), then he tells the computer to do it. The computer doesn’t figure out the idea or any new idea.
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension; they contain important knowledge such as the knowledge of optics that makes their eyes work. This is why “Where can cats come from?” has been considered an important question (people want an explanation of the knowledge which i sometimes called “apparent design), while “Where can rocks come from?” is not in the same category of question (it does have some interest for other reasons).
With people, people create ideas that aren’t in their genes, and were’t told to them by their parents or anyone else. That includes abstract ideas that aren’t the summation of observation. They sometimes create ideas no one ever thought of before. THey create new ideas.
In an AI (AGI you call it?) should be like a person: it should create new ideas which are not in it’s “genes” (programming). If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
Are genetic algorithm systems then creating something new by your definition?
Different concepts. An artificial intelligent is not (necessarily) a well-defined notion. An AGI is an artficial general intelligence, essentially something that passes the Turing test. Not the same concept.
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
When you “discover” that 2+1 = 3, given premises and axioms, you aren’t discovering something new.
But working mathematicians do more than that. They create new knowledge. It includes:
1) they learn new ways to think about the premises and axioms
2) they do not publish deductively implied facts unselectively or randomly. they choose the ones that they consider important. by making these choices they are adding content not found in the premises and axioms
3) they make choices between different possible proofs of the same thing. again where they make choices they are adding stuff, based on their own non-deductive understanding
4) when mathematicians work on proofs, they also think about stuff as they go. just like when experimental scientists do fairly mundane tasks in a lab, at the same time they will think and make it interesting with their thoughts.
They could be. I don’t think any exist yet that do. For example I read a Dawkins paper about one. In the paper he basically explained how he tweaked the code in order to get the results he wanted. He didn’t, apparently, realize that it was him, not the program, creating the output.
By “AI” I mean AGI. An intelligence (like a person) which is artificial. Please read all my prior statements in light of that.
Well, OK, but they’d understand how it was created, and could explain that. They could explain what they know about why it works (it copies what humans do). And they could also make the code public and discuss what it doesn’t include (e.g. hard coded special cases. except for the 3 he included on purpose, and he explains why they are there). That’d be pretty convincing!
I don’t think this is true. While he probably wouldn’t announce it if he was working on AI, he’ has indicated that he’s working on two books (HPMoR and a rationality book), and has another book queued. He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
I more meant “he’s probably thinking about this in the back of his mind fairly often”, as well as trying to be humourous.
Do you know what he would think of work that has a small chance of solving goal stability and a slightly larger chance of helping with AI in general? This seems like a net plus to me, but you seem to have heard what he thinks should be studied from a slightly clearer source than I did.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
The reason is that if you could predict this you would in effect already have made the discovery.
Understanding is not primarily predictive and it is useful in a practical way. For example, you have to understand issues to address critical arguments offered by your peers. Merely predicting that they are wrong isn’t a good approach. It’s crucial to understand what their point is and to reason with them.
Understanding ideas helps us improve on them. It’s crucial to making judgments about what would be an improvement or not. It lets us judge changes better b/c e.g. we have some conception of why it works, which means we can evaluate what would break it or not.
That is not what I meant. If we could predict that the LHC will discover superparticles then yes, we would already know that. However, since we don’t know whether it will produce superparticles, we can predict that it will give us a lot of knowledge, since we will either learn that superparticles in the mass range detectable by the LHC exist or that they do not exist, so we can predict that we will learn a lot more about the universe by continuing to run the LHC than by filling in the tunnel where it is housed.
Eliezer proves that you cannot predict which direction science will go from a Bayesian perspective in http://lesswrong.com/lw/ii/conservation_of_expected_evidence/ .
So if new knowledge doesn’t come from prediction, what creates it? Answering this is one of epistemology’s main tasks. If you are focussing on prediction then you aren’t addressing it. Am I missing something?
New knowledge comes from observation. If you are referring to knowledge of what a theory says rather than of which theory is true, then this is assumed to be known. The math of how to deal with a situation where a theory is known but its consequences cannot be fully understood due to mathematical limitations is still in its infancy, but this has never posed a problem in practice.
That is a substantive and strong empiricist claim which I think is false.
For example, we have knowledge of things we never observed. Like stars. Observation is always indirect and its correctness always depends on theories such as our theories about whether the chain of proxies we are observing with will in fact observe what we want to observe.
Do you understand what I’m talking about and have a reply, or do you need me to explain further?
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
The new knowledge that is obtained from an observation is not just the content of the observation, it is also the new probabilities resulting from the observation. This is discussed at http://lesswrong.com/lw/pb/belief_in_the_implied_invisible/ .
It is assumed in practice, applied epistemology being a rather important thing to have. In ‘pure’ epistemology, it is just labelled incomplete; we definitely don’t have all the answers yet.
It seems to me that you’re pretty much conceding that your epistemology doesn’t work. (All flaws can be taken as “incomplete” parts where, in the future, maybe a solution will be found.)
That would leave the following important disagreement: Popper’s epistemology is not incomplete in any significant way. There is room for improvement, sure, but not really any flaws worth complaining about. No big unsolved problems marring it. So, why not drop this epistemology that doesn’t have the answers yet for one that does?
Would you describe quantum mechanics’ incompatibility with general relativity as “the theory doesn’t work”? For a being with unlimited computing power in a universe that is known to be computable (except for the being itself obviously), we are almost entirely done. Furthermore, many of the missing pieces to get from that to something much more complete seem related.
No, it is just wrong. Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories. Any consistent consequentialist decision rule must be basically equivalent to that. The statement that there is no way to assign probabilities to theories therefore implies that there is no algorithm that a consequentialist can follow to reliably achieve their goals. Note that even if Popper’s values are not consequentialist, a consequentialist should still be able to act based on the knowledge obtained by a valid epistemology.
Can you be more specific?
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t believe these givens are all true. If you think otherwise could we start with you giving the details more? I don’t want to argue with parts you simply omitted b/c I’ll have to guess what you think too much.
As a separate issue, “given my preferences” is such a huge given. It means that your epistemology does not deal in moral knowledge. At all. It simply takes preferences as givens and doesn’t tell you which to have. So in practice in real life it cannot be used for a lot of important issues. That’s a big flaw. And it means a whole entire second epistemology is needed to deal in moral knowledge. And if we have one of those, and it works, why not use it for all knowledge?
The rest of the paragraph was what I meant by this. You agree that Popperian epistemology states that theories should not be assigned probabilities.
Depends. If it’s standards make it useless, then, while internally consistent, I can judge it to be pointless. I just want an epistemology that can help me actually make decisions based on what I learn about reality.
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future. In this context, it is far more general than anything composed of a bunch of term, each of which describes some properties of a person.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
Overall you’ve said almost nothing that’s actually about Popperian epistemology. You just took one claim (which has nothing to do with what it’s about, it’s just a minor point about what it isn’t) and said it’s wrong (without detailed elaboration).
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
How do you learn what preferences are good to have, in that way?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules. Even if the probabilities are not mentioned when constructing the rule, they can be inferred from its final form.
I don’t know what you mean by ′ “utility” the thing people in real life have’.
Can we please not get into this. If it helps, assume I am an expected paperclip maximizer. How would I decide then?
What was the argument for that?
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
The original is Abraham Wald’s An Essentially Complete Class of Admissible Decision Functions.
Thank you!
I thought you didn’t address the issue (and need to): you did not say what mathematical properties you think that real utilities have and how you deal with them.
Using what premises?
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Every possible universe is associated with a utility.
Any two utilities can be compared.
These comparisons are transitive.
Weighted averages of utilities can be taken.
For any three possible universe, L, M, and N, with L < M, a weighted average of L and N is less than a weighted average of M and N, if N is accorded the same weight in both cases.
Basically just definitions. I’m currently trying to enumerate them, which is why I wanted to find the proof of the theorem we were discussing.
Care about in the sense of when I’m deciding whether to make it. I don’t really care about how reasonable other people’s decisions are unless it’s relevant to my interactions with them, where I will need that knowledge to make my own decisions.
Wait, you bought the book just for that proof? I don’t even know if its the best proof of it (in terms of making assumptions that aren’t necessary to get the result). I’m confidant in the proof because of all the other similar proofs I’ve read, though none seem as widely applicable as that one. I can almost sketch a proof in my mind. Some simple ones are explained well at http://en.wikipedia.org/wiki/Coherence_%28philosophical_gambling_strategy%29 .
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
No, I used inter library loan.
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
Yes, I did. Oops.
But that is what a choice is between—the universe where you choose one way and the universe where you choose another. Often large parts of the universe are ignored, but only because the action’s consequences for those parts are not distinguishable from how those part would be if a different action was taken. A utility function may be a sum (or more complicated combination) of parts referring to individual aspects of the universe, but, in this context, let’s not call those ‘utilities’; we’ll reserve that word for the final thing used to make decisions. Most of this is not consciously invoked when people make decisions, but a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice.
I could could achieve better consequences using an ‘irrational’ process, I would, but this sounds nonsensical because I am used to defining ‘rational’ as that which reliably gets the best consequences.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
So what? I think you’re basically saying that your formulation is equivalent to what people (should) do. But that doesn’t address the issue of what people actually do—it doesn’t demonstrate the equivalence. As you guys like to point out, people often think in ways that don’t make sense, including violating basic logic.
But also, for example, I think a person might evaluate getting a cat, and getting an iphone, and then they might (incorrectly) evaluate both by adding the benefits instead of by considering the universe with both based on its own properties.
Another issue is that I don’t think any two utilities people have can be compared. They are sometimes judged with different, contradictory standards. This leads to two major issues when trying to compare them 1) the person doesn’t know how 2) it might not be possible to compare even in theory because one or both contain some mistakes. the mistakes might need to be fixed before comparing, but that would change it.
I’m not saying people are doing it correctly. Whether they are right or wrong has no bearing on whether “utility” the mathematical object with the 5 properties you listed corresponds to “utility” the thing people do.
If you want to discuss what people should do, rather than what they do do, that is a moral issue. So it leads to questions like: how does bayesian epistemology create moral knowledge and how does it evaluate moral statements?
If you want to discuss what kind of advice is helpful to people (which people?), then I”m sure how you can see how talking about entire universes could easily confuse people, and how some other procedure being a special case of it may not be very good advice which does not address the practical problems they are having.
Do you think that the Dutch book arguments go “way beyond deductive or definitions”? Well, I guess that would depend on what you conclude from them. For now, lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
The confusion here is that we’re not judging an action. If I make a mistake and happen to benefit from it, there were good consequences, but there was no choice involved. I don’t care about this; it already happened. What I do care about, and what I can accomplish, is avoiding similar mistakes in the future.
Yes, that is what I was discussing. I probably don’t want to actually get into my arguments here. Can you give an example of what you mean by “moral knowledge”?
Applying dutch book arguments to real life situations always goes way behind deduction and definitions, yes.
A need? Are you talking about morality now?
Why are we saying this? You now speak of probabilities of events. Previously we were discussing epistemology which is about ideas. I object to assigning probabilities to the truth of ideas. Assigning them to events is OK when
1) the laws of physics are indeterministic (never, as far as we know)
2) we have incomplete information and want to make a prediction that would be deterministic except that we have to put several possibilities in some places, which leads to several possible answers. and probability is a reasonable way to organize thoughts about that.
So what?
Murder is immoral.
Being closed minded makes ones life worse because it sabotages improvement.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”? How would you test this? What would you consider a proof of it?
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means. I am a moral realist to some, a relativist to others, and an error theorist to other others. I could prove the statement for many common non-confused definitions, though not for, for example, people who say ‘morality’ is synomnymous to ‘that which is commanded by God’, which is based on confusion but at least everyone can agree on when it is or isn’t true and not for error theorists, as both groups’ definitions make the sentence false.
In theory I could prove this sentence, but in practice I could not do this clearly, especially over the internet. It would probably be much easier for you to read the sequences, which get to this toward the end, but, depending on your answers to some of my questions, there may be an easier way to explain this.
Yes. One epistemology. All types of knowledge. Unified!
You would not.
We don’t accept proofs of anything, we are fallibilists. We consider mathematical proofs to be good arguments though. I don’t really want to argue about those (unless you’re terribly interested. btw this is covered in the math chapter of The Fabric of Reality by David Deutsch). But the point is we don’t accept anything as providing certainty or even probableness. In our terminology, nothing provides justification.
What we do instead is explain our ideas, and to criticize mistakes, and in this way to improve our ideas. This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
Fine, what would you consider an argument for it?
Eliezer and I probably agree with you.
Well, biological evolution is a much smaller part of conceptspace than “replication, variation, selection” and now I’m realizing that you probably haven’t read A Human’s Guide to Words which is extremely important and interesting and, while you’ll know much of it, has things that are unique and original and that you’ll learn a lot from. Please read it.
I do apply reason to those things, I just don’t use the words ‘morality’ in my reasoning process because too many people get confused. It is only a word after all.
On a side note, I am staring to like what I hear of Popper. It seems to embody an understanding of the brain and a bunch of useful advice for it. I think I disagree with some things, but on grounds that seems like the sort of thing that is accepted as motivation for the theory self-modify. Does that make sense? Anyways, it’s not Popper’s fault that there are a set of theorems that in principle remove the need for other types of thought and in practice cause big changes in the way we understand and evaluate the heuristics that are necessary because the brain is fallible and computationally limited.
Wei Dai likes thinking about how to deal with questions outside of Bayesianism’s current domain of applicability, so he might be interested in this.
Interpret this as a need in order to achieve some specified goal in order to keep this part the debate out of morality. A paperclip maximizer, for example would obviously need to not pay 200 paperclips for a lottery with a maximum payout of 100 paperclips in order to achieve its goals. Furthermore, this applies to any consequentialist set of preferences.
Not sure why I wrote that. Substitute ‘theories’.
So you assume morality (the “specified goal”). That makes your theory amoral.
Why is there a need to assign probabilities to theories? Popperian epistemology functions without doing that.
Well there’s a bit more than this, but it’s not important right now. One can work toward any goal just by assuming it as a goal.
Because of the Dutch book arguments. The probabilities can be inferred from the choices. I’m not sure if the agent’s probability distribution can be fully determined from a finite set of wagers, but it can be definitely be inferred to an arbitrary degree of precision by adding enough wagers.
Can you give an example of how you use a Dutch book argument on a non-gambling topic? For example, if I’m considering issues like whether to go swimming today, and what nickname to call my friend, and I don’t assign probabilities like “80% sure that calling her Kate is the best option”, how do I get Dutch Booked?
First you hypothetically ask what would happen if you were asked to make bets on whether calling her Kate would result in world X (with utility U(X)). Do this for all choices and all possible worlds. This gives you probabilities and utilities. You then take a weighted average, as per the VNM theorem.
How do I get Dutch Booked for not doing that?
And I’m still curious how the utilities are decided. By whim?
You don’t get to decide utilities so much as you have to figure out what they are. You already have a utility function, and you do your best to describe it . How do you weight the things you value relative to each other?
This takes observation, because what we think we value often turns out not to be a good description of our feelings and behavior.
From our genes? And the goal is just to figure out what it is, but not change it for the better?
Can you explain how you would change your fundamental moral values for the better?
By criticizing them. And conjecturing improvements which meet the challenges of the criticism. It is the same method as for improving all other knowledge.
In outline it is pretty simple. You may wonder things like what would be a good moral criticism. To that I would say: there’s many books full of examples, why dismiss all that? There is no one true way of arguing. Normal arguments are ok, I do not reject them all out of hand but try to meet their challenges. Even the ones with some kind of mistake (most of them), you can often find some substantive point which can be rescued. It’s important to engage with the best versions of theories you can think of.
BTW once upon a time I was vaguely socialist. Now I’m a (classical) liberal. People do change their fundamental moral values for the better in real life. I attended a speech by a former Muslim terrorist who is now a pro-Western Christian (walid shoebat).
I’ve changed my social values plenty of times, because I decided different policies better served my terminal values. If you wanted to convince me to support looser gun control, for instance, I would be amenable to that because my position on gun control is simply an avenue for satisfying my core values, which might better be satisfied in a different way.
If you tried to convince me to support increased human suffering as an end goal, I would not be amenable to that, unless it turns out I have some value I regard as even more important that would be served by it.
This is what Popper called the Myth of the Framework and refuted in his essay by that name. It’s just not true that everyone is totally set in their ways and extremely closed minded, as you suggest. People with different frameworks learn from each other.
One example is children learn. They are not born sharing their parents framework.
You probably think that frameworks are genetic, so they are. Dealing with that would take a lengthy discussion. Are you interested in this stuff? Would you read a book about it? Do you want to take it seriously?
I’m somewhat skeptical b/c e.g. you gave no reply to some of what I said.
I think a lot of the reason people don’t learn other frameworks, in practice, is merely that they choose not to. They think it sounds stupid (before they understand what it’s actually saying) and decide not to try.
When did I suggest that everyone is set in their ways and extremely closed minded? As I already pointed out, I’ve changed my own social values plenty of times. Our social frameworks are extremely plastic, because there are many possible ways to serve our terminal values.
I have responded to moral arguments with regards to more things than I could reasonably list here (economics, legal codes, etc.) I have done so because I was convinced that alternatives to my preexisting social framework better served my values.
Valuing strict gun control, to pick an example, is not genetically coded for. A person might have various inborn tendencies which will affect how they’re likely to feel about gun control; they might have innate predispositions towards authoritarianism or libertarianism, for instance, that will affect how they form their opinion. A person who valued freedom highly enough might support little or no gun control even if they were convinced that it would result in a greater loss of life. You would have a hard time finding anyone who valued freedom so much that they would support looser gun control if they were convinced it would destroy 90% of the world population, which gives you a bit of information about how they weight their preferences.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more that would be served by this. If you could argue that my preference for happiness is arbitrary, that preference for suffering is more natural, even if you could demonstrate that the moral goodness of human suffering is intrinsically inscribed on the fabric of the universe, why should I care? To make me want to make humans unhappy, you’d have to convince me there’s something else I want enough to make humans unhappy for its sake.
I also don’t feel I’m being properly understood here; I’m sorry if I’m not following up on everything, but I’m trying to focus on the things that I think meaningfully further the conversation, and I think some of your arguments are based on misapprehensions about where I’m coming from. You’ve already made it clear that you feel the same, but you can take it as assured that I’m both trying to understand you and make myself understood.
You suggested it about a category of ideas which you called “core values”.
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
That’s more or less correct. To quote one of Eliezer’s works of ridiculous fanfiction, “A moral system has room for only one absolute commandment; if two unbreakable rules collide, one has to give way.”
If circumstances force my various priorities into conflict, some must give way to others, and if I value one thing more than anything else, I must be willing to sacrifice anything else for it. That doesn’t necessarily make it my only terminal value; I might have major parts of my social framework which ultimately reduce to service to another value, and they’d have to bend if they ever came into conflict with a more heavily weighted value.
Well in the first half, you get Dutch booked in the usual way. It’s not necessarily actually happening, but there still must be probabilities that you would use if it were. In the second half, if you don’t follow the procedure (or an equivalent one) you violate at least one VNM axiom.
If you violate axiom 1, there are situations in which you don’t have a preferred choice—not as is “both are equally good/bad” but as in your decision process does not give an answer or gives more than one answer. I don’t think I’d call this a decision process.
If you violate axiom 2, there are outcomes L, M and N such that you’d want to switch from L to M and then from M to N, but you would not want to switch from L to N.
Axiom 3 is unimportant and is just there to simplify the math.
For axiom 4, imagine a situation where a statement with unknown truth-value, X, determines whether you get to choose between two outcomes, L and M, with L < M, or have no choice in accepting a third outcome, N. If you violate the axiom, there is a situation like this where, if you were asked for your choice before you know X (it will be ignored if X is false), you would pick L, even though L < M.
Do any of these situations describe your preferences?
I’ll let Desrtopa handle this.
Can you give a concrete example. What happens to me? Is it that I get an outcome which is less ideal than was available?
If your decision process is not equivalent to one that uses the previously described procedure, there are situations where something like one of the following will happen.
I ask you if you want chocolate or vanilla ice cream and you don’t decide. Not just you don’t care which one you get or you would prefer not to have ice cream, but you don’t output anything and see nothing wrong with that.
You prefer chocolate to vanilla ice cream, so you would willingly pay 1c to have the vanilla ice cream that you have been promised upgraded to chocolate. You also happen to prefer strawberry to chocolate, so you are willing to pay 1c to exchange a promise of a chocolate ice cream for a promise of a strawberry ice cream. Furthermore, it turn out you prefer vanilla to strawberry, so whenever you are offered a strawberry ice cream, you gladly pay a single cent to change that to an offer of vanilla, ad infinitum.
N/A
You like chocolate ice cream more than vanilla ice cream. Nobody knows if you’ll get ice cream today, but you are asked for your choice just in case, so you pick vanilla.
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
Let’s say they did learn better. How would they do this—changing their utility function? Someone with a utility function like this really does prefer B+1c to A, C+1c to B, and A+1c to C. Even if they did change their utility function, the new one would either have a new hole or it would obey the results of the VNM-theorem.
So Bayes teaches: do not disobey the laws of logic and math.
Still wondering where the assigning probabilities to truths of theories is.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far. And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
Well the Dutch books make it so you have to pick some probabilities. Actually getting the right prior is incomplete, though Solomonoff induction is most of the way there.
Where else are you hoping to go?
In principle, yes. There’s actually a computer program called AIXItl that does it. In practice I use approximations to it. It probably could be done to a very higher degree of certainty. There are a lot of issues and a lot of relevant data.
Can you give an example? Use the ice cream flavors. What probabilities do you have to pick to buy ice cream without being dutch booked?
Explanatory knowledge. Understanding the world. Philosophical knowledge. Moral knowledge. Non-scientific, non-emprical knowledge. Beyond prediction and observation.
How do you know if your approximations are OK to make or ruin things? How do you work out what kinds of approximations are and aren’t safe to make?
The way I would do that is by understanding the explanation of why something is supposed to work. In that way, I can evaluate proposed changes to see whether they mess up the main point or not.
Endo, I think you are making things more confusing by combining issues of Bayesianism with issues of utility. It might help to keep them more separate or to be clear when one is talking about one, the other, or some hybrid.
I use the term Bayesianism to include utility because (a) they are connected and (b) a philosophy of probabilities as abstract mathematical constructs with no applications doesn’t seem complete; it needs an explanation of why those specific objects are studied. How do you think that any of this caused or could cause confusion?
Well, it empirically seems to be causing confusion. See curi’s remarks about the ice cream example. Also, one doesn’t need Bayesianism to include utility and that isn’t standard (although it is true that they do go very well together).
Yes I see what you mean.
I think it goes a bit beyond this. Utility considerations motivate the choice of definitions. I acknowledge that they are distinct things, though.
The consequences could easily be thousands of lives or more in case of sufficiently important decisions.
So the argument is now not that that suboptimal issues don’t exist but that they aren’t a big deal? Are you aware that the primary reason that this involves small amounts of ice cream is for convenience of the example? There’s no reason these couldn’t happen with far more serious issues (such as what medicine to use).
I know. I thought it was strange that you said “ad infinitum” when it would not go on forever. And that you presented this as dire but made your example non-dire.
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
How can I notice? I imagine you will say that involves probabilities. But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
And what I definitely don’t see is probabilities that various theories are true (as opposed to probabilities about events which are ok).
I didn’t say that (I’m not endoself).
Yes, but the Bayesian avoids having this step. For any step you can construct a “criticism” that will duplicate what the Bayesian will do. This is connected to a number of issues, including the fact that what constitutes valid criticism in a Popperian framework is far from clear.
Ice cream is an analogy. It might not be a great one since it is connected to preferences (which sometimes gets confused with Bayesianism). The analogy isn’t a great one. It might make more sense to just go read Cox’s theorem and translate to yourself what the assumptions mean about an approach.
OK, my bad. So many people. I lose track.
Anything which is not itself criticized.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out? To help concretize the idea for me.
Well, I’m not sure, in that I’m not convinced that Dutch Booking really does occur much in real life other than in the obvious contexts. But there are a lot of contexts it does occur in. For example, a fair number of complicated stock maneuvers can be thought of essentially as attempts to dutch book other players in the stock market.
Koth already had an amusing response to that.
Someone here told me it does. Maybe you can go argue with him for me ;-)
I agree.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
In what way would the Popperian approach fail to fit the decision rule approach on page 19 of Bickel and Doksum?
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Do you think of arguments and explanations as types of evidence? If so, how does that work? If not then I wasn’t talking about evidence.
In Bayesian epistemology, most arguments and explanations are just applications of Bayes’ law as discussed at http://lesswrong.com/lw/o7/searching_for_bayesstructure/ . Of course, ‘taking evidence into account’ is the same as using it in Bayes’ law.
Can you give an example using a moral argument, or anything that would help illustrate how you take things that don’t look like they are Bayes’ law cases and apply it anyway?
The linked page says imperfectly efficient minds give off heat and that this is probabilistic (which is weird b/c the laws of physics govern it and they are not probabilistic but deterministic). Even if I accept this, I don’t quite see the relevance. Are you reductionists? I don’t think that the underlying physical processes tell us everything interesting about the epistemology.
It’s called Solomonoff induction—and we’ve known about it for almost 50 years.
Provide the details which address the problem, not a wikipedia link.
It is not my job to teach you maths. Here, use Google.
I know math. The problem is that you haven’t provided anything that works, or any criticism of Popper. Basically all your contributions to the discussion are appeals to authority. You don’t argue, you just say “This source is right; read it and concede”. And most of your sources are wikipedia quality… If you won’t say anything I can’t find on google, why talk at all?
Because one doesn’t generally know where to look?
There are plenty of explanations of Solomonoff induction out there. You asked for how the math of confirmation works—and that’s the math of universal inductive inference. If you just want an instance of confirmation, see Bayes’s theorem.
It is not an “appeal to authority” to direct you to the maths that answers your query!