Bayesianism versus Critical Rationalism
I have just rediscovered an article by Max Albert on my hard drive which I never got around to reading that might interest others on Less Wrong. You can find the article here. It is an argument against Bayesianism and for Critical Rationalism (of Karl Popper fame).
Abstract:
Economists claim that principles of rationality are normative principles. Nevertheless,
they go on to explain why it is in a person’s own interest to be rational. If this were true,
being rational itself would be a means to an end, and rationality could be interpreted in
a non-normative or naturalistic way. The alternative is not attractive: if the only argument
in favor of principles of rationality were their intrinsic appeal, a commitment to
rationality would be irrational, making the notion of rationality self-defeating. A comprehensive
conception of rationality should recommend itself: it should be rational to be
rational. Moreover, since rational action requires rational beliefs concerning means-ends
relations, a naturalistic conception of rationality has to cover rational belief formation including
the belief that it is rational to be rational. The paper considers four conceptions
of rationality and asks whether they can deliver the goods: Bayesianism, perfect rationality
(just in case that it differs from Bayesianism), ecological rationality (as a version of
bounded rationality), and critical rationality, the conception of rationality characterizing
critical rationalism.
Any thoughts?
- 15 Apr 2011 17:42 UTC; 2 points) 's comment on On Debates with Trolls by (
- 8 Apr 2011 1:28 UTC; -5 points) 's comment on Meta: Karma and lesswrong mainstream positions by (
A good nutshell description of the type of Bayesianism that many LWers think correct is objective Bayesianism with critical rationalism-like underpinnings. Where recursive justification hits bottom is particularly relevant. On my cursory skim, Albert only seems to be addressing “subjective” Bayesianism which allows for any choice of prior.
For people like me who have no clue, if you scroll down a bit here there is a comparison (so you get a vague idea):
Subjective Bayesians emphasize the relative lack of rational constraints on prior probabilities.
Objective Bayesians (e.g., Jaynes and Rosenkrantz) emphasize the extent to which prior probabilities are rationally constrained.
More here:
And of course Critical rationalism:
Critical rationalism explicitly proposes a third decision rule for rational belief formation: it is rational to believe a hypothesis if it has so far withstood serious criticism better than its competitors.
I’ll add that a decent summary of the position espoused in Where recursive justification hits bottom (linked in the grandparent) is that critical rationalism (or something like it) entails objective Bayesianism. It both entails the use of Baye’s rule to update on information and it entails a set of correct priors.
Thanks for helping me realize that Critical Rationalism and Bayesianism can be compliments rather than substitutes.
FYI that is a misleading statement of Critical Rationalism.
For one thing, Popper was not a “belief philosopher” so he wouldn’t have stated it quite like that.
There are a lot of misleading statements about CR floating around. Most come from its opponents trying to make sense of it on their own terms. In trying to formulate it in a way that makes sense given their anti-CR premises, they change it. It’s best to read primary sources for this.
It seems to think the problem of the priors does in Bayesianism :-(
Popper seems outdated. Rejecting induction completely is not very realistic.
Critical Rationalism has advanced somewhat since Popper.
I think Hume would agree.
Not very significantly. The revolution happened mostly without them.
After reconsidering your statement, I have come to agree.
Who do you think advanced CR? I think only David Deutsch has improved on Popper.
I had in mind Miller and Deutsch.
Which Miller publication or argument?
Critical Rationalism: A Restatement and Defense and Out of Error: Further Essays On Critical Rationalism.
Thanks I ordered them. I’d only read individual articles of his.
Hume would agree because he never accepted the full force of his own argument. He couldn’t imagine how people can create knowledge without induction even though he could see that induction is impossible. It took Popper to explain how knowledge can be created without induction.
That is weak. You don’t even want to say that Popper is wrong, only that he seems outdated. And you suffer from a failure of imagination when you say that rejecting induction completely is not very realistic. So you’re just going to reject it somewhat, whatever that may mean. Popper’s position was that induction is impossible, it never happens, it is not what we do when we create knowledge. You want to somehow hang on to induction, you think Bayesianism allows you to do this, and you think that hanging onto this mistaken idea, which was invented long ago by Aristotle (who tried to make out it was Socrates—see Popper’s The World of Parmenides), makes you modern and up-to-date.
I am not rejecting induction. Science consists primary of induction.
Without induction, we don’t even know whether the sun will rise tomorrow.
Popper seems out of date—since now we know what the rules governing induction actually are.
According to Popper, the sun will rise tomorrow was refuted in the form it was originally meant:
ditto “bread nourishes” and one other famous example i forgot.
That would be “All men are mortal”.
Solomonoff Induction is just another failed attempt at solving the misbegotten problem of induction.
The goal of Solomonoff Induction is prediction; you want to obtain a compact algorithmic description of past data so that future data can be predicted. Such a description you call a theory although you don’t care what the theory is or how it stands up as an explanation. In philosophy this is called instrumentalism and it is a wrong-headed approach to science, as Karl Popper made clear.
Solomonoff Induction will keep going wrong in random and perverse ways because it pays no heed to theories as explanations. You just think that shorter length theories which are consistent with the data are somehow more likely, regardless of content and how silly that content may be. Consequently, things may seem to be going alright for a while, although you won’t know if that was just luck, and then suddenly it has joined the Ministry of Silly Walks. You won’t get the steady improvement that you do when you treat theories as explanations and focus on correcting errors in those explanations (aka Critical Rationalism).
When a theory starts being inconsistent with the data you just throw it out. But what if you were wrong about the theory being inconsistent? Data needs interpretation and we can be wrong about our interpretation. As Popper explained, all data is theory-laden and there is no way around this.
A related point is that science does not start with a set of observations or data. Science starts with problems. We collect data to try to refute theories that have already been advanced to solve a problem. Without a theory, one has no idea which data is relevant or what to observe. So you don’t collect data and then fit theories to that data with the goal of making correct predictions.
The focus on data to the exclusion of anything else is empiricist. Most theories get rejected not because they were falsified by contradictory data but because they were refuted by criticism. Solomonoff induction doesn’t care about criticism, let alone that it is pivotal in knowledge creation. Also some theories can’t be refuted by empirical means, so what does Solomonoff Induction do about those?
Solomonoff Induction may be nice mathematically but it is bad philosophy and it is old-hat.
I hope you stick around—LW needs people who’ve read Popper. (However, I take that back if it turns out that you’ve only read Deutsch’s simplified, evangelical caricature of him.)
This is at best unclear. If a person, or the entire scientific community, were given the task of competing with Solomonoff induction to predict an incoming data stream, then either (i) Solomonoff induction would eventually arrive at “the correct theory” (at least in the sense that it no longer ‘keeps going wrong’) or (ii) the human scientists would ‘keep going wrong in random ways’ (i.e. ways that seem random to the scientists) or both.
Talking about the need for ‘good explanations’ over and above mere predictive success is all very well, but it’s not that helpful unless you can say something about what makes an explanation good. (It would be even more helpful if you could say something about how an AI could recognize a good explanation.)
I think the point you should be concentrating on is that human scientists do not face the same problem-situation as Solomonoff induction. We can choose what questions to work on, design experiments, and we’re also blessed/cursed with the lack of any stable boundary between ‘data’ and ‘theory’.
Turning into a simplified caricature of Popper appears to be a memetic hazard of reading Popper. Not as bad as reading Rand, though.
Would you care to explain in what way Deutsch is a simplified caricature of Popper or are you just going to be content to make assertions like the other commenter? I doubt very much that you have any idea what Popperians such as David Deutsch are like.
Despite the context, I didn’t have Deutsch specifically in mind. Just a general observation over the years.
The issue under discussion here is wider than Popper and appears elsewhere, in the disagreement between small-world and large-world Bayesians, the former being the side that I guess Popper would have been on. (It’s a very long time since I read Popper, and I do not recall if he ever says anything about Bayesian inference.) Must Bayesian inference of the sort that takes a parameterised model and a prior distribution on the parameters, and fits it to a data set (a process whose validity everyone except a few hard-core frequentists agree on) be subordinate to some other process when the data do not fit your model at all, for any parameters, and you need to find a different model? The small-worlders say yes, and the large-worlders say no. The small-worlders scoff at the large-worlders, the large-worlders exhibit the Solomonoff universal prior, and if the small-worlders are still paying attention, they usually just scoff some more. Sometimes they will point out that Solomonoff induction is uncomputable, and that computable approximations are exponentially unfeasible, relying as they do on merely enumerating all possible hypotheses in order of size. I haven’t seen a large-worlder response to that anywhere, not even here on LessWrong, where large-worldism is the default view.
But then the small worlders, asked just what it is that they put above Bayes, reply only with magic names, such as “human judgement”, “criticism”, or “argument”. All they are doing is describing what the process feels like, not what it is. In chapter 7 of The Fabric of Reality, Deutsch circles around and around the point, but the point never appears.
ETA: And they generally deny that the process can be elucidated any further. Popper himself says (as quoted elsewhere in these threads) that there is no method.
So for me, that is where things stand, and I am not convinced by either side.
Some further context for the above. I’ve been drafting a posting on this for a while, which this comment and that one are based on.
Deutsch does say what makes an explanation good. He has a TED talk about it, and a new book, The Beginning of Infinity (came out 2 days ago in the UK) which has this as a major theme. Good explanations are hard to change while they still solve the same problem. The book has examples and elaboration.
I have read both Popper and Deutsch. Could you explain your comment about Deutsch?
You say human scientists do not face the same problem situation as Solomonoff Induction. But both are trying to create knowledge right? In Solomonoff Induction it is assumed that all knowledge comes to us via sensory organs as data streams and the task of the knowledge creator is to compress that data with the aim of making good predictions. This, it is held, is in some sense what scientists and all people do when they create knowledge and it is what the ideal knowledge creator should do. Critical rationalism rejects the idea that all knowledge comes to us via the senses—that is empiricism—and it rejects the idea that theories are just instruments for making predictions—that is instrumentalism.
You seem to think that predictive success can come without underlying explanations, as though explanations are optional. They are not. We can’t just neglect explanations and think we can get on with the process of building an AI. That we cannot formalize our current knowledge about explanations in a nice piece of mathematics should not be a deterrent to trying to learn more.
I wonder what they think of the discussion of the Oracle in The Fabric of Reality, ch1.
That’s just not what you do. “Throwing out theories” is more strongly suggested by falsificationism. A Bayesian approach recognises that observations are uncertain and fallible. Observations inconsistent with a theory are strong negative evidence, but they don’t really “falsify” a theory.
Here is Yudkowsky on the topic:
Popper “missed” confirmation by rejecting induction. He didn’t get it, and now we know better.
Observations come second. Priors come first.
Critics might have a role to play for a resource-limited agent—for instance if they pointed out explanations that were short and were not yet receiving the proper consideration—or if they supplied more data.
It says to prefer the shorter one.
Yudkowsky hasn’t read Popper or wasn’t paying attention. Popper’s didn’t advocate that position; it’s a myth which Popper repeatedly denied. See e.g. “The Popper Legend” section in Popper’s replies to his critics in the Schilpp book.
Here is Popper’s denial:
That’s not right. Martin Gardner, explains why:
You haven’t understood which part is the myth I was talking about or read the source I gave.
You’ve now given a short statement of the conclusion of an argument Popper made in LScD (but not the argument itself, and also missing too much detail to even understand his point). It is a purely logical argument and unexceptionable. The Gardner passage doesn’t address it at all, nor make any argument, but merely asserts.
Please do your homework instead of just googling out of context snippets. You don’t know what the Popper legend is, nor what Popper’s argument for the quoted conclusion you pasted is.
Yudkowsky:
Popper:
That seems pretty cut and dried—so long as you understand the relationship between confirmation and induction. Popper asserts what Yudkowsky claims he says.
Also, Popper’s position is wrong. Few philosophers of science ever bought it in the first place—and now things have moved on, so this is merely of historical interest.
You have to actually read Popper’s books to understand what he means. You are taking short summaries of conclusions without understanding Popper’s arguments behind them.
For example, when Popper says “theory” there he does not mean any theory. He means a universal theory. This is the kind of thing one finds out by reading him.
Popper gave an argument in LScD along these lines:
Consider a theory, T, that all swans are white. T is a universal theory.
No confirming evidence can prove T is true. You can see 5 white swans or 500 or 50 million. Still might be false.
But if you see one black swan it is false.
This is an asymmetry between confirmation and falsification when applied to universal theories. It does not hold for all theories.
Consider the negation ~T. At least one swan is not white. This theory cannot be refuted by any amount of observations. But it can be confirmed with only one observation. ~T is a non-universal theory and not the kind science is after.
Is he wrong? This is pure logic. Popper in LScD was interest in scientific laws—that is, universal theories—and in that context he was unobjectionably correct about confirmation and falsification.
What you are doing is taking short quotes and imagining the context isn’t relevant, and that they only have one possible meaning. That is an unscholarly beginner’s mistake.
I’ve simplified various things here (for example Popper’s approach is not falsificationism; saying it is is a myth; linking to a page titled Falsifiability and calling it a refutation of Popper demonstrates your ignorance; and I ignored the duhem-quine problem which Popper did address from the start). And Popper had more and better arguments later. But you get the idea?
What if someone painted the swan black, or you went temporarily insane, or it was actually a black goose?
Sure, it was probably a black swan. But you can only ever be 99.99(lots of nines)9% positive it was a black swan. In this way, falsification is itself probabilistic. This becomes more important when we move on to more metaphorical swans that are all slightly different shades of grey.
That is the duhem-quine problem. Popper addresses it in his book as I said.
Part of the answer is: Popper is a fallibilist. Of course our knowledge isn’t certain.
However, uncertain does not mean probabilistic. Probabilistic is not the only option.
I wonder: how do you figure out the probability that you went temporarily insane?
And whatever you say, how do you figure out the probability that you were right about that? etc
I think assigning probabilities leads to a regress.
Wikipedia thinks the Duhem-Quine problem is something quite different. Ditto the Encyclopedia of Philosophy, which has a rather better explanation than wikipedia—it’s the idea that you can always evade falsification of one idea by altering background assumptions or special cases. Definitely different.
Uncertain does mean probabilistic, given the axioms that something cannot be more and less likely than something else at the same time, estimates of likeliness should not have large jumps on infinitesimal evidence, and that estimates of likeliness should not ignore information or make it up.
You might figure out the probability that you went temporarily insane by looking at how often other people with your mental health history go temporarily insane—I bet it’s pretty rare (though not never).
It does lead to a regress (at least until we get to the bottom, since we only have finite pieces of evidence to doubt), but since the series grows smaller rapidly and the sign of the effects alternate between positive and negative (i.e. when you doubt it’s minus, when you doubt the doubt it’s plus, when you doubt the doubt of the doubt it’s minus again) the series will always converge. Hardly a reason to give up.
I don’t want to argue terminology but the Duhem-Quine problem is about the problem of the fallibility of evidence, which is the problem you raised. Which is what Wikipedia says. You were talking about altering background assumptions our evidence relies on with e.g. the temporary insanity.
You raise the issue of giving up. But that’s not the alternative. It’s not this or nothing. Popper has an epistemology without a regress.
You say the signs keep flipping, and the doubts keep getting smaller. But do they really get smaller? If I were to play along and try to estimate the chance of temporary insanity I’d put it very low, say 0.00001%. And if I were to estimate the chance that I had that figure substantially incorrect, I’d put it very high, say 50%. And if I were to try to estimate the chance I was right about the 50% -- well beats me so I’ll have to say 5% or something. And the chance I was right about that 5%? Ugh. It’s low. I think I’m wrong by now.
Note: I don’t think those issues are matters of probability. Please don’t attribute that to me later. I was just trying to play along with your theory to discuss it.
I don’t think the sign flipping argument works either. The original probability depends on its probabilistic justification being correct, which depends on its probabilistic justification being correct, and so on. Break the chain and the whole thing falls apart. I’m guessing you have some extra premise about margins or error or something, so you expect if one thing is wrong the stuff it depends on only changes slightly. But what is the probability you are right about that? You face another regress on that issue. And whatever other arguments and premises and anything else you bring up, you’re simply going to face still more regresses as I question them.
EDIT PS: Thanks for the philosophy of encyclopedia link. It says Duhem-Quine was invented in the 1950s. Popper addressed the issue before that. I thought he might have invented it before them but wasn’t sure about the dates.
Eh, okay, we can leave aside semantics.
You seem to be generating more numbers without thinking what they mean for the original probability. “There are lots of numbers” is not a counterargument. What you might be looking towards is the idea that there is no stable probability, that as you consider more things it blows up, or dies, or oscillates. But because alternating decreasing series converge, the probability is in fact stable. So what do all these other numbers mean? They’re numbers related to how precise (not how accurate) your estimate of the probability is. They don’t change your estimate, but they change the precision of your estimate. If someone rolls a loaded die in front of me, I estimate a chance of 1⁄6 that he gets a 4, because I don’t know how the die is loaded. But the precision of my probability estimate is less than if it was an unloaded die.
The doubts of doubts are in fact smaller than the doubts, for the simple reason that all probabilities are less than one. Doubts are on the order of probability, while doubts of doubts are the probability of the doubt of doubt times the negative effect of the original doubt.
And again, “there are lots of numbers” is not necessarily a bad thing. The world has lots of numbers. A bad thing would be something like a contradiction, or an indeterminate point where reality should be, or an infinity where a finite number should be.
Duhem was an old french guy who was around before Popper. Quine was a more recent american guy who was around after Popper.
The regress doesn’t offer any probability at all because it never ends and you cannot analyze the whole thing which would require infinitely many steps. You imply it has a simple pattern but I don’t think it does as my example showed (where I tried to estimate probabilities successively) which you did not reply to.
If you could analyze the whole infinite regress, the probability it would offer that your first probability estimate was correct is infinitesimal because if you consider the odds of infinitely many probabilities—all below 1 -- you get an infinitesimal result. (If you have 99%, and then 99% of that, and so on, it keeps going down forever).
As I commented on, it does not alternate. Let me try again:
Theory T1 is the temporary insanity theory. You assign it 1%.
Theory T2 is the theory that your first probability assignment was correct. You assign that 90%.
Now, suppose we find out T2 is false. What happens? Does the probability of T1 go up or down?
We don’t know (given only the statements made so far; if you introduce new statements you could come up with an answer but they would themselves be subject to further questioning). So it isn’t true that the signs keep alternating and balance out. The answer is unknown, not stable. They don’t have signs at all.
The issue has nothing to do with “a lot of numbers”. Infinities are different than lots of numbers. They have special properties.
The probability estimate does not go up or down just because it’s imprecise. What gets plugged into the familiar formulas as a probability can be calculated as the expected probability over a distribution. So your T2 is in fact always false—in a continuous probability distribution there’s no one “right answer,” instead there’s an average answer. If we go with your Ts a bit more literally, the new probability is correctly given by Bayes’ rule, which is known and stable.
You seem to think the lack of a right answer is a problem, that if the odds that your first answer is correct are infinitesimal, that’s bad for this method of finding out knowledge (though I still feel like this is an odd thing to focus on, since we’re stuck with reality and can’t choose to pop over to another reality just because it has nice properties). But the trick is that although our estimates are wrong, they’re wrong in an unknown direction—they’re wrong as a consequence of having incomplete information, but they are the best fit to the information that we do have.
The point is that you can’t and don’t know the probability of anything.
Whenever you make a guess at a probability (e.g. 0.1% chance of temporary insanity yesterday), you have to wonder (in your epistemology): what is the probability that this guess (this probability estimate) is correct?
And whatever you guess about that, you have to wonder it again. And then again.
None of this has to do with the imprecision of knowledge. How do you know you are in the right ballpark? How do you know that you are within plus or minus 50%? You do not know. You can say you are, and you can give that a probability. But that itself could be false, its probability could be questioned.
So there is a regress.
At any step in the regress, if you are mistaken, the whole thing crumbles, all the way to the very first probability you assigned. They don’t crumble a little with the addition of minor error, they simply could be anything whatsoever and you don’t know.
Why? Well Suppose you think the probability you were correct about T4 is 99%. And let’s suppose even that you’re right about that. It doesn’t matter. But it’s a 1% case. It could be T40 just as well, or T4000, and the probability could be 99.999%. It doesn’t matter.
So, T4 says that the probability you are correct about T3′s probability estimate is 99%. Which is true, but you’re unlucky this time. T4 is false.
Where does that leave T3? It leaves it unknown. Your probability estimate for it is not changed but gone entirely. You haven’t got one and you don’t know what it is. And with no status for T3, no reliability, no justification, no nothing, then T2 is gone too. And so goes T1. And T0. The end.
T3 said you’re 99% confident that T2′s probability estimate is correct. T2 said you’re 99% confident that T1′s probability estimate is correct. You see how they all fall apart? Each one depends on the next.
There are various ways out. You can make a probability estimate, and when asked the probability you are right about that, answer 100% or refuse to answer. You can suggest we stop asking. You can accept some things which haven’t got a probability (but this contradicts your general method). Whatever. But there is no rational solution which makes everything work. I think you may have misunderstood the regress as having something to do with the probability of the initial theory at each step (so that it goes up and down, in smaller and smaller amounts, and with opposite signs at each step). But that’s not it. It’s more meta than that. Every step is to ask the probability of the previous probability estimate (not theory directly about the real world).
You say we’re stuck with reality. This is true. But it does not mean that your picture of reality is correct. Popper has an epistemology which is not broken, which has no regress. This one is broken. There’s no need to despair, only to change your mind. To let your theories die in your place, as Popper put it.
You’re repeating the same things again. Which means you probably didn’t understand what I said about probability estimates always being wrong. Meanwhile probability distributions can be exactly right, in the sense that they perfectly fit your current knowledge. You should go read a few books or take a class on probability. As a second book I would recommend E.T. Jaynes’ Probability Theory.
As for probability being a part of reality, remember what I said about uncertainty being probabilistic, if you use the axioms that something cannot be more and less likely than something else at the same time, estimates of likeliness should not have large jumps on infinitesimal evidence, and that estimates of likeliness should not ignore information or make it up (okay, fine, I just copied and pasted that from before)? Which of those axioms does Karl Popper reject?
You didn’t understand my point or address it. At all. You just gave up and stopped trying to engage with me. I was still trying. Communication isn’t trivially easy.
Your list of axioms don’t have anything to do with the regress argument I’ve been making, and they aren’t even close to sufficient to support your worldview (they don’t even say that we can ever can or should make a probability estimate about anything).
Because your point is in terms of the truth of specific probabilities, which are already always wrong, your point is ill-formed. T1=0, T2=1, the end. To do better you need to understand probability distributions.
If your first probability estimate is wrong, without any error bar—but simply wrong in an unknown way—then you’re screwed, right?
Edit: And what are you talking about with T2=1? It does not have a probability of 1. That sounds like your “signs flip” thing which I addressed already. I still think you are imagining a different regress than the one I was talking about.
Think of it this way—if it’s wrong in an utterly unknown way, then the wrongness has perfect symmetry; there’s nothing to distinguish being wrong one way from being wrong in another. By the axiom that you shouldn’t make up information, when the information is symmetric, that part of the distribution (“part” as in you convolve the different parts together to get the total distribution) should be symmetric too. And since the final probability estimate is just the average over your distribution, the symmetry makes the problem easy—or if the problem is poorly defined or poorly understood, it at the very least gives you error bars—it makes the answer somewhere between your current estimate and the maximum entropy estimate.
If you’re wrong in an unknown way, then it could just as well be 1% or 99%.
You might try to claim this averages to 50%. But theories don’t have uniform probability. There are more possible mistakes than truths. Almost all theories are mistaken. So when the probability is unknown, we have every reason to think it’s a mistake (if we’re just going to guess; we could of course use Popper’s epistemology instead which handles all this stuff), and there’s no justification for the theory. Right?
Your comments about error bars are subject to regresses (what is the probability you are right about that method? about the maximum entropy estimate? etc)
You don’t seem to be thinking with the concept of an probability distribution, or an average of one. You say “If you’re wrong in an unknown way, then it could just as well be 1% or 99%” as if it spells doom for any attempt to quantify probabilities. When really all it is is a symmetry property for a probability distribution.
I guess I shouldn’t be expected to give you a class in probability over the internet when you are already convinced it’s all wrong. But again, I think you should read a textbook on this stuff, or take a class.
Are you aware that Yudkowsky doesn’t dispute the regress? He has an article on it.
http://lesswrong.com/lw/s0/where_recursive_justification_hits_bottom/
If that’s what you’re using “the regress” to mean, sure, sign me up. But this has even less bearing than usual on whether uncertainty can be represented by probability, unless you are making the (unlikely and terrible) argument that nothing can be represented by anything.
You don’t get an infinite regress if you use a universal prior.
Actually you still do… You simply have to ask: what is the probability that the universal prior idea is correct? And whatever you say, ask the probability that is correct. And so on.
The regress works no matter what you say, even if you say something about universal priors.
“Correct” in what sense? In the actual agent using it, it’s a probability distribution over statements, not a statement itself! Do you mean “What is the probability that the universal prior has [certain property that we consider a reason to use it]?”
A probability distribution over statements is itself a statement (one states that the probability distribution is X). Maybe you use the word “statement” in a fancy way but I include anything.
And the property I was talking about is truth. But another one could be used.
No, because the universe has a state/law, not a probability distribution over states. A theory/universe/statement is either true or false, a probability distribution over theories is not, though it can be scored for accuracy in various ways. A probability distribution over theories is not a statement about the actual state of the universe.
Similarly, the universal prior is in no way “true”; it’s a distribution, not a statement at all. You shouldn’t even expect it to be “true” since it’s meant to be updated. What is important about it is that it has various nice properties such as eventually learning any computable distribution.
The first prior is where the regress bottoms out. Bayesian reasoning has to stop somewhere—and it stops at the first prior.
This area is known as “The problem of the priors”. For most agents it is no big deal—since they are rapidly swamped by evidence that overwhelms their priors, so there is little sensitivity to their exact values.
My bayesian reasoning finishes with a posterior. It starts at the first prior. I’m backwards like that.
So, you simply refuse to question the prior. Is this a matter of faith, or what? Why stop there?
More often a matter of birth. Agents usually start somewhere.
A few details about the process.
That brings us to the second part of the Yudkowsky quote that you criticised:
Above you agree that Popper did argue this. Also, it is a fact—Popper did argue for this difference:
Yet finding evidence against a theory is actually a probabilistic process—just like confirmation is. So, Yudkowsky is correct in saying that Popper was wrong about this. Popper really did believe and promote this kind of material.
Popper did not argue that that confirmation and falsification have fundamentally different rules. They both obey the rules of logic.
Confirmation cannot be any evidence for universal theories. None, probabilistic or otherwise. Popper explained this and did the math. If you disagree people provide the math that governs it and explain how it works.
As to the rest you’re asking how Popper deals with fallible evidence. If you would read his books you could find the answer. He does have an answer, not none, and it isn’t probabilistic.
Let me ask you: how do you deal with the regress I asked Manfred about?
Have you read http://lesswrong.com/lw/ih/absence_of_evidence_is_evidence_of_absence/ ?
It says “When we see evidence, hypotheses that assigned a higher likelihood to that evidence, gain probability at the expense of hypotheses that assigned a lower likelihood to the evidence.”
This does not work. There are infinitely many possible hypotheses which assign a 100% probability to any given piece of evidence. So we can’t get anywhere like this. The probability of each remains infinitesimal.
Actually, its possible to have infinitely many hypotheses, each assigned non-infinitesimal evidence. For example, I could assign probability 50% to the simplest hypothesis, 25% to the second simplest, 12.5% to the third simplest and so on down (I wouldn’t necessarily reccomend that exact assignment, its just the easiest example).
In general, all we need is a criterion of simplicity, such that there are only finitely many hypotheses simpler than any given hypothesis (Kolmogorov Complexity and Minimum Message Length both have this property) and an Occam’s razor style rule saying that simpler hypotheses get higher prior probabilities than more complex hypotheses. Solomonoff induction is a way of doing this.
It seems like people are presenting a moving target. First I was directed to one essay. In response to my criticism of a statement from that essay, you suggest that a different technique other than the one I quoted could work. Do you think I was right that the section of the essay I quoted doesn’t solve the problem?
I am aware that you can assign probabilities to infinite sets in the way you mention. This is beside the point. If you get the probabilities above infinitesimal by doing that it’s simply a different method than the one I was commenting on. The one I commented on, in which “hypotheses that assigned a higher likelihood to that evidence, gain probability” does not get them above infinitesimal or do anything very useful.
Some very general remarks:
You’re missing the point, which is that we need to act—we need to use the information we have as best we can in order to achieve ‘the greatest good’. (The question of what ‘the greatest good’ means is non-trivial but it’s orthogonal to present concerns.)
The agent chooses an action, and then depending on the state of world, the effects of the action are ‘good’ or ‘bad’. Here, the expression “the state of the world” incorporates both contingent facts about ‘how things are’ and the ‘natural laws’ describing how present causes have future effects.
Now, one very broad strategy for answering the question “what should we do?” is to try to break it down as follows:
We assign ‘weights’ p[i] to a wide variety of different ‘states of the world’, to represent the incomplete (but real) information we have thus far acquired.
For each such state, we calculate the effects that each of our actions a[j] would have, and assign ‘weights’ u[i,j] to the outcomes to represent how desirable we think they are.
We choose the action a[j] such that Sum(over i) p[i] * u[i,j] is maximized.
As a matter of terminology, we refer to the weights in step 1 as “probabilities” and those in step 2 as “utilities”.
Here’s an important question: “To what extent is the above procedure inevitable if we are to make rational decisions?”
The standard Lesswrong ideology here is that the above procedure (supplemented by Bayes’ theorem for updating ‘probability weights’) is absolutely central to ‘rationality’ - that any rational decision-maker must be following it, whether explicitly or implicitly.
It’s important to understand that Lesswrong’s discussions of rationality take place in the context of ‘thinking about how to design an artificial intelligence’. One of the great virtues of the Bayesian approach is that it’s clear what it would mean to implement it, and we can actually put it into practice on a wide variety of problems.
Anyway, if you want to challenge Bayesianism then you need to show how it makes irrational choices. It’s not sufficient to present a philosophical view under which assigning probabilities to theories is itself irrational, because that’s just a means to an end. What matters is whether an agent makes clever or stupid decisions, not how it gets there.
And now something more specific:
No-one but you ever assumed that the hypotheses would begin at infinitesimal probability. The idea that we need to “assign probabilities to infinite sets in the way [benelliot] mention[ed]” is so obvious and commonplace that you should assume it even if it’s not actually spelled out.
In your theory, do the probabilities of the infinitely many theories add up to 1?
Does increasing their probabilities ever change the ordering of theories which assigned the same probability to some evidence/event?
If all finite sets of evidence leave infinitely many theories unchanged in ordering, then would we basically be acting on the a priori conclusions built into our way of assigning the initial probabilities?
If we were, would that be rational, in your view?
And do you have anything to say about the regress problem?
The ‘moving target’ effect is caused by the fact that you are talking to several different people, the grandparent is my first comment in this discussion.
The concept mentioned in that essay is Bayes’ Theorem, which tells us how to update our probabilities on new evidence. It does not solve the problem of how to avoid infinitely many hypotheses for the same reason that Newton’s laws to not explain the price of gold in London, its not supposed to. Bayes theorem tells us how to change our probabilities with new evidence, and in the process assumes that those probabilities are real numbers (as opposed to infinitesimals).
Solomonoff induction tells us how to assign the initial probabilities, which are then fed into Bayes theorem to determine the current probabilities after adjusting based on the evidence. Both are essential, criticising BT for not doing SI’s job is like saying a car’s wheels are useless because they can’t do the engine’s job of providing power.
Does any of this deal with the infinite regress problem?
I’m sorry, what is the infinite regress problem?
I don’t see any infinite regress at all, Solomonoff Induction tells us the prior, Bayesian Updating turns the prior into a posterior. They depend on each other to work properly but I don’t think they depend on anything else (unless you wish to doubt the basics of probability theory).
The regress was discussed in other comments here. I took you to be saying “everything together, works” and wanting to discuss the philosophy as a whole.
I thought that would be more productive than arguing with you about whether Bayes theorem really “assumes that those probabilities are real numbers” and various other details. That’s certainly not what other people here told me when I brought up infinitesimals. I also thought it would be more productive than going back to the text I quoted and explaining why that quote doesn’t make sense. Whether it is correct or not isn’t very important if a better idea, along the same lines, works.
The regress argument begins like this: What is the justification or probability for Solomonoff Induction and Bayesian updating? Or if they are not justified, and do not have a probability, then why should we accept them in the first place?
When you say they don’t depend on anything else, maybe you are answering the regress issue by saying they are unquestionable foundations. Is that it?
Well, to some extent every system must have unquestionable foundations, even maths must assume the axioms. The principle of induction (the more something has happened in the past, the more likely it is to happen in the future, all else being equal), cannot be justified without the justification being circular, but I doubt you could get through a single day without it. Ultimately every approach must fall back on an infinite regress as you put it, this doesn’t prevent that system from working.
However, both Bayes’ Theorem and Solomonoff Induction can be justified:
Bayes’ Theorem is an elementary deductive consequence of basic probability theory, particular the fairly obvious (at least it seems that way to me) that P(A&B) = P(A)*P(B|A). If it doesn’t seem obvious to you, then I know of at least two approaches for proving it. One is the Cox theorems, which begin by saying we want to rank statements by their plausibility, and we want certain things to be true this ranking (it must obey the laws of logic, it must treat hypotheses consistently etc), and from these derive probability theory.
Another approach is the Dutch Book arguments, which show that if you are making bets based on your probability estimates of certain things being true, then unless your probability estimates obey Bayes Theorem you can be tricked into a set of bets which result in a guaranteed loss.
To justify Solomonoff Induction, we imagine a theoretical predictor which bases its prior on Solomonoff Induction and updates by Bayes Theorem. Given any other predictor, we can compare our predictor to this opponent by comparing the probability estimates they assign to the actual outcome, then Solomonoff induction will at worst lose by a constant factor based on the complexity of the opponent.
This is the best that can be demanded of any prior, it is impossible to give perfect predictions in every possible universe, since you can always be beaten by a predictor taylor-made for that universe (which will generally perform very badly in most others).
(note: I am not an expert, it is possible that I have some details wrong, please correct me if I do)
“Well, to some extent every system must have unquestionable foundations”
No, Popper’s epistemology does not have unquestionable foundations.
You doubt I could get by without induction, but I can and do. Popper’s epistemology has no induction. It also has no regress.
Arguing that there is no choice but these imperfect concepts only works if there really is no choice. But there are alternatives.
I think that things like unquestionable foundations, or an infinite regress, are flaws. I think we should reject flawed things when we have better options. And I think Bayesian Epistemology has these flaws. Am I going wrong somewhere?
“However, both Bayes’ Theorem and Solomonoff Induction can be justified”
Justified by statements which are themselves justified (which leads to regress issues)? Or you mean justified given some unquestionable foundations? In your statements below, I don’t think you specify precisely what you deem to be able to justify things.
“Bayes’ Theorem is an elementary deductive consequence of basic probability theory”
Yes. It is not controversial itself. What I’m not convinced of is the claim that this basic bit of math solves any major epistemological problem.
Regarding Solomonoff induction, I think you are now attempting to justify it by argument. But you haven’t stated what are the rules for what counts as a good argument and why. Could you specify that? There’s not enough rigor here. And in my understanding Bayesian epistemology aims for rigor and that is one of the reasons they like math and try to use math in their epistemology. It seems to me you are departing from that worldview and its methods.
Another aspect of the situation is you have focussed on prediction. That is instrumentalist. Epistemologies should be able to deal with all categories of knowledge, not just predictive knowledge. For example they should be able to deal with creating non-emprical, non-predictive moral knowledge. Can Solomonoff induction do that? How?
Hang on, Popper’s philosophy doesn’t depend on any foundations? I’m going to call shenanigans on this. Earlier you gave and example of Popperian inference:
Unquestioned assumptions include, but are not limited to the following:
The objects under discussion actually exist (Solomonoff Induction does not make this assumption)
“There is no evidence which could prove T” is stated without any proof, what if you got all the swans in one place, what if you found a reason why the existence of a black swan was impossibile?
Any observation of a black swan must be correct (Bayes Theorem is explicitly designed to avoid this assumption)
You can generalise from this one example to a point about all theories
“Science is only interested in universal theories”. Really? Are palaeontology and astronomy not sciences? They are both often concerned with specifics.
You must always begin with assumptions, if nothing else you must assume maths (which is pretty much the only thing that Bayes Theorem and Solomonoff Induction do assume).
To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.
Stating that you believe Bayes’ theorem but doubt that it can actually solve epistemic problems is like saying you believe Pythagoras’ theorem but doubt it can actually tell you the side lengths of right angled triangles, it demonstrates a failure to internalise.
Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.
Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. Any argument will be convincing to some possible minds and not to others, and I’m okay with that, because I only have one mind.
Allow me to direct you to my all time favourite philosopher
That “Popperian inference” is simply logic.
Deductive arguments have premises, as you say.
Popper’s philosophy itself is not a deductive argument which depends on the truth of its premises and which, given their truth, is logically indisputable.
We’re well aware of issues like the fallibility of evidence (you may think you see a black swan, but didn’t). Those do not contradict this logical point about a particular asymmetry.
“You must always begin with assumptions”
No you don’t have to. Popper’s approach begins with conjectures. None of them are assumed, they are simply conjectured.
Here’s an example. You claim this is an assumption:
“You can generalise from this one example to a point about all theories”
In a Popperian approach, that is not assumed. It is conjectured. It is then open to critical debate. Do you see something wrong with it? Do you have an argument against it? Conjectures can be refuted by criticism.
BTW Popper wasn’t “generalizing”. He was making a point about all theories (in particular categories) in the first place and then illustrating it second. “Generalizing” is a vague and problematic concept.
“Does Popper not argue for his own philosophy, or does he just state it and hope people will believe him?
You cannot set up rules for arguments which are not themselves backed up by argument. ”
He argues, but without setting up predefined, static rules for argument. The rules for argument are conjectured, criticized, modified. They are a work in progress.
Regarding the Hume quote, are you saying you’re a positivist or similar?
“Bayes’ theorem tells you how to adjust beliefs based on evidence, every time you adjust your beliefs you must use it, otherwise your map will not reflect the territory.”
Only probabilistic beliefs. I think it is only appropriate to use when you have actual numbers instead of simply having to assign them to everything involved by estimating.
“To be perfectly honest I care more about getting results in the real world than having some mythical perfect philosophy which can be justified to a rock.”
Mistakes have real world consequences. I think Popper’s epistemology works better in the real world. Everyone thinks their epistemology is more practical. How can we decide? By looking at whether they make sense, whether they are refuted by criticism, etc… If you have a practical criticism of Popperian epistemology you’re welcome to state it.
I agree with that.
How does this translate into illustrating whether either epistemology has “real world consequences”? Criticism and “sense making” are widespread, varied, and not always valuable.
I think what would be most helpful if you set up a hypothetical example and then proceeded to show how Popperian espistemology would lead to a success while a Bayesian approach would lead to a “real world consequence.” I think your question, “How can we decide?” was perfect, but I think your answer was incorrect.
Example: we want to know if liberalism or socialism is correct.
Popperian approach: consider what problem the ideas in question are intended to solve and whether they solve it. They should explain how they solve the problem; if they don’t, reject them. Criticize them. If a flaw is discovered, reject them. Conjecture new theories also to solve the problem. Criticize those too. Theories similar to rejected theories may be conjectured; and it’s important to do that if you think you see a way to not have the same flaw as before. Some more specific statements follow:
Liberalism offers us explanations such as: voluntary trade is mutually beneficial to everyone involved, and harms no one, so it should not be restricted. And: freedom is compatible with a society that makes progress because as people have new ideas they can try them out without the law having to be changed first. And: tolerance of people with different ideas is important because everyone with an improvement on existing customs will at first have a different idea which is unpopular.
Socialism offers explanations like, “People should get what they need, and give what they are able to” and “Central planning is more efficient than the chaos of free trade.”
Socialim’s explanations have been refuted by criticisms like Mises’s 1920 paper which explained that central planners have no rational way to plan (in short: because you need prices to do economic calculation). And “need” has been criticized, e.g. how do you determine what is a need? And the concept of what people are “able to give” is also problematic. Of course the full debate on this is very long.
Many criticisms of liberalism have been offered. Some were correct. Older theories of liberalism were rejected and new versions formulated. If we consider the best modern version, then there are currently no outstanding criticisms of it. It is not refuted, and it has no rivals with the same status. So we should (until this situation changes) accept and use liberalism.
New socialist ideas were also created many times in response to criticism. However, no one has been able to come up with coherent ideas which address all the criticisms and still reach the same conclusions (or anything even close).
Liberalism’s “justification” is merely this: it is the only theory we do not currently have a criticism of. A criticism is an explanation of what we think is a flaw or mistake. It’s a better idea to use a theory we don’t see anything wrong with than one we do. Or in other words: we should act on our best (fallible) knowledge that we have so far. In this way, the Popperian approach doesn’t really justify anything in the normal sense, and does without foundations.
Bayesian approach: Assign them probabilities (how?), try to find relevant evidence to update the probabilities (this depends on more assumptions), ignore that whenever you increase the probability of liberalism (say) you should also be increasing the probability of infinitely many other theories which made the same empirical assertions. Halt when—I don’t know. Make sure the evidence you update with doesn’t have any bias by—I don’t know, it sure can’t be a random sample of all possible evidence.
No doubt my Bayesian approach was unfair. Please correct it and add more specific details (e.g. what prior probability does liberalism have, what is some evidence to let us update that, what is the new probability, etc...)
PS is it just me or is it difficult to navigate long discussions and to find new nested posts? And I wasn’t able to find a way to get email notification of replies.
I’m beginning to see where the problem in this debate is coming from.
Bayesian humans don’t always assign actual probabilities, I almost never do. What we do in practice is vaguely similar to your Popperian approach.
The main difference is that we do thought experiments about Ideal Bayesians, strange beings with the power of logical omniscience (which gets them round the problem of Solomonoff Induction being uncomputable), and we see which types of reasoning might be convincing to them, and use this a standard for which types are legitimate.
Even this might in practice be questioned, if someone showed be a thought experiment in which an ideal Bayesian systematically arrived at worse beliefs than some competitor I might stop being a Bayesian. I can’t tell you what I would use as a standard in this case, since if I could predict that theory X would turn out to be better than Bayesianism I would already be an X theorist.
Popperian reasoning, on the other hand, appears to use human intuition as its standard. The conjectures he makes ultimately come from his own head, and inevitably they will be things that seem intuitively plausible to him. It is also his own head which does the job of evaluating which criticisms are plausible. He may bootstrap himself up into something that looks more rigorous, but ultimately if his intuitions are wrong he’s unlikely to recover from it. The intuitions may not be unquestioned but they might as well be for all the chance he has of getting away from their flaws.
Cognitive science tells us that our intuitions are often wrong. In extreme cases they contradict logic itself, in ways that we rarely notice. Thus they need to be improved upon, but to improve upon them we need a standard to judge them by, something where we can say “I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”. A good example of this is conjunction bias, conjunctions are often more plausible than disjunctions to human intuition, but they are always less likely to be correct, and we know this through probability theory.
So here’s how a human Bayesian might look, this approach only reflects the level of Bayesian strength I currently have, and can definitely be improved upon.
We wouldn’t think in terms of Liberalism and Socialism, both of them are package deals containing many different epistemic beliefs and prescriptive advice. Conjunction bias might fool you into thinking that one of them is probably right, but in fact both are astonishingly unlikely.
We hold off on proposing solutions (scientifically proven to lead to better solutions) and instead just discuss the problem. We clearly frame exactly what our values are in this situation, possibly in the form of a precisely delineated utility function and possibly not, so we know what we are trying to achieve.
We attempt to get our facts straight. Each fact is individually analysed, to see whether we have enough evidence to overcome its complexity. This process continues permanently, every statement is evaluated.
We then suggest policies which seem likely to satisfy our values, and calculate which one is likely to do so best.
I’m not sure there’s actually a difference between the two approaches, ultimately only arrived at Bayesian through my intuitions as well, so there is no difference at the foundations. Bayesianism is just Popperianism done better.
PS there is a little picture of an envelope next to your name and karma score in the right hand corner. It turns red when one of your comments has a reply. Click on it too see the most recent replies to your comments.
No. It does not have a fixed standard. Fixed standards are part of the justificationist attitude which is a mistake which leads to problems such as regress. Justification isn’t possible and the idea of seeking it must be dropped.
Instead, the standard should use our current knowledge (the starting point isn’t very important) and then change as people find mistakes in it (no matter what standard we use for now, we should expect it to have many mistakes to improve later).
Popperian epistemology has no standard for conjectures. The flexible, tentative standard is for criticism, not conjecture.
The “work”—the sorting of good ideas from bad—is all done by criticism and not by rules for how to create ideas in the first place.
You imply that people are parochial and biased and thus stuck. First, note the problems you bring up here are for all epistemologies to deal with. Having a standard you tell everyone to follow does not solve them. Second, people can explain their methods of criticism and theory evaluation to other people and get feedback. We aren’t alone in this. Third, some ways (e.g. having less bias) as a matter of fact work better than others, so people can get feedback from reality when they are doing it right, plus it makes their life better (incentive). More could be said. Tell me if you think it needs more (why?).
“I know this heuristic is a cognitive bias because it tells us Y when the correct answer is in fact X”
I think by “know” here you are referring to the justified, true belief theory of knowledge. And you are expecting that the authority or certainty of objective knowledge will defeat bias. This is a mistake. Like it or not, we cannot ever have knowledge of that type (e.g. b/c justification attempts lead to regress). What we can have is fallible, conjectural knowledge. This isn’t bad; it works fine; it doesn’t not devolve into everyone believing their bias.
Liberalism is not a package by accident. It is a collection of ideas around one theme. They are all related and fit together. They are less good in isolation—e.g. if you take away one idea you’ll find that now one of the other ideas has an unsolved and unaddressed problem. It is sometimes interesting to consider the ideas individually but to a significant extent they all are correct or incorrect as a group.
The way I’m seeing it is that most of the time you (and everyone else) does something roughly similar to what Popper said to. This isn’t a surprise b/c most people do learn stuff and that is the only method possible of creating any knowledge. But when you start using Bayesian philosophy more directly, by e.g. explicitly assigning and updating probabilities to try to settle non-probabilistic issues (like moral issues), then you start making mistakes. You say you don’t do that very often. OK. But there’s other more subtle ones. One is what Popper called The Myth of the Framework where you suggest that people with different frameworks (initial biases) will both be stuck on thinking that what seems right to them (now) is correct and won’t change. And you suggest the way past this is, basically, authoritative declarations where you put someone’s biases against Truth so he has no choice but to recant. This is a mistake!
PS wow that inbox page is helpful… :-)
To some extent our thought processes can certainly improve, however there is no guarantee of this, let me give an example to illustrate:
Alice is an inductive thinker, in general she believes that is something has happened often in the past it is more likely to happen in the future. She does treat this as an absolute, it is only probabilistic, and it does not work in certain specific situations (such as pulling beads out of a jar with 5 red and 5 blue beads), but she used induction to discover which situations those were. She is not particular worried that induction might be wrong, after all, it has almost always worked in the past.
Bob is an anti-inductive thinker, he believes that the more often something happens, the less likely it is to happen in the future. To him, the universe is like a giant bag of beads, and the more something happens the more depleted the universe’s supply of it becomes. He also concedes that anti-induction is merely probabilistic, and there are certain situations (the bag of beads example) were it has already worked a few times so he doesn’t think its very likely to work now. He isn’t particularly worried that he might be wrong, anti-induction has almost never worked for him before, so he must be set up for the winning streak of a lifetime.
Ultimately, neither will ever be convinced of the other’s viewpoint. If Alice conjectures anti-induction then she will immediately have a knock-down criticism, and vice versa for Bob and Induction. One of them has an irreversibly flawed starting point.
Like it or not, you, me, Popper and every other human is an Alice. If you don’t believe me, just ask which of the following criticisms seems more logically appealing to you:
“Socialism has never worked in the past, every socialist state has turned into a nightmarish tyranny, so this country shouldn’t become socialist”
“Liberalism has usually worked in the past, most liberal democracies are wealthy and have the highest standards of living in human history, so this country shouldn’t become liberal”
This might be correct, but there is a heavy burden of proof to show it. Liberalism and Socialism are two philosophies out of thousands (maybe millions) of possibilities. This means that you need huge amounts evidence to distinguish the two of them from the crowd and comparatively little evidence to distinguish one from the other.
That is a recipe for disaster. There are too many possible conjectures, we cannot consider them all, we need some way to prioritise some over others. If you do not specify a way then people will just do so according to personal preference.
As I see it, Popperian reasoning is pretty much the way humans reason naturally, and you only have to look at any modern political debate to see why that’s a problem.
Yes, there is no guarantee. One doesn’t need a guarantee for something to happen. And one can’t have guarantees about anything, ever. So the request for guarantees is itself a mistake.
The sketches you give of Bob and Alice are not like real people. They are simplified and superficial, and people like that could not function in day to day life. The situation with normal people is different. No everyday people have irreversibly flawed starting points.
The argument for this is not short and simple, but I can give it. First I’d like to get clear what it means, and why we would be discussing it. Would you agree that if my statement here is correct then Popper is substantially right about epistemology? Would you concede? If not, what would you make of it?
That is a misconception. One of its prominent advocates was Hume. We do not dispute things like this out of ignorance, out of never hearing it before. One of the many problems with it is that people can’t be like Alice because there is no method of induction—it is a myth that one could possibly do induction because induction doesn’t describe a procedure a person could do. Induction has no set of instructions to follow to offer.
That may sound strange to you. You may think it offers a procedure like:
1) gather data 2) generalize/extrapolate (induce) a conclusion from the data 3) the conclusion is probably right, with some exceptions
The problem is step 2 which does not how how to extrapolate a conclusion from a set of data. There are infinitely many conclusions consistent with any finite data set. So the entire procedure rests on having a method of choosing between them. All proposals made for this either don’t work or are vague. The one I would guess you favor is Occam’s Razor—pick the simplest one. This is both vague (what are the precise guidelines for deciding what is simpler) and wrong (under many interpretations. for example because it might reject all explanatory theories b/c omitting the explanation is simpler).
Another issue is how one thinks about things he has no past experience about. Induction does not answer that. Yet people do it.
I think they are both terrible arguments and they aren’t how I think about the issue.
The “burden of proof” concept is a justificationist mistake. Ideas cannot be proven (which violates fallibility) and they can’t be positively shown to be true. You are judging Popperian ideas by standards which Popper rejected which is a mistake.
But it works in practice. The reason it doesn’t turn into a disaster is people want to find the truth. They aren’t stopped from making a mess of things by authoritative rules but by their own choices because they have some understanding of what will and won’t work.
The authority based approach is a mistake in many ways. For example, authorities can themselves be mistaken and could impose disasters on people. And people don’t always listen to authority. We don’t need to try to force people to follow some authoritative theory to make them think properly, they need to understand the issues and do it voluntarily.
Personal preferences aren’t evil, and imposing what you deem the best preference as a replacement is an anti-liberal mistake.
No. Since Aristotle, justificationism has dominated philosophy and governs the unconscious assumptions people make in debates. They do not think like Popperians or understand Popper’s philosophy (except to the extent that some of their mental processes are capable of creating knowledge, and those have to be in line with the truth of the matter about what does create knowledge).
Since I’m not familiar with the whole of Popper’s position I’m noting going to accept it blindly. I’m also not even certain that he’s incompatible with Bayesianism.
Anyway, that fact that no human has a starting point as badly flawed as anti-induction doesn’t make Bayesianism invalid. It may well be that we are just very badly flawed, and can only get out of those flaws by taking the mathematically best approach to truth. This is Bayesianism, it has been proven in more than one way.
This is exactly we we need induction. It is usually possible to stick any future onto any past and get a consistent history, induction tells us that if we want a probable history we need to make the future and the past resemble each other.
People certainly say that. Most of them even believe it on a conscious level, but there in your average discussion there is a huge amount of other stuff going on, from signalling tribal loyalty to rationalising away unpleasant conclusions. You will not wander down the correct path by chance, you must use a map and navigate.
I have no further interest in talking with you if you resort to straw men like this. I am not proposing we set up a dictatorship and kill all non-Bayesians, nor am I advocating censorship of views opposed the correct Bayesian conclusion.
All I am saying is your mind was not designed to do philosophical reasoning. It was designed to chase antelope across the savannah, lob a spear in them, drag them back home to the tribe, and come up with an eloquent explanation for why you deserve a bigger share of the meat (this last bit got the lion’s share of the processing power).
Your brain is not well suited to abstract reasoning, it is a fortunate coincidence that you are capable of it at all. Hopefully, you are lucky enough to have a starting point which is not irreversibly flawed, and you may be able to self improve, but this should be in the direction of realising that you run on corrupt hardware, distrusting your own thoughts, and forcing them to follow rigorous rules. Which rules? The ones that have been mathematically proven to be the best seem like a good starting point.
(The above is not intended as a personal attack, it is equally true of everyone)
I did not say it makes Bayesianism invalid. I said it doesn’t make Popperism invalid or require epistemological pessimism. You were making myth of the framework arguments against Popper’s view. My comments on those were not intended to refute Bayesianism itself.
That is a mistake and Popper’s approach is superior.
Part 1: It is a mistake because the future does not resemble the past except in some vacuous senses. Why? Because stuff changes. For example an object in motion moves to a different place in the future. And human societies invent new technologies.
It is always the case that some things resemble the past and some don’t. And the guideline that “the future resembles the past” gives no guidance whatsoever in figuring out which are which.
Popper’s approach is to improve our knowledge piecemeal by criticizing mistakes. The primary criticisms of this approach are that is it is incapable of offering guarantees, authority, justification, a way to force people to go against their biases, etc.. These criticisms are mistaken: no viable theory offers what they want. Setting aside those objections—that Popper doesn’t meet standard too high for anything to meet—it works and is how we make progress.
Regarding people wanting to find the truth, indeed they don’t always. Sometimes they don’t learn. Telling them they should be Bayesians won’t change that either. What can change it is sorting out the mess of their psychology enough to figure out some advice they can use. BTW the basic problem you refer to is static memes, the theory of which David Deutsch explains in his new (Popperian) book The Beginning of Infinity.
Please calm down. I am trying my best to explain clearly. If I think that some of your ideas have nasty consequences that doesn’t mean I’m trying to insult you. It could be the case that some of your ideas actually do have nasty consequences of which you are unaware, and that by pointing out some of the ways your ideas relate to some ideas you consciously deem bad, you may learn better.
All justificationist epistemologies have connections to authority, and authority has nasty connections to politics. You hold a justificationist epistemology. When it comes down to it, justification generally consists of authority. And no amount of carefully deciding what is the right thing to set up as that authority changes that.
This connect to one of Popper’s political insights which is that most political theories focus on the problem “Who should rule?” (or: what policies should rule?). This question is a mistake which begs for an authoritarian answer. The right question is a fallibilist one: how can we set up political institutions that help us find and fix errors?
Getting back to epistemology, when you ask questions like, “What is the correct criterion for induction to use in step 2 to differentiate between the infinity of theories?” that is a bad question which begs for an authoritarian answer.
My mind is a universal knowledge creator. What design could be better? I agree with you that it wasn’t designed for this in the sense that evolution doesn’t have intentions, but I don’t regard that as relevant.
Evolutionary psychology contains mistakes. I think discussion of universality is a way to skip past most of them (when universality is accepted, they become pretty irrelevant).
I’d urge you to read The Beginning of Infinity by David Deutsch which refutes this. I can give the arguments but I think reading it would be more efficient and we have enough topics going already.
See! I told you the authoritarian attitude was there!
And there is no mathematical proof of Bayesian epistemology. Bayes’ theorem itself is a bit of math/logic which everyone accepts (including Popper of course). But Bayesian epistemology is an application of it to certain philosophical questions, which leaves the domain of math/logic, and there is no proof that application is correct.
I know. My comments weren’t either.
The object in motion moves according to the same laws in both the future and the past, in this sense the future resembles the past. You are right that the future does not resemble the past in all ways, but the ways in which it does themselves remain constant over time. Induction doesn’t apply in all cases but we can use induction to determine which cases it applies in and which it doesn’t. If this looks circular that’s because it is, but it works.
As far as Bayesianism is concerned this is a straw man. Most Bayesians don’t offer any guarantees in the sense of absolute certainty at all.
No Bayesian has ever proposed setting up some kind of Bayesian dictatorship. As far as I can tell the only governmental proposal based on Bayesianism thus far is Hanson’s futarchy, which could hardly be further from Authoritarianism.
You misunderstand me. What I meant was that as a Bayesian I force my own thoughts to follow certain rules. I don’t force other people to do so. You are arguing from a superficial resemblance. Maths follows rigorous, unbreakable rules, does this mean that all mathematicians are evil fascists?
Incorrect. E.T. Jaynes book Probability Theory: The Logic of Science gives a proof in the first two chapters.
You obviously haven’t read much of the heuristics and biases program. I can’t describe it all very quickly here but I’ll just give you a taster.
Subjects asked to rank statements about a woman called Jill in order of probability of being true ranked “Jill is a feminist and a bank teller” as more probable than “Jill is a bank teller” despite this being logically impossible.
U.N. diplomats, when asked to guess the probabilities of various international events occurring in the nest year gave a higher probability to “USSR invades Poland causing complete cessation of diplomatic activities between USA and USSR” than they did to “Complete cessation of diplomatic activities between USA and USSR”.
Subjects who are given a handful of evidence and arguments for both sides of some issue, and asked to weigh them up, will inevitably conclude that the weight of the evidence given is in favour of their side. Different subjects will interpret the same evidence to mean precisely opposite things.
Employers can have their decision about whether to hire someone changed by whether they held a warm coffee or a cold coke in the elevator prior to the meeting.
People can have their opinion on an issue like nuclear power changed by a single image of a smiley or frowny face, flashed to briefly for conscious attention.
People’s estimates of the number of countries in Africa can be changed simply by telling them a random number beforehand, even if it is explicitly stated that this number has nothing to do with the question.
Students asked to estimate a day by which they are 99% confident their project will be finished, go past this day more than half the time.
People are more like to move to a town if the town’s name and their name begin with the same letter.
There’s a lot more, most of which can’t easily be explained in bullet form. Suffice to say these are not irrelevant to thinking, they are disastrous. It takes constant effort to keep them back, because they are so insidious you will not notice when they are influencing you.
Replied here:
http://lesswrong.com/r/discussion/lw/54u/bayesian_epistemology_vs_popper/
Would you agree that this is a bit condescending and you’re basically assuming in advance that you know more than me?
I actually have read about it and disagree with it on purpose, not out of ignorance.
Does that interest you?
And on the other hand, do you know anything about universality? You made no comment about that. Given that I said the universality issue trumps the details you discuss in your bullet points, and you didn’t dispute that, I’m not quite sure why you are providing these details, other than perhaps a simple assumption that I had no idea what I was talking about and that my position can be ignored without reply because, once my deep ignorance is addressed, I’ll forget all about this Popperian nonsense..
Ordered but there’s an error in the library system and I’m not sure if it will actually come or not. I don’t suppose the proof is online anywhere (I can access major article databases), or that you could give it or an outline? BTW I wonder why the proof takes 2 chapters. Proofs are normally fairly short things. And, well, even if it was 100 pages of straight math I don’t see why you’d break it into separate chapters.
No I understood that. And that is authoritarian in regard to your own thoughts. It’s still a bad attitude even if you don’t do it to other people. When you force your thoughts to follow certain rules all the epistemological problems with authority and force will plague you (do you know what those are?).
Regarding Popper, you say you don’t agree with the common criticisms of him. OK. Great. So, what are your criticisms? You didn’t say.
If there was an epistemology that didn’t endorse circular arguments, would you prefer it over yours which does?
I apologise for this, but I really don’t see how anyone could go through those studies without losing all faith in human intuition.
The text can be found online. My browser (Chrome) wouldn’t open the files but you may have more luck.
Part of the reason for length is that probability theory has a number of axioms and he has to prove them all. The reason for the two chapter split is that the first chapter is about explaining what he wants to do, why he wants to do it, and laying out his desiderata. It also contains a few digressions in case the reader isn’t familiar with one or more of the prerequisites for understanding it (propositional logic for example). All of the actual maths is in the second chapter.
I agree to the explicit meaning of this statement but you are sneaking in connotations. Let us look more closely about what ‘authoritarian’ means.
You probably mean it in the sense of centralised as opposed to decentralized control, and in that sense I will bite the bullet and say that thinking should be authoritarian.
However, the word has a number of negative connotations. Corruption, lack of respect for human rights and massive bureaucracy that stifles innovation to name a few. None of those apply to my thinking process, so even though the term may be technically correct it is somewhat intellectually dishonest to use it, something more value-neutral like ‘centralized control’ might be better.
I will confess that I am not familiar with the whole of Popper’s viewpoint. I have never read anything written by him although after this conversation I am planning to.
Therefore I do not know whether or not I broadly agree or disagree with him. I did not come here to attack him, originally I was just responding to a criticism of yours that Bayesianism fails in a certain situation
To some extent I think the approach with conjectures and criticisms may be correct, at least as a description of how thinking must get off the ground. Can you be a Popperian and conjecture Bayesianism?
The point that I do disagree with is the proposed asymmetry between confirmation and falsification. In my view neither the black swan or the white swan proves anything with certainty, but both do provide some evidence. It happens in this case that one piece of evidence is very strong while the other is very weak, in fact they are pretty much at opposite extremes of the full spectrum of evidence encountered in the real world. This does not mean there is a difference of type.
All else being equal, yes. Other factors, such as real-world results might take precedence. I also doubt that any philosophy could manage without either circularity or assumptions, explicit or otherwise. As I see it when you start thinking you need something to begin your inference, logic derives truths form other truths, it cannot manufacture them out of a vacuum. So any philosophy has two choices:
Either, pick a few axioms, call them self evident and derive everything from them. This seems to work fairly well in pure maths, but not anywhere else. I suspect the difference lies in whether the axioms really are self evident or not.
Or, start out with some procedures for thinking. All claims are judged by these, including proposals to change the procedures for thinking. Thus the procedures may self-modify and will hopefully improve. This seems better to me, as long as the starting point passes a certain threshold of accuracy any errors are likely to get removed (the phrase used here is the Lens that Sees its Flaws). It is ultimately circular, since whatever the current procedures are they are justified only by themselves, but I can live with that.
Ideal Bayesians are of the former type, but they can afford to be as they are mathematically perfect beings who never make mistakes. Human Bayesians take the latter approach, which means in principle they might stop being Bayesians if they could see that for some reason it was wrong.
So I guess my answer is that if a position didn’t endorse circular arguments, I would be very worried that it is going down the unquestionable axioms route, even if it does not do so explicitly, so I would probably not prefer it.
Notice how it is only through the benefits of the second approach that I can even consider such a scenario.
I’m not trying to argue by connotation. It’s hard to avoid connotations and I think the words I’m using are accurate.
That’s not what I had in mind, but I do think that centralized control is a mistake.
I take fallibilism seriously: any idea may be wrong, and many are. Mistakes are common.
Consequently, it’s a bad idea to set something up to be in charge of your whole mind. It will have mistakes. And corrections to those mistakes which aren’t in charge will sometimes get disregarded.
Those 3 things are not what I had in mind. But I think the term is accurate. You yourself used the word “force”. Force is authoritarian. The reason for that is that the forcer is always claiming some kind of authority—I’m right, you’re wrong, and never mind further discussion, just obey.
You may find this statement strange. How can this concept apply to ideas within one mind? Doesn’t it only apply to disagreements between separate people?
But ideas are roughly autonomous portions of a mind (see: http://fallibleideas.com/ideas). They can conflict, they can force each other in the sense of one taking priority over another without the conflict being settled rationally.
Force is a fundamentally epistemological concept. Its political meanings are derivative. It is about non-truth-seeking ways of approaching disputes. It’s about not reaching agreement by one idea wins out anyway (by force).
Settling conflicts between the ideas in your mind by force is authoritarian. It is saying some ideas have authority/preference/priority/whatever, so they get their way. I reject this approach. If you don’t find a rational way to resolve a conflict between ideas, you should say you don’t know the answer, never pick a side b/c the ideas you deem the central controllers are on that side, and they have the authority to force other ideas to conform to them.
This is a big topic, and not so easy to explain. But it is important.
Force, in the sense of solving difficulties without argument, is not what I meant when I said I force my thoughts to follow certain rules. I don’t even see how that could work, my individual ideas do not argue with each-other, if they did I would speak to a psychiatrist.
I’m afraid you are going to have to explain in more detail.
They argue notionally. They are roughly autonomous, they have different substance/assertions/content, sometimes their content contradicts, and when you have two or more conflicting ideas you have to deal with that. You (sometimes) approach the conflict by what we might call an internal argument/debate. You think of arguments for all the sides (the substance/content of the conflict ideas), you try to think of a way to resolve the debate by figuring out the best answer, you criticize what you think may be mistakes in any of the ideas, you reject ideas you decide are mistaken, you assign probabilities to stuff and do math, perhaps, etc...
When things go well, you reach a conclusion you deem to be an improvement. It resolves the issue. Each of the ideas which is improved on notionally acknowledges this new idea is better, rather than still conflicting. For example, if one idea was to get pizza, and one was to get sushi, and both had the supporting idea that you can’t get both because it would cost too much, or take too long, or make you fat, then you could resolve the issue by figuring out how to do it quickly, cheaply and without getting fat (smaller portions). If you came up with a new idea that does all that, none of the previously conflicting ideas would have any criticism of it, no objection to it. The conflict is resolved.
Sometimes we don’t come up with a solution that resolves all the issues cleanly. This can be due to not trying, or because it’s hard, or whatever.
Then what?
Big topic, but what not to do is use force: arbitrarily decide which side wins (often based on some kind of authority or justification), and declare it the winner even though the substance of the other side is not addressed. Don’t force some of your ideas, which have substantive unaddressed points, to defer to the ideas you put in charge (granted authority).
I certainly don’t advocate deciding arbitrarily. The would fall into the fallacy of just making sh*t up which is the exact of everything Bayes stands for. However, I don’t have to be arbitrary, most of the ideas that run up against Bayes don’t have the same level of support. In general, I’ve found that a heuristic of “pick the idea that has a mathematical proof backing it up” seems to work fairly well.
There are also sometimes other clues, rationalisations tend to have a slightly different ‘feel’ to them if you introspect closely (in my experience at any rate), and when the ideas going up against Bayes seem to include a disproportionately high number of rationalisations, I start to notice a pattern.
I also disagree about ideas being autonomous. Ideas are entangled with each other in complex webs of mutual support and anti-support.
Did you read my link? Where did the argument about approximately autonomous ideas go wrong?
Well this changes the topic. But OK. How do you decide what has support? What is support and how does it differ from consistency?
I did. To see what is wrong with it let me give an analogy. Cars have both engines and tyres. It is possible to replace the tyres without replacing the engine. Thus you will find many cars with very different tyres but identical engines, and many different engines but identical tyres. This does not mean that tyres are autonomous and would work fine without engines.
Well, mathematical proofs are support, and they are not at all the same a consistency. In general however, if some random idea pops into my head, and I spot that it in fact it only occurred to me as a result of conjunction bias I am not going to say “well, it would be unfair of me to reject this just because it contradicts probability theory, so I must reject both it and probability theory until I can find a superior compromise position”. Frankly, that would be stupid.
@autonomous—you know we said “approximately autonomous” right? And that, for various purposes, tires are approximately autonomous, which means things like they can be replaced individually without touching the engine or knowing what type of engine it is. And a tire could be taken off one car and put on another.
No one was saying it’d function in isolation. Just like a person being autonomous doesn’t mean they would do well in isolation (e.g. in deep space). Just because people do need to be in appropriate environments to function doesn’t make “people are approximately autonomous” meaningless or false.
First,l you have not answered my question. What is support? The general purpose definition. I want you to specify how it is determined if X supports Y, and also what that means (why should we care? what good is “support”?).
Second, let’s be more precise. If a person writes what he thinks to be a proof, what is supported? What he thinks is the conclusion of what he thinks is a proof, and nothing else? An infinite set of things which have wildly different properties? Something else?
You argue from ideas being approximately autonomous to the fact that words like ‘authoritarian’ apply to them, and that the approximately debate, but this is not true in the car analogy. Is it ‘authoritarian’ that the brakes, accelerator and steering wheel have total control of the car, while the tyres and engine get no say, or is it just efficient?
I didn’t give a loose argument by analogy. You’re attacking a simplified straw man. I explained stuff at some length and you haven’t engaged here with all of what I said. e.g. your comments on “authoritarian” here do not mention or discuss anything I said about that. You also don’t mention force.
I haven’t got any faith in human intuition. That’s not what I said.
OK fair enough.
Oh the book is here: http://bayes.wustl.edu/etj/prob/book.pdf
That was easy.
I don’t know the etiquette or format of this website well or how it works. When I have comments on the book, would it make sense to start a new thread or post somewhere/somehow?
You can conjecture Bayes’ theorem. You can also conjecture all the rest, however some things (such as induction, justificationism, foundationalism) contradict Popper’s epistemology. So at least one of them has a mistake to fix. Fixing that may or may not lead to drastic changes, abandonment of the main ideas, etc
That is a purely logical point Popper used to criticize some mistaken ideas. Are you disputing the logic? If you’re merely disputing the premises, it doesn’t really matter because its purpose is to criticize people who use those premises on their own terms.
Agreed.
I think you are claiming that seeing a white swan is positive support for the assertion that all swans are white. (If not, please clarify). If so, this gets into important issues. Popper disputed the idea of positive support. The criticism of the concept begins by considering: what is support? And in particular, what is the difference between “X supports Y” and “X is consistent with Y”?
Questioning this was one of Popper’s insights. The reason most people doubt it is possible is because, since Aristotle, pretty much all epistemology has taken this for granted. These ideas seeped into our culture and became common sense.
What’s weird about the situation is that most people are so attached to them that they are willing to accept circular arguments, arbitrary foundations, or other things like that. Those are OK! But that Popper might have a point is hard to swallow. I find circular arguments rather more doubtful than doing without what Popperians refer to broadly as “justification”. I think it’s amazing that people run into circularity or other similar problems and still don’t want to rethink all their premises. (No offense intended. Everyone has biases, and if we try to overcome them we can become less wrong about some matters, and stating guesses at what might be biases can help with that.)
All the circularity and foundations stem from seeking to justify ideas. To show they are correct. Popper’s epistemology is different: ideas never have any positive support, confirmation, verification, justification, high probability, etc… So how do we act? How do we decide which idea is better than the others? We can differentiate ideas by criticism. When we see a mistake in an idea, we criticize it (criticism = explaining a mistake/flaw). That refutes the idea. We should act on or use non-refuted ideas in preference over refuted ideas.
That’s the very short outline, but does that make any sense?
Fully agreed. In principle, if Popper’s epistemology is of the second, self-modifying type, there would be nothing wrong with drastic changes. One could argue that something like that is exactly how I arrived at my current beliefs, I wasn’t born a Bayesian.
I can also see some ways to make induction and foundationalism easer to swallow.
A discussion post sounds about right for this, if enough people like it you might consider moving it to the main site.
This is precisely what I am saying.
The beauty of Bayes is how it answers these questions. To distinguish between the two statements we express them each in terms of probabilities.
“X is consistent with Y” is not really a Bayesian way of putting things, I can see two ways of interpreting it. One is as P(X&Y) > 0, meaning it is at least theoretically possible that both X and Y are true. The other is that P(X|Y) is reasonably large, i.e. that X is plausible if we assume Y.
“X supports Y” means P(Y|X) > P(Y), X supports Y if and only if Y becomes more plausible when we learn of X. Bayes tells us that this is equivalent to P(X|Y) > P(X), i.e. if Y would suggest that X is more likely that we might think otherwise then X is support of Y.
Suppose we make X the statement “the first swan I see today is white” and Y the statement “all swans are white”. P(X|Y) is very close to 1, P(X|~Y) is less than 1 so P(X|Y) > P(X), so seeing a white swan offers support for the view that all swans are white. Very, very weak support, but support nonetheless.
(The above is not meant to be condescending, I apologise if you know all of it already).
This is a very tough bullet to bite.
One thing I don’t like about this is the whole ‘one strike and you’re out’ feel of it. It’s very boolean, the real world isn’t usually so crisp. Even a correct theory will sometimes have some evidence pointing against it, and in policy debates almost every suggestion will have some kind of downside.
There is also the worry that there could be more than one non-refuted idea, which makes it a bit difficult to make decisions. Bayesianism, on the other hand, when combined with expected utility theory, is perfect for making decisions.
When replying it said “comment too long” so I posted my reply here:
http://lesswrong.com/r/discussion/lw/552/reply_to_benelliott_about_popper_issues/
Step 1 is problematic also, as I explained in some of my comments to Tim Tyler. What should I gather data about? What kind of data? What measurements are important? How accurate? And so on.
Yes I agree. Another issue I mentioned in one of my comments here is that your data isn’t a random sample of all possible data, so what do you do about bias? (I mean bias in the data, not bias in the person.)
Step 3 is also problematic (as it explicitly acknowledges).
Finding it difficult also.
Have you found: http://lesswrong.com/message/inbox/
I don’t think I have the grasp on these subjects to hang in this, but this is great. -- I hope someone else comments in a more detailed manner.
In Popperian analysis, who ends the discussion of “what’s better?” You seem to have alluded to it being “whatever has no criticisms.” Is that accurate?
Why would Bayesian epistemology not be able to use the same evidence that Popperians used (e.g. the 1920 paper) and thus not require “assumptions” for new evidence? My rookie statement would be that the Bayesian has access to all the same kinds of evidence and tools that the Popperian approach does, as well as a reliable method for estimating probability outcomes.
Could you also clarify the difference between “conjecture” and “assumption.” Is it just that you’re saying that a conjecture is just a starting point for departure, whereas an assumption is assumed to be true?
An assumption seems both 1) justified if it has supporting evidence to make it highly likely as true to the best of our knowledge and 2) able to be just as “revisable” given counter-evidence as a “conjecture.”
Are you thinking that a Bayesian “assumption” is set in stone or that it could not be updated/modified if new evidence came along?
Lastly, what are “conjectures” based on? Are they random? If not, it would seem that they must be supported by at least some kind of assumptions to even have a reason for being conjectured in the first place. I think of them as “best guesses” and don’t see that as wildly different from the assumptions needed to get off the ground in any other analysis method.
Yes, “no criticisms” is accurate. There are issues of what to do when you have a number of theories remaining which isn’t exactly one which I didn’t go into.
It’s not a matter of “who”—learning is a cooperative thing and people can use their own individual judgment. In a free society it’s OK if they don’t agree (for now—there’s always hope for later) about almost all topics.
I don’t regard the 1920 paper as evidence. It contains explanations and arguments. By “evidence” I normally mean “empirical evidence”—i.e. observation data. Is that not what you guys mean? There is some relevant evidence for liberalism vs socialism (e.g. the USSR’s empirical failure) but I don’t regard this evidence as crucial, and I don’t think that if you were to rely only on it that would work well (e.g. people could say the USSR did it wrong and if they did something a bit different, which has never been tried, then it would work. And the evidence could not refute that.)
BTW in the Popperian approach, the role of evidence is purely in criticism (and inspiration for ideas, which has no formal rules or anything). This is in contrast to inductive approaches (in general) which attempt to positively support/confirm/whatever theories with the weight of evidence.
If the Bayesian approach uses arguments as a type of evidence, and updates probabilities accordingly, how is that done? How is it decided which arguments win, and how much they win by? One aspect of the criticism approach is theories do not have probabilities but only two statuses: they are refuted or non-refuted. There’s never an issue of judging how strong an argument is (how do you do that?).
If you try to follow along with the Popperian approach too closely (to claim to have all the same tools) one objection will be that I don’t see Bayesian literature acknowledging Popper’s tools as valuable, talking about how to use them, etc… I will suspect that you aren’t in line with the Bayesian tradition. You might be improving it, but good luck convincing e.g. Yudkowsky of that.
The difference between a conjecture and an assumption is just as you say: conjectures aren’t assumed true but are open to criticism and debate.
I think the word “assumption” means not revisable (normally assumptions are made in a particular context, e.g. you assume X for the purposes of a particular debate which means you don’t question it. But you could have a different debate later and question it.). But I didn’t think Bayesianism made any assumptions except for its foundational ones. I don’t mind if you want to use the word a different way.
Regarding justification by supporting evidence, that is a very problematic concept which Popper criticized. The starting place of the criticism is to ask what “support” means. And in particular, what is the difference between support and mere consistency (non-contradiction)?
Conjectures are not based on anything and not supported. They are whatever you care to imagine. It’s good to have reasons for conjectures but there are no rules about what the reasons should be, and conjectures are never rejected because of the reason they were conjectured (nor because of the source of the conjecture), only because of criticisms of their substance. If someone makes too many poor conjectures and annoys people, it’s possible to criticize his methodology in order to help him. Popperian epistemology does not have any built-in guidelines for conjecturing on which it depends; they can be changed and violated as people see fit. I would rather call them “guesses” than “best guesses” because it’s often a good idea for one person to make several conjectures, including ones he suspects are mistaken, in order to learn more about them. It should not be each person puts forward his best theory and they face off, but everyone puts forward all the theories he thinks may be interesting and then everyone cooperates in criticizing all of them.
Edit: BTW I use the words “theory” and “idea” interchangeably. I do not mean by “theory” ideas with a certain amount of status/justification. I think “idea” is the better word but I frequently forget to use it (because Popper and Deutsch say “theory” all the time and I got used to it).
So, you weight them by their simplicity, to formally implement Occam’s razor:
So we have infinitely many theories, infinitely many of which are dead wrong, and only one of which is true, and we just use the shortest one and hope? And that’s supposed to be a good idea?
You usually weight them by their simplicity, if you want a probabalistic forecast. This is Occam’s razor. Picking the shortest one is not an unreasonable way to get a specific prediction.
Here is Hutter on how good an idea it is:
What do you mean by `get anywhere’? I can update my probability estimates and use the new estimates to make decisions perfectly well.
What does this have to do with whether confirmation can be used as evidence?
Infinitely many hypotheses increase in probability. What good is that? You have infinite possibilities before you and haven’t made progress towards picking between them.
When you say “this infinite set over here, its probability increases” you aren’t reaching an answer. You aren’t even getting any further than pure deduction would have gotten you.
Look, there’s two infinite sets: those contradicted by the evidence, and those not (deal with theories with “maybes” in them however you like, it does not matter to my point). The first set we don’t care about—we all agree to reject it. The second set is all that’s left to consider. if you increase the probability of every theory in it that doesn’t help you choose between them. it’s not useful. when you “confirm” or increase the probability of every theory logically consistent with the data, you aren’t reaching an answer, you aren’t making progress.
The progress is in the theories that are ruled out. When playing cards, you could consider all possible histories of the motions of the cards that are compatible with the evidence. Would you have any problem with making bets based on these probabilities? Solomonoff induction is very similar. While there are an infinite number of possibilities, both cases involve proving general properties of the distribution rather than considering each possibility individually.
In the future please capitalize your sentences; it improves readability (especially in large paragraphs).
“The progress is in the theories that are ruled out.”
This is purely a matter of deduction, right? Bayes’ theorem doesn’t come into it.
One doesn’t have to be a Bayesian to rule out theories contradicted by the evidence.
Further, there are always infinitely many theories that aren’t ruled out. This is the hard part of epistemology. How do you deal with those?
If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.
The Solomonoff prior is really just a from of the principle of insufficient reason, which states that if there is no reason to think that one thing is more probable than another, they should be assigned the same probability. Since there are an infinite number of theories, we need to take some kind of limit. If we encode them as self-delimiting computer programs, we can write them as strings of digits (usually binary). Start with some maximum length and increase it toward infinity. Some programs will proceed normally, looping infinitely or encountering a stop instruction, making many programs equivalent because changing bits that are never used by the hypothesis does not change the theory. Other programs will leave the bounds of the maximum length, but this will be fixed as that length is taken to infinity.
This obviously isn’t a complete justification, but it is better than Popperian induction. Both rule out falsified theories and both penalize theories for unfalsifiability and complexity. Only Solomonoff induction allows us to quantify the size of these penalties in terms of probability. Popper would agree that a simpler theory, being compared to a more complex one, is more likely but not guaranteed to be true, but he could not give the numbers.
If you are still worried about the foundational issues of the Solomonoff prior, I’ll answer your questions, but it would be better if you asked me again in however long progress takes (that was supposed to sound humourous, as if I were describing a specific, known amount of time, but I really doubt that that is noticable in text). http://lesswrong.com/r/discussion/lw/534/where_does_uncertainty_come_from/ writes up some of the questions I’m thinking about now. It’s not by me, but Paul seems to wonder about the same issues. This should all be significantly more solid once some of these questions are answered.
“If we ignore theories with ‘maybes’, which don’t really matter because one theory that predicts two possibilities can easily be split into two theories, weighted by the probabilities assigned by the first theory, then Bayes’ theorem simplifies to ‘eliminate the theories contradicted by the evidence and rescale the others so the probabilities sum to 1’, which is a wonderful way to think about it intuitively. That and a prior is really all there is.”
That’s it? That is trivial, and doesn’t solve the major problems in epistemology. It’s correct enough (I’m not convinced theories have probabilities, but I think that’s a side issue) but it doesn’t get you very far. Any old non-Bayesian epistemology could tell you this.
Epistemology has harder problems than figuring out that you should reject things contradicted by evidence. For example, what do you do about the remaining possibilities?
I think with Solomonoff what you are doing is ordering all theories (by length) and saying the ones earlier in the ordering are better. This ordering has nothing empirical about it. Your approach here is not based on evidences or probabilities, just an ordering. Correct me if I got that wrong. That raises the question: why is the Solomonoff ordering correct? Why not some other ordering? Here’s one objection: “God did everything” is a short theory which is compatible with all evidence. You can make separate versions of it for all possible sets of predictions if you want. Doesn’t that mean we’re either stuck with some kind of “God did everything” or the final truth is even shorter?
You mention “Popperian induction”. Please don’t speak for Popper. The idea that Popper advocated induction is a myth. A rather crass one; he refuted induction and published a lot of material against it. Instead, ask me about his positions, OK? Popper would not agree that the simpler theory is “more likely”. There’s many issues here. One is that Popper said we should prefer low probability theories because they say more about the world.
You seem to present “Popperian induction” as an incomplete justification. Maybe you are unware that Popper’s epistemology rejects the concept of justification itself. It is thus a mistake to criticize it on justificationist grounds. It isn’t any type of justification and doesn’t want to be.
In order to quote people, you can use a single greater than sign ‘>’ at the beginning of a line.
Note I said that and a prior. The important concept here is that we must always assign probabilities to all theories, because otherwise we would have no way to act. From Wikipedia: ‘Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures’, where a statistical procedure may be taken as a guide for optimal action.
Sorry about saying ‘Popperian induction’. I only have a basic knowledge of Popper. Would Popper say that predicting the results of actions is (one of) the goals of science? This is, of course, slightly more general than induction.
Wikipedia quotes Popper as saying simpler theories are to be preferred ‘because their empirical content is greater; and because they are better testable’. Does this mean that he would bet something important on this? If there were two possible explanations for a plague and if the simpler one were true than, with medicine, we could save 100 lives but if the more complex one were true we could save 200 lives, how would you decide which cure the factories should manufacture (and it takes a long time to prepare the factories or something so you can only make one type of cure).
It is exactly not about this. The reason to prefer simpler theories is that more possible universes correspond to them. For a simple universe, axioms 1 through 10 have to come out the right way, but the rest can be anything, as they are meaningless since the universe is already fully specified. For a more complex theory, axioms 11-15 must also turn out a certain way, so fewer possible universe are compatible with this theory. I would also add the principle of sufficient reason, which I think is likely, as further justification for Occam’s razor, but that is irrelevant here.
This seems wrong. Should I play the lottery because the low-probability theory that I will win is preferred to the high-probability theory that I will lose?
Popperian epistemology doesn’t assign probabilities like that, and has a way to act. So would you agree that, if you fail to refute Popperian epistemology, then one of your major claims is wrong? Or do you have a backup argument: you don’t have to, but you should anyway because..?
Prediction is a goal of science, but it is not the primary one. The primary goal is explanation/understanding.
Secondary sources about Popper, like wikipedia, are not trustworthy. Popper would not bet anything important on that simpler theories thing. That fragment is misleading because Popper means “preferred” in a methodological sense, not considered to have a higher probability of being true, or considered more justified. It’s not a preference about which theory is actually, in fact, better.
The way to make decisions is by making conjectures about what to do, and criticizing those conjectures. We learn by critical, imaginative argument (including within one mind). Explanations should be given for why each possibility is a good idea; the hypothetical you give doesn’t have enough details to actually reach an answer.
About Solomonoff, if I understand you correctly now you are starting with theories which don’t say much (that isn’t what I expected simpler or shorter to mean). So at any point Solomonoff induction will basically be saying the minimal theory to account for the data and specify nothing else at all. Is that right? If that is the case, then it doesn’t deal with choosing between the various possibilities which are all compatible with the data (except in so far as it tells you to choose the least ambitious) and can make no predictions: it simply leaves everything we don’t already know unspecified. Have I misunderstood again?
I thought the theories were supposed to specify everything (not, as you say, “the rest can be anything”) so that predictions could be made.
I’m not totally sure what your concept of a universe or axiom is here. Also I note that the real world is pretty complicated.
No, he means they are more important and more interesting. His point is basically that a theory which says nothing has a 100% prior probability. Quantum Mechanics has a very low prior probability. The theories worth investigating, and which turn out important in science, all had low prior probabilities (prior probability meaning something like: of all logically possible worlds, for what proportion is it true?) They have what Popper called high “content” because they exclude many possibilities. That is a good trait. But it’s certainly not a guarantee that arbitrary theories excluding stuff will be correct.
My first wikipedia quote (Every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.) was somewhat technical, but it basically meant that any consistent set of actions is either describable in terms of probabilities or nonconsequentialist. How would you choose the best action in a Popperian framework? Would you be forced to consider aspects of a choice other than its consequences? Otherwise, your choices must be describable using terms of a prior probability and Bayesian updating (and, while we already agree that the latter is obvious, here we are using it to update a set of probabilities and, on the pain of inconsistency, our new probabilities must have that relationship to our old ones).
Definitely use all the evidence when making decisions. I didn’t mean for my example to be complete. I was wondering how a question like that could be addressed in general. What pieces of information would be important and how would they be taken into account? You can assume that the less relevant variables, like which disease is more painful, are equal in both cases.
I may have been unclear here. I meant prediction in a very broad sense, including, eg., predicting which experiments will be best at refining our knowledge and predicting which technologies will best improve the world. Was it clear that I meant that? If you seek understanding beyond this, you are allowed but, at least for the present era, I only care about an epistemology if it can help me make world a better place.
No, not at all. The more likely theories are those that include small amounts of theory, not small amounts of prediction. Eliezer discusses this in the sequences here, here, andhere. Those don’t really cover Solomonoff induction directly, but they will probably give you a better idea of what I’m trying to say than I did. I think Solomonoff induction is better covered in another post, but I can’t find it right now.
Sorry, I was abusing one word ‘theories’ to mean both ‘individual descriptions of the universe’ and ‘sets of descriptions that make identical predictions in some realm (possibly in all realms)‘. It is a very natural place to slip definitions, because, for example, when discussing biology, we often don’t care about the distinction between ‘Classical physics is true and birds are descended from dinosaurs.’ and ‘Quantum physics is true and birds are descended from dinosaurs.’ Once enough information is specified to make predictions, a theory (in the second sense) is on equal ground with another theory that contains the same amount of information and that makes different predictions only in realms where it has not been tested, as well as with a set of theories for which the set can be specified with the same amount of information but for which specifying one theory out of the set would take more information.
I’m not sure how one would act based on this. Should one conduct new experiments differently given this knowledge of which theories are preferred? Should one write papers about how awesome the theory is?
All of this is present is Bayesian epistemology.
Consider Bayes theorem, with theories A and B and evidence E:
P(A|E) = P(E|A) P(A) / P(E)
Let’s look at how the probability of a theory increases upon learning E, using a ratio.
P(A|E) / P(A) = P(E|A) / P(E)
P(B|E) / P(B) = P(E|B) / P(E)
Which one increases by a larger ratio?
[P(A|E) / P(A)] / [P(B|E) / P(B)] = [P(E|A) / P(E)] / [P(E|B) / P(E)] = P(E|A) / P(E|B)
The greater P(E|A) is compared to P(E|B), the more A benefits compared to B. This means that the more theory A narrowly predicts E, the actual observation, to the exclusion of other possible observations, the more probability we assign to it. This is a quantitative form of Popper’s preference for more specific and more easily falsifiable theories, as proven by Bayes theorem.
That’s basically what Solomonoff means by prior probability.
Yes Popper is non-consequentialist.
Consequentialism is a bad theory. It says ideas should be evaluated by their consequences (only). This does not address the question of how to determine what are good or bad consequences.
If you try to evaluate methods of determining what are good or bad consequences, by their consequences, you’ll end up with serious regress problems. If you don’t, you’ll have to introduce something other than consequences.
You may want to be a little more careful with how you formulate this. Saying that a good idea is one that has good consequences, and a bad idea is one that has bad consequences, doesn’t invite regress… it may be that you have a different mechanism for evaluating whether a consequence is good/bad than you do for evaluating whether an idea is good/bad.
For example, I might assert that a consequence is good if it makes me happy, and bad if it makes me unhappy. (I don’t in fact assert this.) I would then conclude that an idea is good if its consequences make me happy, and bad if its consequences make me unhappy. No regress involved.
(And note that this is different from saying that an idea is good if the idea makes me happy. If it turns out that the idea “I could drink drain cleaner” makes me happy, but that actually drinking drain cleaner makes me unhappy, then it’s a bad idea by the first theory but a good idea by the second theory.)
A certain amount of precision is helpful when thinking about these sorts of things.
If you reread the sentence in which I discuss a regress, you will notice it begins with “if” and says that a certain method would result in a regress, the point being you have to do something else. So it was your mistake.
That is not what I meant by consequentialism, and I agree that that theory entails an infinite regress. The theory I was referring to, which is the first google result for consequentialism, states that actions should be judged by their consequences.
That theory is bad too. For one thing, you might do something really dumb—say, shoot at a cop—and the consequence might be something good, e.g. you might accidentally hit the robber behind him who was about to kill him. you might end up declared a hero.
For another thing, “judge by consequences” does not answer the question of what are good or bad consequences. It tells us almost nothing. The only content is don’t judge by anything else. Why not? Beats me.
If you mean judge by rationally expected consequences, or something like that, you could drop the first objection but I still don’t see the use of it. If you merely want to exclude mysticism I think we can do that with a lighter restriction.
Sorry, I didn’t explain this very well. I don’t use consequentialism to judge people, I use it to judge possible courses of action. I (try to) make choices with the best consequences, this fully determines actions, so judgments of, for example, who is a bad person, do not add anything.
You are right that this is very broad. My point is that all consequentialist decision rules are either Bayesian decision rules or limits of Bayesian decision rules, according to a theorem.
I didn’t discuss who is a bad person. An action might be bad but have a good result (this time) by chance. And you haven’t said a word about what kinds of consequences of actions are good or bad … I mean desirable or undesirable. And you haven’t said why everything but consequences is inadmissible.
In your example of someone shooting a police officer, I would say that it is good that the police officer’s life was saved, but it is bad that there is a person who would shoot people so irresponsibly and I would not declare that person a hero as that will neither help save more police officers or reduce the number of people shooting recklessly; in fact, it would probably increase the number of reckless people.
I don’t want to get into the specifics of morality, because it is complex. The only reason that I specified consequentialist decision making is that it is a condition of the theorem that proves Bayesian decision making to be optimal. Entirely nonconsequentialist systems don’t need to learn about the universe to make decisions and partially consequentialist systems are more complicated. For the latter, Bayesianism is often necessary if there are times when nonconsequentialist factors have little import to a decision.
You are here judging a non-action by a non-consequence.
I think you mean systems which ignore all consequences. Popper’s system does not do that.
Popper’s system incorporates observational evidence in the form of criticism: ideas can be criticized for contradicting it.
Yes, this is a non-action; I often say it is bad that as shorthand for cetris paribus, I would act so as to make not be the case. However, it is a consequence of what happened before (though you may have just meant it is not a consequence of my action). Judgements are often attached to consequences without specifying which action they are consequences of, just for convenience.
Yes, that was what I meant.
OK. I don’t recall hearing any Bayesian praising low probability theories, but no doubt you’ve heard more of them than me.
Yes but that only helps you deal with wishy washy theories. There’s plenty of theories which predict stuff with 100% probability. Science has to deal with those. This doesn’t help deal with them.
Examples include Newton’s Laws and Quantum Theory. They don’t say they happen sometimes but always, and that’s important. Good scientific theories are always like that. Even when they have a restricted, non-universal domain, it’s 100% within the domain.
Physics is currently thought to be deterministic. And even if physics was random, we would say that e.g. motion happens randomly 100% of the time, or whatever the law is. We would expect a law of motion with a call to a random function to still always be what happens.
PS Since you seem to have an interest in math, I’d be curious about your thoughts on this:
http://scholar.google.com/scholar?cluster=10839009135739435828&hl=en&as_sdt=0,5
There’s an improved version in Popper’s book The World of Parmenides but that may be harder for you to get.
The article you sent me is mathematically sound, but Popper draws the wrong conclusion from it. He has already accepted that P(H|E) can be greater than P(H). That’s all that’s necessary for induction: updating probability distribution. The stuff he says at the end about H ← E being countersupported by E does not prevent decision making based on the new distribution.
Setting aside Popper’s point for a minute, p(h|e) > p(h) is not sufficient for induction.
The reason it is not sufficient is that infinitely many h gain probability for any e. The problem of dealing with those remains unaddressed. And it would be incorrect and biased to selectively pick some pet theory from that infinite set and talk about how it’s supported.
Do you see what I’m getting at?
Yes, that is what the Solomonoff prior is for. It gives numbers to all the P(H_i).
And what is the argument for that prior? Why is it not arbitrary and often incorrect?
And whatever argument you give, I’ll also be curious: what method of arguing are you using? Deduction? Induction? Something else?
I tried to present it, but was obviously very unclear. If you read http://lesswrong.com/lw/jk/burdensome_details/ , http://lesswrong.com/lw/jn/how_much_evidence_does_it_take/ , and http://lesswrong.com/lw/jp/occams_razor/ , it’s basically a formalization of those ideas, with a tiny amount of handwaving.
Deduction.
Deduction requires premises to function. Where did you get the premises?
It seems obvious that low probability theories are good. Since probabilities must add up to 100%, there can be only a few high-probability theories and, when one is true, there is not much work to be done in finding it, since it is already so likely. telling someone to look among low-probability theories is like telling them to look among nonapples when looking for possible products to sell, and it provides no way of distinguishing good low-prior theories, like quantum mechanics, from bad ones, like astrology.
Unfortunately, I cannot read that article, as it is behind a paywall. If you have access to it, perhaps you could email it to me at endoself (at) yahoo (dot) com .
ETA:
I was only talking about Popper’s idea of theories with high content. That particular analysis was not meant to address theories that predicted certain outcomes with probability 1.
It’s a loose guideline for people about where it may be fruitful to look. It can also be used in critical arguments if/when people think of arguments that use it.
One of the differences between Popper and Bayesian Epistemology is that Popper thinks being overly formal is a fault not a merit. Much of Popper’s philosophy does not consist of formal, rigorous guidelines to be followed exactly. Popper isn’t big on rules of procedure. A lot is explanation. Some is knowledge to use on your own. Some is advice.
So, “God does everything”, plus a definition of “everything” which makes predictions about all events, would rate very highly with you? It’s very low on theory and very high on prediction.
Define theories of that type for all possible sets of predictions. Then at any given time you will have infinitely many of them that predict all your data with 100% probability.
Why is that wrong?
No, it has tons of theory. God is a very complex concept. Note that ‘God did everything’ is more complex and therefore less likely than ‘everything happened’. Did you read http://lesswrong.com/lw/jp/occams_razor/ ?
How do you figure God is complex? God as I mean it simply can do anything, no reason given. That is its only attribute: that it arbitrarily does anything the theory its attached to cares to predict. We can even stop calling it “God”. We could even not mention it at all so there is no theory and merely give a list of predictions. Would that be good, in your view?
If ‘God’ is meaningless and can merely be attached to any theory, then the theory is the same with and without God. There is nothing to refute, since there is no difference. If you defined ‘God’ to mean a being who created all species or who commanded a system of morality, I would have both reason to care about and means to refute God. If you defined ‘God’ to mean ‘quantum physics’, there would be applications and means of proving that ‘God’ is a good approximation, but this definition is nonsensical, since it is not what is usually meant by ‘God. If the theory of ‘God’ has no content, there is nothing to discuss, but the is again a very unusual definition.
Would a list of predictions with no theory/explanation be good or bad, in your view?
If there is no simpler description, then a list of predictions is better but, if an explanation simpler then merely a list of prediction is at all possible, then that would be more likely.
How do you decide if an explanation is simpler than a list of predictions? Are you thinking in terms of data compression?
Do you understand that the content of an explanation is not equivalent to the predictions it makes? It offers a different kind of thing than just predictions.
Essentially. It is simpler if it has a higher Solomonoff prior.
Yes, there is more than just predictions. However, prediction are the only things that tell us how to update our probability distributions.
So, your epistemology is 100% instrumentalist and does not deal with non-predictive knowledge at all?
Can you give an example of non-predictive knowledge and what role it should play?
Quoting from The Fabric of Reality, chapter 1, by David Deutsch.
Yet some philosophers — and even some scientists — disparage the role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of experiments: its entire content lies in its predictive formulae. They consider that any consistent explanation that a theory may give for its predictions is as good as any other — or as good as no explanation at all — so long as the predictions are true. This view is called instrumentalism (because it says that a theory is no more than an ‘instrument’ for making predictions). To instrumentalists, the idea that science can enable us to understand the underlying reality that accounts for our observations is a fallacy and a conceit. They do not see how anything a scientific theory may say beyond predicting the outcomes of experiments can be more than empty words.
[cut a quote of Steven Weinberg clearly advocating instrumentalism. the particular explanation he says doesn’t matter is that space time is curved. space time curvature is an example of a non-predictive explanation.]
imagine that an extraterrestrial scientist has visited the Earth and given us an ultra-high-technology ‘oracle’ which can predict the outcome of any possible experiment, but provides no explanations. According to instrumentalists, once we had that oracle we should have no further use for scientific theories, except as a means of entertaining ourselves. But is that true? How would the oracle be used in practice? In some sense it would contain the knowledge necessary to build, say, an interstellar spaceship. But how exactly would that help us to build one, or to build another oracle of the same kind — or even a better mousetrap? The oracle only predicts the outcomes of experiments. Therefore, in order to use it at all we must first know what experiments to ask it about. If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction — even perfect, universal prediction — is simply no substitute for explanation.
Similarly, in scientific research the oracle would not provide us with any new theory. Not until we already had a theory, and had thought of an experiment that would test it, could we possibly ask the oracle what would happen if the theory were subjected to that test. Thus, the oracle would not be replacing theories at all: it would be replacing experiments. It would spare us the expense of running laboratories and particle accelerators.
[cut elaboration]
The oracle would be very useful in many situations, but its usefulness would always depend on people’s ability to solve scientific problems in just the way they have to now, namely by devising explanatory theories. It would not even replace all experimentation, because its ability to predict the outcome of a particular experiment would in practice depend on how easy it was to describe the experiment accurately enough for the oracle to give a useful answer, compared with doing the experiment in reality. After all, the oracle would have to have some sort of ‘user interface’. Perhaps a description of the experiment would have to be entered into it, in some standard language. In that language, some experiments would be harder to specify than others. In practice, for many experiments the specification would be too complex to be entered. Thus the oracle would have the same general advantages and disadvantages as any other source of experimental data, and it would be useful only in cases where consulting it happened to be more convenient than using other sources. To put that another way: there already is one such oracle out there, namely the physical world. It tells us the result of any possible experiment if we ask it in the right language (i.e. if we do the experiment), though in some cases it is impractical for us to ‘enter a description of the experiment’ in the required form (i.e. to build and operate the apparatus). But it provides no explanations.
In a few applications, for instance weather forecasting, we may be almost as satisfied with a purely predictive oracle as with an explanatory theory. But even then, that would be strictly so only if the oracle’s weather forecast were complete and perfect. In practice, weather forecasts are incomplete and imperfect, and to make up for that they include explanations of how the forecasters arrived at their predictions. The explanations allow us to judge the reliability of a forecast and to deduce further predictions relevant to our own location and needs. For instance, it makes a difference to me whether today’s forecast that it will be windy tomorrow is based on an expectation of a nearby high-pressure area, or of a more distant hurricane. I would take more precautions in the latter case.
[“wind due to hurricane” and “wind due to high-pressure area” are different explanations for a particular prediction.]
So knowledge is more than just predictive because it also lets us design things?
Here’s a solution to the problem with the oracle—design a computer that inputs every possible design to the oracle and picks the best. You may object that this would be extremely time-consuming and therefore impractical. However, you don’t need to build the computer; just ask the oracle what would happen if you did.
What can we learn from this? This kind of knowledge can be seen as predictive, but only incidentally, because the computer happen to be implemented in the physical world. If it were implemented mathematically, as an abstract algorithm, we would recognize this as deductive, mathematical knowledge. But math is all about tautologies; nothing new is learned. Okay, I apologize for that. I think I’ve been changing my definition of knowledge repeatedly to include or exclude such things. I don’t really care as much about consistent definitions as I should. Hopefully it is clear from context. I’ll go back to your original question.
The difference between the two cases is not the same as the crucial difference here. Having a theory as opposed to a list of predictions for every possible experiment does not necessarily make the theorems easier to prove. When it does, which is almost always, this is simply because that theory is more concise, so it is easier to deduce things from. This seems more like a matter of computing power than one of epistemology.
How does it pick the best?
And wouldn’t the oracle predict that the computer program would never halt, since it would attempt to enter infinitely many questions into the oracle?
According to some predetermined criteria. “How well does this spaceship fly?” “How often does it crash?” Making a computer evaluate machines is not hard in principle, and is beside the point.
I was assuming a finite maximum size with only finitely many distinguishable configurations in that size, but, again, this is irrelevant; whatever trick you use to make this work, you will not change the conclusions.
I think figuring out what criteria you want is an example of a non-predictive issue. That makes it not beside the point. And if the computer picks the best according to criteria we give it, they will contain mistakes. We won’t actually get the best answer. We’ll have to learn stuff and improve our knowledge all in order to set up your predictive thing. So there is this whole realm of non-predictive learning.
So you make assumptions like a spaceship is a thing made out of atoms. If your understanding of physics (and therefore your assumptions) is incorrect then your use of the oracle won’t work out very well. So your ability to get useful predictions out of the oracle depends on your understanding, not just on predicting anything.
So I just give it my brain and tell it to do what it wants. Of course, there are missing steps, but they should be purely deductive. I believe that is what Eliezer is working on now :)
Good point. I guess you can’t bootstrap an oracle like this; some things possible mathematically, like calculating a function over an infinity of points, just can’t be done physically. My point still stands, but this illustration definitely dies.
That’s it? That’s just not very impressive by my standards. Popper’s epistemology is far more advanced, already. Why do you guys reject and largely ignore it? Is it merely because Eliezer published a few sentences of nasty anti-Popper myths in an old essay?
By ‘what Eliezer is working on now’ I meant AI, which would probably be necessary to extract my desires from my brain in practice. In principle, we could just use Bayes’ theorem a lot, assuming we had precise definitions of these concepts.
Popperian epistemology is incompatible with Bayesian epistemology, which I accept from its own justification, not from a lack of any other theory. I disliked what I had heard about Popper before I started reading LessWrong, but I forget my exact argument, so I do not know if it was valid. From what I do remember, I suspect it was not.
So, you reject Popper’s ideas without having any criticism of them that you can remember?
That’s it?
You don’t care that Popper’s ideas have criticisms of Bayesian epistemology which you haven’t answered. You feel you don’t need to answer criticisms because Bayesian epistemology is self-justifying and thus all criticisms of it must be wrong. Is that it?
No, I brought up my past experience with Popper because you asked if my opinions on him came from Eliezer.
No, I think Bayesian epistemology has been mathematically proven. I don’t spend a lot of time investigating alternatives for the same reason I don’t spend time investigating alternatives to calculus.
If you have a valid criticism, “this is wrong” or “you haven’t actually proved this” as opposed to “this has a limited domain of applicability” (actually, that could be valid if Popperian epistemology can answer a question that Bayesianism can’t), I would love to know. You did bring up some things of this type, like that paper by Popper, but none of them have logically stood up, unless I am missing something.
If Bayesian epistemology is mathematically proven, why have I been told in my discussions here various things such as: there is a regress problem which isn’t fully solved (Yudkowsky says so), that circular arguments for induction are correction, that foundationalism is correct, been linked to articles to make Bayesian points and told they have good arguments with only a little hand waving, and so on? And I’ve been told further research is being done.
It seems to me that saying it’s proven, the end, is incompatible with it having any flaws or unsolved problems or need for further research. So, which is it?
All of the above. It is wrong b/c, e.g., it is instrumentalist (has not understood the value of explanatory knowledge) and inductivist (induction is refuted). It is incomplete b/c, e.g. it cannot deal with non-observational knowledge such as moral knowledge. You haven’t proved much to me however I’ve been directed to two books, so judgment there is pending.
I don’t know how you concluded that none of my arguments stood up logically. Did you really think you’d logically refuted every point? I don’t agree, I think most of your arguments were not pure logic, and I thought that various issues were pending further discussion of sub-points. As I recall, some points I raised have not been answered. I’m having several conversations in parallel so I don’t recall which in particular you didn’t address which were replies to you personally, but for example I quoted an argument by David Deutsch about an oracle. The replies I got about how to try to cheat the oracle did not address the substantive point of the thought experiment, and did not address the issue (discussed in the quote) that oracles have user interfaces and entering questions isn’t just free and trivial, and did not address the issue that physical reality is a predictive oracle meeting all the specified characteristics of the alien oracle (we already have an oracle and none of the replies I got about use the oracle would actually work with the oracle we have). As I saw it, my (quoted) points on that issue stood. The replies were some combination of incomplete and missing the point. They were also clever which is a bit of fun. I thought of what I think is a better way to try to cheat the rules, which is to ask the oracle to predict the contents of philosophy books that would be written if philosophy was studied for trillions of years by the best people. However, again, the assumption that any question which is easily described in English can be easily entered into the oracle and get a prediction was not part of the thought experiment. And the reason I hadn’t explained all this yet is that there were various other points pending anyway, so shrug, it’s hard to decide where to start when you have many different things to say.
It is proven that the correct epistemology, meaning one that is necessary to achieve general goals, is isomorphic to Bayesianism with some prior (for beings that know all math). What that prior is requires more work. While the constraint of knowing all math is extremely unrealistic, do you agree that the theory of what knowledge would be had in such situations is a useful guide to action until we have a more general theory. Popperian epistemology cannot tell me how much money to bet at what odds for or against P = NP any more than Bayesian epistemology can, but at least Bayesian epistemology set this as a goal.
This is all based on our limited mathematical ability. A theory does have an advantage over an oracle or the reality-oracle: we can read it. Would you agree that all the benefits of a theory come from this plus knowing all math. The difference is one of mathematical knowledge, not of physical knowledge. How does Popper help with this? Are there guidelines for what ‘equivalent’ formulations of a theory are mathematically better? If so, this is something that Bayesianism does not try to cover, so this may have value. However, this is unrelated to the question of the validity of “don’t assign probabilities to theories”.
I thought I addressed this but, to recap:
That (well and how much bigger) is all I need to make decisions.
So what? I already have my new probabilities.
What is induction if not the calculation of new probabilities for hypotheses? Should I care about these ‘inductive truths’ that Popper disproves the existence of? I already have an algorithm to calculate the best action to take. It seems like Bayesianism isn’t inductivist by Popper’s definition.
I’d like to be sure that we are using the same definitions of our terms, so please give an example.
You mean proven given some assumptions about what an epistemology should be, right?
No. We need explanations to understand the world. In real life, is only when we have explanations that we can make good predictions at all. For example, suppose you have a predictive theory about dice and you want to make bets. I chose that example intentionally to engage with areas of your strength. OK, now you face the issue: does a particular real world situation have the correct attributes for my predictive theory to apply? You have to address that to know if your predictions will be correct or not. We always face this kind of problem to do much of anything. How do we figure out when our theories apply? We come up with explanations about what kinds of situations they apply to, and what situation we are in, and we then come up with explanations about why we think we are/aren’t in the right kind of situation, and we use critical argument to improve these explanations. Bayesian Epistemology does not address all this.
I replied to that. Repeating: if you increase the probability of infinitely many theories, the problem of figuring out a good theory is not solved. So that is not all you need.
Further, I’m still waiting on an adequate answer about what support is (inductive or otherwise) and how it differs from consistency.
I gave examples of moral knowledge in another comment to you. Morality is knowledge about how to live, what is a good life. e.g. murder is immoral.
Yes, I stated my assumptions in the sentence, though I may have missed some.
This comes back to the distinction between one complete theory that fully specifies the universe and a set of theories that are considered to be one because we are only looking at a certain domain. In the former case, the domain of applicability is everywhere. In the latter, we have a probability distribution that tells us how likely it is to fail in every domain. So, this kind of thing is all there in the math.
What do you mean by ‘a good theory’. Bayesian never select one theory as ‘good’ as follow that; we always consider the possibility of being wrong. When theories have higher probability than others, I guess you could call them good. I don’t see why this is hard; just calculate P(H | E) for all the theories and give more weight to the more likely ones when making decisions.
Evidence supports a hypothesis if P(H | E) > P(H). Two statements, A, B, are consistent if ¬(A&B → ⊥). I think I’m missing something.
Let’s consider only theories which make all their predictions with 100% probability for now. And theories which cover everything.
Then:
If H and E are consistent, then it follows that P(H | E) > P(H).
For any given E, consider how much greater the probability of H is, for all consistent H. That amount is identical for all H considered.
We can put all the Hs in two categories: the consistent ones which gain equal probability, and the inconsistent ones for which P(H|E) = 0. (Assumption warning: we’re relying on getting it right which H are consistent with which E.)
This means:
1) consistency and support coincide.
2) there are infinitely many equally supported theories. There are only and exactly two amounts of support that any theory has given all current evidence, one of which is 0.
3) The support concept plays no role in helping us distinguish between the theories with more than 0 support.
4) The support concept can be dropped entirely because it has no use at all. The consistency concept does everything
5) All mention of probability can be dropped too, since it wasn’t doing anything.
6) And we still have the main problem of epistemology left over, which is dealing with the theories that aren’t refuted by evidence
Similar arguments can be made without my initial assumptions/restrictions. For example introducing theories that make predictions with less than 100% probability will not help you because they are going to have lower probability than theories which make the same predictions with 100% probability.
Well the ratio is the same, but that’s probably what you meant.
Have a prior. This reintroduces probabilities and deals with the remaining theories. You will converge on the right theory eventually no matter what your prior is. Of course, that does not mean that all priors are equally rational.
If they all have the same prior probability, then their probabilities are the same and stay that way. If you use a prior which arbitrarily (in my view) gives some things higher prior probabilities in a 100% non-evidence-based way, I object to that, and it’s a separate issue from support.
How does having a prior save the concept of support? Can you give an example? Maybe the one here, currently near the bottom:
http://lesswrong.com/lw/54u/bayesian_epistemology_vs_popper/3urr?context=3
Well shouldn’t they? If you look at it from the perspective of making decisions rather than finding one right theory, it’s obvious that they are equiprobable and this should be recognized.
Solomonoff does not give “some things higher prior probabilities in a 100% non-evidence-based way”. All hypotheses have the same probability, many just make similar predictions.
Is anyone here working on the problem of parenting/educating AIs?
It seems someone has downvoted you for not being familiar with Eliezer’s work on AI. Basically, this is overly anthropomorphic. It is one of our goals to ensure that an AI can progress from a ‘seed AI’ to a superintelligent AI without anything going wrong, but, in practice, we’ve observed that using metaphors like ‘parenting’ confuses people too much to make progress, so we avoid it.
Don’t worry about downvotes, they do not matter.
I wasn’t using parenting as a metaphor. I meant it quite literally (only the educational part, not the diaper changing).
One of the fundamental attributes of an AI is that it’s a program which can learn new things.
Humans are also entities that learn new things.
But humans, left alone, don’t fare so well. Helping people learn is important, especially children. This avoids having everyone reinvent the wheel.
The parenting issue therefore must be addressed for AI. I am familiar with the main ideas of the kind of AI work you guys do, but I have not found the answer to this.
One possible way to address it is to say the AI will reinvent the wheel. It will have no help but just figure everything out from scratch.
Another approach would be to program some ideas into the AI (changeable, or not, or some of each), and then leave it alone with that starting point.
Another approach would be to talk with the AI, answer its questions, lecture it, etc… This is the approach humans use with their children.
Each of these approaches has various problems with it which are non-trivial to solve.
Make sense so far?
When humans hear parenting, they think of the human parenting process. Describe the AI as ‘learning’ and the humans as ‘helping it learn’. This get us closer to the idea of humans learning about the universe around them, rather than being raised as generic members of society.
Well, the point of down votes is discourage certain behaviour, and I agree that you should use terminology that we have found less likely to cause confusion.
AIs don’t necessarily have so much of a problem with this. They learn very differently than humans: http://lesswrong.com/lw/jo/einsteins_arrogance/ , http://lesswrong.com/lw/qj/einsteins_speed/ , http://lesswrong.com/lw/qk/that_alien_message/
This is definitely an important problem, but we’re not really at the stage where it is necessary yet. I don’t see how we could make much progress on how to get an AI to learn without knowing the algorithms that it will use to learn.
Not all humans. Not me. Is that not a bias?
I don’t discourage without any argument being given, just on the basis of someone’s judgement without knowing the reason. I don’t think I should. I think that would be irrational. I’m surprised that this community wants to encourage people to conform to the collective opinion of others as expressed by votes.
OK, I think I see where you are coming from. However, there is only one known algorithm that learns (creates knowledge). It is, in short, evolution. We should expect an AI to use it, we shouldn’t expect a brand new solution to this hard problem (historically there have been very few candidate solutions proposed, most not at all promising).
The implementation details are not very important because the result will be universal, just like people are. This is similar to how the implementation details of universal computers are not important for many purposes.
Are you guys familiar with these concepts? There is important knowledge relevant to creating AIs which your statement seems to me to overlook.
Yes, that would be a bias. Note that this kind of bias is not always explicitly noticed.
As a general rule, if I downvote, I either reply to the post, or it is something that should be obvious to someone who has read the main sequences.
No, there is another: the brain. It is also much faster than evolution, an advantage I would want a FAI to have.
You are unfamiliar with the basic concepts of evolutionary epistemology. The brain internally does evolution of ideas.
Why is it that you guys want to make AI but don’t study relevant topics like this?
You’re conflating two things. Biological evolution is a very specific algorithm, with well-studied mathematical properties. ‘Evolution’ in general just means any change over time. You seem to be using it in an intermediate sense, as any change that proceeds through reproduction, variation, and selection, which is also a common meaning. This, however, is still very broad, so there’s very little that you can learn about an AI just from knowing “it will come up with many ideas, mostly based on previous ones, and reject most of them”. This seems less informative than “it will look at evidence and then rationally adjust its understanding”.
There’s an article related to this: http://lesswrong.com/lw/l6/no_evolutions_for_corporations_or_nanodevices/
Eliezer has studied cognitive science. Those of us not working directly with him have very little to do with AI design. Even Eliezer’s current work is slightly more background theory than AI itself.
I’m not conflating them. I did not mean “change over time”.
There are many things we can learn from evolutionary epistemology. It seeming broad to you does not prevent that. You would do better to ask what good it is instead of guess it is no good.
For one thing it connects with meme theory.
A different example is that it explains misunderstandings when people communicate. Misunderstandings are extremely common because communication involves 1) guessing what the other person is trying to say 2) selecting between those guesses with criticism 3) making more guesses which are variants of previous guesses 4) more selection 5) etc
This explanation helps us see how easily communication can go wrong. It raises interesting issues like why so much communication doesn’t go wrong. It refutes various myths like that people absorb their teacher’s lectures a little like sponges.
It matters. And other explanations of miscommunication are worse.
But that isn’t the topic I was speaking of. I meant evolutionary epistemology. Which btw I know that Eliezer has not studied much because he isn’t familiar with one of it’s major figures (Popper).
I don’t know enough about evolutionary epistemology to evaluate the usefulness and applicability of its ideas.
How was evolutionary epistemology tested? Are there experiments or just introspection?
Evolution is a largely philosophical theory (distinct from the scientific theory about the history of life of earth). It is a theory of epistemology. Some parts of epistemology technically depend on the laws of physics, but it is general researched separately from physics. There has not been any science experiment to test it which I consider important, but I could conceive of some because if you specified different and perverse laws of physics you could break evolution. In a different sense, evolution is tested constantly in that the laws of physics and evidence we see around us, every day, are not that perverse but conceivable physics that would break evolution.
The reason I accept evolution (again I refer to the epistemological theory about how knowledge is created) is that it is a good explanation, and it solves an important philosophical problem, and I don’t know anything wrong with it, and I also don’t know any rivals which solve the problem.
The problem has a long history. Where does “apparent design” come from? Paley gave an example of finding a watch in nature, which he said you know can’t have gotten there by chance. That’s correct—the watch has knowledge (aka apparent designed, or purposeful complexity, or many other terms). The watch is adapted “to a purpose” as some people put it (I’m not really a fan of the purpose terminology. But it’s adapted! And I think it gets the point across ok.)
Paley then guessed as follows: there is no possible solution to the origins of knowledge other than “A designer (God) created it”. This is a very bad solution even pre-Darwin because it does not actually solve the problem. The designer itself has knowledge, adaptation to a purpose, whatever. So where did it come from? The origin is not answered.
Since then, the problem has been solved by the theory of evolution and nothing else. And it applies to more than just watches found in nature, and to plants and animals. It also applies to human knowledge. The answer “intelligence did it” is no better than “God did it”. How does intelligence do it? The only known answer is: by evolution.
The best thing to read on this topic is The Beginning of Infinity by David Deutsch which discusses Popperian epistemology, evolution, Paley’s problem and its solution, and also has two chapters about meme theory which give important applications.
You can also find some, e.g. here: http://fallibleideas.com/evolution-and-knowledge
Also here: http://fallibleideas.com/tradition (Deutsch discusses static and dynamic memes and societies. I discuss “traditions” rather than “memes”. It’s quite similar stuff.)
What? Epistemological evolution seems to be about how the mind works, independent of what philosophical status is accorded to the thoughts. Surely it could be tested just by checking if the mind actually develops ideas in accordance with the way it is predicted to.
If you want to check how minds work, you could do that. But that’s very hard. We’re not there yet. We don’t know how.
How minds work is a separate issue from evolutionary epistemology. Epistemology is about how knowledge is created (in abstract, not in human minds specifically). If it turns out there is another way, it wouldn’t upset the evolution would create knowledge if done in minds.
There’s no reason to think there is another way. No argument that there is. No explanation of why to expect there to be. No promising research on the verge of working one out. Shrug.
I see. I thought that evolutionary epistemology was a theory of human minds, though I know that that technically isn’t epistemology. Does evolutionary epistemology describe knowledge about the world, mathematical knowledge, or both (I suspect you will say both)?
It describes the creation of any type of knowledge. It doesn’t tell you the specifics of any field itself, but doing it helps you learn them.
So, you’re saying that in order to create knowledge, there has to be copying, variation, and selection. I would agree with the first two, but not necessarily the third. Consider a formal axiomatic system. It produces an ever-growing list of theorems, but none are ever selected any more than others. Would you still consider this system to be learning?
With deduction, all the consequences are already contained in the premises and axioms. Abstractly, that’s not learning.
When human mathematicians do deduction, they do learn stuff, because they also think about stuff while doing it, they don’t just mechanically and thoughtlessly follow the rules of math.
So induction (or probabilistic updating, since you said that Popper proved it not to be the same as whatever philosopher call ‘induction’) isn’t learning either because the conclusions are contained in the priors and observations?
If the axiomatic system was physically implemented in a(n ever-growing) computer, would you consider this learning?
the idea of induction is that the conclusions are NOT logically contained in the observations (that’s why it is not deduction).
if you make up a prior from which everything deductively follows, and everything else is mere deduction from there, then all of your problems and mistakes are in the prior.
no. learning is creating new knowledge. that would simply be human programmers putting their own knowledge into a prior, and then the machine not creating any new knowledge that wasn’t in the prior.
The correct method of updating one’s probability distributions is contained in the observations. P(H|E) = P(H)P(E|H)/P(E) .
So how could evolutionary epistemology be relevant to AI design?
AIs are programs that create knowledge. That means they need to do evolution. That means they need, roughly, a conjecture generator, a criticism generator, and a criticism evaluator. The conjecture generator might double as the criticism generator since a criticism is a type of conjecture, but it might not.
The conjectures need to be based on the previous conjectures (not necessarily all of the, but some). That makes it replication with variation. The criticism is the selection.
Any AI design that completely ignores this is, imo, hopeless. I think that’s why the AI field hasn’t really gotten anywhere. They don’t understand what they are trying to make, because they have the wrong philosophy (in particular the wrong explanations. i don’t mean math or logic).
Could you explain where AIXI does any of that?
Or could you explain where a Bayesian spam filter does that?
Note that there are AI approaches which do do something close to what you think an AI “needs”. For example, some of Simon Colton’s work can be thought of in a way roughly like what you want. But it is a mistake to think that such an entity needs to do that. (Some of the hardcore Bayesians make the same mistake in assuming that an AI must use a Bayesian framework. That something works well as a philosophical approach is not the same claim as that it should work well in a specific setting where we want an artificial entity to produce certain classes of systematic reliable results.)
Those aren’t AIs. They do not create new knowledge. They do not “learn” in my sense—of doing more than they were programmed to. All the knowledge is provided by the human programmer—they are designed by an intelligent person and to the extent they “act intelligent” it’s all due to the person providing the thinking for it.
I’m not sure this is at all well-defined. I’m curious, what would make you change your mind? If for example, Colton’s systems constructed new definitions, proofs, conjectures, and counter-examples in math would that be enough to decide they were learning?
How about it starts by passing the turing test?
Or: show me the code, and explain to me how it works, and how the code doesn’t contain all the knowledge the AI creates.
Could you explain how this is connected to the issue of making new knowledge?
This seems a bit like showing a negative. I will suggest you look for a start at Simon Colton’s paper in the Journal of Integer Sequences which uses a program that operates in a way very close to the way you think an AI would need to operate in terms of making conjectures and trying to refute them. I don’t know if the source code is easily available. It used to be on Colton’s website but I don’t see it there anymore; if his work seems at all interesting to you you can presumably email him requesting a copy. I don’t know how to show that the AI “doesn’t contain all the knowledge the AI creates” aside from the fact that the system constructed concepts and conjectures in number theory which had not previously been constructed. Moreover, Colton’s own background in number theory is not very heavy, so it is difficult to claim that he’s importing his own knowledge into the code. If you define more precisely what you mean by the code containing the knowledge I might be able to answer that further. Without a more precise notion it isn’t clear to me how to respond.
Holding a conversation requires creating knowledge of what the other guy is saying.
In deduction, you agree that the conclusions are logically contained in the premises and axioms, right? They aren’t something new.
In a spam filter, a programmer figures out how he wants spam filtered (he has the idea), then he tells the computer to do it. The computer doesn’t figure out the idea or any new idea.
With biological evolution, for example, we see something different. You get stuff out, like cats, which weren’t specified in advance. And they aren’t a trivial extension; they contain important knowledge such as the knowledge of optics that makes their eyes work. This is why “Where can cats come from?” has been considered an important question (people want an explanation of the knowledge which i sometimes called “apparent design), while “Where can rocks come from?” is not in the same category of question (it does have some interest for other reasons).
With people, people create ideas that aren’t in their genes, and were’t told to them by their parents or anyone else. That includes abstract ideas that aren’t the summation of observation. They sometimes create ideas no one ever thought of before. THey create new ideas.
In an AI (AGI you call it?) should be like a person: it should create new ideas which are not in it’s “genes” (programming). If someone actually writes an AI they will understand how it works and they can explain it, and we can use their explanation to judge whether they “cheated” or not (whether they, e.g., hard coded some ideas into the program and then said the AI invented them).
Ok. So to make sure I understand this claim. You are asserting that mathematicians are not constructing anything “new” when they discover proofs or theorems in set axiomatic systems?
Are genetic algorithm systems then creating something new by your definition?
Different concepts. An artificial intelligent is not (necessarily) a well-defined notion. An AGI is an artficial general intelligence, essentially something that passes the Turing test. Not the same concept.
I see no reason to assume that a person will necessarily understand how an AGI they constructed works. To use the most obvious hypothetical, someone might make a neural net modeled very closely after the human brain that functions as an AGI without any understanding of how it works.
When you “discover” that 2+1 = 3, given premises and axioms, you aren’t discovering something new.
But working mathematicians do more than that. They create new knowledge. It includes:
1) they learn new ways to think about the premises and axioms
2) they do not publish deductively implied facts unselectively or randomly. they choose the ones that they consider important. by making these choices they are adding content not found in the premises and axioms
3) they make choices between different possible proofs of the same thing. again where they make choices they are adding stuff, based on their own non-deductive understanding
4) when mathematicians work on proofs, they also think about stuff as they go. just like when experimental scientists do fairly mundane tasks in a lab, at the same time they will think and make it interesting with their thoughts.
They could be. I don’t think any exist yet that do. For example I read a Dawkins paper about one. In the paper he basically explained how he tweaked the code in order to get the results he wanted. He didn’t, apparently, realize that it was him, not the program, creating the output.
By “AI” I mean AGI. An intelligence (like a person) which is artificial. Please read all my prior statements in light of that.
Well, OK, but they’d understand how it was created, and could explain that. They could explain what they know about why it works (it copies what humans do). And they could also make the code public and discuss what it doesn’t include (e.g. hard coded special cases. except for the 3 he included on purpose, and he explains why they are there). That’d be pretty convincing!
I don’t think this is true. While he probably wouldn’t announce it if he was working on AI, he’ has indicated that he’s working on two books (HPMoR and a rationality book), and has another book queued. He’s also indicated that he doesn’t think anyone should work on AI until the goal system stability problem is solved, which he’s talked about thinking about but hasn’t published anything on, which probably means he’s stuck.
I more meant “he’s probably thinking about this in the back of his mind fairly often”, as well as trying to be humourous.
Do you know what he would think of work that has a small chance of solving goal stability and a slightly larger chance of helping with AI in general? This seems like a net plus to me, but you seem to have heard what he thinks should be studied from a slightly clearer source than I did.
I do not consider it possible to predict the growth of knowledge. That means you cannot predict, for example, the consequences of a scientific discovery that you have not yet discovered.
The reason is that if you could predict this you would in effect already have made the discovery.
Understanding is not primarily predictive and it is useful in a practical way. For example, you have to understand issues to address critical arguments offered by your peers. Merely predicting that they are wrong isn’t a good approach. It’s crucial to understand what their point is and to reason with them.
Understanding ideas helps us improve on them. It’s crucial to making judgments about what would be an improvement or not. It lets us judge changes better b/c e.g. we have some conception of why it works, which means we can evaluate what would break it or not.
That is not what I meant. If we could predict that the LHC will discover superparticles then yes, we would already know that. However, since we don’t know whether it will produce superparticles, we can predict that it will give us a lot of knowledge, since we will either learn that superparticles in the mass range detectable by the LHC exist or that they do not exist, so we can predict that we will learn a lot more about the universe by continuing to run the LHC than by filling in the tunnel where it is housed.
Eliezer proves that you cannot predict which direction science will go from a Bayesian perspective in http://lesswrong.com/lw/ii/conservation_of_expected_evidence/ .
So if new knowledge doesn’t come from prediction, what creates it? Answering this is one of epistemology’s main tasks. If you are focussing on prediction then you aren’t addressing it. Am I missing something?
New knowledge comes from observation. If you are referring to knowledge of what a theory says rather than of which theory is true, then this is assumed to be known. The math of how to deal with a situation where a theory is known but its consequences cannot be fully understood due to mathematical limitations is still in its infancy, but this has never posed a problem in practice.
That is a substantive and strong empiricist claim which I think is false.
For example, we have knowledge of things we never observed. Like stars. Observation is always indirect and its correctness always depends on theories such as our theories about whether the chain of proxies we are observing with will in fact observe what we want to observe.
Do you understand what I’m talking about and have a reply, or do you need me to explain further?
Could you understand why I might object to making a bunch of assumptions in one’s epistemology?
The new knowledge that is obtained from an observation is not just the content of the observation, it is also the new probabilities resulting from the observation. This is discussed at http://lesswrong.com/lw/pb/belief_in_the_implied_invisible/ .
It is assumed in practice, applied epistemology being a rather important thing to have. In ‘pure’ epistemology, it is just labelled incomplete; we definitely don’t have all the answers yet.
It seems to me that you’re pretty much conceding that your epistemology doesn’t work. (All flaws can be taken as “incomplete” parts where, in the future, maybe a solution will be found.)
That would leave the following important disagreement: Popper’s epistemology is not incomplete in any significant way. There is room for improvement, sure, but not really any flaws worth complaining about. No big unsolved problems marring it. So, why not drop this epistemology that doesn’t have the answers yet for one that does?
Would you describe quantum mechanics’ incompatibility with general relativity as “the theory doesn’t work”? For a being with unlimited computing power in a universe that is known to be computable (except for the being itself obviously), we are almost entirely done. Furthermore, many of the missing pieces to get from that to something much more complete seem related.
No, it is just wrong. Expected utility allows me to compute the right course of action given my preferences and a probability distribution over all theories. Any consistent consequentialist decision rule must be basically equivalent to that. The statement that there is no way to assign probabilities to theories therefore implies that there is no algorithm that a consequentialist can follow to reliably achieve their goals. Note that even if Popper’s values are not consequentialist, a consequentialist should still be able to act based on the knowledge obtained by a valid epistemology.
Can you be more specific?
I suspect you are judging Popperian epistemology by standards it states are mistaken. Would you agree that doing that would be a mistake?
Note the givens. There’s more givens which you didn’t mention too, e.g. some assumptions about people’s utilities having certain mathematical properties (you need this for, e.g., comparing them).
I don’t believe these givens are all true. If you think otherwise could we start with you giving the details more? I don’t want to argue with parts you simply omitted b/c I’ll have to guess what you think too much.
As a separate issue, “given my preferences” is such a huge given. It means that your epistemology does not deal in moral knowledge. At all. It simply takes preferences as givens and doesn’t tell you which to have. So in practice in real life it cannot be used for a lot of important issues. That’s a big flaw. And it means a whole entire second epistemology is needed to deal in moral knowledge. And if we have one of those, and it works, why not use it for all knowledge?
The rest of the paragraph was what I meant by this. You agree that Popperian epistemology states that theories should not be assigned probabilities.
Depends. If it’s standards make it useless, then, while internally consistent, I can judge it to be pointless. I just want an epistemology that can help me actually make decisions based on what I learn about reality.
I don’t think I was clear. A utility here just means a number I use to say how good a possible future is, so I can decide whether I want to work toward that future. In this context, it is far more general than anything composed of a bunch of term, each of which describes some properties of a person.
I can learn more about my preferences from observation of my own brain using standard Bayesian epistemology.
Popperian epistemology does this. What’s the problem? Do you think that assigning probabilities to theories is the only possible way to do this?
Overall you’ve said almost nothing that’s actually about Popperian epistemology. You just took one claim (which has nothing to do with what it’s about, it’s just a minor point about what it isn’t) and said it’s wrong (without detailed elaboration).
I understood that. I think you are conflating “utility” the mathematical concept with “utility” the thing people in real life have. The second may not have the convenient properties the first has. You have not provided an argument that it does.
How do you learn what preferences are good to have, in that way?
It is a theorem that every consistent consequentialist decision rule is either a Bayesian decision rule or a limit of Bayesian decision rules. Even if the probabilities are not mentioned when constructing the rule, they can be inferred from its final form.
I don’t know what you mean by ′ “utility” the thing people in real life have’.
Can we please not get into this. If it helps, assume I am an expected paperclip maximizer. How would I decide then?
What was the argument for that?
And what is the argument that actions should be judged ONLY by consequences? What is the arguing for excluding all other considerations?
People have preferences and values. e.g. they might want a cat or an iPhone and be glad to get it. The mathematical properties of these real life things are not trivial or obvious. For example, suppose getting the cat would add 2 happiness and the iPhone would add 20. Would getting both add 22 happiness? Answer: we cannot tell from the information available.
But the complete amorality of your epistemology—it’s total inability to create entire categories of knowledge—is a severe flaw in it. There’s plenty of other examples I could use to make the same point, however in general they are a bit less clear. One example is epistemology: epistemology is also not an empirical field. But I imagine you may argue about that a bunch, while with morality I think it’s clearer.
I’ve actually been meaning to find a paper that proves that myself. There’s apparently a proof in Mathematical Statistics, Volume 1: Basic and Selected Topics by Peter Bickel and Kjell Doksum.
None. I’ve just never found any property of an action that I care about other the consequences. I’d gladly change my mind on this if one were pointed out to me.
Agreed, and agreed that this is a common mistake. If you thought I was making this error, I was being far less clear than I thought.
Well all my opinions about the foundations of morality and epistemology are entirely deductive.
The original is Abraham Wald’s An Essentially Complete Class of Admissible Decision Functions.
Thank you!
I thought you didn’t address the issue (and need to): you did not say what mathematical properties you think that real utilities have and how you deal with them.
Using what premises?
What about explanations about whether it was a reasonable decision for the person to make that action, given the knowledge he had before making it?
Ordered. But I think you should be more cautious asserting things that other people told you were true, which you have not checked up on.
Every possible universe is associated with a utility.
Any two utilities can be compared.
These comparisons are transitive.
Weighted averages of utilities can be taken.
For any three possible universe, L, M, and N, with L < M, a weighted average of L and N is less than a weighted average of M and N, if N is accorded the same weight in both cases.
Basically just definitions. I’m currently trying to enumerate them, which is why I wanted to find the proof of the theorem we were discussing.
Care about in the sense of when I’m deciding whether to make it. I don’t really care about how reasonable other people’s decisions are unless it’s relevant to my interactions with them, where I will need that knowledge to make my own decisions.
Wait, you bought the book just for that proof? I don’t even know if its the best proof of it (in terms of making assumptions that aren’t necessary to get the result). I’m confidant in the proof because of all the other similar proofs I’ve read, though none seem as widely applicable as that one. I can almost sketch a proof in my mind. Some simple ones are explained well at http://en.wikipedia.org/wiki/Coherence_%28philosophical_gambling_strategy%29 .
For your first 5 points, how is that a reply about Popper? Maybe you meant to quote something else.
I don’t think that real people’s way of considering utility is based on entire universes at a time. So I don’t think your math here corresponds to how people think about it.
No, I used inter library loan.
Then put yourself in as the person under consideration. Do you think it matters whether you make decisions using rational thought processes, or do only the (likely?) consequences matter?
How do you judge whether you have the right ones? You said “entirely deductive” above, so are you saying you have a deductive way to judge this?
Yes, I did. Oops.
But that is what a choice is between—the universe where you choose one way and the universe where you choose another. Often large parts of the universe are ignored, but only because the action’s consequences for those parts are not distinguishable from how those part would be if a different action was taken. A utility function may be a sum (or more complicated combination) of parts referring to individual aspects of the universe, but, in this context, let’s not call those ‘utilities’; we’ll reserve that word for the final thing used to make decisions. Most of this is not consciously invoked when people make decisions, but a choice that does not stand when you consider its expected effects on the whole universe is a wrong choice.
I could could achieve better consequences using an ‘irrational’ process, I would, but this sounds nonsensical because I am used to defining ‘rational’ as that which reliably gets the best consequences.
Definitions as in “let’s set up this situation and see which choices make sense”. It’s pretty much all like the Dutch book arguments.
I don’t think I understand. This would rely on your conception of the real life situation (if you want it to apply to real life), of what what makes sense, being correct. That goes way beyond deductive or definitions into substantive claims.
About decisions, if a method like “choose by whim” gets you a good result in a particular case, you’re happy with it? You don’t care that it doesn’t make any sense if it works out this time?
So what? I think you’re basically saying that your formulation is equivalent to what people (should) do. But that doesn’t address the issue of what people actually do—it doesn’t demonstrate the equivalence. As you guys like to point out, people often think in ways that don’t make sense, including violating basic logic.
But also, for example, I think a person might evaluate getting a cat, and getting an iphone, and then they might (incorrectly) evaluate both by adding the benefits instead of by considering the universe with both based on its own properties.
Another issue is that I don’t think any two utilities people have can be compared. They are sometimes judged with different, contradictory standards. This leads to two major issues when trying to compare them 1) the person doesn’t know how 2) it might not be possible to compare even in theory because one or both contain some mistakes. the mistakes might need to be fixed before comparing, but that would change it.
I’m not saying people are doing it correctly. Whether they are right or wrong has no bearing on whether “utility” the mathematical object with the 5 properties you listed corresponds to “utility” the thing people do.
If you want to discuss what people should do, rather than what they do do, that is a moral issue. So it leads to questions like: how does bayesian epistemology create moral knowledge and how does it evaluate moral statements?
If you want to discuss what kind of advice is helpful to people (which people?), then I”m sure how you can see how talking about entire universes could easily confuse people, and how some other procedure being a special case of it may not be very good advice which does not address the practical problems they are having.
Do you think that the Dutch book arguments go “way beyond deductive or definitions”? Well, I guess that would depend on what you conclude from them. For now, lets say “there is a need to assign probabilities to events, no probability can be less than 0 or more than 1 and probabilities of mutually exclusive events should add”.
The confusion here is that we’re not judging an action. If I make a mistake and happen to benefit from it, there were good consequences, but there was no choice involved. I don’t care about this; it already happened. What I do care about, and what I can accomplish, is avoiding similar mistakes in the future.
Yes, that is what I was discussing. I probably don’t want to actually get into my arguments here. Can you give an example of what you mean by “moral knowledge”?
Applying dutch book arguments to real life situations always goes way behind deduction and definitions, yes.
A need? Are you talking about morality now?
Why are we saying this? You now speak of probabilities of events. Previously we were discussing epistemology which is about ideas. I object to assigning probabilities to the truth of ideas. Assigning them to events is OK when
1) the laws of physics are indeterministic (never, as far as we know)
2) we have incomplete information and want to make a prediction that would be deterministic except that we have to put several possibilities in some places, which leads to several possible answers. and probability is a reasonable way to organize thoughts about that.
So what?
Murder is immoral.
Being closed minded makes ones life worse because it sabotages improvement.
Are you saying Popper would evaluate “Murder is immoral.” in the same way as “Atoms are made up of electrons and a nucleus.”? How would you test this? What would you consider a proof of it?
I prefer to leave such statements undefined, since people disagree too much on what ‘morality’ means. I am a moral realist to some, a relativist to others, and an error theorist to other others. I could prove the statement for many common non-confused definitions, though not for, for example, people who say ‘morality’ is synomnymous to ‘that which is commanded by God’, which is based on confusion but at least everyone can agree on when it is or isn’t true and not for error theorists, as both groups’ definitions make the sentence false.
In theory I could prove this sentence, but in practice I could not do this clearly, especially over the internet. It would probably be much easier for you to read the sequences, which get to this toward the end, but, depending on your answers to some of my questions, there may be an easier way to explain this.
Yes. One epistemology. All types of knowledge. Unified!
You would not.
We don’t accept proofs of anything, we are fallibilists. We consider mathematical proofs to be good arguments though. I don’t really want to argue about those (unless you’re terribly interested. btw this is covered in the math chapter of The Fabric of Reality by David Deutsch). But the point is we don’t accept anything as providing certainty or even probableness. In our terminology, nothing provides justification.
What we do instead is explain our ideas, and to criticize mistakes, and in this way to improve our ideas. This, btw, creates knowledge in the same way as evolution (replication of ideas, with variation, and selection by criticism). That’s not a metaphor or analogy by literally true.
Wouldn’t it be nice if you had an epistemology that helped you deal with all kinds of knowledge, so you didn’t have to simply give up on applying reason to important issues like what is a good life, and what are good values?
Fine, what would you consider an argument for it?
Eliezer and I probably agree with you.
Well, biological evolution is a much smaller part of conceptspace than “replication, variation, selection” and now I’m realizing that you probably haven’t read A Human’s Guide to Words which is extremely important and interesting and, while you’ll know much of it, has things that are unique and original and that you’ll learn a lot from. Please read it.
I do apply reason to those things, I just don’t use the words ‘morality’ in my reasoning process because too many people get confused. It is only a word after all.
On a side note, I am staring to like what I hear of Popper. It seems to embody an understanding of the brain and a bunch of useful advice for it. I think I disagree with some things, but on grounds that seems like the sort of thing that is accepted as motivation for the theory self-modify. Does that make sense? Anyways, it’s not Popper’s fault that there are a set of theorems that in principle remove the need for other types of thought and in practice cause big changes in the way we understand and evaluate the heuristics that are necessary because the brain is fallible and computationally limited.
Wei Dai likes thinking about how to deal with questions outside of Bayesianism’s current domain of applicability, so he might be interested in this.
Interpret this as a need in order to achieve some specified goal in order to keep this part the debate out of morality. A paperclip maximizer, for example would obviously need to not pay 200 paperclips for a lottery with a maximum payout of 100 paperclips in order to achieve its goals. Furthermore, this applies to any consequentialist set of preferences.
Not sure why I wrote that. Substitute ‘theories’.
So you assume morality (the “specified goal”). That makes your theory amoral.
Why is there a need to assign probabilities to theories? Popperian epistemology functions without doing that.
Well there’s a bit more than this, but it’s not important right now. One can work toward any goal just by assuming it as a goal.
Because of the Dutch book arguments. The probabilities can be inferred from the choices. I’m not sure if the agent’s probability distribution can be fully determined from a finite set of wagers, but it can be definitely be inferred to an arbitrary degree of precision by adding enough wagers.
Can you give an example of how you use a Dutch book argument on a non-gambling topic? For example, if I’m considering issues like whether to go swimming today, and what nickname to call my friend, and I don’t assign probabilities like “80% sure that calling her Kate is the best option”, how do I get Dutch Booked?
First you hypothetically ask what would happen if you were asked to make bets on whether calling her Kate would result in world X (with utility U(X)). Do this for all choices and all possible worlds. This gives you probabilities and utilities. You then take a weighted average, as per the VNM theorem.
How do I get Dutch Booked for not doing that?
And I’m still curious how the utilities are decided. By whim?
You don’t get to decide utilities so much as you have to figure out what they are. You already have a utility function, and you do your best to describe it . How do you weight the things you value relative to each other?
This takes observation, because what we think we value often turns out not to be a good description of our feelings and behavior.
From our genes? And the goal is just to figure out what it is, but not change it for the better?
Can you explain how you would change your fundamental moral values for the better?
By criticizing them. And conjecturing improvements which meet the challenges of the criticism. It is the same method as for improving all other knowledge.
In outline it is pretty simple. You may wonder things like what would be a good moral criticism. To that I would say: there’s many books full of examples, why dismiss all that? There is no one true way of arguing. Normal arguments are ok, I do not reject them all out of hand but try to meet their challenges. Even the ones with some kind of mistake (most of them), you can often find some substantive point which can be rescued. It’s important to engage with the best versions of theories you can think of.
BTW once upon a time I was vaguely socialist. Now I’m a (classical) liberal. People do change their fundamental moral values for the better in real life. I attended a speech by a former Muslim terrorist who is now a pro-Western Christian (walid shoebat).
I’ve changed my social values plenty of times, because I decided different policies better served my terminal values. If you wanted to convince me to support looser gun control, for instance, I would be amenable to that because my position on gun control is simply an avenue for satisfying my core values, which might better be satisfied in a different way.
If you tried to convince me to support increased human suffering as an end goal, I would not be amenable to that, unless it turns out I have some value I regard as even more important that would be served by it.
This is what Popper called the Myth of the Framework and refuted in his essay by that name. It’s just not true that everyone is totally set in their ways and extremely closed minded, as you suggest. People with different frameworks learn from each other.
One example is children learn. They are not born sharing their parents framework.
You probably think that frameworks are genetic, so they are. Dealing with that would take a lengthy discussion. Are you interested in this stuff? Would you read a book about it? Do you want to take it seriously?
I’m somewhat skeptical b/c e.g. you gave no reply to some of what I said.
I think a lot of the reason people don’t learn other frameworks, in practice, is merely that they choose not to. They think it sounds stupid (before they understand what it’s actually saying) and decide not to try.
When did I suggest that everyone is set in their ways and extremely closed minded? As I already pointed out, I’ve changed my own social values plenty of times. Our social frameworks are extremely plastic, because there are many possible ways to serve our terminal values.
I have responded to moral arguments with regards to more things than I could reasonably list here (economics, legal codes, etc.) I have done so because I was convinced that alternatives to my preexisting social framework better served my values.
Valuing strict gun control, to pick an example, is not genetically coded for. A person might have various inborn tendencies which will affect how they’re likely to feel about gun control; they might have innate predispositions towards authoritarianism or libertarianism, for instance, that will affect how they form their opinion. A person who valued freedom highly enough might support little or no gun control even if they were convinced that it would result in a greater loss of life. You would have a hard time finding anyone who valued freedom so much that they would support looser gun control if they were convinced it would destroy 90% of the world population, which gives you a bit of information about how they weight their preferences.
If you wanted to convince me to support more human suffering instead of more human happiness, you would have to appeal to something else I value even more that would be served by this. If you could argue that my preference for happiness is arbitrary, that preference for suffering is more natural, even if you could demonstrate that the moral goodness of human suffering is intrinsically inscribed on the fabric of the universe, why should I care? To make me want to make humans unhappy, you’d have to convince me there’s something else I want enough to make humans unhappy for its sake.
I also don’t feel I’m being properly understood here; I’m sorry if I’m not following up on everything, but I’m trying to focus on the things that I think meaningfully further the conversation, and I think some of your arguments are based on misapprehensions about where I’m coming from. You’ve already made it clear that you feel the same, but you can take it as assured that I’m both trying to understand you and make myself understood.
You suggested it about a category of ideas which you called “core values”.
You are saying that you are not open to new values which contradict your core values. Ultimately you might replace all but the one that is the most core, but never that one.
That’s more or less correct. To quote one of Eliezer’s works of ridiculous fanfiction, “A moral system has room for only one absolute commandment; if two unbreakable rules collide, one has to give way.”
If circumstances force my various priorities into conflict, some must give way to others, and if I value one thing more than anything else, I must be willing to sacrifice anything else for it. That doesn’t necessarily make it my only terminal value; I might have major parts of my social framework which ultimately reduce to service to another value, and they’d have to bend if they ever came into conflict with a more heavily weighted value.
Well in the first half, you get Dutch booked in the usual way. It’s not necessarily actually happening, but there still must be probabilities that you would use if it were. In the second half, if you don’t follow the procedure (or an equivalent one) you violate at least one VNM axiom.
If you violate axiom 1, there are situations in which you don’t have a preferred choice—not as is “both are equally good/bad” but as in your decision process does not give an answer or gives more than one answer. I don’t think I’d call this a decision process.
If you violate axiom 2, there are outcomes L, M and N such that you’d want to switch from L to M and then from M to N, but you would not want to switch from L to N.
Axiom 3 is unimportant and is just there to simplify the math.
For axiom 4, imagine a situation where a statement with unknown truth-value, X, determines whether you get to choose between two outcomes, L and M, with L < M, or have no choice in accepting a third outcome, N. If you violate the axiom, there is a situation like this where, if you were asked for your choice before you know X (it will be ignored if X is false), you would pick L, even though L < M.
Do any of these situations describe your preferences?
I’ll let Desrtopa handle this.
Can you give a concrete example. What happens to me? Is it that I get an outcome which is less ideal than was available?
If your decision process is not equivalent to one that uses the previously described procedure, there are situations where something like one of the following will happen.
I ask you if you want chocolate or vanilla ice cream and you don’t decide. Not just you don’t care which one you get or you would prefer not to have ice cream, but you don’t output anything and see nothing wrong with that.
You prefer chocolate to vanilla ice cream, so you would willingly pay 1c to have the vanilla ice cream that you have been promised upgraded to chocolate. You also happen to prefer strawberry to chocolate, so you are willing to pay 1c to exchange a promise of a chocolate ice cream for a promise of a strawberry ice cream. Furthermore, it turn out you prefer vanilla to strawberry, so whenever you are offered a strawberry ice cream, you gladly pay a single cent to change that to an offer of vanilla, ad infinitum.
N/A
You like chocolate ice cream more than vanilla ice cream. Nobody knows if you’ll get ice cream today, but you are asked for your choice just in case, so you pick vanilla.
Let’s consider (2). Suppose someone was in the process of getting Dutch Booked like this. It would not go on ad infinitum. They would quickly learn better. Right? So even if this happened, I think it would not be a big deal.
Let’s say they did learn better. How would they do this—changing their utility function? Someone with a utility function like this really does prefer B+1c to A, C+1c to B, and A+1c to C. Even if they did change their utility function, the new one would either have a new hole or it would obey the results of the VNM-theorem.
So Bayes teaches: do not disobey the laws of logic and math.
Still wondering where the assigning probabilities to truths of theories is.
OK. So what? There’s more to life than that. That’s so terribly narrow. I mean, that part of what you’re saying is right as far as it goes, but it doesn’t go all that far. And when you start trying to apply it to harder cases—what happens? Do you have some Bayesian argument about who to vote for for president? Which convinced millions of people? Or should have convinced them, and really answers the questions much better than other arguments?
Well the Dutch books make it so you have to pick some probabilities. Actually getting the right prior is incomplete, though Solomonoff induction is most of the way there.
Where else are you hoping to go?
In principle, yes. There’s actually a computer program called AIXItl that does it. In practice I use approximations to it. It probably could be done to a very higher degree of certainty. There are a lot of issues and a lot of relevant data.
Can you give an example? Use the ice cream flavors. What probabilities do you have to pick to buy ice cream without being dutch booked?
Explanatory knowledge. Understanding the world. Philosophical knowledge. Moral knowledge. Non-scientific, non-emprical knowledge. Beyond prediction and observation.
How do you know if your approximations are OK to make or ruin things? How do you work out what kinds of approximations are and aren’t safe to make?
The way I would do that is by understanding the explanation of why something is supposed to work. In that way, I can evaluate proposed changes to see whether they mess up the main point or not.
Endo, I think you are making things more confusing by combining issues of Bayesianism with issues of utility. It might help to keep them more separate or to be clear when one is talking about one, the other, or some hybrid.
I use the term Bayesianism to include utility because (a) they are connected and (b) a philosophy of probabilities as abstract mathematical constructs with no applications doesn’t seem complete; it needs an explanation of why those specific objects are studied. How do you think that any of this caused or could cause confusion?
Well, it empirically seems to be causing confusion. See curi’s remarks about the ice cream example. Also, one doesn’t need Bayesianism to include utility and that isn’t standard (although it is true that they do go very well together).
Yes I see what you mean.
I think it goes a bit beyond this. Utility considerations motivate the choice of definitions. I acknowledge that they are distinct things, though.
The consequences could easily be thousands of lives or more in case of sufficiently important decisions.
So the argument is now not that that suboptimal issues don’t exist but that they aren’t a big deal? Are you aware that the primary reason that this involves small amounts of ice cream is for convenience of the example? There’s no reason these couldn’t happen with far more serious issues (such as what medicine to use).
I know. I thought it was strange that you said “ad infinitum” when it would not go on forever. And that you presented this as dire but made your example non-dire.
But OK. You say we must consider probabilities, or this will happen. Well, suppose that if I do something it will happen. I could notice that, criticize it, and thus avoid it.
How can I notice? I imagine you will say that involves probabilities. But in your ice cream example I don’t see the probabilities. It’s just preferences for different ice creams, and an explanation of how you get a loop.
And what I definitely don’t see is probabilities that various theories are true (as opposed to probabilities about events which are ok).
I didn’t say that (I’m not endoself).
Yes, but the Bayesian avoids having this step. For any step you can construct a “criticism” that will duplicate what the Bayesian will do. This is connected to a number of issues, including the fact that what constitutes valid criticism in a Popperian framework is far from clear.
Ice cream is an analogy. It might not be a great one since it is connected to preferences (which sometimes gets confused with Bayesianism). The analogy isn’t a great one. It might make more sense to just go read Cox’s theorem and translate to yourself what the assumptions mean about an approach.
OK, my bad. So many people. I lose track.
Anything which is not itself criticized.
Could you pick any real world example you like, where the probabilities needed to avoid dutch book aren’t obvious, and point them out? To help concretize the idea for me.
Well, I’m not sure, in that I’m not convinced that Dutch Booking really does occur much in real life other than in the obvious contexts. But there are a lot of contexts it does occur in. For example, a fair number of complicated stock maneuvers can be thought of essentially as attempts to dutch book other players in the stock market.
Koth already had an amusing response to that.
Someone here told me it does. Maybe you can go argue with him for me ;-)
I agree.
Consequentialism is not in the index.
Decision rule is, a little bit.
I don’t think this book contains a proof mentioning consequentialism. Do you disagree? Give a page or section?
It looks like what they are doing is defining a decision rule in a special way. So, by definition, it has to be a mathematical thing to do with probability. Then after that, I’m sure it’s rather easy to prove that you should use bayes’ theorem rather than some other math.
But none of that is about decisions rules in the sense of methods human beings use for making decisions. It’s just if you define them in a particular way—so that Bayes’ is basically the only option—then you can prove it.
see e.g. page 19 where they give a definition. A Popperian approach to making decisions simply wouldn’t fit within the scope of their definition, so the conclusion of any proof like you claimed existed (which i haven’t found in this book) would not apply to Popperian ideas.
Maybe there is a lesson here about believing stuff is proven when you haven’t seen the proof, listening to hearsay about what books contain, and trying to apply proofs you aren’t familiar with (they often have limits on scope).
In what way would the Popperian approach fail to fit the decision rule approach on page 19 of Bickel and Doksum?
It says a decision rule (their term) is a function of the sample space, mapping something like complete sets of possible data to things people do. (I think it needs to be complete sets of all your data to be applied to real world human decision making. They don’t explain what they are talking about in the type of way I think is good and clear. I think that’s due to having in mind different problems they are trying to solve than I have. We have different goals without even very much overlap. They both involve “decisions” but we mean different things by the word.)
In real life, people use many different decision rules (my term, not theirs). And people deal with clashes between them.
You may claim that my multiple decision rules can be combined into one mathematical function. That is so. But the result isn’t a smooth function so when they start talking about estimation they have big problems! And this is the kind of thing I would expect to get acknowledgement and discussion if they were trying to talk about how humans make decisions, in practice, rather than just trying to define some terms (chosen to sound like they have something to do with what humans do) and then proceed with math.
e.g. they try to talk about estimating amount of error. if you know error bars on your data, and you have a smooth function, you’re maybe kind of OK with imperfect data. but if your function has a great many jumps in it, what are you to do? what if, within the margin for error on something, there’s several discontinuities? i think they are conceiving of the decision rule function as being smooth and not thinking about what happens when it’s very messy. Maybe they specified some assumptions so that it has to be which I missed, but anyway human beings have tons of contradictory and not-yet-integrated ideas in their head—mistakes and separate topics they haven’t connected yet, and more—and so it’s not smooth.
On a similar note they talk about the median and mean which also don’t mean much when it’s not smooth. Who cares what the mean is over an infinitely large sample space where you get all sorts of unrepresentative results in large unrealistic portions of it? So again I think they are looking at the issues differently than me. They expect things like mathematically friendly distributions (for which means and medians are useful); I don’t.
Moving on to a different issue, they conceive of a decision rule which takes input and then gives output. I do not conceive of people starting with the input and then deciding the output. I think decision making is more complicated. While thinking about the input, people create more input—their thoughts. The input is constantly being changed during the decision process, it’s not a fixed quantity to have a function of. Also being changed during any significant decision is the decision rule itself—it too isn’t a static function even for purposes of doing one decision (at least in the normal sense. maybe they would want to call every step in the process a decision. so when you’re deciding a flavor of ice cream that might involve 50 decisions, with updates to the decisions rules and inputs in between them. if they want to do something like that they do not explain how it works.)
They conceive of the input to decisions as “data”. But I conceive of much thinking as not using much empirical data, if any. I would pick a term that emphasizes it. The input to all decision making is really ideas, some of which are about empirical data and some of which aren’t. Data is a special case, not the right term for the general case. From this I take that they are empiricists. You can find a refutation of empiricism in The Beginning of Infinity by David Deutsch but anyway it’s a difference between us.
A Popperian approach to decision making would focus more on philosophical problems, and their solutions. It would say things like: consider what problem you’re trying to solve, and consider what actions may solve it. And criticize your ideas to eliminate errors. And … well no short summary does it justice. I’ve tried a few times here. But Popperian ways of thinking are not intuitive to people with the justificationist biases dominant in our culture. Maybe if you like everything I said I’ll try to explain more, but in that case I don’t know why you wouldn’t read some books which are more polished than what I would type in. If you have a specific, narrow question I can see that answering that would make sense.
Thank you for that detailed reply. I just have a few comments:
“data” could be any observable property of the world
in statistical decision theory, the details of the decision process that implements the mapping aren’t the focus because we’re going to try to go straight to the optimal mapping in a mathematical fashion
there’s no requirement that the decision function be smooth—it’s just useful to look at such functions first for pedagogical reasons. All of the math continues to work in the presence of discontinuities.
a weak point of statistical decision theory is that it treats the set of actions as a given; human strategic brilliance often finds expression through the realization that a particular action is possible
Yes but using it to refer to a person’s ideas, without clarification, would be bizarre and many readers wouldn’t catch on.
Straight to the final, perfect truth? lol… That’s extremely unPopperian. We don’t expect progress to just end like that. We don’t expect you get so far and then there’s nothing further. We don’t think the scope for reason is so bounded, nor do we think fallibility is so easily defeated.
In practice searches for optimal things of this kind always involve many premises with have substantial philosophical meaning. (Which is often, IMO, wrong.)
Does it use an infinite set of all possible actions? I would have thought it wouldn’t rely on knowing what each action actually is, but would just broadly specify the set of all actions and move on.
@smooth: what good is a mean or median with no smoothness? And for margins of error, with a non-smooth function, what do you do?
With a smooth region of a function, taking the midpoint of the margin of error region is reasonable enough. But when there is a discontinuity, there’s no way to average it and get a good result. Mixing different ideas is a hard process if you want anything useful to result. If you just do it in a simple way like averaging you end up with a result that none of the ideas think will work and shouldn’t be surprised when it doesn’t. It’s kind of like how if you have half an army do one general’s plan, and half do another, the result is worse than doing either one.
Do you think of arguments and explanations as types of evidence? If so, how does that work? If not then I wasn’t talking about evidence.
In Bayesian epistemology, most arguments and explanations are just applications of Bayes’ law as discussed at http://lesswrong.com/lw/o7/searching_for_bayesstructure/ . Of course, ‘taking evidence into account’ is the same as using it in Bayes’ law.
Can you give an example using a moral argument, or anything that would help illustrate how you take things that don’t look like they are Bayes’ law cases and apply it anyway?
The linked page says imperfectly efficient minds give off heat and that this is probabilistic (which is weird b/c the laws of physics govern it and they are not probabilistic but deterministic). Even if I accept this, I don’t quite see the relevance. Are you reductionists? I don’t think that the underlying physical processes tell us everything interesting about the epistemology.
It’s called Solomonoff induction—and we’ve known about it for almost 50 years.
Provide the details which address the problem, not a wikipedia link.
It is not my job to teach you maths. Here, use Google.
I know math. The problem is that you haven’t provided anything that works, or any criticism of Popper. Basically all your contributions to the discussion are appeals to authority. You don’t argue, you just say “This source is right; read it and concede”. And most of your sources are wikipedia quality… If you won’t say anything I can’t find on google, why talk at all?
Because one doesn’t generally know where to look?
There are plenty of explanations of Solomonoff induction out there. You asked for how the math of confirmation works—and that’s the math of universal inductive inference. If you just want an instance of confirmation, see Bayes’s theorem.
It is not an “appeal to authority” to direct you to the maths that answers your query!
If Solomonoff Induction does not discard theories inconsistent with the data, then this is wrong:
http://wiki.lesswrong.com/wiki/Solomonoff_induction
Whether it does or does not isn’t important to the main argument here.
If consistent data makes a theory more probable, I might have expected a theory that has survived (non-empirical) criticism to become more probable. Because you are an empiricist, you relegate criticism to a minor role when in fact criticism is a major driving force in science. Most theories don’t get tested empirically, they are refuted by criticism alone. Critical rationalism knows this.
Is that it? And how is the algorithm supposed to work anyway? If the theory is non-empirical, it can’t be a compression of an empirical dataset.
Checking with the definition that apparently boils down to whether I think there is much innate knowledge. Humans have some innate knowledge, so I figure: probably not an empiricist.
I have no particular beef with criticism. Solomonoff induction is not given as a model of how humans actually do science. It is given as a formalisation of the maths of induction.
Theories are constructed from datasets. Solomonoff induction is an abstract model of sequence prediction. Given a serial stream of sense data, it maintains models of it, and uses those models to predict future observations. The models embody theories about what it being observed—and smaller models are preferred.
Solomonoff Induction is empiricist because it assumes all knowledge comes from the data. Theories arising from Solomonoff Induction are, at most, only as reliable as the data and it can’t come up with theories that make more precise predictions than the data or that contain more knowledge than the data. This is complicated by the fact that in real life applications it will have to deal with noise in the data and this is going to get deeply subjective very quickly.
Another problem is: how is the dataset itself constructed? You don’t just go out and collect data; you need to know what you are looking for. Among the infinite number of things you can observe, you need to know what is important and to know this you need a theory. Where does this theory come from? It arises as a conjectural explanation to a problem-situation and specific predictions arising from the explanation guide your observations. So Solomonoff Induction has things backward.
Solomonoff Induction is just about prediction. It models a forecasting agent that observes a stream, and emits probabilities of the next symbol. It doesn’t do anything else. Complaining that it can’t create its own experiments seems rather futile. Of course it can’t—it is a forecaster. Real agents do more than just forecast, of course, but that isn’t a criticism for forecasting, or the idea of a forecaster.