If bayes wants to be an epistemology then it must do more than predict. Same for Newton.
If you want to have math which doesn’t dethrone Popper, but is orthogonal, you’re welcome to do that and i’d stop complaining (much). However Yudkowsky says Bayesian Epistemology dethrones and replaces Popper. He regards it as a rival theory to Popper’s. Do you think Yudkowsky was wrong about that?
Yudkowsky says Bayesian Epistemology dethrones and replaces Popper. He regards it as a rival theory to Popper’s. Do you think Yudkowsky was wrong about that?
It replaces Popperian epistemology where their domains overlap—namely: building models from observations and using them to predict the future. It won’t alone tell you what experiments to perform in order to gather more data—there are other puzzle pieces for dealing with that.
There’s no overlap there b/c Popperian epistemology doesn’t provide the specific details of how to do that. Popperian epistemology is fully compatible with, and can use, Bayes’ theorem and any other pure math or logic insights.
Popperian epistemology contradicts your “other puzzle pieces”. And without them, Bayes’ theorem alone isn’t epistemology.
building models from observations and using them to predict the future
I thought you were referring to things you can do with Bayes’ theorem and some input. If you meant something more, provide the details of what you are proposing.
The most common point of Popper’s philosophy that I hear (including from my Popperian philosophy teacher) is the whole “black swan white swan” thing, which Bayes does directly contradict, and dethrone (though personally I’m not a big fan of that terminology).
The stuff you talked about with conjectures and criticisms does not directly contradict Bayes and if the serious problems with ‘one strike and you’re out’ criticisms are fixed it I may be persuaded to accept both it and Bayes.
Bayes is not meant to be an epistemology all on its own. It only starts becoming one when you put it together with Solomonoff Induction, Expected Utility Theory, Cognitive Science and probably a few other pieces of the puzzle that haven’t been found yet. I presume the reason it is referred to as Bayesian rather than Solomonoffian or anything else is that Bayes is the both most frequently used and the oldest part.
The black swan thing is not that important to Popper’s ideas, it is merely a criticism of some of Popper’s opponents.
How does Bayes dethrone it? By asserting that white swans support “all swans are white”? I’ve addressed that at length (still going through overnight replies, if someone answered my points i’ll try to find it).
Well I don’t have a problem with Bayes’ theorem itself, of course (pretty much no one does, right? i hope not lol). It’s these surrounding ideas that make an epistemology that I think are mistaken, and all of which Popper’s epistemology contradicts. (I mean the take on cognitive science popular here, not the idea of doing cognitive science).
(still going through overnight replies, if someone answered my points i’ll try to find it)
I think I answered your points a few days ago with my first comment of this discussion.
In short, yes, there are infinitely many hypotheses whose probabilities are raised by the white swan, and yes those include both “all swans are white” and “all swans are black and I am hallucinating” but the former has a higher prior, at least for me, so it remains more probable by several orders of magnitude. For evidence to support X it doesn’t have to only support X. All that is required is that X does better at predicting than the weighted average of all alternatives.
I have had people tell me that “all swans are black, but tomorrow you will hallucinated 10 white swans” is supported less by seeing 10 white swans tomorrow than “all swans are white” is, even though they made identical predictions (and asserted them with 100% probability, and would both have been definitely refuted by anything else).
Just to be clear I am happy to say those people were completely wrong. It would be nice if nobody ever invented a poor argument to defend a good conclusion but sadly we do not live in that world.
I think I answered your points a few days ago with my first comment of this discussion.
But then I answered your answer, right? If I missed one that isn’t pretty new, let me know.
but the former has a higher prior
so support is vacuous and priors do all the real work. right?
and priors have their own problems (why that prior?).
Just to be clear I am happy to say those people were completely wrong. It would be nice if nobody ever invented a poor argument to defend a good conclusion but sadly we do not live in that world.
OK. I think your conception of support is unsubstantive but not technically wrong.
so support is vacuous and priors do all the real work. right?
No. Bayesian updating is doing the job of distinguishing “all swans are white” from “all swans are black” and “all swans are green” and “swans come in a equal mixture of different colours”. It is only a minority of hypothesis which are specifically crafted to give the same predictions as “all swans are white” where posterior probabilities remain equal to priors.
What is it with you! I admit that priors are useful in one situation and you conclude that everything else is useless!
Also, the problem of priors is overstated. Given any prior at all, the probability of eventually converging to the correct hypothesis, or at any rate a hypothesis which gives exactly the same predictions as the correct one, is 1.
Bayes cannot distinguish between two theories that assign exactly the same probabilities to everything, but I don’t see how you could distinguish them, without just making sh*t up, and it doesn’t matter much anyway since all my decisions will be correct whichever is true.
Bayesian updating is doing the job of distinguishing “all swans are white” from “all swans are black”
But that is pretty simple logic. Bayes’ not needed.
@priors—are you saying you use self-modifying priors?
Bayes cannot distinguish between two theories that assign exactly the same probabilities to everything
That makes it highly incomplete, in my view. e.g. it makes it unable to address philosophy at all.
but I don’t see how you could distinguish them
By considering their explanations. The predictions of a theory are not its entire content.
without just making sh*t up
that’s one of the major problems popper addressing (reconciling fallibilism and non-justification with objective knowledge and truth)
and it doesn’t matter much anyway since all my decisions will be correct whichever is true.
It does matter, given that you aren’t perfect. How badly things start breaking when mistakes are made depends on issues other than what theories predict—it depends on their explanations, internal structure, etc...
It does matter, given that you aren’t perfect. How badly things start breaking when mistakes are made depends on issues other than what theories predict—it depends on their explanations, internal structure, etc...
No, I’m pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.
By considering their explanations. The predictions of a theory are not its entire content.
One could say that this is how to work out priors. You are aware that the priors aren’t necessarily set in stone at the beginning of time? Jaynes pointed out that a prior should always include all the information you have that is not explicitly part of the data (and even the distinction between prior and data is just a convention), and may well be based on insights or evidence encountered at any time, even after the data was collected.
Solomonoff Induction is precisely designed to consider explanations. The difference is it does so in a rigorous mathematical fashion rather than with a wishy-washy word salad.
That makes it highly incomplete, in my view. e.g. it makes it unable to address philosophy at all.
It was designed to address science, which is a more important job anyway.
However, in my experience, the majority of philosophical questions are empirically addressable, at least in principle, and the majority of the rest are wrong questions.
No, I’m pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.
No!
OK would you agree that this is an important point of disagreement, and an interesting discussion topic, to focus on? Do you want to know why not?
Do you have any particular argument that I can’t be right about this? Or are you just making a wild guess? Are you open to being wrong about this? Would you be impressed if there was a theory which explained this issue? Intrigued to learn more about the philosophy from which I learned this concept?
Lets look at a Bayesian decision process. First you consider all possible actions that you could take, this is unaffected by the difference between A and B. For each of them you use your probabilities to get a distribution across all possible outcomes, these will be identical.
You assign a numerical utility to each outcome based on how much you would value that outcome. If you want, I can give a method for generating these numerical utilities. These will be a mixture of terminal and instrumental values. Terminal values are independent of beliefs, so these are identical. Instrumental values depend on beliefs, but only via predictions about what an outcome will lead to in the long run, so these are identical.
For each action you take an average of the values of all outcomes weighted by probability, and pick action with the highest result. This will be the same with theory A or theory B. So I do the same thing either way, and the same thing happens to me either way. Why do I care which is true?
So, if you were wrong, you’d be really impressed, and want to rethink your worldview?
Or did you mean you’re not interested?
None of the rest of what you say is relevant at all. Remember that you said, “No, I’m pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.” It wasn’t specified that they were Bayesian decision theories.
And the context was how well or badly it goes for you when we introduce mistakes into the picture (e.g. the issue is you, being fallible, make some mistakes. How resilient is your life to them?).
Do you now understand the issue I’m talking about?
Notice how the description I gave made absolutely no reference to whether or not the theories are correct. The argument applies equally well regardless of any correspondence to reality or lack thereof. Nothing changes when we introduce mistakes to the picture because they are already in the picture.
The only kind of mistakes that can hurt me are the ones that affect my decisions, and the only ones that can do that are the ones that affect my predictions. The point remains, if the predictions are the same, my actions are the same, the outcome is the same.
You’re still mistaken and have overlooked several things.
And you have ignored my questions.
In Popperian epistemology, we do not say things like
I cannot see how I could be wrong.
No, I’m pretty sure that [my position is true]
They are anti-fallibilist, closed minded, and silly. We don’t think our lack of imagination of how we could possibly be wrong is an argument that we are right.
I want you to pin yourself down a little bit. What will you concede if you find out you are wrong about this? Will you concede a lot or almost nothing? Will you regard it as important and be glad, or will you be annoyed and bored? Will you learn much? Will your faith in Bayes be shaken? What do you think is at stake here?
And are you even interested? You have expressed no interest in why I think you’re mistaken, you just keep saying how I can’t possibly have a point (even though you don’t yet know what it is).
I want you to pin yourself down a little bit. What will you concede if you find out you are wrong about this? Will you concede a lot or almost nothing? Will you regard it as important and be glad, or will you be annoyed and bored? Will you learn much? Will your faith in Bayes be shaken? What do you think is at stake here?
It annoys me a lot when people do this, because I can be wrong in many different ways. If I give a maths proof, then say I cannot see how it could be wrong, someone else might come up and ask me if I will give up my trust (trust, not faith, is what I have in Bayes by the way) in maths. When they reveal why I am wrong, it turns out I just made a mathematical error, I have learnt that I need to be more careful, not that maths is wrong.
I am confident enough in that statement that I would be interested to find out why you think it is wrong.
If the way in which you prove me wrong turns out to be interesting and important, rather than a technical detail or a single place where I said something I didn’t mean, then it will likely cause a significant change in my world view. I will not just immediately switch to Popper, there are more than two alternatives after all, and I may well not give up on Bayes. This isn’t a central tenet of Bayesian decision theory, (although it is a central tenet of instrumental rationality), so it won’t refute the whole theory.
My most likely response, if you really can show that more than prediction is required, is to acknowledge that at least one component of the complete Bayesian epistemology is still missing. It would not surprise me, although it would surprise me to find that this specific thing was what was missing.
No, I’m pretty sure that [my position is true]
I’m not asserting that I could not possibly be wrong, that P(I am wrong) = 0. All I am saying is that I feel pretty sure about this, which I do.
Since you refuse to state your point I’m going to guess what it is.
My guess is that you are referring to the point you made earlier about how the difference between “the seasons are caused by the earth tilting on its axis” and “the seasons are caused by the Goddess Demeter being upset about Persephone being in Hades” is that the former has a good explanation and the latter has a bad explanation. Is your point that if I don’t care about explanations I have no means of distinguishing between them.
I do not find this convincing, I do not currently have time to explain why but I can do so later if you want.
if you were going to buy a black box which does multiplication, do you think all black boxes you could buy—which you thoroughly test and find give perfect outputs for all inputs—are equally good?
disregard time taken to get an answer. and they only multiply numbers up to an absolute value of a trillion, say.
if you were going to buy a black box which does multiplication, do you think all black boxes you could buy—which you thoroughly test and find give perfect outputs for all inputs—are equally good?
disregard time taken to get an answer. and they only multiply numbers up to an absolute value of a trillion, say.
If I disregard time taken then yes, they are all equally good (assuming we don’t add in other confounding factors like if one works by torturing puppies and the other doesn’t).
But one might work, internally, by torturing puppies.
One might work, internally, in a way that will break sooner.
One might work, internally, in a way that is harder to repair if it does break.
One might work, internally, in a way that is harder or easier to convert to perform some other function.
So the internal structure of knowledge, which makes identical predictions, does matter.
All good programmers know this in the form of: some coding styles, which achieve the same output for the users, have different maintenance costs.
This is an important fact about epistemology, that the internal structure of knowledge matters, not just its results.
edit: Relating this to earlier conversation, one might work, internally, in a way so that if an error does happen (maybe they have error rates of 1 time in 500 trillion. or maybe something partially breaks after you buy it and use it a while), then the result you get is likely to be off by a small amount. Another might work internally in a way that if something goes wrong you may get random output.
I just said I was assuming away confounding factors like that for the sake of argument.
One might work, internally, in a way that will break sooner.
One might work, internally, in a way that is harder to repair if it does break.
Ideas do not ‘break’. They are either correspond to reality or they do not, this is a timeless fact about them. They do not suddenly switch from corresponding to reality to not doing so.
If by break you mean ‘fail in some particular scenario that has not yet been considered’ then the only way one can fail and the other not is if they generate different predictions in that scenario.
One might work, internally, in a way that is harder or easier to convert to perform some other function.
The only other function would be to predict in a different domain, which would mean that if they still make the same predictions they are equally good.
I just said I was assuming away confounding factors like that for the sake of argument.
You meant you were intentionally ignoring all factors that could possibly make them different? That the puppy example had to do with immorality was merely a coincidence, and not representative of the class of factors you wanted to ignore? That completely defeats the point of the question to say “I don’t care which I buy, ignoring all factors that might make me care, except the factors that, by premise, are the same for both”.
Ideas do not ‘break’.
Not everything I said directly maps back to the original context. Some does. Some doesn’t.
Ideas stored in brains can randomly break. Brains are physical devices with error rates.
Regardless, ideas need to be changed. How easily they change, and into what, depends on their structure.
The only other function would be to predict in a different domain, which would mean that if they still make the same predictions they are equally good.
Even if I granted that (I don’t), one black box could have a list of predictions. Another one could predict using an algorithm which generates the same results as the list.
When you go to change them to another domain, you’ll find it matters which you have. For example, you might prefer the list, since a list based approach will still work in other domains, while the algorithm might not work at all. Or you might prefer the algorithm since it helps you find a similar algorithm for the new domain, and hate the list approach b/c the new domain is trillions of times bigger and the list gets unwieldy.
Ideas stored in brains can randomly break. Brains are physical devices with error rates.
Brains are fairly resilient to the failure of a single component.
Regardless, ideas need to be changed. How easily they change, and into what, depends on their structure.
Part of the problem here is that I’m not clear what you mean by ideas. Are you using Hume’s definition, in which an idea is like a mental photocopy of a sensory impression? Or do you mean something more like a theory?
If you mean the former then ideas are just the building blocks of predictions and plans. They’re content matters, because their content effects the predictions I make.
If you mean something like a theory, then let me explain what I think of theories as. A theory is like a compressed description of a long list of probabilities. For example, Newton’s laws could be ‘unfolded’ into a very long, possibly infinite, list of predictions for what objects will do in a wide range of situations, but it is quicker to give them in their compressed form, as general principles of motion.
A theory can be seen as a mathematical function, from a subset of the set of all possible strings of future events to the real numbers between 0 and 1. Ultimately, a functions identity rests in its output and nothing else. x+x and 2x may look different, but to say they are different functions is absurd. If you give me a functional equation to solve, and I show that 2x is the answer, you cannot criticise me for failing to distinguish between the the two possibilities of 2x and x+x, because they are not actually two different possibilities.
The only way you could compare 2x and x+x is to note that the former is slightly quicker to write, so when we have two theories giving identical predictions we pick the one that is shorter, or from which it easier to generate the predictions (analogous to picking the faster box, or the box which is more convenient to carry around with you).
A theory is like a compressed description of a long list of probabilities
What do you call explanatory theories?
Even with your definition of theories, my example about changing a theory to apply to a new domain is correct, is it not? e.g. which compression algorithm is used matters, even though it doesn’t affect the predictions being made.
Or similarly: compress data, flip one bit, uncompress. The (average) result depends on the compression algorithm used. And since our brains are fallible then this can happen to people. How often? Well… most people experience lots of errors while remembering stuff.
This ‘explanation’ does not explain anything. It gives a one sentence definition ‘an idea is the smallest unit of coherent thought’ and the rest is irrelevant to the definition. That one sentence just pushes back the question into “what counts as coherent thought”.
The definition this looks most like it Hume’s, is it that?
Even with your definition of theories, my example about changing a theory to apply to a new domain is correct, is it not? e.g. which compression algorithm is used matters, even though it doesn’t affect the predictions being made.
No, you do not change a theory to apply to a new domain. You invent a new theory.
Or similarly: compress data, flip one bit, uncompress. The (average) result depends on the compression algorithm used. And since our brains are fallible then this can happen to people. How often? Well… most people experience lots of errors while remembering stuff.
Fine, you also try to pick the explanation which you find easiest to remember, which is pretty much equivalent to the shortest. Or you write stuff down.
Some new theories are modifications of old theories, right? e.g. in physics, QM wasn’t invented from scratch.
So the “structure” (my word) of the old theories matters.
Fine, you also try to pick the explanation which you find easiest to remember, which is pretty much equivalent to the shortest. Or you write stuff down.
I think you’re missing the point which is that some compression algorithms (among other things) are more error resistance. This is an example of how the internal structure of knowledge making identical predictions can differ and how the differences can have real world consequences.
Which is just the thing we were debating, and which you previously couldn’t conceive of how you could be mistaken about.
Some new theories are modifications of old theories, right? e.g. in physics, QM wasn’t invented from scratch.
Some theories are modifications of theories from other domains. It is quite rare for such a theory to work. As for QM, what happened there is quite a common story in science.
A theory is designed which makes correct predictions within the range of things we can test, but as our range of testable predictions expands we find it doesn’t predict as well in other domains. It gets abandoned in favour of something better (strictly speaking it never gets fully abandoned, its probability just shrinks to well below the threshold for human consideration). That better theory inevitably ‘steals’ some of the old theory’s predictions, since the old theory was making correct predictions in some domains the new theory must steal those otherwise it will quickly end up falsified itself.
This doesn’t mean theories should be designed to do anything other than predict well, the only thing a theory an hope to offer any successor is predictions, so the focus should remain on those.
I think you’re missing the point which is that some compression algorithms (among other things) are more error resistance. This is an example of how the internal structure of knowledge making identical predictions can differ and how the differences can have real world consequences.
Which is just the thing we were debating, and which you previously couldn’t conceive of how you could be mistaken about.
As someone with quite a bit of training in maths and no training in computing I have a bit of a cognitive blind-spot for efficiency concerns with algorithms. However, if we look at the real point of this digression, which was whether it is a fault of Bayes that it does not do more than predict, then I think you don’t have a point.
When do these efficiency concerns matter, are you saying there should be some sort of trade-off between predictive accuracy and efficiency? That if we have a theory that generate slightly worse predictions but does so much more efficiently then we should adopt it?
I can’t agree to that. It may sometimes be useful to use the second theory as a useful approximation to the first, but we should always keep one eye on the truth, you never know when you will need to be as accurate as possible (Newtonian Mechanics and Relativity provide a good example of this dynamic).
If you are saying efficiency concerns only matter with theories that have exactly the same predictive accuracy, then all your talk of structure and content can be reduced to a bunch of empirical questions “which of these is shorter” “which of these is most quick to use” “which of these is least likely to create disastrous predictions if we make an error”. Every one of these has its own Bayesian answer, so the process solves itself.
As for QM, what happened there is quite a common story in science.
Right. It’s common that old theories get changed. That there is piecemeal improvement of existing knowledge, rather than outright replacement.
How well ideas are able to be improved in this way—how suited for it they are—is an attribute of ideas other than what predictions they make. It depends on what I would call their internal structure.
This doesn’t mean theories should be designed to do anything other than predict well
So that’s (above) why it does mean it. Note that good structure doesn’t come at the cost of prediction. It’s not a trade off; they aren’t incompatible. You can have both.
As someone with quite a bit of training in maths and no training in computing I have a bit of a cognitive blind-spot for efficiency concerns with algorithms.
Efficiency of algorithms (space and speed both) is a non-predictive issue, and it’s important, but it’s not what I was talking about at all. It’s ground that others have covered plenty. That’s why I specifically disqualified that concern in my hypothetical.
When do these efficiency concerns matter, are you saying there should be some sort of trade-off between predictive accuracy and efficiency?
No trade off, and I wasn’t talking about efficiency. My concern is how good ideas are at facilitating their own change, or not.
There’s some good examples of this from programming. For example, you can design a program which is more modular: it has separated, isolated parts. Then later the parts can be individually reused. This is good! There’s no tradeoffs involved here with what output you get from the program—this issue is orthogonal to output.
If the parts of the program are all messily tied together, then you remove one and everything breaks. (That means: if you introduce some random error, you end up with unpredictable, bad results.) If they are decoupled, and each system has fault tolerances and error checking on everything it’s told by the other systems, then errors, or wanted to replace some part of it, can have much more limited affects on the rest of the system.
That if we have a theory that generate slightly worse predictions but does so much more efficiently then we should adopt it?
Right. It’s common that old theories get changed. That there is piecemeal improvement of existing knowledge, rather than outright replacement.
It is my claim that this is not a process of change, it is a total replacement. It will happen that if the old theory had any merit at all then some of its predictions will be identical to those made by the new one.
There’s some good examples of this from programming. For example, you can design a program which is more modular: it has separated, isolated parts. Then later the parts can be individually reused. This is good! There’s no tradeoffs involved here with what output you get from the program—this issue is orthogonal to output.
If my claim is correct then all theories are perfectly modular. A replacement thoery may have an entirely different explanation, but it can freely take any subset of the old theory’s predictions.
Efficiency of algorithms (space and speed both) is a non-predictive issue, and it’s important, but it’s not what I was talking about at all. It’s ground that others have covered plenty. That’s why I specifically disqualified that concern in my hypothetical.
I was using efficiency as a quick synonym for all such concerns, to save time and space.
No, nothing like that at all.
Fine, if we have a new theory that generates very slightly worse predictions, but is less modular, would you advocate replacement.
If my claim is correct then all theories are perfectly modular.
Do you think all computer programs are perfectly modular? What is it that programmers are learning when they read books on modular design and change their coding style?
I was using efficiency as a quick synonym for all such concerns, to save time and space.
Well OK. I didn’t know that. I don’t think you should though b/c efficiency is already the name of a small number of well known concerns, and the ones I’m talking about are separate.
Fine, if we have a new theory that generates very slightly worse predictions, but is less modular, would you advocate replacement.
I think this question is misleading. The answer is something like: keep both, and use them for different purposes. If you want to make predictions, use the theory currently best at that. If you want to do research, the more modular one might be a more promising starting point and might surpass the other in the future.
It is my claim that this is not a process of change, it is a total replacement. It will happen that if the old theory had any merit at all then some of its predictions will be identical to those made by the new one.
In physics, QM retains many ideas from classical physics. Put it this way: a person trained in classical physics doesn’t have to start over to learn QM. There are lots of ways they are related and his pre-existing knowledge remains useful. This is why even today classical physics is still taught (usually first) at universities. It hasn’t been totally replaced. It’s not just that some of its predictions are retained (they are only retained as quick approximations in limited sets of circumstances, btw. by and large, technically speaking, they are false. and taught anyway), it’s that ways of thinking about physics, and approaching physics problems, are retained. people need knowledge other than predictions such as how to think like physicist. while classical physics predictions were largely not retained, some other aspects largely were retained.
BTW speaking of physics, are you aware of the debate about the many worlds interpretation, the bohmian interpretation, the shut up and calculate interpretation, the copenhagan interpretation, and so on? None of these debates about about prediction. They are all about the explanatory interpretation (or lack there of, for the shut up and calculate school), and the debates are between people who agree on the math and predictions.
Do you think all computer programs are perfectly modular? What is it that programmers are learning when they read books on modular design and change their coding style?
They are learning to write computer programs, which are not necessarily perfectly modular. Computer programs and scientific theories are different things and work in different ways.
I think this question is misleading. The answer is something like: keep both, and use them for different purposes. If you want to make predictions, use the theory currently best at that. If you want to do research, the more modular one might be a more promising starting point and might surpass the other in the future.
Are you imagining a scenario something like Ptolemy’s astronomy versus Copernican astronomy during the days when the latter still assumed the planets moved in perfect circles?
In that case I can sort of see your point, the latter was a more promising direction for future research while the former generated better predictions. The former deserved a massive complexity penalty of course, but it may still have come out on top in the Bayesian calculation.
Sadly, there will occasionally be times when you do everything right but still get the wrong answer. Still, there is a Bayesian way to deal with this sort of thing without giving up the focus on predictions. Yudkowsky’s Technical Explanation of Technical Explanation goes into it.
In physics, QM retains many ideas from classical physics. Put it this way: a person trained in classical physics doesn’t have to start over to learn QM. There are lots of ways they are related and his pre-existing knowledge remains useful. This is why even today classical physics is still taught (usually first) at universities. It hasn’t been totally replaced. It’s not just that some of its predictions are retained (they are only retained as quick approximations in limited sets of circumstances, btw. by and large, technically speaking, they are false. and taught anyway), it’s that ways of thinking about physics, and approaching physics problems, are retained. people need knowledge other than predictions such as how to think like physicist. while classical physics predictions were largely not retained, some other aspects largely were retained.
That may be the case, it doesn’t mean that when designing a theory I should waste any concern on “should I try to include general principles which may be useful to future theories”. Thinking like that is likely to generate a lot of rubbish. Thinking “what is happening here, and what simple mathematical explanation can I find for it?” is the only way to go.
BTW speaking of physics, are you aware of the debate about the many worlds interpretation, the bohmian interpretation, the shut up and calculate interpretation, the copenhagan interpretation, and so on? None of these debates about about prediction. They are all about the explanatory interpretation (or lack there of, for the shut up and calculate school), and the debates are between people who agree on the math and predictions.
Yes, I am. There are some empirical prediction differences, for instance, we might or might not be able to create quantum superpositions at increasingly large scales. In general, I say just pick the mathematically simplest and leave it at that.
You don’t need to resort to QM for an example of that dilemma, “do objects vanish when we stop looking at them?” is another such debate, but ultimately it doesn’t make much difference either way so I say just assume they don’t since its mathematically simpler.
Edit:
Thinking about this some more, I can see the virtue of deliberately spending some time thinking about explanations, and keeping a record of such explanations even if they are attached to theories that make sub-par predictions.
This should all be kept separate from the actual business of science. Prediction should also remain the target, the job of explanations is merely a means towards that end.
Double Edit:
I just realised that I wrote do where I meant don’t for objects vanishing. That completely changes the meaning of the paragraph. Crap.
You don’t need to resort to QM for an example of that dilemma, “do objects vanish when we stop looking at them?” is another such debate, but ultimately it doesn’t make much difference either way so I say just assume they do since its mathematically simpler.
Assuming that objects do vanish when you stop looking at them is much simpler?
This should all be kept separate from the actual business of science
Part of the business of science is to create successively better explanations of the world we live in. What its nature is, and so on.
Or maybe you will call that philosophy. If you do, it will be the case that many scientists are “philosophers” too. In the past, I would have said most of them. But instrumentalism started getting rather popular last century.
I wonder where you draw the line with your instrumentalism. For example, do you think the positivists were mistaken? If so, what are your arguments against them?
I do think the positivists went too far. They failed to realise that we can make predictions about things which we can never test. We can never evaluate these predictions, and we can never update our models on the basis of them, but we can still make them in the same way as we make any other predictions.
For example, consider the claim “a pink rhinoceros rides a unicycle around the Andromeda galaxy, he travels much faster than light and so completes a whole circuit of the galaxy every 42 hours. He is, of course, far too small for our telescopes to see.”
The positivist says “meaningless!”
I say “meaningful, very high probability of being false”
Another thing they shouldn’t have dismissed is counterfactuals. As Pearl showed, questions about counterfactuals can be reduced to Bayesian questions of fact.
Part of the business of science is to create successively better explanations of the world we live in. What its nature is, and so on.
I sympathise with this. To some extent I may have been exaggerating my own position in my last few posts, it happens to me occasionally. I do think that predictions are the only way of entangling your beliefs with reality, of creating a state of the world where what you believe is causally affected by what is true. Without that you have no way to attain a map that reflects the territory, any epistemology which claims you do is guilty of making stuff up.
1) It can be phrased as a prediction. “I predict if someone had no way to evaluate their predictions based on evidence they would have no way of attaining a map that reflects the territory. They would have no way of attaining a belief-set that works better in this world than in the average of all possible worlds”.
2) It is a mathematical statement, or at any rate the logical implication of a mathematical statement, and thus is probably true in all possible worlds so I am not trying to entangle it with the territory.
I understand, but disagree. The point I have been trying to make is that it does.
My original claim was that an agent’s outcome was determined solely by that agent’s predictions and the external world in which that agent lived. If you define a theory so that its predictive content is a strict subset of all the predictions which can be derived from it then yes, its predictive content is not all that matters, the other predictions matter as well.
It nonetheless remains the case that what happens to an agent is determined by that agent’s predictions. You need to understand that theories are not fundamentally Bayesian concepts, so it is much better to argue Bayes at either the statement-level or the agent-level than the theory-level.
In addition, I think our debate is starting to annoy everyone else here. There have been times when the entire recent comments bar is filled with comments from one of us, which is considered bad form.
“do objects vanish when we stop looking at them?” is another such debate, but ultimately it doesn’t make much difference either way so I say just assume they do since its mathematically simpler.
This kind of attitude is what we see as the anti-thesis of philosophy, and of understanding the world.
When people try to invent new theories in physics, they need to understand things. For example, will they want to use math that models objects that frequently blink in and out of existence, or objects that don’t do that? Both ways might make the same predictions for all measurements humans can do, but they lead to different research directions. The MWI vs Bohm/etc stuff is similar: it leads to different research directions for what one thinks would be an important advance in physics, a promising place to look for a successor theory, and so on. As an example, Deutsch got interested in fungibility because of his way of understanding physics. That may or may not be the right direction to go—depending on if his way of thinking about physics is right—but the point is it concretely matters even though it’s not an issue of prediction.
They are learning to write computer programs, which are not necessarily perfectly modular. Computer programs and scientific theories are different things and work in different ways.
Computer programs are a kind of knowledge, like any other, which has an organizational structure, like any other knowledge.
That may be the case, it doesn’t mean that when designing a theory I should waste any concern on “should I try to include general principles which may be useful to future theories”. Thinking like that is likely to generate a lot of rubbish.
That is not how you achieve good design. That’s the wrong way to go about it. Good design is achieved by looking for simplicity, elegance, clarity, modularity, good explanations, and so on. When you have those, then you do get a theory which can be more help to future theories. If you just try to think, “What will the future want?” then you won’t know the answer so you won’t get anywhere.
EDIT: I thought you meant having them vanish is simpler b/c it means that less stuff exists less of the time. That is a rough description of what the Copenhagen Interpretation people think. One issue this raises is that people can and do disagree about what is simpler. I don’t think it’s good to set up “simpler” as a definitive criterion. It’s better to have a discussion. You can use “it’s simpler” as an argument. And that might end the discussion. But it might not if someone else has another criterion they think is relevant and an explanation of why it matters (you shouldn’t rule out that ever happening). And also someone might criticize your interpretation of what makes things simpler in general, or which is simpler in this case.
The MWI vs Bohm/etc stuff is similar: it leads to different research directions for what one thinks would be an important advance in physics, a promising place to look for a successor theory, and so on. As an example, Deutsch got interested in fungibility because of his way of understanding physics. That may or may not be the right direction to go—depending on if his way of thinking about physics is right—but the point is it concretely matters even though it’s not an issue of prediction.
Actually, it is an issue of prediction. We are trying to predict which future research will lead to promising results.
Computer programs are a kind of knowledge, like any other, which has an organizational structure, like any other knowledge.
I would say computer programs are more analogous to architecture than to scientific theories.
Actually, it is an issue of prediction. We are trying to predict which future research will lead to promising results.
If you have a theory, X, which predicts Y, then there are important aspects of X other than Y. There is non-predictive content of X. When you say that those other factors have to do with prediction in some way, that wouldn’t mean that only the predictive content of X matters since the way they have to do with prediction isn’t a prediction X made.
I would say computer programs are more analogous to architecture than to scientific theories.
I’m not speaking in loose analogies and opinions. All knowledge is the same thing (“knowledge”) because it has shared attributes. The concept of an “idea” covers ideas in various fields, under the same word, because they share attributes.
If you have a theory, X, which predicts Y, then there are important aspects of X other than Y. There is non-predictive content of X. When you say that those other factors have to do with prediction in some way, that wouldn’t mean that only the predictive content of X matters since the way they have to do with prediction isn’t a prediction X made.
I would say, evaluate X solely on the predictive merits of Y. If we are interested in future research directions then make separate predictions about those.
All knowledge is the same thing (“knowledge”) because it has shared attributes.
A computer program doesn’t really count as knowledge. Its information, in the scientific and mathematical sense, and you write it down, but the similarity ends there. It is a tool that is built to do a job, and in that respect is more like a building. It doesn’t really count as knowledge at all, not to a Bayesian at any rate.
What? Broad reach is a virtue. A theory which applies to many questions—which has some kind of general principle to it—is valuable. Like QM which applies to the entire universe—it is a universal theory, not a narrow theory.
A computer program doesn’t really count as knowledge.
It has apparent design. It has adaptation to a purpose. It’s problem-solving information. (The knowledge is put there by the programmer, but it’s still there.)
I would say, evaluate X solely on the predictive merits of Y. If we are interested in future research directions then make separate predictions about those.
One of the ways this came up is we were considering theories with identical Y, and whether they have any differences that matter. I said they do. Make sense now?
It has apparent design. It has adaptation to a purpose. It’s problem-solving information. (The knowledge is put there by the programmer, but it’s still there.)
In this sense, a building is also knowledge. Programming is making, not discovering.
One of the ways this came up is we were considering theories with identical Y, and whether they have any differences that matter. I said they do. Make sense now?
Suppose two theories A and B make identical predictions for the results of all lab experiments carried out thus far but disagree about directions for future research. I would say they make different predictions about which research directions will lead to success, and are therefore not entirely identical.
Just got an idea for a good example from another thread.
Consider chess. If a human and a chess program come up with the same move, then the differences between them, and their ways of thinking about the move, don’t really matter, do you think?
And suppose we want to learn from them. So we give them both white. We play the same moves against each of them. We end up with identical games, suppose. So, in the particular positions from that game they make identical predictions about what move has the best chance to win.
Now, we also in each case gather some information about why they made each move to learn from.
The computer program provides move list trees it looked at, at every move, with evaluations of the positions they reach.
The human provides explanations. He says things like, “I was worried my queen side wasn’t safe, so i decided i better win on the king side quickly” or “i saw that this was eventually heading towards a closed game with my pawns fixed on dark squares, so that’s why i traded my bishop for a knight there”.
When you want to learn chess, these different kinds of information are both useful, but in different ways. They are different. The differences matter. For a specific person, with specific strengths and weaknesses, one or the other may be far far more useful.
The point is that identical predictive content (in this case, in the sense of predicting what move has the best chance to win the game in each position) does not mean that what’s going on behind the scenes is even similar.
It would be that one thing is intelligent, and one isn’t. That’s how big the differences can be.
So, are you finally ready to concede that your claim is false? The one you were so sure of that identical predictive content of theories means nothing else matters, and that their internal structure can’t be important?
No, because they do different things. If they take different actions this imply’s they must have different predictions (admittedly its a bit anthropomorphic to talk about a chess program having predictions at all).
Incidentally, they are using different predictions to make their moves. For example the human may predict P(my left side is too weak) = 0.9 and use this prediction to derive P(I should move my queen to the left side) = 0.8, while the chess program doesn’t really predict at all but if it did you would see something more like individual predictions for the chance of winning given each possible move, and a derived prediction like P(I should move my queen to the left side) = 0.8.
With such different processes, its really an astonishing coincidence that they make the same moves at all.
(I apologise in advance for my lack of knowledge of how chess players actually think, I haven’t played it since I discovered go, I hope my point is still apparent.)
Your point is apparent—you try to reinterpret all human thinking in terms of probability—but it just isn’t true. There’s lots of books on how to think about chess. They do not advise what you suggest. Many people follow the advice they do give, which is different and unlike what computers do.
People learn explanations like “control the center because it gives your pieces more mobility” and “usually develop knights before bishops because it’s easier to figure out the correct square for them”.
Chess programs do things like count up how many squares each piece on the board can move to. When humans play they don’t count that. They will instead do stuff like think about what squares they consider important and worry about those.
I only write “idea” instead. If you taboo that too, i start writing “conjecture” or “guess” which is misleading in some contexts. Taboo that too and i might have to say “thought” or “believe” or “misconception” which are even more misleading in many contexts.
In this sense, a building is also knowledge.
Yes buildings rely on, and physically embody, engineering knowledge.
Suppose two theories A and B make identical predictions for the results of all lab experiments carried out thus far but disagree about directions for future research. I would say they make different predictions about which research directions will lead to success, and are therefore not entirely identical.
But they don’t make those predictions. they don’t say this stuff, they embody it in their structure. it’s possible for a theory to be more suited to something, but no one knows that, and it wasn’t made that way on purpose.
I only write “idea” instead. If you taboo that too, i start writing “conjecture” or “guess” which is misleading in some contexts. Taboo that too and i might have to say “thought” or “believe” or “misconception” which are even more misleading in many contexts.
You didn’t read the article, and so you are missing the point. In spectacular fashion I might add.
Yes buildings rely on, and physically embody, engineering knowledge.
So, buildings should be made out of bricks, therefore scientific theories should be made out of bricks?
But they don’t make those predictions. they don’t say this stuff, they embody it in their structure. it’s possible for a theory to be more suited to something, but no one knows that, and it wasn’t made that way on purpose.
I contend that a theory can make more predictions than are explicitly written down. Most theories make infinitely many predictions. A logically omniscient Ideal Bayesian would immediately be able to see all those predictions just from looking at the theory, a Human Bayesian may not, but they still exist.
So, buildings should be made out of bricks, therefore scientific theories should be made out of bricks?
2) I meant something else which you didn’t understand
?
Can you specify the infinitely many predictions of the theory “Mary had a little lamb” without missing any I deem important structural issues? Saying the theory “Mary had a little lamb” is not just a prediction but infinitely many predictions is non-standard terminology right? Did you invent this terminology during this argument, or did you always use? Are there articles on it?
Bayesians don’t treat the concept of a theory as being fundamental to epistemology (which is why I wanted to taboo it), so I tried to figure out the closest Bayesian analogue to what you were saying and used that.
As for 1) and 2), I was merely pointing out that “program’s are a type of knowledge, programs should be modular, therefore knowledge should be modular” and “building’s are a type of knowledge, buildings should be made of bricks, therefore knowledge should be made of bricks” are of the same form and equally valid. Since the latter is clearly wrong, I was making the point that the former is also wrong.
To be honest I have never seen a better demonstration of the importance of narrowness than your last few comments, they are exactly the kind of rubbish you end up talking when you make a concept too broad.
What should I do, do you think? I take it you know what my goals are in order to judge this issue. Neat. What are they? Also what’s my reputation like?
If bayes wants to be an epistemology then it must do more than predict. Same for Newton.
If you want to have math which doesn’t dethrone Popper, but is orthogonal, you’re welcome to do that and i’d stop complaining (much). However Yudkowsky says Bayesian Epistemology dethrones and replaces Popper. He regards it as a rival theory to Popper’s. Do you think Yudkowsky was wrong about that?
It replaces Popperian epistemology where their domains overlap—namely: building models from observations and using them to predict the future. It won’t alone tell you what experiments to perform in order to gather more data—there are other puzzle pieces for dealing with that.
There’s no overlap there b/c Popperian epistemology doesn’t provide the specific details of how to do that. Popperian epistemology is fully compatible with, and can use, Bayes’ theorem and any other pure math or logic insights.
Popperian epistemology contradicts your “other puzzle pieces”. And without them, Bayes’ theorem alone isn’t epistemology.
Except for the advice on induction? Or has induction merely been rechristened as corroboration? Popper enthusiasts usually seem to deny doing that.
Induction doesn’t work.
I thought you were referring to things you can do with Bayes’ theorem and some input. If you meant something more, provide the details of what you are proposing.
Building models from observations and using them to predict the future is what Solomonoff induction does. It is Occam’s razor plus Bayes’s theorem.
The most common point of Popper’s philosophy that I hear (including from my Popperian philosophy teacher) is the whole “black swan white swan” thing, which Bayes does directly contradict, and dethrone (though personally I’m not a big fan of that terminology).
The stuff you talked about with conjectures and criticisms does not directly contradict Bayes and if the serious problems with ‘one strike and you’re out’ criticisms are fixed it I may be persuaded to accept both it and Bayes.
Bayes is not meant to be an epistemology all on its own. It only starts becoming one when you put it together with Solomonoff Induction, Expected Utility Theory, Cognitive Science and probably a few other pieces of the puzzle that haven’t been found yet. I presume the reason it is referred to as Bayesian rather than Solomonoffian or anything else is that Bayes is the both most frequently used and the oldest part.
The black swan thing is not that important to Popper’s ideas, it is merely a criticism of some of Popper’s opponents.
How does Bayes dethrone it? By asserting that white swans support “all swans are white”? I’ve addressed that at length (still going through overnight replies, if someone answered my points i’ll try to find it).
Well I don’t have a problem with Bayes’ theorem itself, of course (pretty much no one does, right? i hope not lol). It’s these surrounding ideas that make an epistemology that I think are mistaken, and all of which Popper’s epistemology contradicts. (I mean the take on cognitive science popular here, not the idea of doing cognitive science).
I think I answered your points a few days ago with my first comment of this discussion.
In short, yes, there are infinitely many hypotheses whose probabilities are raised by the white swan, and yes those include both “all swans are white” and “all swans are black and I am hallucinating” but the former has a higher prior, at least for me, so it remains more probable by several orders of magnitude. For evidence to support X it doesn’t have to only support X. All that is required is that X does better at predicting than the weighted average of all alternatives.
Just to be clear I am happy to say those people were completely wrong. It would be nice if nobody ever invented a poor argument to defend a good conclusion but sadly we do not live in that world.
But then I answered your answer, right? If I missed one that isn’t pretty new, let me know.
so support is vacuous and priors do all the real work. right?
and priors have their own problems (why that prior?).
OK. I think your conception of support is unsubstantive but not technically wrong.
No. Bayesian updating is doing the job of distinguishing “all swans are white” from “all swans are black” and “all swans are green” and “swans come in a equal mixture of different colours”. It is only a minority of hypothesis which are specifically crafted to give the same predictions as “all swans are white” where posterior probabilities remain equal to priors.
What is it with you! I admit that priors are useful in one situation and you conclude that everything else is useless!
Also, the problem of priors is overstated. Given any prior at all, the probability of eventually converging to the correct hypothesis, or at any rate a hypothesis which gives exactly the same predictions as the correct one, is 1.
Bayes cannot distinguish between two theories that assign exactly the same probabilities to everything, but I don’t see how you could distinguish them, without just making sh*t up, and it doesn’t matter much anyway since all my decisions will be correct whichever is true.
But that is pretty simple logic. Bayes’ not needed.
@priors—are you saying you use self-modifying priors?
That makes it highly incomplete, in my view. e.g. it makes it unable to address philosophy at all.
By considering their explanations. The predictions of a theory are not its entire content.
that’s one of the major problems popper addressing (reconciling fallibilism and non-justification with objective knowledge and truth)
It does matter, given that you aren’t perfect. How badly things start breaking when mistakes are made depends on issues other than what theories predict—it depends on their explanations, internal structure, etc...
No, I’m pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.
One could say that this is how to work out priors. You are aware that the priors aren’t necessarily set in stone at the beginning of time? Jaynes pointed out that a prior should always include all the information you have that is not explicitly part of the data (and even the distinction between prior and data is just a convention), and may well be based on insights or evidence encountered at any time, even after the data was collected.
Solomonoff Induction is precisely designed to consider explanations. The difference is it does so in a rigorous mathematical fashion rather than with a wishy-washy word salad.
It was designed to address science, which is a more important job anyway.
However, in my experience, the majority of philosophical questions are empirically addressable, at least in principle, and the majority of the rest are wrong questions.
No!
OK would you agree that this is an important point of disagreement, and an interesting discussion topic, to focus on? Do you want to know why not?
Do you have any particular argument that I can’t be right about this? Or are you just making a wild guess? Are you open to being wrong about this? Would you be impressed if there was a theory which explained this issue? Intrigued to learn more about the philosophy from which I learned this concept?
I cannot see how I could be wrong.
Lets look at a Bayesian decision process. First you consider all possible actions that you could take, this is unaffected by the difference between A and B. For each of them you use your probabilities to get a distribution across all possible outcomes, these will be identical.
You assign a numerical utility to each outcome based on how much you would value that outcome. If you want, I can give a method for generating these numerical utilities. These will be a mixture of terminal and instrumental values. Terminal values are independent of beliefs, so these are identical. Instrumental values depend on beliefs, but only via predictions about what an outcome will lead to in the long run, so these are identical.
For each action you take an average of the values of all outcomes weighted by probability, and pick action with the highest result. This will be the same with theory A or theory B. So I do the same thing either way, and the same thing happens to me either way. Why do I care which is true?
So, if you were wrong, you’d be really impressed, and want to rethink your worldview?
Or did you mean you’re not interested?
None of the rest of what you say is relevant at all. Remember that you said, “No, I’m pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.” It wasn’t specified that they were Bayesian decision theories.
And the context was how well or badly it goes for you when we introduce mistakes into the picture (e.g. the issue is you, being fallible, make some mistakes. How resilient is your life to them?).
Do you now understand the issue I’m talking about?
Notice how the description I gave made absolutely no reference to whether or not the theories are correct. The argument applies equally well regardless of any correspondence to reality or lack thereof. Nothing changes when we introduce mistakes to the picture because they are already in the picture.
The only kind of mistakes that can hurt me are the ones that affect my decisions, and the only ones that can do that are the ones that affect my predictions. The point remains, if the predictions are the same, my actions are the same, the outcome is the same.
You’re still mistaken and have overlooked several things.
And you have ignored my questions.
In Popperian epistemology, we do not say things like
They are anti-fallibilist, closed minded, and silly. We don’t think our lack of imagination of how we could possibly be wrong is an argument that we are right.
I want you to pin yourself down a little bit. What will you concede if you find out you are wrong about this? Will you concede a lot or almost nothing? Will you regard it as important and be glad, or will you be annoyed and bored? Will you learn much? Will your faith in Bayes be shaken? What do you think is at stake here?
And are you even interested? You have expressed no interest in why I think you’re mistaken, you just keep saying how I can’t possibly have a point (even though you don’t yet know what it is).
It annoys me a lot when people do this, because I can be wrong in many different ways. If I give a maths proof, then say I cannot see how it could be wrong, someone else might come up and ask me if I will give up my trust (trust, not faith, is what I have in Bayes by the way) in maths. When they reveal why I am wrong, it turns out I just made a mathematical error, I have learnt that I need to be more careful, not that maths is wrong.
I am confident enough in that statement that I would be interested to find out why you think it is wrong.
If the way in which you prove me wrong turns out to be interesting and important, rather than a technical detail or a single place where I said something I didn’t mean, then it will likely cause a significant change in my world view. I will not just immediately switch to Popper, there are more than two alternatives after all, and I may well not give up on Bayes. This isn’t a central tenet of Bayesian decision theory, (although it is a central tenet of instrumental rationality), so it won’t refute the whole theory.
My most likely response, if you really can show that more than prediction is required, is to acknowledge that at least one component of the complete Bayesian epistemology is still missing. It would not surprise me, although it would surprise me to find that this specific thing was what was missing.
I’m not asserting that I could not possibly be wrong, that P(I am wrong) = 0. All I am saying is that I feel pretty sure about this, which I do.
Since you refuse to state your point I’m going to guess what it is.
My guess is that you are referring to the point you made earlier about how the difference between “the seasons are caused by the earth tilting on its axis” and “the seasons are caused by the Goddess Demeter being upset about Persephone being in Hades” is that the former has a good explanation and the latter has a bad explanation. Is your point that if I don’t care about explanations I have no means of distinguishing between them.
I do not find this convincing, I do not currently have time to explain why but I can do so later if you want.
i got bored of your evasions.
you’re not on the right track.
if you were going to buy a black box which does multiplication, do you think all black boxes you could buy—which you thoroughly test and find give perfect outputs for all inputs—are equally good?
disregard time taken to get an answer. and they only multiply numbers up to an absolute value of a trillion, say.
If I disregard time taken then yes, they are all equally good (assuming we don’t add in other confounding factors like if one works by torturing puppies and the other doesn’t).
But one might work, internally, by torturing puppies.
One might work, internally, in a way that will break sooner.
One might work, internally, in a way that is harder to repair if it does break.
One might work, internally, in a way that is harder or easier to convert to perform some other function.
So the internal structure of knowledge, which makes identical predictions, does matter.
All good programmers know this in the form of: some coding styles, which achieve the same output for the users, have different maintenance costs.
This is an important fact about epistemology, that the internal structure of knowledge matters, not just its results.
edit: Relating this to earlier conversation, one might work, internally, in a way so that if an error does happen (maybe they have error rates of 1 time in 500 trillion. or maybe something partially breaks after you buy it and use it a while), then the result you get is likely to be off by a small amount. Another might work internally in a way that if something goes wrong you may get random output.
and lol @ Marius
I just said I was assuming away confounding factors like that for the sake of argument.
Ideas do not ‘break’. They are either correspond to reality or they do not, this is a timeless fact about them. They do not suddenly switch from corresponding to reality to not doing so.
If by break you mean ‘fail in some particular scenario that has not yet been considered’ then the only way one can fail and the other not is if they generate different predictions in that scenario.
The only other function would be to predict in a different domain, which would mean that if they still make the same predictions they are equally good.
You meant you were intentionally ignoring all factors that could possibly make them different? That the puppy example had to do with immorality was merely a coincidence, and not representative of the class of factors you wanted to ignore? That completely defeats the point of the question to say “I don’t care which I buy, ignoring all factors that might make me care, except the factors that, by premise, are the same for both”.
Not everything I said directly maps back to the original context. Some does. Some doesn’t.
Ideas stored in brains can randomly break. Brains are physical devices with error rates.
Regardless, ideas need to be changed. How easily they change, and into what, depends on their structure.
Even if I granted that (I don’t), one black box could have a list of predictions. Another one could predict using an algorithm which generates the same results as the list.
When you go to change them to another domain, you’ll find it matters which you have. For example, you might prefer the list, since a list based approach will still work in other domains, while the algorithm might not work at all. Or you might prefer the algorithm since it helps you find a similar algorithm for the new domain, and hate the list approach b/c the new domain is trillions of times bigger and the list gets unwieldy.
Brains are fairly resilient to the failure of a single component.
Part of the problem here is that I’m not clear what you mean by ideas. Are you using Hume’s definition, in which an idea is like a mental photocopy of a sensory impression? Or do you mean something more like a theory?
If you mean the former then ideas are just the building blocks of predictions and plans. They’re content matters, because their content effects the predictions I make.
If you mean something like a theory, then let me explain what I think of theories as. A theory is like a compressed description of a long list of probabilities. For example, Newton’s laws could be ‘unfolded’ into a very long, possibly infinite, list of predictions for what objects will do in a wide range of situations, but it is quicker to give them in their compressed form, as general principles of motion.
A theory can be seen as a mathematical function, from a subset of the set of all possible strings of future events to the real numbers between 0 and 1. Ultimately, a functions identity rests in its output and nothing else. x+x and 2x may look different, but to say they are different functions is absurd. If you give me a functional equation to solve, and I show that 2x is the answer, you cannot criticise me for failing to distinguish between the the two possibilities of 2x and x+x, because they are not actually two different possibilities.
The only way you could compare 2x and x+x is to note that the former is slightly quicker to write, so when we have two theories giving identical predictions we pick the one that is shorter, or from which it easier to generate the predictions (analogous to picking the faster box, or the box which is more convenient to carry around with you).
Yes. And are not perfectly resilient. So, I’m right and we do have to consider the issue.
http://fallibleideas.com/ideas
What do you call explanatory theories?
Even with your definition of theories, my example about changing a theory to apply to a new domain is correct, is it not? e.g. which compression algorithm is used matters, even though it doesn’t affect the predictions being made.
Or similarly: compress data, flip one bit, uncompress. The (average) result depends on the compression algorithm used. And since our brains are fallible then this can happen to people. How often? Well… most people experience lots of errors while remembering stuff.
This ‘explanation’ does not explain anything. It gives a one sentence definition ‘an idea is the smallest unit of coherent thought’ and the rest is irrelevant to the definition. That one sentence just pushes back the question into “what counts as coherent thought”.
The definition this looks most like it Hume’s, is it that?
No, you do not change a theory to apply to a new domain. You invent a new theory.
Fine, you also try to pick the explanation which you find easiest to remember, which is pretty much equivalent to the shortest. Or you write stuff down.
Some new theories are modifications of old theories, right? e.g. in physics, QM wasn’t invented from scratch.
So the “structure” (my word) of the old theories matters.
I think you’re missing the point which is that some compression algorithms (among other things) are more error resistance. This is an example of how the internal structure of knowledge making identical predictions can differ and how the differences can have real world consequences.
Which is just the thing we were debating, and which you previously couldn’t conceive of how you could be mistaken about.
Some theories are modifications of theories from other domains. It is quite rare for such a theory to work. As for QM, what happened there is quite a common story in science.
A theory is designed which makes correct predictions within the range of things we can test, but as our range of testable predictions expands we find it doesn’t predict as well in other domains. It gets abandoned in favour of something better (strictly speaking it never gets fully abandoned, its probability just shrinks to well below the threshold for human consideration). That better theory inevitably ‘steals’ some of the old theory’s predictions, since the old theory was making correct predictions in some domains the new theory must steal those otherwise it will quickly end up falsified itself.
This doesn’t mean theories should be designed to do anything other than predict well, the only thing a theory an hope to offer any successor is predictions, so the focus should remain on those.
As someone with quite a bit of training in maths and no training in computing I have a bit of a cognitive blind-spot for efficiency concerns with algorithms. However, if we look at the real point of this digression, which was whether it is a fault of Bayes that it does not do more than predict, then I think you don’t have a point.
When do these efficiency concerns matter, are you saying there should be some sort of trade-off between predictive accuracy and efficiency? That if we have a theory that generate slightly worse predictions but does so much more efficiently then we should adopt it?
I can’t agree to that. It may sometimes be useful to use the second theory as a useful approximation to the first, but we should always keep one eye on the truth, you never know when you will need to be as accurate as possible (Newtonian Mechanics and Relativity provide a good example of this dynamic).
If you are saying efficiency concerns only matter with theories that have exactly the same predictive accuracy, then all your talk of structure and content can be reduced to a bunch of empirical questions “which of these is shorter” “which of these is most quick to use” “which of these is least likely to create disastrous predictions if we make an error”. Every one of these has its own Bayesian answer, so the process solves itself.
Right. It’s common that old theories get changed. That there is piecemeal improvement of existing knowledge, rather than outright replacement.
How well ideas are able to be improved in this way—how suited for it they are—is an attribute of ideas other than what predictions they make. It depends on what I would call their internal structure.
So that’s (above) why it does mean it. Note that good structure doesn’t come at the cost of prediction. It’s not a trade off; they aren’t incompatible. You can have both.
Efficiency of algorithms (space and speed both) is a non-predictive issue, and it’s important, but it’s not what I was talking about at all. It’s ground that others have covered plenty. That’s why I specifically disqualified that concern in my hypothetical.
No trade off, and I wasn’t talking about efficiency. My concern is how good ideas are at facilitating their own change, or not.
There’s some good examples of this from programming. For example, you can design a program which is more modular: it has separated, isolated parts. Then later the parts can be individually reused. This is good! There’s no tradeoffs involved here with what output you get from the program—this issue is orthogonal to output.
If the parts of the program are all messily tied together, then you remove one and everything breaks. (That means: if you introduce some random error, you end up with unpredictable, bad results.) If they are decoupled, and each system has fault tolerances and error checking on everything it’s told by the other systems, then errors, or wanted to replace some part of it, can have much more limited affects on the rest of the system.
No, nothing like that at all.
It is my claim that this is not a process of change, it is a total replacement. It will happen that if the old theory had any merit at all then some of its predictions will be identical to those made by the new one.
If my claim is correct then all theories are perfectly modular. A replacement thoery may have an entirely different explanation, but it can freely take any subset of the old theory’s predictions.
I was using efficiency as a quick synonym for all such concerns, to save time and space.
Fine, if we have a new theory that generates very slightly worse predictions, but is less modular, would you advocate replacement.
Do you think all computer programs are perfectly modular? What is it that programmers are learning when they read books on modular design and change their coding style?
Well OK. I didn’t know that. I don’t think you should though b/c efficiency is already the name of a small number of well known concerns, and the ones I’m talking about are separate.
I think this question is misleading. The answer is something like: keep both, and use them for different purposes. If you want to make predictions, use the theory currently best at that. If you want to do research, the more modular one might be a more promising starting point and might surpass the other in the future.
In physics, QM retains many ideas from classical physics. Put it this way: a person trained in classical physics doesn’t have to start over to learn QM. There are lots of ways they are related and his pre-existing knowledge remains useful. This is why even today classical physics is still taught (usually first) at universities. It hasn’t been totally replaced. It’s not just that some of its predictions are retained (they are only retained as quick approximations in limited sets of circumstances, btw. by and large, technically speaking, they are false. and taught anyway), it’s that ways of thinking about physics, and approaching physics problems, are retained. people need knowledge other than predictions such as how to think like physicist. while classical physics predictions were largely not retained, some other aspects largely were retained.
BTW speaking of physics, are you aware of the debate about the many worlds interpretation, the bohmian interpretation, the shut up and calculate interpretation, the copenhagan interpretation, and so on? None of these debates about about prediction. They are all about the explanatory interpretation (or lack there of, for the shut up and calculate school), and the debates are between people who agree on the math and predictions.
They are learning to write computer programs, which are not necessarily perfectly modular. Computer programs and scientific theories are different things and work in different ways.
Are you imagining a scenario something like Ptolemy’s astronomy versus Copernican astronomy during the days when the latter still assumed the planets moved in perfect circles?
In that case I can sort of see your point, the latter was a more promising direction for future research while the former generated better predictions. The former deserved a massive complexity penalty of course, but it may still have come out on top in the Bayesian calculation.
Sadly, there will occasionally be times when you do everything right but still get the wrong answer. Still, there is a Bayesian way to deal with this sort of thing without giving up the focus on predictions. Yudkowsky’s Technical Explanation of Technical Explanation goes into it.
That may be the case, it doesn’t mean that when designing a theory I should waste any concern on “should I try to include general principles which may be useful to future theories”. Thinking like that is likely to generate a lot of rubbish. Thinking “what is happening here, and what simple mathematical explanation can I find for it?” is the only way to go.
Yes, I am. There are some empirical prediction differences, for instance, we might or might not be able to create quantum superpositions at increasingly large scales. In general, I say just pick the mathematically simplest and leave it at that.
You don’t need to resort to QM for an example of that dilemma, “do objects vanish when we stop looking at them?” is another such debate, but ultimately it doesn’t make much difference either way so I say just assume they don’t since its mathematically simpler.
Edit:
Thinking about this some more, I can see the virtue of deliberately spending some time thinking about explanations, and keeping a record of such explanations even if they are attached to theories that make sub-par predictions.
This should all be kept separate from the actual business of science. Prediction should also remain the target, the job of explanations is merely a means towards that end.
Double Edit:
I just realised that I wrote do where I meant don’t for objects vanishing. That completely changes the meaning of the paragraph. Crap.
Assuming that objects do vanish when you stop looking at them is much simpler?
Just noted that in an edit, it was a typo.
Part of the business of science is to create successively better explanations of the world we live in. What its nature is, and so on.
Or maybe you will call that philosophy. If you do, it will be the case that many scientists are “philosophers” too. In the past, I would have said most of them. But instrumentalism started getting rather popular last century.
I wonder where you draw the line with your instrumentalism. For example, do you think the positivists were mistaken? If so, what are your arguments against them?
I do think the positivists went too far. They failed to realise that we can make predictions about things which we can never test. We can never evaluate these predictions, and we can never update our models on the basis of them, but we can still make them in the same way as we make any other predictions.
For example, consider the claim “a pink rhinoceros rides a unicycle around the Andromeda galaxy, he travels much faster than light and so completes a whole circuit of the galaxy every 42 hours. He is, of course, far too small for our telescopes to see.”
The positivist says “meaningless!”
I say “meaningful, very high probability of being false”
Another thing they shouldn’t have dismissed is counterfactuals. As Pearl showed, questions about counterfactuals can be reduced to Bayesian questions of fact.
I sympathise with this. To some extent I may have been exaggerating my own position in my last few posts, it happens to me occasionally. I do think that predictions are the only way of entangling your beliefs with reality, of creating a state of the world where what you believe is causally affected by what is true. Without that you have no way to attain a map that reflects the territory, any epistemology which claims you do is guilty of making stuff up.
I do not agree with this assertion.
Some things I note about it:
1) it isn’t phrased as a prediction
2) it isn’t phrased as an argument based on empirical evidence
Would you like to try rewriting it more carefully?
1) It can be phrased as a prediction. “I predict if someone had no way to evaluate their predictions based on evidence they would have no way of attaining a map that reflects the territory. They would have no way of attaining a belief-set that works better in this world than in the average of all possible worlds”.
2) It is a mathematical statement, or at any rate the logical implication of a mathematical statement, and thus is probably true in all possible worlds so I am not trying to entangle it with the territory.
If Y can be phrased as a prediction, it does not follow that Y is the predictive content of X. Do you understand?
I understand, but disagree. The point I have been trying to make is that it does.
My original claim was that an agent’s outcome was determined solely by that agent’s predictions and the external world in which that agent lived. If you define a theory so that its predictive content is a strict subset of all the predictions which can be derived from it then yes, its predictive content is not all that matters, the other predictions matter as well.
It nonetheless remains the case that what happens to an agent is determined by that agent’s predictions. You need to understand that theories are not fundamentally Bayesian concepts, so it is much better to argue Bayes at either the statement-level or the agent-level than the theory-level.
In addition, I think our debate is starting to annoy everyone else here. There have been times when the entire recent comments bar is filled with comments from one of us, which is considered bad form.
Could we continue this somewhere else?
Yes. I PMed you somewhere yesterday. Did you get it?
This kind of attitude is what we see as the anti-thesis of philosophy, and of understanding the world.
When people try to invent new theories in physics, they need to understand things. For example, will they want to use math that models objects that frequently blink in and out of existence, or objects that don’t do that? Both ways might make the same predictions for all measurements humans can do, but they lead to different research directions. The MWI vs Bohm/etc stuff is similar: it leads to different research directions for what one thinks would be an important advance in physics, a promising place to look for a successor theory, and so on. As an example, Deutsch got interested in fungibility because of his way of understanding physics. That may or may not be the right direction to go—depending on if his way of thinking about physics is right—but the point is it concretely matters even though it’s not an issue of prediction.
Computer programs are a kind of knowledge, like any other, which has an organizational structure, like any other knowledge.
That is not how you achieve good design. That’s the wrong way to go about it. Good design is achieved by looking for simplicity, elegance, clarity, modularity, good explanations, and so on. When you have those, then you do get a theory which can be more help to future theories. If you just try to think, “What will the future want?” then you won’t know the answer so you won’t get anywhere.
EDIT: I thought you meant having them vanish is simpler b/c it means that less stuff exists less of the time. That is a rough description of what the Copenhagen Interpretation people think. One issue this raises is that people can and do disagree about what is simpler. I don’t think it’s good to set up “simpler” as a definitive criterion. It’s better to have a discussion. You can use “it’s simpler” as an argument. And that might end the discussion. But it might not if someone else has another criterion they think is relevant and an explanation of why it matters (you shouldn’t rule out that ever happening). And also someone might criticize your interpretation of what makes things simpler in general, or which is simpler in this case.
I’m pretty sure Benelleiot wrote “do” when he meant “don’t.”
Actually, it is an issue of prediction. We are trying to predict which future research will lead to promising results.
I would say computer programs are more analogous to architecture than to scientific theories.
If you have a theory, X, which predicts Y, then there are important aspects of X other than Y. There is non-predictive content of X. When you say that those other factors have to do with prediction in some way, that wouldn’t mean that only the predictive content of X matters since the way they have to do with prediction isn’t a prediction X made.
I’m not speaking in loose analogies and opinions. All knowledge is the same thing (“knowledge”) because it has shared attributes. The concept of an “idea” covers ideas in various fields, under the same word, because they share attributes.
I would say, evaluate X solely on the predictive merits of Y. If we are interested in future research directions then make separate predictions about those.
A computer program doesn’t really count as knowledge. Its information, in the scientific and mathematical sense, and you write it down, but the similarity ends there. It is a tool that is built to do a job, and in that respect is more like a building. It doesn’t really count as knowledge at all, not to a Bayesian at any rate.
Remember, narrowness is a virtue.
What? Broad reach is a virtue. A theory which applies to many questions—which has some kind of general principle to it—is valuable. Like QM which applies to the entire universe—it is a universal theory, not a narrow theory.
It has apparent design. It has adaptation to a purpose. It’s problem-solving information. (The knowledge is put there by the programmer, but it’s still there.)
One of the ways this came up is we were considering theories with identical Y, and whether they have any differences that matter. I said they do. Make sense now?
What happens if we taboo the word ‘theory’.
In this sense, a building is also knowledge. Programming is making, not discovering.
Suppose two theories A and B make identical predictions for the results of all lab experiments carried out thus far but disagree about directions for future research. I would say they make different predictions about which research directions will lead to success, and are therefore not entirely identical.
Just got an idea for a good example from another thread.
Consider chess. If a human and a chess program come up with the same move, then the differences between them, and their ways of thinking about the move, don’t really matter, do you think?
And suppose we want to learn from them. So we give them both white. We play the same moves against each of them. We end up with identical games, suppose. So, in the particular positions from that game they make identical predictions about what move has the best chance to win.
Now, we also in each case gather some information about why they made each move to learn from.
The computer program provides move list trees it looked at, at every move, with evaluations of the positions they reach.
The human provides explanations. He says things like, “I was worried my queen side wasn’t safe, so i decided i better win on the king side quickly” or “i saw that this was eventually heading towards a closed game with my pawns fixed on dark squares, so that’s why i traded my bishop for a knight there”.
When you want to learn chess, these different kinds of information are both useful, but in different ways. They are different. The differences matter. For a specific person, with specific strengths and weaknesses, one or the other may be far far more useful.
So, the computer program and the human do different things, and thereby produce different results. Your point?
I was claiming that if they did the same thing they would get the same results.
The point is that identical predictive content (in this case, in the sense of predicting what move has the best chance to win the game in each position) does not mean that what’s going on behind the scenes is even similar.
It would be that one thing is intelligent, and one isn’t. That’s how big the differences can be.
So, are you finally ready to concede that your claim is false? The one you were so sure of that identical predictive content of theories means nothing else matters, and that their internal structure can’t be important?
No, because they do different things. If they take different actions this imply’s they must have different predictions (admittedly its a bit anthropomorphic to talk about a chess program having predictions at all).
Incidentally, they are using different predictions to make their moves. For example the human may predict P(my left side is too weak) = 0.9 and use this prediction to derive P(I should move my queen to the left side) = 0.8, while the chess program doesn’t really predict at all but if it did you would see something more like individual predictions for the chance of winning given each possible move, and a derived prediction like P(I should move my queen to the left side) = 0.8.
With such different processes, its really an astonishing coincidence that they make the same moves at all.
(I apologise in advance for my lack of knowledge of how chess players actually think, I haven’t played it since I discovered go, I hope my point is still apparent.)
That’s not how chess players think.
Your point is apparent—you try to reinterpret all human thinking in terms of probability—but it just isn’t true. There’s lots of books on how to think about chess. They do not advise what you suggest. Many people follow the advice they do give, which is different and unlike what computers do.
People learn explanations like “control the center because it gives your pieces more mobility” and “usually develop knights before bishops because it’s easier to figure out the correct square for them”.
Chess programs do things like count up how many squares each piece on the board can move to. When humans play they don’t count that. They will instead do stuff like think about what squares they consider important and worry about those.
Notice how this sentence is actually a prediction in disguise
As is this
I only write “idea” instead. If you taboo that too, i start writing “conjecture” or “guess” which is misleading in some contexts. Taboo that too and i might have to say “thought” or “believe” or “misconception” which are even more misleading in many contexts.
Yes buildings rely on, and physically embody, engineering knowledge.
But they don’t make those predictions. they don’t say this stuff, they embody it in their structure. it’s possible for a theory to be more suited to something, but no one knows that, and it wasn’t made that way on purpose.
The point of tabooing words is to expand your definitions and remove misunderstandings, not to pick almost synonyms.
You didn’t read the article, and so you are missing the point. In spectacular fashion I might add.
So, buildings should be made out of bricks, therefore scientific theories should be made out of bricks?
I contend that a theory can make more predictions than are explicitly written down. Most theories make infinitely many predictions. A logically omniscient Ideal Bayesian would immediately be able to see all those predictions just from looking at the theory, a Human Bayesian may not, but they still exist.
What do you think is more likely:
1) I meant
2) I meant something else which you didn’t understand
?
Can you specify the infinitely many predictions of the theory “Mary had a little lamb” without missing any I deem important structural issues? Saying the theory “Mary had a little lamb” is not just a prediction but infinitely many predictions is non-standard terminology right? Did you invent this terminology during this argument, or did you always use? Are there articles on it?
Bayesians don’t treat the concept of a theory as being fundamental to epistemology (which is why I wanted to taboo it), so I tried to figure out the closest Bayesian analogue to what you were saying and used that.
As for 1) and 2), I was merely pointing out that “program’s are a type of knowledge, programs should be modular, therefore knowledge should be modular” and “building’s are a type of knowledge, buildings should be made of bricks, therefore knowledge should be made of bricks” are of the same form and equally valid. Since the latter is clearly wrong, I was making the point that the former is also wrong.
To be honest I have never seen a better demonstration of the importance of narrowness than your last few comments, they are exactly the kind of rubbish you end up talking when you make a concept too broad.
I didn’t make that argument. Try to be more careful not to put words into my mouth.
When you have a reputation like curi’s this is exactly the sort of rhetorical question you should avoid asking.
What should I do, do you think? I take it you know what my goals are in order to judge this issue. Neat. What are they? Also what’s my reputation like?
FTFY