Well that was a straightforward answer.
alex_zag_al
The metaphor’s going over my head. Don’t feel obligated to explain though, I’m only mildly curious. But know that it’s not obvious to everyone.
...my suggestion is that truth-seeking (science etc) has increased in usefulness over time, whereas charisma is probably roughly the same as it has been for a long time.
Yes, and I think it’s a good suggestion. I think I can phrase my real objection better now.
My objection is that I don’t think this article gives any evidence for that suggestion. The historical storytelling is a nice illustration, but I don’t think it’s evidence.
I don’t think it’s evidence because I don’t expect evolutionary reasoning at this shallow a depth to produce reliable results. Historical storytelling can justify all sorts of things, and if it justifies your suggestion, that doesn’t really mean anything to me.
A link to a more detailed evolutionary argument written by someone else, or even just a link to a Wikipedia article on the general concept, would have changed this. But what’s here is just evolutionary/historical storytelling like I’ve seen justifying all sorts of incorrect conclusions, and the only difference is that I happen to agree with the conclusion.
If you just want to illustrate something that you expect your readers to already believe, this is fine. If you want to convince anybody you’d need a different article.
This is from a novel (Three Parts Dead by Max Gladstone). The situation is a man and a woman who have to work together but have trouble trusting each other because of propaganda from an old war:
[Abelard] hesitated, suddenly aware that he was alone with a woman he barely trusted, a woman who, had they met only a few decades before, would have tried to kill him and destroy the gods he served. Tara hated propaganda for this reason. Stories always outlasted their usefulness.
Colin Howson, talking about how Cox’s theorem bears the mark of Cox’s training as a physicist (source):
An alternative approach is to start immediately with a quantitative notion and think of general principles that any acceptable numerical measure of uncertainty should obey. R.T. Cox and I.J. Good, working independently in the mid nineteen-forties, showed how strikingly little in the way of constraints on a numerical measure yield the finitely additive probability functions as canonical representations. It is not just the generality of the assumptions that makes the Cox–Good result so significant: unlike some of those which have to be imposed on a qualitative probability ordering, the assumptions used by Cox and to a somewhat lesser extent Good seem to have the property of being uniformly self-evidently analytic principles of numerical epistemic probability whatever particular scale it might be measured in. Cox was a working physicist and his point of departure was a typical one: to look for invariant principles:
To consider first … what principles of probable inference will hold however probability is measured. Such principles, if there are any, will play in the theory of probable inference a part like that of Carnot’s principle in thermodynamics, which holds for all possible scales of temperature, or like the parts played in mechanics by the equations of Lagrange and Hamilton, which have the same form no matter what system of coordinates is used in the description of motion. [Cox 1961]
I like this post, I like the example, I like the point that science is newer than debate and so we’re probably more naturally inclined to debate. I don’t like the apparently baseless storytelling.
In the jungle of our evolutionary childhood, humanity formed groups to survive. In these groups there was a hierachy of importance, status and power. Predators, starvation, rival groups and disease all took the weak on a regular basis, but the groups afforded a partial protection. However, a violent or unpleasant death still remained a constant threat. It was of particular threat to the lowest and weakest members of the group. Sometimes these individuals were weak because they were physically weak. However, over time groups that allowed and rewarded things other than physical strength became more successful. In these groups, discussion played a much greater role in power and status. The truely strong individuals, the winners in this new arena were one’s that could direct converstation in their favour—conversations about who will do what, about who got what, and about who would be punished for what. Debates were fought with words, but they could end in death all the same.
I don’t know much about the environment of evolutionary adaptation, but it sounds like you don’t either. Jungle? Didn’t we live on the savannah? And forming groups for survival, it seems just as plausible that we formed groups for availability of mates.
If you don’t know what the EEA was like, why use it as an example? All you really know is about the modern world. I think reasoning about the modern world makes your point quite well in fact. There are still plenty of people living and dying dependent on their persuasive ability. For example, Adolf Hitler lived while Ernst Rohm died. And we can guess that it’s been like this since the beginning of humanity and that this has bred us to have certain behaviors.
I think this reasoning is a lot more reliable, in fact, than imagining what the EEA was like without any education in the subject.
Maybe I’m being pedantic—the middle of the post is structured as a story, a chronology. It definitely reads nicely that way.
Hmm. Yeah, that’s tough. What do you use to calculate probabilities of the principles of logic you use to calculate probabilities?
Although, it seems to me that a bigger problem than the circularity is that I don’t know what kinds of things are evidence for principles of logic. At least for the probabilities of, say, mathematical statements, conditional on the principles of logic we use to reason about them, we have some idea. Many consequences of a generalization being true are evidence for a generalization, for example. A proof of an analogous theorem is evidence for a theorem. So I can see that the kinds of things that are evidence for mathematical statements are other mathematical statements.
I don’t have nearly as clear a picture of what kinds of things lead us to accept principles of logic, and what kind of statements they are. Whether they’re empirical observations, principles of logic themselves, or what.
Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?
Like...
uncertain parameters—simulation--> distribution over state
noisy observations—standard bayesian update--> refined distribution over state
Cari Kaufman’s research profile made me think that’s something she was interested in. But I haven’t found any publications by her or anyone else that actually do this.
I actually think that I misread her research description, latching on to the one familiar idea.
This reminds me of the story of Robert Edgar, who created the DNA and protein sequence alignment program MUSCLE.
He got a PhD in physics, but considers that a mistake. He did his bioinformatics work after selling a company and having free time. The bioinformatics work was notable enough that it’s how I know of him.
His blog post, from which I learned this story: https://thewinnower.com/discussions/an-unemployed-gentleman-scholar
added, with whatever little bits of summary I could get by skimming.
It’s true that this is a case of logical uncertainty.
However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn’t mean you need to represent it with probability theory.
In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving… okay, yeah, in automated theorem proving I can’t explain why you’d want to use probability theory in particular.
But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that’s a lot of problems. I’m not sure how it would stack up against other methods, but it’d be interesting if that became a paradigm for at least some problems.
In fact, now that you’ve inspired me to look for it, I find that it’s being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano’s report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I’m finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.
They wouldn’t classify their work that way, and in fact I thought that was the whole point of surveying these other fields. Like, for example, a question for philosophers in the 1600s is now a question for biologists, and that’s why we have to survey biologists to find out if it was resolved.
Yes. Because, we’re trying to express uncertainty about the consequences of axioms. Not about axioms themselves.
common_law’s thinking does seem to be something people actually do. Like, we’re uncertain about the consequences of the laws of physics, while simultaneously being uncertain of the laws of physics, while simultaneously being uncertain if we’re thinking about it in a logical way. But, it’s not the kind of uncertainty that we’re trying to model, in the applications I’m talking about. The missing piece in these applications are probabilities conditional on axioms.
Nice. Links added to post and I’ll check them out later. The Duc and Williamson papers were from a post of yours, by the way. Some, MIRI status report or something. I don’t remember.
Applications of logical uncertainty
Logical uncertainty reading list
I now think you’re right that logical uncertainty doesn’t violate any of Jaynes’s desiderata. Which means I should probably try to follow them more closely, if they don’t create problems like I thought they would.
An Aspiring Rationalist’s Ramble has a post asserting the same thing, that nothing in the desiderata implies logical omniscience.
Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it.
There has to be a better way to put this.
The problem is that sometimes you can anticipate the direction. For example, if someone’s flipping a coin, and you think it might have two heads. This is a simple example because a heads is always evidence in favor of the two-heads hypothesis, and a tails is always evidence in favor of the normal-coin hypothesis. We can see you become sure of the direction of evidence in this scenario: If the prior prob of two heads is 1⁄2, then after about ten heads you’re 99% sure the eleventh is also going to be heads.
However, I do think that this is just because of very artificial features of the example that would never hold when making first impressions of people. Specifically, what’s going on in the coin example is a hypothesis that we’re very sure of, that makes very specific predictions. I can’t prove it, but I think that’s what allows you to be very sure of the update direction.
This never happens in social situations where you’ve just recently met someone—you’re never sure of a hypothesis that makes very specific predictions, are you?
I don’t know. I do know that there’s some element of the situation besides conservation of probability going into this. It takes more than just that to derive that updates will be gradual and in an unpredictable direction.
(EDIT: I didn’t emphasize this but updates aren’t necessarily gradual in the coin example—a tails leads to an extreme update. I think that might be related—an extreme update in an unexpected direction balancing a small one in a known direction?)
You’ll note that I don’t try to modestly say anything like, “Well, I may not be as brilliant as Jaynes or Conway, but that doesn’t mean I can’t do important things in my chosen field.”
Because I do know… that’s not how it works.
Maybe not in your field, but that is how it usually works, isn’t it?
(the rest of this comment is basically an explanation of comparative advantage)
Anybody can take the load off of someone smarter, by doing the easiest tasks that have been taking their time.
As a most obvious example, a brilliant scientist’s secretary. Another example: a brilliant statistician that employs a programmer, who turns his statistical and computational ideas into efficient, easy-to-use software. He doesn’t have to be the best programmer, and doesn’t have to be that great at statistics, but he allows the statistician to publish usable implementations of his statistical methods without having to code them himself.
Or, here’s another one: I’ve heard MIRI needs a science writer, or needs funding for one. You don’t have to be Eliezer Yudkowsky-level at thinking about FAI to save Yudkowsky the time it takes to write manuscripts that can be published in science journals, and then Yudkowsky uses that time for research.
This is “important work.” It’s not the kind of important work Jaynes or Conway does, and it doesn’t put your name in the history books, and if that’s what was meant by the article I have no disagreement. But by any utilitarian standard of importance, it’s important.
(last time I heard the word “jungle” was a Peruvian guy saying his dad grew up in the jungle and telling me about Peruvian native marriage traditions)