The way I’ve heard the two generals problem, the lesson is that it’s unsolvable in theory, but approximately (to an arbitrary degree of approximation) solvable in practice (e.g. via message confirmations, probabilistic reasoning, and optimistic strategies), especially because channel reliability can be improved through protocols (at the cost of latency).
I also think that taking literally a statement like “LessWrong curated posts help to establish common knowledge” is outright wrong; instead, there’s an implied transformation to “LessWrong curated posts help to increase the commonness of knowledge”, where “commonness” is some property of knowledge, which common knowledge has to an infinite degree; p in p-common knowledge is an initially plausible measure of commonness, but I’m not sure we ever get up to infinity.
Intuitively, though, increasing commonness of knowledge is a comparative property: the probability that your interlocutor has read the post is higher if it’s curated than if it’s not, all else being equal, and, crucially, this property of curation is known (with some probability) to your interlocutor.
The way I’ve heard the two generals problem, the lesson is that it’s unsolvable in theory, but approximately (to an arbitrary degree of approximation) solvable in practice
My response would be that this unfairly (and even absurdly) maligns “theory”! The “theory” here seems like a blatant straw-man, since it pretends probabilistic reasoning and cost-benefit tradeoffs are impossible.
I also think that taking literally a statement like “LessWrong curated posts help to establish common knowledge” is outright wrong; instead, there’s an implied transformation
Let me sketch the situation you’re describing as I understand it.
Some people (as I understand it, core LessWrong staff, although I didn’t go find a reference) justify some things in terms of common knowledge.
You either think that they were not intending this literally, or at least, that no one else should take them literally, and instead should understand “common knowledge” to mean something informal (which you yourself admit you’re somewhat unclear on the precise meaning of).
My problem with this is that it creates a missing stair kind of issue. There’s the people “in the know” who understand how to walk carefully on the dark stairway, but there’s also a class of “newcomers” who are liable to fall. (Where “fall” here means, take all the talk of “common knowledge” literally.)
I would think this situation excusable if “common knowledge” were a very useful metaphor with no brief substitute that better conveys what’s really going on; but you yourself suggest “commonness of knowledge” as such a substitute.
p in p-common knowledge is an initially plausible measure of commonness, but I’m not sure we ever get up to infinity.
Just want to flag the philosopher’s defense here, which is that you can have a finite concept of p-common knowledge, which merely implies any finite level of iteration you need, without actually requiring all of those implications to be taken.
(Granted, I can see having some questions about this.)
Intuitively, though, increasing commonness of knowledge is a comparative property: the probability that your interlocutor has read the post is higher if it’s curated than if it’s not, all else being equal, and, crucially, this property of curation is known (with some probability) to your interlocutor.
Whereas the explicit theory of common knowledge contradicts this comparative idea, and instead asserts that common knowledge is an absolute yes-or-no thing, such that finite-level approximations fall far short of the real thing. This idea is illustrated with the electronic messaging example, which purports to show that any number of levels of finite iteration are as good as no communication at all.
My response would be that this unfairly (and even absurdly) maligns “theory”!
I agree. However, the way I’ve had the two generals problem framed to me, it’s not a solution unless it guarantees successful coordination. Like, if I claim to solve the halting problem because in practice I can tell if a program halts, most of the time at least, I’m misunderstanding the problem statement. I think that conflating “approximately solves the 2GP” with “solves the 2GP” is roughly as malign as my claim that the approximate solution is not the realm of theory.
Some people (as I understand it, core LessWrong staff, although I didn’t go find a reference) justify some things in terms of common knowledge.
You either think that they were not intending this literally, or at least, that no one else should take them literally, and instead should understand “common knowledge” to mean something informal (which you yourself admit you’re somewhat unclear on the precise meaning of).
I think that the statement, taken literally, is false, and egregiously so. I don’t know how the LW staff meant it, but I don’t think they should mean it literally. I think that when encountering a statement that is literally false, one useful mental move is to see if you can salvage it, and that one useful way to do so is to reinterpret an absolute as a gradient (and usually to reduce the technical precision). Now that you have written this post, the commonality of the knowledge that the statement should not be taken literally and formally is increased; whether the LW staff responds by changing the statement they use, or by adding a disclaimer somewhere, or by ignoring all of us and expecting people to figure it out on their own, I did not specify.
My problem with this is that it creates a missing stair kind of issue. There’s the people “in the know” who understand how to walk carefully on the dark stairway, but there’s also a class of “newcomers” who are liable to fall. (Where “fall” here means, take all the talk of “common knowledge” literally.)
Yes, and I think as aspiring rationalists we should try to eventually do better in our communications, so I think that mentions of common knowledge should be one of:
explicitly informal, intended to gesture to some real world phenomenon that has the same flavor
explicitly contrived, like the blue islanders puzzle
explicitly something else, like p-common knowledge; but beware, that’s probably not meant either
This idea is illustrated with the electronic messaging example, which purports to show that any number of levels of finite iteration are as good as no communication at all.
I think (I haven’t read the SEP link) that this is correct—in the presence of uncertainty, iteration does not achieve the thing we are referring to precisely as “common knowledge”—but we don’t care, for the reasons mentioned in your post.
I think your post and my reply together actually point to two interesting lines of research:
formalize measures of “commonness” of knowledge and see how they respond to realistic scenarios such as “signal boosting”
see if there is an interesting “approximate common knowledge”, vaguely analogous to The Complexity of Agreement
I agree. However, the way I’ve had the two generals problem framed to me, it’s not a solution unless it guarantees successful coordination. Like, if I claim to solve the halting problem because in practice I can tell if a program halts, most of the time at least, I’m misunderstanding the problem statement. I think that conflating “approximately solves the 2GP” with “solves the 2GP” is roughly as malign as my claim that the approximate solution is not the realm of theory.
I think this is very fair (and I will think about editing my post in response).
My response would be that this unfairly (and even absurdly) maligns “theory”! The “theory” here seems like a blatant straw-man, since it pretends probabilistic reasoning and cost-benefit tradeoffs are impossible.
This is because mathematics plays with evidence very differently than most other fields. If the probability is not equivalent to 1 (I.E it must happen), it doesn’t matter. A good example of this is the Collatz conjecture. It has a stupendous amount of evidence in the finite data points regime, but no mathematician worth their salt would declare Collatz solved, since it needs to have an actual proof. Similarly, P=NP is another problem with vast evidence, but no proof.
And a proof, if correct, is essentially equivalent to an infinite amount of observations, since you usually need to show that it holds up to infinite amounts of examples, so one Adversarial datapoint ruins the effort.
Mathematics is what happens when 0 or 1 are the only sets of probabilities allowed, that is a much stricter standard exists in math.
To nitpick, a mathematician won’t even accept that probability 1 is proof. Lots of probability 1 statements aren’t provable and could even be false from a logical perspective.
For example, a random walk in one dimension crosses any point infinitely many times, with probability one. But it could go in one direction forever.
So, proof could be said to be nontrivially stronger than even an infinite number of observations.
I feel like this is a difference between “almost surely” and “surely”, both of which are typically expressed as “probability 1”, but which are qualitatively different. I’m wondering whether infinitesimals would actually work to represent “almost surely” as 1−ϵ (as suggested in this post).
Also a nitpick and a bit of a tangent, but in some cases, a mathematician will accept any probability > 1 as proof; probabilistic proof is a common tool for non-constructive proof of existence, especially in combinatorics (although the way I’ve seen it, it’s usually more of a counting argument than something that relies essentially on probability).
A good example of this is the Collatz conjecture. It has a stupendous amount of evidence in the finite data points regime, but no mathematician worth their salt would declare Collatz solved, since it needs to have an actual proof.
It’s important to distinguish the probability you’d get from a naive induction argument and a more credible one that takes into account the reference class of similar mathematical statements that hold until large but finite limits but may or may not hold for all naturals.
Similarly, P=NP is another problem with vast evidence, but no proof.
Arguably even more than the Collatz conjecture, if you believe Scott Aaronson (link).
Mathematics is what happens when 0 or 1 are the only sets of probabilities allowed, that is a much stricter standard exists in math.
As I mentioned in my other reply, there are domains of math that do accept probabilities < 1 as proof (but it’s in a very special way). Also, proof usually comes with insight (this is one of the reasons that the proof of the 4 color theorem was controversial), which is almost more valuable than the certainty it brings (and of course as skeptics and bayesians, we must refrain from treating it as completely certain anyway; there could have been a logical error that everyone missed).
I think talking about proofs in terms of probabilities is a category error; logical proof is analogous to computable functions (equivalent if you are a constructivist), while probabilities are about bets.
I think talking about proofs in terms of probabilities is a category error; logical proof is analogous to computable functions (equivalent if you are a constructivist), while probabilities are about bets.
I agree that logical/mathematical proofs are more analogous to functions than probabilities, but they don’t have to be only computable functions, even if it’s all we will ever access, and that actually matters, especially in non-constructive math and proofs.
I also realized why probability 0 and 1 aren’t enough for proof, or equivalently why Abram Demski’s observation that a proof is stronger than an infinite number of observations is correct.
And it essentially boils down to the fact that in infinite sets, there are certain scenarios where the probability of an outcome is 0, but it’s still possible to get that outcome, or equivalently the probability is 1, but that doesn’t mean that it doesn’t have counterexamples. The best example is throwing a dart at a diagonal corner has probability 0, yet it still can happen. This doesn’t happen in finite sets, because a probability of 0 or 1 implicitly means that you have all your sample points, and for probability 0 means that it’s impossible to do, because you have no counterexamples, and for probability 1 you have a certainty proof, because you have no counterexamples. Mathematical conjectures and proofs usually demand something stronger than that of probability in infinite sets: A property of a set can’t hold at all, and there are no members of a set where that property holds, or a property of a set always holds, and the set of counterexamples is empty.
(Sometimes mathematics conjectures that something must exist, but this doesn’t change the answer to why probability!=proof in math, or why probability!=possibility in infinite sets.)
Unfortunately, infinite sets are the rule, not the exception in mathematics, and this is still true of even uncomputably large sets, like the arithmetical hierarchy of halting oracles. Finite sets are rare in mathematics, especially for proof and conjecture purposes.
The way I’ve heard the two generals problem, the lesson is that it’s unsolvable in theory, but approximately (to an arbitrary degree of approximation) solvable in practice (e.g. via message confirmations, probabilistic reasoning, and optimistic strategies), especially because channel reliability can be improved through protocols (at the cost of latency).
I also think that taking literally a statement like “LessWrong curated posts help to establish common knowledge” is outright wrong; instead, there’s an implied transformation to “LessWrong curated posts help to increase the commonness of knowledge”, where “commonness” is some property of knowledge, which common knowledge has to an infinite degree; p in p-common knowledge is an initially plausible measure of commonness, but I’m not sure we ever get up to infinity.
Intuitively, though, increasing commonness of knowledge is a comparative property: the probability that your interlocutor has read the post is higher if it’s curated than if it’s not, all else being equal, and, crucially, this property of curation is known (with some probability) to your interlocutor.
My response would be that this unfairly (and even absurdly) maligns “theory”! The “theory” here seems like a blatant straw-man, since it pretends probabilistic reasoning and cost-benefit tradeoffs are impossible.
Let me sketch the situation you’re describing as I understand it.
Some people (as I understand it, core LessWrong staff, although I didn’t go find a reference) justify some things in terms of common knowledge.
You either think that they were not intending this literally, or at least, that no one else should take them literally, and instead should understand “common knowledge” to mean something informal (which you yourself admit you’re somewhat unclear on the precise meaning of).
My problem with this is that it creates a missing stair kind of issue. There’s the people “in the know” who understand how to walk carefully on the dark stairway, but there’s also a class of “newcomers” who are liable to fall. (Where “fall” here means, take all the talk of “common knowledge” literally.)
I would think this situation excusable if “common knowledge” were a very useful metaphor with no brief substitute that better conveys what’s really going on; but you yourself suggest “commonness of knowledge” as such a substitute.
Just want to flag the philosopher’s defense here, which is that you can have a finite concept of p-common knowledge, which merely implies any finite level of iteration you need, without actually requiring all of those implications to be taken.
(Granted, I can see having some questions about this.)
Whereas the explicit theory of common knowledge contradicts this comparative idea, and instead asserts that common knowledge is an absolute yes-or-no thing, such that finite-level approximations fall far short of the real thing. This idea is illustrated with the electronic messaging example, which purports to show that any number of levels of finite iteration are as good as no communication at all.
I agree. However, the way I’ve had the two generals problem framed to me, it’s not a solution unless it guarantees successful coordination. Like, if I claim to solve the halting problem because in practice I can tell if a program halts, most of the time at least, I’m misunderstanding the problem statement. I think that conflating “approximately solves the 2GP” with “solves the 2GP” is roughly as malign as my claim that the approximate solution is not the realm of theory.
I think that the statement, taken literally, is false, and egregiously so. I don’t know how the LW staff meant it, but I don’t think they should mean it literally. I think that when encountering a statement that is literally false, one useful mental move is to see if you can salvage it, and that one useful way to do so is to reinterpret an absolute as a gradient (and usually to reduce the technical precision). Now that you have written this post, the commonality of the knowledge that the statement should not be taken literally and formally is increased; whether the LW staff responds by changing the statement they use, or by adding a disclaimer somewhere, or by ignoring all of us and expecting people to figure it out on their own, I did not specify.
Yes, and I think as aspiring rationalists we should try to eventually do better in our communications, so I think that mentions of common knowledge should be one of:
explicitly informal, intended to gesture to some real world phenomenon that has the same flavor
explicitly contrived, like the blue islanders puzzle
explicitly something else, like p-common knowledge; but beware, that’s probably not meant either
I think (I haven’t read the SEP link) that this is correct—in the presence of uncertainty, iteration does not achieve the thing we are referring to precisely as “common knowledge”—but we don’t care, for the reasons mentioned in your post.
I think your post and my reply together actually point to two interesting lines of research:
formalize measures of “commonness” of knowledge and see how they respond to realistic scenarios such as “signal boosting”
see if there is an interesting “approximate common knowledge”, vaguely analogous to The Complexity of Agreement
I think this is very fair (and I will think about editing my post in response).
This is because mathematics plays with evidence very differently than most other fields. If the probability is not equivalent to 1 (I.E it must happen), it doesn’t matter. A good example of this is the Collatz conjecture. It has a stupendous amount of evidence in the finite data points regime, but no mathematician worth their salt would declare Collatz solved, since it needs to have an actual proof. Similarly, P=NP is another problem with vast evidence, but no proof.
And a proof, if correct, is essentially equivalent to an infinite amount of observations, since you usually need to show that it holds up to infinite amounts of examples, so one Adversarial datapoint ruins the effort.
Mathematics is what happens when 0 or 1 are the only sets of probabilities allowed, that is a much stricter standard exists in math.
To nitpick, a mathematician won’t even accept that probability 1 is proof. Lots of probability 1 statements aren’t provable and could even be false from a logical perspective.
For example, a random walk in one dimension crosses any point infinitely many times, with probability one. But it could go in one direction forever.
So, proof could be said to be nontrivially stronger than even an infinite number of observations.
I feel like this is a difference between “almost surely” and “surely”, both of which are typically expressed as “probability 1”, but which are qualitatively different. I’m wondering whether infinitesimals would actually work to represent “almost surely” as 1−ϵ (as suggested in this post).
Also a nitpick and a bit of a tangent, but in some cases, a mathematician will accept any probability > 1 as proof; probabilistic proof is a common tool for non-constructive proof of existence, especially in combinatorics (although the way I’ve seen it, it’s usually more of a counting argument than something that relies essentially on probability).
It’s important to distinguish the probability you’d get from a naive induction argument and a more credible one that takes into account the reference class of similar mathematical statements that hold until large but finite limits but may or may not hold for all naturals.
Arguably even more than the Collatz conjecture, if you believe Scott Aaronson (link).
As I mentioned in my other reply, there are domains of math that do accept probabilities < 1 as proof (but it’s in a very special way). Also, proof usually comes with insight (this is one of the reasons that the proof of the 4 color theorem was controversial), which is almost more valuable than the certainty it brings (and of course as skeptics and bayesians, we must refrain from treating it as completely certain anyway; there could have been a logical error that everyone missed).
I think talking about proofs in terms of probabilities is a category error; logical proof is analogous to computable functions (equivalent if you are a constructivist), while probabilities are about bets.
I agree that logical/mathematical proofs are more analogous to functions than probabilities, but they don’t have to be only computable functions, even if it’s all we will ever access, and that actually matters, especially in non-constructive math and proofs.
I also realized why probability 0 and 1 aren’t enough for proof, or equivalently why Abram Demski’s observation that a proof is stronger than an infinite number of observations is correct.
And it essentially boils down to the fact that in infinite sets, there are certain scenarios where the probability of an outcome is 0, but it’s still possible to get that outcome, or equivalently the probability is 1, but that doesn’t mean that it doesn’t have counterexamples. The best example is throwing a dart at a diagonal corner has probability 0, yet it still can happen. This doesn’t happen in finite sets, because a probability of 0 or 1 implicitly means that you have all your sample points, and for probability 0 means that it’s impossible to do, because you have no counterexamples, and for probability 1 you have a certainty proof, because you have no counterexamples. Mathematical conjectures and proofs usually demand something stronger than that of probability in infinite sets: A property of a set can’t hold at all, and there are no members of a set where that property holds, or a property of a set always holds, and the set of counterexamples is empty.
(Sometimes mathematics conjectures that something must exist, but this doesn’t change the answer to why probability!=proof in math, or why probability!=possibility in infinite sets.)
Unfortunately, infinite sets are the rule, not the exception in mathematics, and this is still true of even uncomputably large sets, like the arithmetical hierarchy of halting oracles. Finite sets are rare in mathematics, especially for proof and conjecture purposes.
Here’s a link on where I got the insight from:
https://en.wikipedia.org/wiki/Almost_surely