I would like to defend fuzzy logic at greater length, but I might not find the time. So, here is my sketch.
Like Richard, I am not defending fuzzy logic as exactly correct, but I am defending it as a step in the right direction.
The Need for Truth
As Richard noted, meaning is context-dependent. When I say “is there water in the fridge?” I am not merely referring to h2o; I am referring to something like a container of relatively pure water in easily drinkable form.
However, I claim: if we think of statements as being meaningful, we think these context-dependent meanings can in principle be rewritten into a language which lacks the context-independence.
In the language of information theory, the context-dependent language is what we send across the communication channel. The context-independent language is the internal sigma algebra used by the agents attempting to communicate.
You seem to have a similar picture:
It is totally allowed for semantics of a proposition to be very dependent on context within that model—more precisely, there would be a context-free interpretation of the proposition in terms of latent variables, but the way those latents relate to the world would involve a lot of context (including things like “what the speaker intended”, which is itself latent).
I am not sure if Richard would agree with this in principle (EG he might think that even the internal language of agents needs to be highly context-independent, unlike sigma-algebras).
But in any case, if we take this assumption and run with it, it seems like we need a notion of accuracy for these context-independent beliefs. This is typical map-territory thinking; the propositions themselves are thought of as having a truth value, and the probabilities assigned to propositions are judged by some proper scoring rule.
The Problem with Truth
This works fine so long as we talk about truth in a different language (as Tarski pointed out with Tarski’s Undefinability Theorem and the Tarski Hierarchy). However, if we believe that an agent can think in one unified language (modeled by the sigma-algebra in standard information theory / Bayesian theory) and at the same time think of its beliefs in map-territory terms (IE think of its own propositions as having truth-values), we run into a problem—namely, Tarski’s aforementioned undefinability theorem, as exemplified by the Liar Paradox.
The Liar Paradox constructs a self-referential sentence “This sentence is false”. This cannot consistently be assigned either “true” or “false” as an evaluation. Allowing self-referential sentences may seem strange, but it is inevitable in the same way that Goedel’s results are—sufficiently strong languages are going to contain self-referential capabilities whether we like it or not.
Lukasiewicz came up with one possible solution, called Lukaziewicz logic. First, we introduce a third truth value for paradoxical sentences which would otherwise be problematic. Foreshadowing the conclusion, we can call this new value 1⁄2. The Liar Paradox sentence can be evaluated as 1⁄2.
Unfortunately, although the new 1⁄2 truth value can resolve some paradoxes, it introduces new paradoxes. “This sentence is either false or 1/2” cannot be consistently assigned any of the three truth values.
Under some plausible assumptions, Lukaziewicz shows that we can resolve all such paradoxes by taking our truth values from the interval [0,1]. We have a whole spectrum of truth values between true and false. This is essentially fuzzy logic. It is also a model of linear logic.
So, Lukaziewicz logic (and hence a version of fuzzy logic and linear logic) are particularly plausible solutions to the problem of assigning truth-values to a language which can talk about the map-territory relation of its own sentences.
Relative Truth
One way to think about this is that fuzzy logic allows for a very limited form of context-dependent truth. The fuzzy truth values themselves are context-independent. However, in a given context where we are going to simplify such values to a binary, we can do so with a threshhold.
A classic example is baldness. It isn’t clear exactly how much hair needs to be on someone’s head for them to be bald. However, I can make relative statements like “well if you think Jeff is bald, then you definitely have to call Sid bald.”
Fuzzy logic is just supposing that all truth-evaluations have to fall on a spectrum like this (even if we don’t know exactly how). This models a very limited form of context-dependent truth, where different contexts put higher or lower demands on truth, but these demands can be modeled by a single parameter which monotonically admits more/less as true when we shift it up/down.
I’m not denying the existence of other forms of context-dependence, of course. The point is that it seems plausible that we can put up with just this one form of context-dependence in our “basic picture” and allow all other forms to be modeled more indirectly.
Vagueness
My view is close to the view of Saving Truth from Paradox by Hartry Field. Field proposes that truth is vague (so that the baldness example and the Liar Paradox example are closely linked). Based on this idea, he defends a logic (based on fuzzy logic, but not quite the same) based on this idea. His book does (imho) a good job of defending assumptions similar to those Lukaziewicz makes, so that something similar to fuzzy logic starts to look inevitable.
I generally agree that self-reference issues require “fuzzy truth values” in some sense, but for Richard’s purposes I expect that sort of thing to end up looking basically Bayesian (much like he lists logical induction as essentially Bayesian).
Unfortunately, although the new 1⁄2 truth value can resolve some paradoxes, it introduces new paradoxes. “This sentence is either false or 1/2” cannot be consistently assigned any of the three truth values.
Under some plausible assumptions, Lukaziewicz shows that we can resolve all such paradoxes by taking our truth values from the interval [0,1]...
Well, a straightforward continuation of paradox would be “This sentence has truth value in [0;1)”; is it excluded by “plausible assumptions” or overlooked?
Excluded. Truth-functions are required to be continuous, so a predicate that’s true of things in the interval [0,1) must also be true at 1. (Lukaziewicz does not assume continuity, but rather, proves it from other assumptions. In fact, Lukaziewicz is much more restrictive; however, we can safely add any continuous functions we like.)
One justification of this is that it’s simply the price you have to pay for consistency; you (provably) can’t have all the nice properties you might expect. Requiring continuity allows consistent fixed-points to exist.
Of course, this might not be very satisfying, particularly as an argument in favor of Lukaziewicz over other alternatives. How can we justify the exclusion of [0,1) when we seem to be able to refer to it?
As I mentioned earlier, we can think of truth as a vague term, with the fuzzy values representing an ordering of truthiness. Therefore, there should be no way to refer to “absolute truth”.
We have to think of assigning precise numbers to the vague values as merely a way to model this phenomenon. (It’s up to you to decide whether this is just a bit of linguistic slight-of-hand or whether it constitutes a viable position...)
When we try to refer to “absolute truth” we can create a function which outputs 1 on input 1, but which declines sharply as we move away from 1.[1] This is how the model reflects the fact that we can’t refer to absolute truth. We can map 1 to 1 (make a truth-function which is absolutely true only of absolute truth), however, such a function must also be almost-absolutely-true in some small neighborhood around 1. This reflects the idea that we can’t completely distinguish absolute truth from its close neighborhood.
Similarly, when we negate this function, it “represents” [0,1) in the sense that it is only 0 (only ‘absolutely false’) for the value 1, and maps [0,1) to positive truth-values which can be mostly 1, but which must decline in the neighborhood of 1.
And yes, this setup can get us into some trouble when we try to use quantifiers. If “forall” is understood as taking the min, we can construct discontinuous functions as the limit of continuous functions. Hartry Field proposes a fix, but it is rather complex.
Note that some relevant authors in the literature use 0 for true and 1 for false, but I am using 1 for true and 0 for false, as this seems vastly more intuitive.
I’m confused about how continuity poses a problem for “This sentence has truth value in [0,1)” without also posing an equal problem for “this sentence is false”, which was used as the original motivating example.
I’d intuitively expect “this sentence is false” == “this sentence has truth value 0″ == “this sentence does not have a truth value in (0,1]”
“X is false” has to be modeled as something that is value 1 if and only if X is value 0, but continuously decreases in value as X continuously increases in value. The simplest formula is value(X is false) = 1-value(X). However, we can made “sharper” formulas which diminish in value more rapidly as X increases in value. Hartry Field constructs a hierarchy of such predicates which he calls “definitely false”, “definitely definitely false”, etc.
Proof systems for the logic should have the property that sentences are derivable only when they have value 1; so “X is false” or “X is definitely false” etc all share the property that they’re only derivable when X has value zero.
Understood. Does that formulation include most useful sentences?
For instance, “there exists a sentence which is more true than this one” must be excluded as equivalent to “this statement’s truth value is strictly less than 1″, but the extent of such exclusion is not clear to me at first skim.
As Richard noted, meaning is context-dependent. When I say “is there water in the fridge?” I am not merely referring to h2o; I am referring to something like a container of relatively pure water in easily drinkable form.
Then why not consider structure as follows?
you are searching for “something like a container of relatively pure water in easily drinkable form”—or, rather, “[your subconscious-native code] of water-like thing + for drinking”,
you emit sequence of tokens (sounds/characters) “is there water in the fridge?”, approximating previous idea (discarding your intent to drink it as it might be inferred from context, omitting that you can drink something close to water),
conversation partner hears “is there water in the fridge?”, converted into thought “you asked ‘is there water in the fridge?’”,
and interprets words as “you need something like a container of relatively pure water in easily drinkable form”—or, rather, “[their subconscious-native code] for another person, a water-like thing + for drinking”.
That messes up with “meanings of sentences” but is necessary to rationally process filtered evidence.
Each statement that the clever arguer makes is valid evidence—how could you not update your probabilities? Has it ceased to be true that, in such-and-such a proportion of Everett branches or Tegmark duplicates in which box B has a blue stamp, box B contains a diamond? According to Jaynes, a Bayesian must always condition on all known evidence, on pain of paradox. But then the clever arguer can make you believe anything they choose, if there is a sufficient variety of signs to selectively report.
It seems to me that there is a really interesting interplay of different forces here, which we don’t yet know how to model well.
Even if Alice tries meticulously to only say literally true things, and be precise about her meanings, Bob can and should infer more than what Alice has literally said, by working backwards to infer why she has said it rather than something else.
So, pragmatics is inevitable, and we’d be fools not to take advantage of it.
However, we also really like transparent contexts—that is, we like to be able to substitute phrases for equivalent phrases (equational reasoning, like algebra), and make inferences based on substitution-based reasoning (if all bachelors are single, and Jerry is a bachelor, then Jerry is single).
To put it simply, things are easier when words have context-independent meanings (or more realistically, meanings which are valid across a wide array of contexts, although nothing will be totally context-independent).
This puts contradictory pressure on language. Pragmatics puts pressure towards highly context-dependent meaning; reasoning puts pressure towards highly context-independent meaning.
If someone argues a point by conflation (uses a word in two different senses, but makes an inference as if the word had one sense) then we tend to fault using the same word in two different senses, rather than fault basic reasoning patterns like transitivity of implication (A implies B, and B implies C, so A implies C). Why is that? Is that the correct choice? If meanings are inevitably context-dependent anyway, why not give up on reasoning? ;p
I would like to defend fuzzy logic at greater length, but I might not find the time. So, here is my sketch.
Like Richard, I am not defending fuzzy logic as exactly correct, but I am defending it as a step in the right direction.
The Need for Truth
As Richard noted, meaning is context-dependent. When I say “is there water in the fridge?” I am not merely referring to h2o; I am referring to something like a container of relatively pure water in easily drinkable form.
However, I claim: if we think of statements as being meaningful, we think these context-dependent meanings can in principle be rewritten into a language which lacks the context-independence.
In the language of information theory, the context-dependent language is what we send across the communication channel. The context-independent language is the internal sigma algebra used by the agents attempting to communicate.
You seem to have a similar picture:
I am not sure if Richard would agree with this in principle (EG he might think that even the internal language of agents needs to be highly context-independent, unlike sigma-algebras).
But in any case, if we take this assumption and run with it, it seems like we need a notion of accuracy for these context-independent beliefs. This is typical map-territory thinking; the propositions themselves are thought of as having a truth value, and the probabilities assigned to propositions are judged by some proper scoring rule.
The Problem with Truth
This works fine so long as we talk about truth in a different language (as Tarski pointed out with Tarski’s Undefinability Theorem and the Tarski Hierarchy). However, if we believe that an agent can think in one unified language (modeled by the sigma-algebra in standard information theory / Bayesian theory) and at the same time think of its beliefs in map-territory terms (IE think of its own propositions as having truth-values), we run into a problem—namely, Tarski’s aforementioned undefinability theorem, as exemplified by the Liar Paradox.
The Liar Paradox constructs a self-referential sentence “This sentence is false”. This cannot consistently be assigned either “true” or “false” as an evaluation. Allowing self-referential sentences may seem strange, but it is inevitable in the same way that Goedel’s results are—sufficiently strong languages are going to contain self-referential capabilities whether we like it or not.
Lukasiewicz came up with one possible solution, called Lukaziewicz logic. First, we introduce a third truth value for paradoxical sentences which would otherwise be problematic. Foreshadowing the conclusion, we can call this new value 1⁄2. The Liar Paradox sentence can be evaluated as 1⁄2.
Unfortunately, although the new 1⁄2 truth value can resolve some paradoxes, it introduces new paradoxes. “This sentence is either false or 1/2” cannot be consistently assigned any of the three truth values.
Under some plausible assumptions, Lukaziewicz shows that we can resolve all such paradoxes by taking our truth values from the interval [0,1]. We have a whole spectrum of truth values between true and false. This is essentially fuzzy logic. It is also a model of linear logic.
So, Lukaziewicz logic (and hence a version of fuzzy logic and linear logic) are particularly plausible solutions to the problem of assigning truth-values to a language which can talk about the map-territory relation of its own sentences.
Relative Truth
One way to think about this is that fuzzy logic allows for a very limited form of context-dependent truth. The fuzzy truth values themselves are context-independent. However, in a given context where we are going to simplify such values to a binary, we can do so with a threshhold.
A classic example is baldness. It isn’t clear exactly how much hair needs to be on someone’s head for them to be bald. However, I can make relative statements like “well if you think Jeff is bald, then you definitely have to call Sid bald.”
Fuzzy logic is just supposing that all truth-evaluations have to fall on a spectrum like this (even if we don’t know exactly how). This models a very limited form of context-dependent truth, where different contexts put higher or lower demands on truth, but these demands can be modeled by a single parameter which monotonically admits more/less as true when we shift it up/down.
I’m not denying the existence of other forms of context-dependence, of course. The point is that it seems plausible that we can put up with just this one form of context-dependence in our “basic picture” and allow all other forms to be modeled more indirectly.
Vagueness
My view is close to the view of Saving Truth from Paradox by Hartry Field. Field proposes that truth is vague (so that the baldness example and the Liar Paradox example are closely linked). Based on this idea, he defends a logic (based on fuzzy logic, but not quite the same) based on this idea. His book does (imho) a good job of defending assumptions similar to those Lukaziewicz makes, so that something similar to fuzzy logic starts to look inevitable.
I generally agree that self-reference issues require “fuzzy truth values” in some sense, but for Richard’s purposes I expect that sort of thing to end up looking basically Bayesian (much like he lists logical induction as essentially Bayesian).
Yeah, I agree with that.
Well, a straightforward continuation of paradox would be “This sentence has truth value in [0;1)”; is it excluded by “plausible assumptions” or overlooked?
Excluded. Truth-functions are required to be continuous, so a predicate that’s true of things in the interval [0,1) must also be true at 1. (Lukaziewicz does not assume continuity, but rather, proves it from other assumptions. In fact, Lukaziewicz is much more restrictive; however, we can safely add any continuous functions we like.)
One justification of this is that it’s simply the price you have to pay for consistency; you (provably) can’t have all the nice properties you might expect. Requiring continuity allows consistent fixed-points to exist.
Of course, this might not be very satisfying, particularly as an argument in favor of Lukaziewicz over other alternatives. How can we justify the exclusion of [0,1) when we seem to be able to refer to it?
As I mentioned earlier, we can think of truth as a vague term, with the fuzzy values representing an ordering of truthiness. Therefore, there should be no way to refer to “absolute truth”.
We have to think of assigning precise numbers to the vague values as merely a way to model this phenomenon. (It’s up to you to decide whether this is just a bit of linguistic slight-of-hand or whether it constitutes a viable position...)
When we try to refer to “absolute truth” we can create a function which outputs 1 on input 1, but which declines sharply as we move away from 1.[1] This is how the model reflects the fact that we can’t refer to absolute truth. We can map 1 to 1 (make a truth-function which is absolutely true only of absolute truth), however, such a function must also be almost-absolutely-true in some small neighborhood around 1. This reflects the idea that we can’t completely distinguish absolute truth from its close neighborhood.
Similarly, when we negate this function, it “represents” [0,1) in the sense that it is only 0 (only ‘absolutely false’) for the value 1, and maps [0,1) to positive truth-values which can be mostly 1, but which must decline in the neighborhood of 1.
And yes, this setup can get us into some trouble when we try to use quantifiers. If “forall” is understood as taking the min, we can construct discontinuous functions as the limit of continuous functions. Hartry Field proposes a fix, but it is rather complex.
Note that some relevant authors in the literature use 0 for true and 1 for false, but I am using 1 for true and 0 for false, as this seems vastly more intuitive.
I’m confused about how continuity poses a problem for “This sentence has truth value in [0,1)” without also posing an equal problem for “this sentence is false”, which was used as the original motivating example.
I’d intuitively expect “this sentence is false” == “this sentence has truth value 0″ == “this sentence does not have a truth value in (0,1]”
“X is false” has to be modeled as something that is value 1 if and only if X is value 0, but continuously decreases in value as X continuously increases in value. The simplest formula is value(X is false) = 1-value(X). However, we can made “sharper” formulas which diminish in value more rapidly as X increases in value. Hartry Field constructs a hierarchy of such predicates which he calls “definitely false”, “definitely definitely false”, etc.
Proof systems for the logic should have the property that sentences are derivable only when they have value 1; so “X is false” or “X is definitely false” etc all share the property that they’re only derivable when X has value zero.
Understood. Does that formulation include most useful sentences?
For instance, “there exists a sentence which is more true than this one” must be excluded as equivalent to “this statement’s truth value is strictly less than 1″, but the extent of such exclusion is not clear to me at first skim.
Then why not consider structure as follows?
you are searching for “something like a container of relatively pure water in easily drinkable form”—or, rather, “[your subconscious-native code] of water-like thing + for drinking”,
you emit sequence of tokens (sounds/characters) “is there water in the fridge?”, approximating previous idea (discarding your intent to drink it as it might be inferred from context, omitting that you can drink something close to water),
conversation partner hears “is there water in the fridge?”, converted into thought “you asked ‘is there water in the fridge?’”,
and interprets words as “you need something like a container of relatively pure water in easily drinkable form”—or, rather, “[their subconscious-native code] for another person, a water-like thing + for drinking”.
That messes up with “meanings of sentences” but is necessary to rationally process filtered evidence.
It seems to me that there is a really interesting interplay of different forces here, which we don’t yet know how to model well.
Even if Alice tries meticulously to only say literally true things, and be precise about her meanings, Bob can and should infer more than what Alice has literally said, by working backwards to infer why she has said it rather than something else.
So, pragmatics is inevitable, and we’d be fools not to take advantage of it.
However, we also really like transparent contexts—that is, we like to be able to substitute phrases for equivalent phrases (equational reasoning, like algebra), and make inferences based on substitution-based reasoning (if all bachelors are single, and Jerry is a bachelor, then Jerry is single).
To put it simply, things are easier when words have context-independent meanings (or more realistically, meanings which are valid across a wide array of contexts, although nothing will be totally context-independent).
This puts contradictory pressure on language. Pragmatics puts pressure towards highly context-dependent meaning; reasoning puts pressure towards highly context-independent meaning.
If someone argues a point by conflation (uses a word in two different senses, but makes an inference as if the word had one sense) then we tend to fault using the same word in two different senses, rather than fault basic reasoning patterns like transitivity of implication (A implies B, and B implies C, so A implies C). Why is that? Is that the correct choice? If meanings are inevitably context-dependent anyway, why not give up on reasoning? ;p