[The following is rather long; I’d offer the usual Pascal quotation but actually I’m not sure how much shorter it could actually be. I hope it isn’t too tedious to read. It is quite a bit shorter than “Unnatural Categories are Optimized for Deception”.]
I don’t really understand what in what I wrote you’re interpreting as condescension, but for what it’s worth none was intended.
No, I don’t think I ever read UCAOFD in any detail. The “did you read …?” seems, on the face of it, to be assuming a principle along the lines of “you should not say that someone is wrong about something unless you have read every word they have written about it”, which is not a principle I am willing to endorse; would you care either to argue for that principle or explain what weaker principle you are implicitly appealing to here?
Anyway, I’ve taken a look at UCAOFD now; yes, in it you say something similar to what I’m saying here, and we are in agreement about many things.
Let me summarize some things I think we agree on: 1. Category boundaries are not entirely arbitrary, in that some choices of boundary are just plain better than others for any reasonable purpose. 2. They are also not entirely forced; certain differences in purposes and priorities can rightly lead to different choices of boundaries. 3. When you use a common term to describe a variety of things, you are implicitly declaring that they resemble each other; how reasonable it is to use that common term for that particular set of things therefore depends on how closely they resemble each other. 4. One way to formalize this is to represent the things by points in some sort of concept-space, and words by regions in that space (or maybe by something fuzzier: e.g., maps from the space to [0,1] saying to what extent a given thing is appropriately described by the word). 5. We can then e.g. try to minimize some combination of distances between things within each region (i.e., try to make the things covered by a given term as like one another as possible), or pick a point for each word and try to minimize some combination of distances from things in the region to that point (i.e., try to make the things covered by a given term as like a particular Representative Thing as possible), or contemplate some pattern of communication about these things and try to minimize some combination of message length and interpretation errors committed by the receiver. 6. There is something distinctively honest about category boundaries chosen to maximize this sort of figure of merit.
(To the best of my knowledge, these are all things we agree on, and they summarize a substantial fraction of what you are saying in pieces like UCAOFD, though of course not necessarily all of it. If I am wrong about either of those things, then 1. please let me know and 2. some of what follows may be less useful than I hope for it to be.)
So far, I hope, so good. Now for some probably more contentious bits. It seems to me that when anyone on LW says anything at all like “category boundaries are a bit arbitrary”, you are liable to pounce on them and protest: no no no, category boundaries are chosen to optimize prediction and/or communication, and you shouldn’t call them arbitrary, can’t you see that there are definite mathematical laws here?. I think this is frequently unfair, for reasons I’ll elaborate on in a moment. (My protests about choice of metric etc. were, I now agree, probably misdirected; I think there is something motte-and-bailey-ish going on, but I think I misidentified the bailey. I’ll say a few words about my error later.)
So, I think the pouncing is frequently unfair because (1) someone saying that category boundaries are a bit arbitrary is not necessarily (or even, I think, usually) meaning anything incompatible with choosing category boundaries to optimize prediction and/or communication, and (2) optimizing prediction and/or communication is not the only goal one can (honestly) have when choosing category boundaries. And (3) if you are going to claim that someone’s use of language fails to respect underlying mathematical laws, I think you owe them some sort of argument that some other reasonable alternative respects those laws better, and it seems to me that generally this argument is lacking either in the sense of not being there, or in the sense of not actually making the case well.
I’ll argue briefly for those claims in order 2, 1, 3. (Apologies if the order discrepancy makes this harder to follow than it should be.)
2. [Optimizing for prediction/communication is not the only honest goal.] In UCAOFD you concede that there may be goals other than optimizing prediction and/or communication but the only such goals you consider are ones you categorize as “deception”: picking boundaries that lead to suboptimal predictions in the hope that others will predict suboptimally in ways that benefit you, or (self-deceivingly) picking boundaries that make predictions you know deep down are wrong, but that make you feel happier. It seems to me you’re assuming here that this sort of prediction is the only function of language, and it just isn’t. Suppose someone’s legal name, given by their parents, is George, but they hate the way that sounds. “Call me Fred”, they say. Let’s say they’re in the UK where there is an official legal procedure for doing this, and that they haven’t done it yet. Then there is a sense in which the name Fred is “wrong”; if you call them Fred then other people will draw wrong inferences if for some reason they e.g. have to guess what is on their birth certificate, or what names their parents liked. You may none the less choose to call them Fred because they enjoy being called Fred more than they enjoy being called George. If they ask everyone else to do likewise, after a while it will stop being an “error”, but that’s not really the point; you aren’t calling them Fred to minimize any sort of prediction errors, because prediction isn’t what calling someone (in conversation with them) by a given name is for.
You could, I guess, argue that what you’re doing is helping them deceive themselves. (They would prefer a world where their name is Fred, so they pretend it is to optimize their internal utility-estimator rather than doing the more difficult thing of changing the world so that their name really is Fred. This is the “wiredheading/self-deception” phenomenon you mention in UCAOFD.) But I don’t see much merit to this analysis of the situation. They’re presumably well aware that their legal name is still George; when you call them Fred they aren’t pretending otherwise; they just enjoy being addressed as Fred and prediction is a small enough element of what you’re doing when you call them by name that any loss in prediction-accuracy when you do it is outweighed by their utility gain from being called something they think sounds nicer.
Of course this is a case where prediction is especially unimportant and their (un)enjoyment of being addressed a particular way is especially important. Other kinda-parallel cases will readily occur to the reader and I do not mean to claim that what I’ve said above obliges anyone to use the same policy in those cases. The only point is that prediction is not everything even when you exclude “deception”, unless you define deception so broadly as to include literally everything that is not prediction, which I think would itself be highly misleading.
Here’s another related situation, which might also be applicable to some of those kinda-parallel cases. Suppose our interlocutor actually thinks his name is Fred even though it’s really George. (Perhaps he intended to file the relevant legal documents but forgot to do so and then forgot having forgotten. Perhaps he’s suffering from some sort of delusion. Etc.) Then if you are addressing him by name and you want him actually to understand you, you’d better call him Fred.
Here’s another, further toward “deception” territory but, I claim, not in it. Suppose you’re talking to someone about the big rock in Australia named Uluru, but your interlocutor is an Englishman stuck in the past who insists that it is, and must always be, called Ayers Rock. And suppose you agree with the position I’ve taken in the previous sentence that Uluru is in some sense the right name for that rock. Unfortunately the Englishman has a short temper and a gun. You may choose to use the term “Ayers Rock” when talking to him, not because you want to deceive him into thinking that that’s the right name or that Australia is still an English colony or that the aboriginal Australians weren’t there first or anything like that (indeed, he already believes those things); not because you want to help him with his wireheading (he will carry on believing those things whatever term you use); but because you are concerned that if you insist on calling it Uluru he will shoot you. This isn’t what you’d call a respectable use of language, exactly, but it’s in no way deceptive and it is driven by something other than optimizing prediction or accurate/efficient communication.
And then there are deliberately non-literal uses of language—poetry, metaphor, etc. Around here we’re mostly not much concerned with these (or, at least, not in the discussions that are relevant right now), but it’s worth being aware that much actual everyday language use has elements of these—we choose our words for euphony, memorability, etc., as well as for accuracy. “I did say as a philosopher of science, after all!” Yes, but we both know that some of the applications of these principles that you’re concerned with, such as the use of pronouns by philosophers in casual conversation, are not in technical discussions where accuracy is the only goal. (In that particular case there probably isn’t much poetry or metaphor going on either; but the point is that we aren’t talking only about highly-literal technical discourse.)
So far I’ve argued that prediction and communication aren’t the only goals it’s reasonable (and honest) to optimize for. I should also say that in practice we are seldom optimizing-machines, nor should we be because optimization is expensive. So category boundaries are commonly drawn according to (something like) where everyone else seems to be drawing them or where our brains just happen to draw them according to whatever mostly opaque algorithms are going on inside, and while both of these can be approximated as optimizations of a sort (using words the same way as everyone else is kinda-sorta a communication optimization; those mostly opaque algorithms inside our brain are probably doing something a bit like optimizing something) I don’t know of any reason to think that they amount to the sort of optimization you’re calling for. And, I claim, this is perfectly fine: nothing requires us to optimize all our language, and we don’t optimize all our language, so if someone proposes a particular bit of language use then saying “but that isn’t optimizing for anything!” is something like an Isolated Demand For Rigour.
1. [“Boundaries are a bit arbitrary” needn’t be a repudiation of optimizing for prediction/communication.] Let’s suppose that either in general or in a particular case we agree that the only honest thing to aim for is optimal prediction or communication. Then, as I believe we are agreed, there is still quite a lot of possible variation in where those boundaries get drawn; we may have different purposes, different loss functions, etc. (We may be talking about botany or about jam.) Someone who describes this situation by saying “boundaries are a bit arbitrary” is not saying anything false, even if they haven’t said the most precise thing they could have said, and I think it is generally unhelpful to jump on them.
3. [Underlying mathematical laws.] Let’s take the discussion a couple of comments upthread as an example. Your criticism of eukaryote’s use of the word “arbitrary” appealed to the fact that there are “definite mathematical laws”, and your concrete example of a definite mathematical law was the fact that one can draw boundaries by picking lots of examples and using soft k-means, in which case “there would be nothing arbitrary about those numbers as the definite, precise result of what happens when you run this particular clustering algorithm against that particular data”. But the choice of which particular data is somewhat arbitrary; the choice of how to embed your various plants into euclidean space in order to run the soft k-means algorithm is somewhat arbitrary; the choice of soft k-means rather than some other clustering algorithm is somewhat arbitrary. (It kinda corresponds to trying to minimize distances from the example data to particular representative points for “tree” and “shrub”; but I see no reason at all to think that we should want notions of “tree” and “shrub” defined by single representative points, especially given what eukaryote has shown us about the set of things that are called “trees”.) There’s tons of arbitrariness here, and it seems to me that mathematics here is functioning more to intimidate than to enlighten.
To whatever extent I’ve argued successfully for claims 1,2,3, I think I’ve justified my claim that your pouncing on people who talk about category boundaries as slightly arbitrary is unfair. But above I made a narrower accusation: that there’s something motte-and-bailey-ish about what you’re doing. I’m not sure it’s exactly motte-and-bailey, but the idea is that the motte is something like “ideally, category boundaries would be drawn so as to optimize some measure of accurate prediction and/or efficient communication” (arguably true for some definition of “ideally”) and the bailey is something like “anyone who talks about flexibility in drawing category boundaries, but doesn’t specifically insist that they be drawn so as to optimize such a measure, is in a state of sin”.
Upthread I implicitly accused you of thinking that there’s only ever a single optimal place for a given boundary. I no longer think you think that (even in the sense of having it as bailey). It may already be obvious what (I now think) the source of my error was: it seemed like you were objecting any time anyone endorsed the principle that category boundaries can vary. I no longer think that’s quite what’s going on, but I do think you’re objecting to more than your more nuanced analyses of category boundaries (e.g., in UCAOFD) justify even if what you say therein is 100% correct.
But (1) identifying baileys is an inexact art; perhaps I’m still some way off optimal, and (2) perhaps I’m entirely wrong in thinking that you’re motte-and-baileying; perhaps a deep enough understanding of what I’ve claimed to be the motte actually justifies your criticisms of e.g. eukaryote’s use of the phrase “a little arbitrary”. If you reckon so, I’d love to understand how.
[The following is rather long; I’d offer the usual Pascal quotation but actually I’m not sure how much shorter it could actually be. I hope it isn’t too tedious to read. It is quite a bit shorter than “Unnatural Categories are Optimized for Deception”.]
I don’t really understand what in what I wrote you’re interpreting as condescension, but for what it’s worth none was intended.
No, I don’t think I ever read UCAOFD in any detail. The “did you read …?” seems, on the face of it, to be assuming a principle along the lines of “you should not say that someone is wrong about something unless you have read every word they have written about it”, which is not a principle I am willing to endorse; would you care either to argue for that principle or explain what weaker principle you are implicitly appealing to here?
Anyway, I’ve taken a look at UCAOFD now; yes, in it you say something similar to what I’m saying here, and we are in agreement about many things.
Let me summarize some things I think we agree on: 1. Category boundaries are not entirely arbitrary, in that some choices of boundary are just plain better than others for any reasonable purpose. 2. They are also not entirely forced; certain differences in purposes and priorities can rightly lead to different choices of boundaries. 3. When you use a common term to describe a variety of things, you are implicitly declaring that they resemble each other; how reasonable it is to use that common term for that particular set of things therefore depends on how closely they resemble each other. 4. One way to formalize this is to represent the things by points in some sort of concept-space, and words by regions in that space (or maybe by something fuzzier: e.g., maps from the space to [0,1] saying to what extent a given thing is appropriately described by the word). 5. We can then e.g. try to minimize some combination of distances between things within each region (i.e., try to make the things covered by a given term as like one another as possible), or pick a point for each word and try to minimize some combination of distances from things in the region to that point (i.e., try to make the things covered by a given term as like a particular Representative Thing as possible), or contemplate some pattern of communication about these things and try to minimize some combination of message length and interpretation errors committed by the receiver. 6. There is something distinctively honest about category boundaries chosen to maximize this sort of figure of merit.
(To the best of my knowledge, these are all things we agree on, and they summarize a substantial fraction of what you are saying in pieces like UCAOFD, though of course not necessarily all of it. If I am wrong about either of those things, then 1. please let me know and 2. some of what follows may be less useful than I hope for it to be.)
So far, I hope, so good. Now for some probably more contentious bits. It seems to me that when anyone on LW says anything at all like “category boundaries are a bit arbitrary”, you are liable to pounce on them and protest: no no no, category boundaries are chosen to optimize prediction and/or communication, and you shouldn’t call them arbitrary, can’t you see that there are definite mathematical laws here?. I think this is frequently unfair, for reasons I’ll elaborate on in a moment. (My protests about choice of metric etc. were, I now agree, probably misdirected; I think there is something motte-and-bailey-ish going on, but I think I misidentified the bailey. I’ll say a few words about my error later.)
So, I think the pouncing is frequently unfair because (1) someone saying that category boundaries are a bit arbitrary is not necessarily (or even, I think, usually) meaning anything incompatible with choosing category boundaries to optimize prediction and/or communication, and (2) optimizing prediction and/or communication is not the only goal one can (honestly) have when choosing category boundaries. And (3) if you are going to claim that someone’s use of language fails to respect underlying mathematical laws, I think you owe them some sort of argument that some other reasonable alternative respects those laws better, and it seems to me that generally this argument is lacking either in the sense of not being there, or in the sense of not actually making the case well.
I’ll argue briefly for those claims in order 2, 1, 3. (Apologies if the order discrepancy makes this harder to follow than it should be.)
2. [Optimizing for prediction/communication is not the only honest goal.] In UCAOFD you concede that there may be goals other than optimizing prediction and/or communication but the only such goals you consider are ones you categorize as “deception”: picking boundaries that lead to suboptimal predictions in the hope that others will predict suboptimally in ways that benefit you, or (self-deceivingly) picking boundaries that make predictions you know deep down are wrong, but that make you feel happier. It seems to me you’re assuming here that this sort of prediction is the only function of language, and it just isn’t. Suppose someone’s legal name, given by their parents, is George, but they hate the way that sounds. “Call me Fred”, they say. Let’s say they’re in the UK where there is an official legal procedure for doing this, and that they haven’t done it yet. Then there is a sense in which the name Fred is “wrong”; if you call them Fred then other people will draw wrong inferences if for some reason they e.g. have to guess what is on their birth certificate, or what names their parents liked. You may none the less choose to call them Fred because they enjoy being called Fred more than they enjoy being called George. If they ask everyone else to do likewise, after a while it will stop being an “error”, but that’s not really the point; you aren’t calling them Fred to minimize any sort of prediction errors, because prediction isn’t what calling someone (in conversation with them) by a given name is for.
You could, I guess, argue that what you’re doing is helping them deceive themselves. (They would prefer a world where their name is Fred, so they pretend it is to optimize their internal utility-estimator rather than doing the more difficult thing of changing the world so that their name really is Fred. This is the “wiredheading/self-deception” phenomenon you mention in UCAOFD.) But I don’t see much merit to this analysis of the situation. They’re presumably well aware that their legal name is still George; when you call them Fred they aren’t pretending otherwise; they just enjoy being addressed as Fred and prediction is a small enough element of what you’re doing when you call them by name that any loss in prediction-accuracy when you do it is outweighed by their utility gain from being called something they think sounds nicer.
Of course this is a case where prediction is especially unimportant and their (un)enjoyment of being addressed a particular way is especially important. Other kinda-parallel cases will readily occur to the reader and I do not mean to claim that what I’ve said above obliges anyone to use the same policy in those cases. The only point is that prediction is not everything even when you exclude “deception”, unless you define deception so broadly as to include literally everything that is not prediction, which I think would itself be highly misleading.
Here’s another related situation, which might also be applicable to some of those kinda-parallel cases. Suppose our interlocutor actually thinks his name is Fred even though it’s really George. (Perhaps he intended to file the relevant legal documents but forgot to do so and then forgot having forgotten. Perhaps he’s suffering from some sort of delusion. Etc.) Then if you are addressing him by name and you want him actually to understand you, you’d better call him Fred.
Here’s another, further toward “deception” territory but, I claim, not in it. Suppose you’re talking to someone about the big rock in Australia named Uluru, but your interlocutor is an Englishman stuck in the past who insists that it is, and must always be, called Ayers Rock. And suppose you agree with the position I’ve taken in the previous sentence that Uluru is in some sense the right name for that rock. Unfortunately the Englishman has a short temper and a gun. You may choose to use the term “Ayers Rock” when talking to him, not because you want to deceive him into thinking that that’s the right name or that Australia is still an English colony or that the aboriginal Australians weren’t there first or anything like that (indeed, he already believes those things); not because you want to help him with his wireheading (he will carry on believing those things whatever term you use); but because you are concerned that if you insist on calling it Uluru he will shoot you. This isn’t what you’d call a respectable use of language, exactly, but it’s in no way deceptive and it is driven by something other than optimizing prediction or accurate/efficient communication.
And then there are deliberately non-literal uses of language—poetry, metaphor, etc. Around here we’re mostly not much concerned with these (or, at least, not in the discussions that are relevant right now), but it’s worth being aware that much actual everyday language use has elements of these—we choose our words for euphony, memorability, etc., as well as for accuracy. “I did say as a philosopher of science, after all!” Yes, but we both know that some of the applications of these principles that you’re concerned with, such as the use of pronouns by philosophers in casual conversation, are not in technical discussions where accuracy is the only goal. (In that particular case there probably isn’t much poetry or metaphor going on either; but the point is that we aren’t talking only about highly-literal technical discourse.)
So far I’ve argued that prediction and communication aren’t the only goals it’s reasonable (and honest) to optimize for. I should also say that in practice we are seldom optimizing-machines, nor should we be because optimization is expensive. So category boundaries are commonly drawn according to (something like) where everyone else seems to be drawing them or where our brains just happen to draw them according to whatever mostly opaque algorithms are going on inside, and while both of these can be approximated as optimizations of a sort (using words the same way as everyone else is kinda-sorta a communication optimization; those mostly opaque algorithms inside our brain are probably doing something a bit like optimizing something) I don’t know of any reason to think that they amount to the sort of optimization you’re calling for. And, I claim, this is perfectly fine: nothing requires us to optimize all our language, and we don’t optimize all our language, so if someone proposes a particular bit of language use then saying “but that isn’t optimizing for anything!” is something like an Isolated Demand For Rigour.
1. [“Boundaries are a bit arbitrary” needn’t be a repudiation of optimizing for prediction/communication.] Let’s suppose that either in general or in a particular case we agree that the only honest thing to aim for is optimal prediction or communication. Then, as I believe we are agreed, there is still quite a lot of possible variation in where those boundaries get drawn; we may have different purposes, different loss functions, etc. (We may be talking about botany or about jam.) Someone who describes this situation by saying “boundaries are a bit arbitrary” is not saying anything false, even if they haven’t said the most precise thing they could have said, and I think it is generally unhelpful to jump on them.
3. [Underlying mathematical laws.] Let’s take the discussion a couple of comments upthread as an example. Your criticism of eukaryote’s use of the word “arbitrary” appealed to the fact that there are “definite mathematical laws”, and your concrete example of a definite mathematical law was the fact that one can draw boundaries by picking lots of examples and using soft k-means, in which case “there would be nothing arbitrary about those numbers as the definite, precise result of what happens when you run this particular clustering algorithm against that particular data”. But the choice of which particular data is somewhat arbitrary; the choice of how to embed your various plants into euclidean space in order to run the soft k-means algorithm is somewhat arbitrary; the choice of soft k-means rather than some other clustering algorithm is somewhat arbitrary. (It kinda corresponds to trying to minimize distances from the example data to particular representative points for “tree” and “shrub”; but I see no reason at all to think that we should want notions of “tree” and “shrub” defined by single representative points, especially given what eukaryote has shown us about the set of things that are called “trees”.) There’s tons of arbitrariness here, and it seems to me that mathematics here is functioning more to intimidate than to enlighten.
To whatever extent I’ve argued successfully for claims 1,2,3, I think I’ve justified my claim that your pouncing on people who talk about category boundaries as slightly arbitrary is unfair. But above I made a narrower accusation: that there’s something motte-and-bailey-ish about what you’re doing. I’m not sure it’s exactly motte-and-bailey, but the idea is that the motte is something like “ideally, category boundaries would be drawn so as to optimize some measure of accurate prediction and/or efficient communication” (arguably true for some definition of “ideally”) and the bailey is something like “anyone who talks about flexibility in drawing category boundaries, but doesn’t specifically insist that they be drawn so as to optimize such a measure, is in a state of sin”.
Upthread I implicitly accused you of thinking that there’s only ever a single optimal place for a given boundary. I no longer think you think that (even in the sense of having it as bailey). It may already be obvious what (I now think) the source of my error was: it seemed like you were objecting any time anyone endorsed the principle that category boundaries can vary. I no longer think that’s quite what’s going on, but I do think you’re objecting to more than your more nuanced analyses of category boundaries (e.g., in UCAOFD) justify even if what you say therein is 100% correct.
But (1) identifying baileys is an inexact art; perhaps I’m still some way off optimal, and (2) perhaps I’m entirely wrong in thinking that you’re motte-and-baileying; perhaps a deep enough understanding of what I’ve claimed to be the motte actually justifies your criticisms of e.g. eukaryote’s use of the phrase “a little arbitrary”. If you reckon so, I’d love to understand how.
Let’s take this to my containment thread.