It’s time to decouple sexual orientation from gender identity! If my gender is neither male nor female, but I’m primarily attracted to one of those, then I’m neither homosexual nor heterosexual (nor bisexual nor asexual). But neither am I some nebulous other; if only I had a binary gender identity, then suddenly I would have a binary sexual orientation too! Of course, some people identify specifically as homosexual or heterosexual (and some people even have prima-facie contradictory identifications such as both male and lesbian), and you could ask about that if you like, but you should also ask the more fundamental question of which genders one is attracted to.
… and that doesn’t even get into the sexual-vs.-romantic issue. My girlfriend is cis and bisexual, but only andro-romantic (hetero). She identifies as bi, for purposes of broad categorization such as surveys like this, but has no interest in dating other women even though she is sexually attracted to them.
In other words, yes, the better way to ask such a question would be something along the lines of “which gender(s) are you romantically attracted to?” and “which gender(s) are you sexually attracted to?” as different questions.
Out of curiosity, if I’d avoided mentioning how she self-identifies and had instead told you that “she has had sex with other women before and has asked me if it’s OK if she sleeps with other women while we’re dating (or brings them home for a threesome)… but has never shown or claimed any interest in actually dating another woman” (all of which is, incidentally, true), what would your response have been? Framed that way, one could assume that she’s actually bi or even lesbian and the only reason she’s dating me instead of one of those girls is because she wants to avoid the social or family stigma of homosexuality.
Or you could take me at my word. It’s not like you’re in any position to verify one way or the other, where she in particular is concerned, unless you’re one of the handful of people who actually know who I am speaking of and know her preferences at least as well as I do.
It also doesn’t matter for the point I raised (about how some people have different targets for sexual and romantic attraction) unless you intended to imply that not only is she personally actually neatly classifiable under the existing system but so is everybody else who would claim otherwise. That is a theory which only takes one counterexample to disprove (as I provided, although one could then debate the necessity of writing the survey to accommodate however many people have this “non-standard” categorization).
Do you have an actual response to my claim that the survey should account for the possibility that people may be romantically and sexually attracted to different genders?
REALITY: 80% of self-identified bisexuals are only interested in one gender.
OkCupid is a gay- and bi-friendly place and it’s not our intention here to call into question anyone’s sexual identity. But when we looked into messaging trends by sexuality, we were very surprised at what we found. People who describe themselves as bisexual overwhelmingly message either one sex or the other, not both as you might expect. Site-wide, here’s how it breaks out:
This suggests that bisexuality is often either a hedge for gay people or a label adopted by straights to appear more sexually adventurous to their (straight) matches.
That’s a valid point. On the other hand, as a dating site, OKC messaging is probably going to be skewed towards the gender that one is interested in pursuing a relationship with (though maybe that’s just the way I use it; as soon as I typed it I felt sure there were plenty of people just looking to hook up). When the topic is sexual orientation vs. romantic orientation, I’m not sure that OKC is the best source of data. I can’t deny the specific claim that a large proportion of ostensibly bi people appear to not be both bisexual and biromantic.
The questions Family Religion and Religious Background seem to parallel the questions Religious Views and Religious Denomination, but they are phrased differently. The first is my family when I was growing up, while the second is simply my family. So as it happened, I was not thinking of the same families when answering them! Perhaps I should have paid more attention the name of the question Religious Background, which I really only noticed just now when I wanted to identify it for this comment. You did not in fact get information about my religious background in my answer to that question; you got information about the religious background of my spouse of less than 2 years (and my stepchild).
So I filled out the whole survey, and then I got to the part about the digit ratio, and I thought, OK, I’ll do this! But I can’t do it now (no photocopier at home, can’t trust a measurement to 3 digits if I’m not doing it the same way as others). And I can’t keep my answers up until I can do it (no battery in computer, must be turned off to transport, Lazarus plug-in has been problematic). So I put in a public and private key but no data. I will gladly supply the data to you tomorrow, using those keys to identify my survey.
Some countries hold elections but not major national ones; and sometimes a country has elections, but most people in them still can’t vote. (Examples are Saudi Arabia and Kuwait, respectively.)
I’d be much more comfortable answering the probability sections if I knew what epsilon is. I usually say 0% when the value is less than 0.5% and 100% when the value is greater than 99.5%, rounding to the nearest whole percentage, on the grounds that the whole point of using percentages is to avoid explicit fractions (common or decimal). But then you ruin this by explicitly mentioning 0.5% and 99.99% as possible answers. If you had put a hard limit on the number of digits allowed, then I could have used that. In the end, since I saw no consistent guidance, I fell back on my usual practice. The result is that I had a lot of 0s and 100s; hopefully that won’t mess up your algorithms.
ETA: It is probably relevant here that I am a naturally lazy person.
I think it might have been better to ask people to estimate what are the odds that a given statement is true. If a probability of a statement is close to zero or close to one, it gives us better precision without having to worry about digits after the decimal point (however, if a probability is close to one half, it is probably better to ask for a probability). Although it is easy to convert odds to probabilities, how many people in this survey actually took the mental effort to remind themselves to calculate the odds first and only then to express them as probabilities? I might be wrong, but I guess that only a minority. An idea for the next year survey—it might be interesting to compare the answers of two groups, one of which would be asked to estimate probabilities, the other one to estimate the odds.
Are you using “odds” to refer to percentages and “probabilities” to refer to fractions? I don’t think there is actually any difference in meaning between the two terms.
Colloquial language doesn’t make this distinction, but by technical convention, they are different.
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’; numerically, that’s the fraction 5⁄3, or rather (because of the ‘against’) its reciprocal, 3⁄5. Thus odds run from 0 (impossible) to infinity (certain), with odds of 1 being perfectly balanced between Yes and No. In contrast, probabilities run only from 0 to 1. An event with odds of 5 to 3 against, or equivalently odds of 3⁄5, has a probability of 3/(3+5) = 3⁄8. So the numbers are different. The conversion formulas are O = P/(1 − P) and P = O/(1 + O).
Then there are log-odds; this is log₂ O bits. (You can also use other bases than 2 and correspondingly other units than bits.) Now 0 indicates perfect balance between Yes and No; a positive number means more likely Yes than No, and a negative number means less likely Yes than No. Log-odds run from negative infinity (impossible) to infinity (certain).
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’
Oh right, I forgot about that definition. The main probability conversions that I was aware of involved converting between fractions and percentages, sometimes expressed instead as probabilities between 0 and 1. Theoretically, it makes sense that odds can also be converted to or from probabilities, now that I think about it. Thanks for your explanation.
Odds can be expressed as a ratio of two numbers [or] as a number, by
dividing the terms in the ratio [....] Odds range from 0 to infinity, while
probabilities range from 0 to 1 [...]”
Yes, that’s exactly what I said. There is no way to express a fraction greater than 100% using odds notation; Saying that odds are “1 million to 1” is 99.9999%, still under 1.
In the Wikipedia
article, take a
look at the table below the words “These are worked out for some simple odds”.
The odds that
TobyBartels is talking
about,
which one gets by dividing the
numbers in an “n to m” expression, and which go from zero to infinity, are
shown in the second and third columns of that table (o_f and
o_a). Probabilities, which go from 0 to 1 or 0% to 100%, are shown in the
fourth and fifth columns (p and q).
Did you actually read the article you linked? It says the exact same thing as I did, phrased differently. Their “Odds range from 0 to infinity” means that any number from 0 to infinity can be used in the odds ratio, but still always represent a probability between 0 and 1. Which is precisely what I said.
Um, representing a number between 0 and 1 is not the same as being a number between 0 and 1. The representation of p = 3⁄8 as odds = 3⁄5 (“5 to 3 against”) is useful in practice, for example because bayes’ rule reduces to plain multiplication for odds ratios.
Yes, odds are good (and log-odds are even better), but people are bad at both dealing with very large absolute values and dealing with very fine precisions. I think that the survey is correct to put in a cut-off (whether an ϵ for probabilities, an N for log-odds, or one of each for odds); it should just tell us where. (Edit: put in stuff about log-odds properly.)
To be more specific: If ϵ ≥ 5 × 10⁻ⁿ (which it must be for some n, if it is a positive real number), then I only need to figure out my probability to n + 1 digits. Upon doing so, if it’s all 0s, then my probability is no more than ϵ, so I can enter 0. Otherwise, I should enter something larger. (And a similar thing holds on the other end.) Specifying ϵ serves the practical purpose of telling us how much work to put into estimating our probabilities. Since I had no guideline for that, I chose to default to ϵ = 1⁄2 (in percentage points), rather than try to additionally work out how small ϵ was supposed to be.
If, instead of bringing up ϵ, the survey had instructed us to use as many decimals as we need to avoid ever answering either 0 or 100, then I probably would have done more work. (There are reasons why this is bad, since the results will be increasingly unreliable, but still it could have said that.) But since I knew that at some point my work would be ignored, I didn’t do any.
(Edits: minor grammar and precise phrasing of inequalities.)
I took epsilon to be simply 0.5, on the basis of “the survey can take decimals but I’m going to use whole numbers as suggested, so 0 means I rounded down anything less than 0.5”. This is imprecise but gives me greater confidence in my answers, and (as you say), I have some tendency towards laziness.
I don’t think it will mess up the algorithms. My guess is that most people probably rounded most calibration answers to the tens place due to lack of enough confidence to be more precise, but since people are giving different values, the average across all respondents is unlikely to fall on an increment of ten, and should be a reasonably accurate measure of the respondents’ collective assigned probability for a question.
It could mess them up, because in theory a single wrong answer with 100% confidence renders the entire series infinitely poorly calibrated. The survey says that this won’t be done, that 100% will be treated as something slightly less than that. But how much less could depend on assumptions that the survey-makers made about how often people would answer this way, and maybe I did it too much.
I doubt it, since I’m pretty sure that they know enough about these pitfalls to avoid them. But I felt that I answered 0 and 100 quite a lot, so I thought that some warning was in order.
Even though percentages are typically used for cases where precision is less important, I’d say that in this context it would be better to err on the side of precision.
I don’t fit in well with any of the 5 answers to the Political question, and there was no Other, but skipping it also didn’t seem right. (Several questions have explicit cases when they are to be skipped, but this was not one of them.) I eventually picked 1 of the 2 that seemed less wrong than the other 3; I would have preferred to pick some sort of non-moderated mixture of those 2. (Actually, that is how I usually describe my politics when asked for a response in the form of a political party: somewhere between the ___ Party and the ___ Party, only more extreme.)
The Complex Affiliation was not a problem. (Actually, I was still torn between 2 answers, but this time I would have been happy with either of them!)
My public key is the same as my user name. Should it have been anonymous? (My private key was randomized and only identifies me if you know what format I use for general-purpose random strings.)
My public key is the same as my user name. Should it have been anonymous?
Assuming Yvain does the same thing as last year, both the public and private key will be released as part of the survey dataset if you checked the ‘release my survey data’ box.
The private key too!? Fortunately I used a one-time key for that.
The public key is OK. I made sure that I was comfortable with people linking my answers to me before I used it. But then I thought that maybe I wasn’t supposed to.
I did the survey. (Comments on specific aspects appear as replies.)
It’s time to decouple sexual orientation from gender identity! If my gender is neither male nor female, but I’m primarily attracted to one of those, then I’m neither homosexual nor heterosexual (nor bisexual nor asexual). But neither am I some nebulous other; if only I had a binary gender identity, then suddenly I would have a binary sexual orientation too! Of course, some people identify specifically as homosexual or heterosexual (and some people even have prima-facie contradictory identifications such as both male and lesbian), and you could ask about that if you like, but you should also ask the more fundamental question of which genders one is attracted to.
… and that doesn’t even get into the sexual-vs.-romantic issue. My girlfriend is cis and bisexual, but only andro-romantic (hetero). She identifies as bi, for purposes of broad categorization such as surveys like this, but has no interest in dating other women even though she is sexually attracted to them.
In other words, yes, the better way to ask such a question would be something along the lines of “which gender(s) are you romantically attracted to?” and “which gender(s) are you sexually attracted to?” as different questions.
This strikes me as suspiciously like “she’d straight but identifies as bi because it’s fashionable”.
Out of curiosity, if I’d avoided mentioning how she self-identifies and had instead told you that “she has had sex with other women before and has asked me if it’s OK if she sleeps with other women while we’re dating (or brings them home for a threesome)… but has never shown or claimed any interest in actually dating another woman” (all of which is, incidentally, true), what would your response have been? Framed that way, one could assume that she’s actually bi or even lesbian and the only reason she’s dating me instead of one of those girls is because she wants to avoid the social or family stigma of homosexuality.
Or you could take me at my word. It’s not like you’re in any position to verify one way or the other, where she in particular is concerned, unless you’re one of the handful of people who actually know who I am speaking of and know her preferences at least as well as I do.
It also doesn’t matter for the point I raised (about how some people have different targets for sexual and romantic attraction) unless you intended to imply that not only is she personally actually neatly classifiable under the existing system but so is everybody else who would claim otherwise. That is a theory which only takes one counterexample to disprove (as I provided, although one could then debate the necessity of writing the survey to accommodate however many people have this “non-standard” categorization).
Do you have an actual response to my claim that the survey should account for the possibility that people may be romantically and sexually attracted to different genders?
What would you expect it to look like if in fact she found both men and woman sexually attractive but only men romantically attractive, as she claims?
See also the OKCupid Trends post about The Big Lies People Tell In Online Dating.
That’s a valid point. On the other hand, as a dating site, OKC messaging is probably going to be skewed towards the gender that one is interested in pursuing a relationship with (though maybe that’s just the way I use it; as soon as I typed it I felt sure there were plenty of people just looking to hook up). When the topic is sexual orientation vs. romantic orientation, I’m not sure that OKC is the best source of data. I can’t deny the specific claim that a large proportion of ostensibly bi people appear to not be both bisexual and biromantic.
The questions Family Religion and Religious Background seem to parallel the questions Religious Views and Religious Denomination, but they are phrased differently. The first is my family when I was growing up, while the second is simply my family. So as it happened, I was not thinking of the same families when answering them! Perhaps I should have paid more attention the name of the question Religious Background, which I really only noticed just now when I wanted to identify it for this comment. You did not in fact get information about my religious background in my answer to that question; you got information about the religious background of my spouse of less than 2 years (and my stepchild).
So I filled out the whole survey, and then I got to the part about the digit ratio, and I thought, OK, I’ll do this! But I can’t do it now (no photocopier at home, can’t trust a measurement to 3 digits if I’m not doing it the same way as others). And I can’t keep my answers up until I can do it (no battery in computer, must be turned off to transport, Lazarus plug-in has been problematic). So I put in a public and private key but no data. I will gladly supply the data to you tomorrow, using those keys to identify my survey.
Some countries hold elections but not major national ones; and sometimes a country has elections, but most people in them still can’t vote. (Examples are Saudi Arabia and Kuwait, respectively.)
I’d be much more comfortable answering the probability sections if I knew what epsilon is. I usually say 0% when the value is less than 0.5% and 100% when the value is greater than 99.5%, rounding to the nearest whole percentage, on the grounds that the whole point of using percentages is to avoid explicit fractions (common or decimal). But then you ruin this by explicitly mentioning 0.5% and 99.99% as possible answers. If you had put a hard limit on the number of digits allowed, then I could have used that. In the end, since I saw no consistent guidance, I fell back on my usual practice. The result is that I had a lot of 0s and 100s; hopefully that won’t mess up your algorithms.
ETA: It is probably relevant here that I am a naturally lazy person.
I think it might have been better to ask people to estimate what are the odds that a given statement is true. If a probability of a statement is close to zero or close to one, it gives us better precision without having to worry about digits after the decimal point (however, if a probability is close to one half, it is probably better to ask for a probability). Although it is easy to convert odds to probabilities, how many people in this survey actually took the mental effort to remind themselves to calculate the odds first and only then to express them as probabilities? I might be wrong, but I guess that only a minority. An idea for the next year survey—it might be interesting to compare the answers of two groups, one of which would be asked to estimate probabilities, the other one to estimate the odds.
Are you using “odds” to refer to percentages and “probabilities” to refer to fractions? I don’t think there is actually any difference in meaning between the two terms.
Colloquial language doesn’t make this distinction, but by technical convention, they are different.
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’; numerically, that’s the fraction 5⁄3, or rather (because of the ‘against’) its reciprocal, 3⁄5. Thus odds run from 0 (impossible) to infinity (certain), with odds of 1 being perfectly balanced between Yes and No. In contrast, probabilities run only from 0 to 1. An event with odds of 5 to 3 against, or equivalently odds of 3⁄5, has a probability of 3/(3+5) = 3⁄8. So the numbers are different. The conversion formulas are O = P/(1 − P) and P = O/(1 + O).
Then there are log-odds; this is log₂ O bits. (You can also use other bases than 2 and correspondingly other units than bits.) Now 0 indicates perfect balance between Yes and No; a positive number means more likely Yes than No, and a negative number means less likely Yes than No. Log-odds run from negative infinity (impossible) to infinity (certain).
Oh right, I forgot about that definition. The main probability conversions that I was aware of involved converting between fractions and percentages, sometimes expressed instead as probabilities between 0 and 1. Theoretically, it makes sense that odds can also be converted to or from probabilities, now that I think about it. Thanks for your explanation.
‘5 to 3 against’ is 3⁄8, not 3⁄5. Odds of ‘N to M’ or ‘N to M against’ are always between 0 and 1.
5 to 3 against is 3⁄5 (as odds), which is a probability of 3⁄8. You are muddling probability and odds ratios in an unacceptable way.
Wikipedia:
Yes, that’s exactly what I said. There is no way to express a fraction greater than 100% using odds notation; Saying that odds are “1 million to 1” is 99.9999%, still under 1.
In the Wikipedia article, take a look at the table below the words “These are worked out for some simple odds”. The odds that TobyBartels is talking about, which one gets by dividing the numbers in an “n to m” expression, and which go from zero to infinity, are shown in the second and third columns of that table (o_f and o_a). Probabilities, which go from 0 to 1 or 0% to 100%, are shown in the fourth and fifth columns (p and q).
You said ‘Odds […] are always between 0 and 1’, while Wikipedia said ‘Odds range from 0 to infinity’, so you didn’t say the same thing.
Did you actually read the article you linked? It says the exact same thing as I did, phrased differently. Their “Odds range from 0 to infinity” means that any number from 0 to infinity can be used in the odds ratio, but still always represent a probability between 0 and 1. Which is precisely what I said.
No, that’s not what you said. I am now done with this conversation.
Um, representing a number between 0 and 1 is not the same as being a number between 0 and 1. The representation of p = 3⁄8 as odds = 3⁄5 (“5 to 3 against”) is useful in practice, for example because bayes’ rule reduces to plain multiplication for odds ratios.
Yes, odds are good (and log-odds are even better), but people are bad at both dealing with very large absolute values and dealing with very fine precisions. I think that the survey is correct to put in a cut-off (whether an ϵ for probabilities, an N for log-odds, or one of each for odds); it should just tell us where. (Edit: put in stuff about log-odds properly.)
Epsilon is a minuscule amount. It’s vanishingly small, but it’s still there.
Yes, but which minuscule amount?
To be more specific: If ϵ ≥ 5 × 10⁻ⁿ (which it must be for some n, if it is a positive real number), then I only need to figure out my probability to n + 1 digits. Upon doing so, if it’s all 0s, then my probability is no more than ϵ, so I can enter 0. Otherwise, I should enter something larger. (And a similar thing holds on the other end.) Specifying ϵ serves the practical purpose of telling us how much work to put into estimating our probabilities. Since I had no guideline for that, I chose to default to ϵ = 1⁄2 (in percentage points), rather than try to additionally work out how small ϵ was supposed to be.
If, instead of bringing up ϵ, the survey had instructed us to use as many decimals as we need to avoid ever answering either 0 or 100, then I probably would have done more work. (There are reasons why this is bad, since the results will be increasingly unreliable, but still it could have said that.) But since I knew that at some point my work would be ignored, I didn’t do any.
(Edits: minor grammar and precise phrasing of inequalities.)
I took epsilon to be simply 0.5, on the basis of “the survey can take decimals but I’m going to use whole numbers as suggested, so 0 means I rounded down anything less than 0.5”. This is imprecise but gives me greater confidence in my answers, and (as you say), I have some tendency towards laziness.
Yes, that’s what I did too (0.5%).
I don’t think it will mess up the algorithms. My guess is that most people probably rounded most calibration answers to the tens place due to lack of enough confidence to be more precise, but since people are giving different values, the average across all respondents is unlikely to fall on an increment of ten, and should be a reasonably accurate measure of the respondents’ collective assigned probability for a question.
It could mess them up, because in theory a single wrong answer with 100% confidence renders the entire series infinitely poorly calibrated. The survey says that this won’t be done, that 100% will be treated as something slightly less than that. But how much less could depend on assumptions that the survey-makers made about how often people would answer this way, and maybe I did it too much.
I doubt it, since I’m pretty sure that they know enough about these pitfalls to avoid them. But I felt that I answered 0 and 100 quite a lot, so I thought that some warning was in order.
Even though percentages are typically used for cases where precision is less important, I’d say that in this context it would be better to err on the side of precision.
I don’t fit in well with any of the 5 answers to the Political question, and there was no Other, but skipping it also didn’t seem right. (Several questions have explicit cases when they are to be skipped, but this was not one of them.) I eventually picked 1 of the 2 that seemed less wrong than the other 3; I would have preferred to pick some sort of non-moderated mixture of those 2. (Actually, that is how I usually describe my politics when asked for a response in the form of a political party: somewhere between the ___ Party and the ___ Party, only more extreme.)
The Complex Affiliation was not a problem. (Actually, I was still torn between 2 answers, but this time I would have been happy with either of them!)
My public key is the same as my user name. Should it have been anonymous? (My private key was randomized and only identifies me if you know what format I use for general-purpose random strings.)
Assuming Yvain does the same thing as last year, both the public and private key will be released as part of the survey dataset if you checked the ‘release my survey data’ box.
Faith in Humanity moment: LW will not submit garbage poll responses using other LW-users as public keys.
If that’s true I wish I’d known it before choosing keys.
The private key too!? Fortunately I used a one-time key for that.
The public key is OK. I made sure that I was comfortable with people linking my answers to me before I used it. But then I thought that maybe I wasn’t supposed to.
I hope that you’ll publish the answers to the calibration questions, after the survey closes, of course.