If your concern is whether people who tried some medicine X will keep taking it forever even if they do not need it, then the reference group is “people who took medicine X at least once”, and you try to find out how many among them still need it, and how many still take it.
If your concern is rather whether people who do need medicine X will buy it regardless when it is freely available at shops, the reference group is “everyone”, and you try to find out how many have bought it, and how many needed it. (Or if it is freely available in a city, then the reference group is “everyone in the city”.)
If there are different risk profiles e.g. for men and women, or educated people and uneducated people, you can reduce to that. But if you go too far, you will have not enough data (in the extreme case you end up in the group alone). There is a methodological risk of… choosing a subset when its outcomes are better, but keeping the large set if they are not. Or just having a very low N and getting lucky.
My concern is how I can construct a category withput knowing which construction method to select. Say my category is rationalist who use drug, does that mean anyone who did try something once upon a time? Does alcool count? Painkillers and meds? If I’m too strict, everyone and her grand-mother is testing addiction. Not useful. If I’m too relax, that’s a free pass for shamans, old rockers, wallstreet, and Ontario prime ministers who have been on drug for decades. So, how do you define rationalist who use drug so that’s a priori useful for either(you, the median occasional LW reader, the specific question of how we could have detected faster the present wave of complications from med abuse)?
If your concern is whether people who tried some medicine X will keep taking it forever even if they do not need it, then the reference group is “people who took medicine X at least once”, and you try to find out how many among them still need it, and how many still take it.
If your concern is rather whether people who do need medicine X will buy it regardless when it is freely available at shops, the reference group is “everyone”, and you try to find out how many have bought it, and how many needed it. (Or if it is freely available in a city, then the reference group is “everyone in the city”.)
If there are different risk profiles e.g. for men and women, or educated people and uneducated people, you can reduce to that. But if you go too far, you will have not enough data (in the extreme case you end up in the group alone). There is a methodological risk of… choosing a subset when its outcomes are better, but keeping the large set if they are not. Or just having a very low N and getting lucky.
My concern is how I can construct a category withput knowing which construction method to select. Say my category is rationalist who use drug, does that mean anyone who did try something once upon a time? Does alcool count? Painkillers and meds? If I’m too strict, everyone and her grand-mother is testing addiction. Not useful. If I’m too relax, that’s a free pass for shamans, old rockers, wallstreet, and Ontario prime ministers who have been on drug for decades. So, how do you define rationalist who use drug so that’s a priori useful for either(you, the median occasional LW reader, the specific question of how we could have detected faster the present wave of complications from med abuse)?