I wish there were some examples (other than the Soviet nails) … if I had some better idea of what G and G* might actually represent, I’d be able to more easily get my head around the rest of the post.
LW karma is an interesting example because no one has direct access to the karma giving algorithm.
It’s a bit like telling the nail factory that you’re going to evaluate them on something, but not telling them whether its nail mass or number or something else until the end of the evaluation period.
If the one being evaluated knows nothing about how he’s going to be evaluated except that it’s going to be a proxy for goodness, then he can’t really cheat. However, they might know that it’s going to be very simple criteria so they make a very massive nail and many miniature ones.
This reminds me of the way I hear they do state censorship in China. The censoring agencies don’t actually give out any specific guidelines on what is allowed and what isn’t, instead just clamping down on cases they do consider to be over the line. As a consequence, everyone self-censors more than they might with specific guidelines: with the guidelines, you could always try to twist their letter to violate their spirit. Instead, people are constantly unsure of just exactly what will get you in trouble, so they err on the side of caution.
While I strongly oppose state censorship, I can’t help but admire the genius in the system.
Also, unlike Saudi Arabia, they don’t make many efforts to block pornography. As a result, the average Chinese teen is less likely to know how to access blocked sites than the average Saudi teen is (or so I read; I’m not aware of any study on that).
Or section 28 , which didn’t forbid the discussion of homosexuality in the classroom, only its promotion....but since promotion wasn’t defined, schools erred on the side of not mentioning it.
Depressing. This would mean that most informal norms of censorship are much more resilient and effective than most formal laws censoring material.
Arguably this makes them much harder to dislodge than even the intentionally vague Chinese law. Since I guess you can’t really be prosecuted under it by pointing out there is a censorship law right?
What is your interpretation of it? It seems a pretty plausible hypothesis to me that it’s a proxy for something, and has come to be relied upon as such. If we think Goodhart’s Law applies in the case of karma, the final prediction in the “speculative origin” section might be something to be concerned about.
I think of it as a proxy for “valued member of the community”—if someone has karma, then people like their posts and comments. I’m mostly here to have fun and pass the time, and I happen to find discussing rationality to be fun. I don’t really expect refining the art of human rationality to be well-correlated with a popularity contest.
And do you think Goodhart’s Law, as presented in the post, applies here? That is, we should expect that eventually people (through gaming the system) end up with high karma without that in fact reliably correlating with being valued members of the community?
As a data point, one thing I’ve noticed that seems to give a disproportionate amount of karma is arguing with someone who’s wrong and unwilling to listen. It’s easy to think they might come around eventually, and each point you make against them is worth a few points of karma from the amused onlookers or fellow arguers—which might tell you that you’re making a valuable contribution, and so encourage you to keep arguing with trolls. This is my impression, at least.
Edit: (The problem being—determining the point of diminishing returns.)
Except we’re like the self-employed in this regard. You can’t do anything with karma. It won’t impress your boss. It is just a way of quantifying how valued you are by the community. An employee doesn’t really care about G at all. She cares about G because that’s what impresses the boss which furthers her own goals. But if you are your own boss you do care about G, G is just an easy way to measure it. For me at least, this is the case with karma. I can’t do anything with the number but it suggests that people like me.
So perhaps revenue sharing is a way to help address the problem. Instead of trying to come up with ways to measure what you care about, make the people beneath you care about it too. Of course this is a lot easier with money than it is with values.
Only if people care about having high karma. It’s probably fairly easy to game karma by making multiple accounts and voting yourself up, but why bother?
And do you think Goodhart’s Law, as presented in the post, applies here? That is, we should expect that eventually people (through gaming the system) end up with high karma without that in fact reliably correlating with being valued members of the community?
What? You mean Karma doesn’t reliably correlate with objective worth of the individual? Damn.
In education, this is one of the criticisms of high-stakes testing: you’ll just get schools teaching to the test, in ways that aren’t correlated to real learning (the test is G*, real knowledge/learning is G). People say the same thing about the SAT and test prep—kids get into better colleges because they paid to learn tricks for answering multiple choice questions. The Wire does a great job of showing the police force’s efforts to “juke the stats” (e.g. counting robberies as larcenies) so that crime statistics (G*) look better even while crime (G) is getting worse. Athletes get criticized for playing for their stats (G*), or trying to pad their stats, instead of playing to win, when the stats are supposed to be a measure of how much a player has contributed to his team’s chances of winning (G). I’m not sure if it’s historically accurate, but I’ve heard that body count (G*) was used by the US as one of the main metrics of success (G) in the Vietnam war, and as a result we ended up with a bunch of dead bodies but a misguided war.
In general, any time you measure something you care about in order to incentivize people, or to hold people accountable, or to keep track of what’s going on, and the thing you measure isn’t exactly the same as the thing that you care about, there’s a risk of figuring out ways to improve the measurement that don’t translate into improvements on the thing that you care about.
The health and/or beauty of a woman (G) and her scale reported weight (G*) which might be somewhat correlated under some circumstances, but are definitely not identical and can diverge rather sharply due to crazy diets.
Call time (G) or calls taken (G) in a call center, where what they care about is customer satisfaction (G) (at least inasmuch as it serves profitability).
I wish there were some examples (other than the Soviet nails) … if I had some better idea of what G and G* might actually represent, I’d be able to more easily get my head around the rest of the post.
I’m surprised no one has yet brought up (G*) the LW karma system as a proxy for (G) contributing to “refining the art of human rationality”.
LW karma is an interesting example because no one has direct access to the karma giving algorithm.
It’s a bit like telling the nail factory that you’re going to evaluate them on something, but not telling them whether its nail mass or number or something else until the end of the evaluation period.
If the one being evaluated knows nothing about how he’s going to be evaluated except that it’s going to be a proxy for goodness, then he can’t really cheat. However, they might know that it’s going to be very simple criteria so they make a very massive nail and many miniature ones.
This reminds me of the way I hear they do state censorship in China. The censoring agencies don’t actually give out any specific guidelines on what is allowed and what isn’t, instead just clamping down on cases they do consider to be over the line. As a consequence, everyone self-censors more than they might with specific guidelines: with the guidelines, you could always try to twist their letter to violate their spirit. Instead, people are constantly unsure of just exactly what will get you in trouble, so they err on the side of caution.
While I strongly oppose state censorship, I can’t help but admire the genius in the system.
Also, unlike Saudi Arabia, they don’t make many efforts to block pornography. As a result, the average Chinese teen is less likely to know how to access blocked sites than the average Saudi teen is (or so I read; I’m not aware of any study on that).
Or section 28 , which didn’t forbid the discussion of homosexuality in the classroom, only its promotion....but since promotion wasn’t defined, schools erred on the side of not mentioning it.
Depressing. This would mean that most informal norms of censorship are much more resilient and effective than most formal laws censoring material.
Arguably this makes them much harder to dislodge than even the intentionally vague Chinese law. Since I guess you can’t really be prosecuted under it by pointing out there is a censorship law right?
I never thought of the LW karma system a proxy for that.
What is your interpretation of it? It seems a pretty plausible hypothesis to me that it’s a proxy for something, and has come to be relied upon as such. If we think Goodhart’s Law applies in the case of karma, the final prediction in the “speculative origin” section might be something to be concerned about.
I think of it as a proxy for “valued member of the community”—if someone has karma, then people like their posts and comments. I’m mostly here to have fun and pass the time, and I happen to find discussing rationality to be fun. I don’t really expect refining the art of human rationality to be well-correlated with a popularity contest.
And do you think Goodhart’s Law, as presented in the post, applies here? That is, we should expect that eventually people (through gaming the system) end up with high karma without that in fact reliably correlating with being valued members of the community?
As a data point, one thing I’ve noticed that seems to give a disproportionate amount of karma is arguing with someone who’s wrong and unwilling to listen. It’s easy to think they might come around eventually, and each point you make against them is worth a few points of karma from the amused onlookers or fellow arguers—which might tell you that you’re making a valuable contribution, and so encourage you to keep arguing with trolls. This is my impression, at least.
Edit: (The problem being—determining the point of diminishing returns.)
Except we’re like the self-employed in this regard. You can’t do anything with karma. It won’t impress your boss. It is just a way of quantifying how valued you are by the community. An employee doesn’t really care about G at all. She cares about G because that’s what impresses the boss which furthers her own goals. But if you are your own boss you do care about G, G is just an easy way to measure it. For me at least, this is the case with karma. I can’t do anything with the number but it suggests that people like me.
So perhaps revenue sharing is a way to help address the problem. Instead of trying to come up with ways to measure what you care about, make the people beneath you care about it too. Of course this is a lot easier with money than it is with values.
My boss cares about karma.
Only if people care about having high karma. It’s probably fairly easy to game karma by making multiple accounts and voting yourself up, but why bother?
What? You mean Karma doesn’t reliably correlate with objective worth of the individual? Damn.
In education, this is one of the criticisms of high-stakes testing: you’ll just get schools teaching to the test, in ways that aren’t correlated to real learning (the test is G*, real knowledge/learning is G). People say the same thing about the SAT and test prep—kids get into better colleges because they paid to learn tricks for answering multiple choice questions. The Wire does a great job of showing the police force’s efforts to “juke the stats” (e.g. counting robberies as larcenies) so that crime statistics (G*) look better even while crime (G) is getting worse. Athletes get criticized for playing for their stats (G*), or trying to pad their stats, instead of playing to win, when the stats are supposed to be a measure of how much a player has contributed to his team’s chances of winning (G). I’m not sure if it’s historically accurate, but I’ve heard that body count (G*) was used by the US as one of the main metrics of success (G) in the Vietnam war, and as a result we ended up with a bunch of dead bodies but a misguided war.
In general, any time you measure something you care about in order to incentivize people, or to hold people accountable, or to keep track of what’s going on, and the thing you measure isn’t exactly the same as the thing that you care about, there’s a risk of figuring out ways to improve the measurement that don’t translate into improvements on the thing that you care about.
The health and/or beauty of a woman (G) and her scale reported weight (G*) which might be somewhat correlated under some circumstances, but are definitely not identical and can diverge rather sharply due to crazy diets.
Here’s a few.
Well there’s a few described here, for instance: http://lesswrong.com/lw/le/lost_purposes/
Products that are good for humanity, and products that are profitable
Call time (G) or calls taken (G) in a call center, where what they care about is customer satisfaction (G) (at least inasmuch as it serves profitability).
Thanks,