“X does damage to Y, therefore X is bad” is a fallacy, because sometimes Y should be damaged. (Or do you think that eliminating bad things isn’t good?)
“A is the product of a lot of effort, therefore A is good” is also a fallacy, because what matters is results. (Or do you think that the labor theory of value applies to ideas?)
It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value. It is bad to encourage people to do such things.
Effort spent foolishly, or harmfully, should not be respected.
It is, of course, both imprudent and harmful to criticize ideas which you do not understand. Likewise, it is, of course, a waste of everyone’s time to reply to a post with arguments which have already been addressed in the post (but which you have neglected to read). These things are to be discouraged and penalized—obviously.
But there is no virtue in mere effort. If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).
In addition to the rest of it, high-effort, low-value contributions waste readers’ time. Low-effort, high-value contributions save readers’ time.
My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
And this applies fully, not just to code, but to words.
It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.
Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of “making things that have little or no (or even negative) value” as a primitive action—something that can be “encouraged” or “discouraged”—whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.
If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be “yes, we ought”), but rather, whether we ought to discourage posters from spending effort. To this, you say:
But there is no virtue in mere effort.
Perhaps there is no “virtue” in effort, but in that case we must ask why “virtue” is the thing we are measuring. If the goal is to maximize, not “virtue”, but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it’s hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.
And what does it mean to “encourage” or “discourage” a poster? Based on the following part of your comment, it seems that you are taking “discourage” to mean something along the lines of “point out ways in which the post in question is mistaken”:
If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).
But how often is it the case that a “long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced” is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations—in other words, that the hypothetical situation you describe does not reflect reality.
What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called “drive-by low-effort criticism” described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.
perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it’s hard to see why this should be the case a priori
I don’t think high-effort posts are more likely to contain muddled thinking, but I do think readers are less likely to notice muddled thinking when it appears in high-effort posts, so suppressing criticism of high-effort posts is especially dangerous.
Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general?
Sure, that’s easy: apply the discouragement (downvotes, critical comments, etc.) only to low-value things, and not to high-value things.
Or are you suggesting that you (or, perhaps, Less Wrong participants in general?) can’t tell the difference between low-value things and high-value things?
Perhaps there is no “virtue” in effort, but in this case we must ask why “virtue” is the thing we are measuring.
“Virtue” here means “whatever we take to be good and desirable, and that which produces those things”. We are measuring it because it is, by definition, the thing we want to be measuring.
If the goal is to maximize, not “virtue”, but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-quality posts.
[emphasis mine]
Did you mean to write “high-effort”, in place of the bolded part? (If, however, you meant what you wrote, then I don’t understand what you’re trying to say, here; please explain.)
And what does it mean to “encourage” or “discourage” a poster?
I mean whatever the OP means when he talks about adverse effects, etc.
But how often is it the case that a “long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced” is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner?
What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and they act quickly to post what they believe to be an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called “drive-by low-effort criticism” described in the OP. If this claim is true, then it should not be difficult to understand why some people might prefer to see less of this.
Yes, we should discourage low-quality criticism which is wrong, and encourage high-quality criticism which is right. (I already said this, in the grandparent.) Having accounted for this, it makes no sense at all to prefer longer critical comments to shorter ones. (Quite the opposite preference would be sensible, in fact.)
Yes, we should discourage low-quality criticism which is wrong, and encourage high-quality criticism which is right. (I already said this, in the grandparent.) Having accounted for this, it makes no sense at all to prefer longer critical comments to shorter ones. (Quite the opposite preference would be sensible, in fact.)
I think that compared to high-effort criticisms, low-effort criticisms are much more likely to be based on misunderstandings or otherwise low quality. I interpret Lionhearted as saying that criticism should, on the margin, be held to a higher bar than it is now.
What is the evaluation of “effort” even doing here? Why not just evaluate whether the criticism is high-quality, understands the post, is correct, etc?
Requiring “effort” (independent of quality) is a proof-of-work scheme meant to tax criticism.
Requiring “effort” (independent of quality) is a proof-of-work scheme meant to tax criticism.
Proof-of-work was originally invented to fight email spam. The analogous argument plausibly applies here: evaluating quality (e.g., letting a human read the email to decide what to do with it, trying to figure out whether a criticism actually makes sense) is costly, so it’s more efficient to first filter using a cheaper-to-evaluate signal/proxy like work/effort. (I don’t think this is the OP’s argument though, which is based more on low effort criticism feeling unpleasant or discouraging to some post authors. I’m kind of going off on a tangent based on your mention of “proof of work”.)
Now, at first glance this may seem orthogonal to what you said—which was about how much effort went into the criticism, rather than how much went into the post—but note that evaluation of a criticism as “low-effort” is relative. If I write a very well-researched and lengthy post, and you write the median comment (in effort, length, etc.) in reply, that is “low-effort” relative to the post I wrote, yes?
This implies that criticism which is low-effort relative to the post it is responding to, should not only not be held to a higher bar, but in fact that it should be held to a lower bar!
That having been said, it is of course good to discourage bad criticism. But the point is that “how much effort went into this” is simply orthogonal to quality—and this is true for posts as well as comments.
So, we should discourage bad posts, and encourage good ones. We should discourage bad criticism, and encourage good criticism.
We should not, however, encourage high-effort posts merely for being high-effort, nor should we discourage low-effort criticism merely for being low-effort. What matters is results.
And note that the fact that low-effort criticisms are “more likely to be based on misunderstandings or otherwise low quality” is irrelevant. We can, and should, simply judge whether any given comment actually is a misunderstanding, etc., and respond appropriately.
But how often is it the case that a “long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced” is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations—in other words, that the hypothetical situation you describe does not reflect reality.
Taken to a extreme this road can lead to a place where a person thinks that only institutions can produce knowledge.
I tired to spend a long prose to explain my somewhat malformed fears but lets try short and interactive to have practise on the side of the discussions theory.
Trophical witch doctyor answer methodological questions that in fact he has been doing and invented from scracth inductive reasoning because being the only healer in the known world. Fake doctor by fraud assails other credibility by emphasising how western epistemogloy has peerreview and replications. While the university is struggling with reseach branches that have not done a single replication study in 10 years.
People give fraud doctor easy time and go hard on witch doctor. Those that declare witch doctors methods to be legit still face image problems and ordinary folk don’t lend their ear. Bar is high for witch doctor and low for fake doctor and clearing the bar is not significant in some important ways.
It is a distinct skill that when fake doctor prescribes wrong medice that the nurse has the audacity to think does his level of understanding confirm the appropriateness of the medication. If the nurse didn’t think of doctor as doctor he would be more critical and would let less inappropriate medication to pass.”Doctors are so hardworking we should support them” is not a proper way for nurse to adjust.
It is distinct skill to try to evaluate whether the witch doctor has invented a new medice making process without knowledge of chemistry what the active ingredient is. “Religion is ineffective hopeful thinking” is not a good adjust. The nurse skill and the witch doctor listener skills are linked.
Meanwhile “hey we have all these cool standard research practices, hope you get excited about them” is an important sell towards the witch doctor. And “you did a really good job listening to the psyhological trauma the patient was describing when checking in for a broken ankle” is important acknowledgement towards the fake doctor.
If scientist believe that you need to be scientist to have a chance to prove a scientist wrong, then science becomes immune from outside knowledge. It’s dangerous to posit that some party has monopoly on epistemological competency. If you believe so little in one-liners that you stop hearing them your longliner source better have a very fair and representative distribution. If you do not beleive a witch doctor could have done inductive experiments you don’t believe in inductive reasoning you believe in dominant cultures. If you do not believe that a doctor could be mistaken you don’t believe in empirisim but in authorative revelation.
darn still pretty long and I think this has the addiotnal property that there is lot left to imagination that saves on reading time but uses up imagination/decrypt time.
Be a human gatherer tribe. Go hunt rabbits for a week. Tribe member sits around campfire for a week. You both have 0 rabbits at the end of the week. It would seem that atleast you tried is meaningful. I get that someone that hunts for only one day and gets 1 rabbit is better than both. And it might be that you are getting “expected rabbit” even if not actual rabbit.
No, you want to incentivize rabbits, not expected rabbits. Trying to incentivize expected rabbits is mixing levels. Incentivize rabbits, and people will attend to expectation of rabbits themselves.
Like, just because there is any chance of a variable becoming disconnected in a principal-agent problem doesn’t mean that it’s always a bad idea to incentivize intermediary metrics. I am not fully sure how to understand your point as anything besides “never incentivize any lead-metrics whatsoever, only ever incentivize successful output”, which seems like a recipe for sparse reward landscapes, and also not a common practice in almost any domain in which humans deal with principal-agent problems.
Your employer pays you if you show up for work, not only if you successfully get work done (at least on the day-to-day or month-to-month level). You pay your plumber if they show up, not only if they successfully fix your toilet.
Like, if you see a friend taking an action that you know and they know has a 50% chance of making $10 for you and your friend (let’s say for a communal club) and a 50% chance of losing $5, and then turns out they lose $5, then it seems better to still reward your friend for taking that action, instead of punishing them, given that you know the action had positive expected value.
(assuming you have mostly linear value of money at these stakes)
If they think the odds are 90% $10 and 10% −5$ and you think the odds are 10% $10 and 90% −5$ should you reward for trying to benefit or punish for having wrong beliefs that materially matter?
No, because humans are risk-averse, at least in money terms, but also in most other currencies. If you do this, you increase the total risk for your friend, for no particular gain.
Punishment is also usually net-negative, whereas rewards tend to be zero-sum, so by adding a bunch of worlds where you added punishments, you destroyed a bunch of value, with no gain (in the world where you both have certainty about the payoff matrix).
One model here is that humans have diminishing returns on money, so in order to reward someone 2x with dollars, you have to pay more than 2x the dollar amount, so your total cost is higher.
A scenario with zero-sum actions and net-negative actions can only go downhill. This would seem to imply that if you have an opportunity to give feedback or not give feedback you should opt to get a guaranteed zero rather than risk destroying value.
Rewards are usually a transfer of resources (e.g. me giving you money), which tend to preserve total wealth (or status, or whatever other resource you are thinking about).
Unilateral punishments are usually not transfers of resource, they are usually one party imposing a cost on another party (like hitting them with a stick and injuring them), in a way that does not preserve total wealth (or health, or whatever other resource applies to the situation).
You certainly shouldn’t hit your friend with a stick if he loses $5 of your club’s money. I think this is fairly obvious, and it seems quite improbable that you were assuming that I was suggesting any such thing. So, given that we can’t possibly be talking about injuring anyone, or doing any such thing, how can your point about net-negative punishment apply? The more sensible assumption is that the punishment is of the same kind as the reward.
I think social punishments usually have the same form. Where rewards tend to be more of a transfer of status, and punishments more of a destruction of status (two people can destroy each others reputation with repeated social punishments).
There is also the bandwidth cost of punishment, as well as the simple fact that giving people praise usually comes with a positive emotional component for the receiver (in addition to the status and the reputation), whereas punishments usually come with an addition of stress and discomfort that reduces total output for a while.
In either case, I think the simpler case is made by simply looking at the assumption of diminishing returns in resources and realizing that the cost of giving someone a reward they care 2x about is usually larger than the cost of giving the reward twice, meaning that there is an inherent cost to high-variance reward landscapes.
Your employer pays you if you show up for work, not only if you successfully get work done (at least on the day-to-day or month-to-month level).
If you show up, but don’t get work done, you get fired. (How quickly that happens varies from workplace to workplace, of course—but in many places it happens very quickly indeed.)
Yeah, but the fact that it takes a while and we have monthly wages instead of just all being contractors that are paid by the piece is kind of my point. Most of the economy does not pay for completed output, but for intermediary metrics that allow a much higher-level of stability.
But note that even if you don’t get fired immediately for failing to produce satisfactory work, you are likely to receive a dressing-down from your boss, poor evaluations, etc., or even something so simple as your team leader being visibly disappointed with you, even if they take no immediate action.
Now consider what that analogizes to, in the case at hand. Is a downvote, or a critical comment, more like being fired, or more like your boss telling you that your work isn’t up to par and that you should really try to do better?
My experience is definitely the opposite. Random Quora question also suggests that it’s common practice in plumbing to pay someone for the attempt, not for the solution. As someone who recently hired plumbers and electricians to fix a bunch of stuff in a new house we rented, this also matches with my experience. Not sure where your experience comes from.
In general, most contractors bill by the hour, not for completed output, and definitely not “output that the client thinks is worth it”, at least in my experience (there are obviously exceptions, though I found them relatively rare).
My experience comes from the same sort of thing: having, on many occasions, hired various people to do various sorts of work; and also from having worked for several years working at a computer store that specialized in on-the-premises repair/service.
The Quora answer you linked doesn’t really support your point, as it’s quite clear about the prerequisite being an informed, explicit agreement between plumber and customer that the latter will pay the former regardless of outcome. (And even with that caveat, some of what the answer-giver says is suspect, and is not consistent with my experience.)
I do not know of any industry in which contractor agreements with variable payments that are dependent on the quality of the output are common practice. There is often an agreement on what it means to “complete the work” but in almost any case both your downside and your upside are limited by a guaranteed upfront payment, and a conditional final payment. But it’s almost never the case that you can get 2x the money depending on the quality of your output, which seems like a necessary requirement for some of the incentive schemes you outlined.
What does this have to do with anything? You originally said:
Youu pay your plumber if they show up, not only if they successfully fix your toilet.
I don’t see the connection between “should you pay your plumber even if they don’t actually fix your toilet” and “should you pay your plumber twice as much if they fix your toilet twice as well”; the latter seems like a nonsensical question, and unrelated to the former.
(someone else downvoted. I was sort of torn between downvoting because it seemed importantly wrong, but upvoting for stating the disagreement clearly, ended up not voting either way. Someday we’ll probably have disagree reacts or something)
This isn’t intrinsically wrong, but basically wrong in many circumstances. I’m guessing this is a fairly important crux of disagreement.
The problem comes if either:
a) you actively punish attempts that had a high expectation of rabbits but didn’t produce rabbits. This just straightforwardly punishes high variance strategies. You need at least some people doing low variance strategies, otherwise a week where nobody brings home any rabbits, everyone dies. But if you punish high variance strategies whenever they are employed, you’re going to end up with a lot fewer rabbits.
[there might be a further disagreement about how human psychology works and what counts as punishing]
b) you systematically incentive legibly producing countable rabbits, in a world where it turned out a lot of value wasn’t just about the rabbits, or that some rabbits were actually harder to notice. I think one of the major problems with goodhart in the 20th century comes from expectation of legible results.
Figuring out how to navigate the tension between “much of value isn’t yet legible to us and overfocus on it is goodharting / destroying value” and “but, also, if you don’t focus on legible results you get nonsense” is, in my frame, basically the problem.
you actively punish attempts that had a high expectation of rabbits but didn’t produce rabbits. This just straightforwardly punishes high variance strategies. You need at least some people doing low variance strategies, otherwise a week where nobody brings home any rabbits, everyone dies. But if you punish high variance strategies whenever they are employed, you’re going to end up with a lot fewer rabbits.
You should neither reward nor punish strategies or attempts at all, but results. If I am executing a high-variance strategy, and you punish poor results, and reward good results, in accordance with how poor/good they are, then (if I am right about my strategy having a positive expectation) I will—in expectation—be rewarded. This will incentivize me to execute said strategy (assuming I am not risk-averse—but if I am, then I’m not going to be the one trying the high-variance strategy anyway).
you systematically incentive legibly producing countable rabbits, in a world where it turned out a lot of value wasn’t just about the rabbits, or that some rabbits were actually harder to notice. I think one of the major problems with goodhart in the 20th century comes from expectation of legible results.
I was talking about rabbits (or things very similar to rabbits). I made, and make, no guarantees that the analysis applies when analogized to anything very different. (It seems clear that the analysis does apply in some very different situations, and does not apply in others.) Reasoning by analogy is dangerous; if we propose to attempt it, we need to be very clear about what the assumptions of the model are, and how the situations we are analogizing differ, and what that does to our assumptions.
You should neither reward nor punish strategies or attempts at all, but results.
This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don’t see why it’s obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence—namely, “results”—and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate—namely, “strategies”.
I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we “meant” to reward or punish, our rewards/punishments may end up impacting aspects of the agent’s behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)
The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence—a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.
(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations—which are often bottlenecked on cognitive effort—generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart’s Law.))
Strange. You bring up Goodhart’s Law, but the way you apply it seems exactly backwards to me. If you’re rewarding strategies instead of results, and someone comes up with a new strategy that has far better results than the strategy you’re rewarding, you fail to reward people for developing better strategies or getting better results. This seems like it’s exactly what Goodhart was trying to warn us about.
I agree this is a weird place to bring up Goodhart that requires extra justification. But I do think it makes sense here. (Though I also agree with Said elsethread that it matters a lot what we’re actually talking about – rabbits, widgets, companies, scientific papers and blogposts might all behave a bit differently)
The two main issues are:
it’s hard to directly incentivize results with fuzzy, unpredictable characteristics.
it’s hard to directly incentivize results over long timescales
In short: preoccupation with “rewarding results”, in situations where you can’t actually reward results, can result in goodharting for all the usual reasons.
Two examples here are:
Scientific papers. Probably the closest to a directly relevant example before we start talking about blogposts in particular. My impression [epistemic status: relatively weak, based on anecodotes, but it seems like everyone I’ve heard talk about this roughly agreed with these anecdotes] is that the publish-or-perish mindset for academic output has been a pretty direct example of “we tried directly incentivizing results, and instead of getting more science we got shittier science.”
Founding Companies. There are ecosystems for founding and investing in companies (startups and otherwise), which are ultimately about a particular result (making money). But, this requires very long time horizons, which many people a) literally can’t pull off because they don’t have the runway, b) are often risk averse, and might not be willing to do it if they had to just risk their own money.
The business venture world works because there’s been a lot of infrastructural effort put into enabling particular strategies (in the case of startups, an established concept of seed funding, series A, etc). In the case of business sometimes more straightforward loans.
The relation to goodhart here is a bit weirder here because yeah, overfocus on “known strategies” is also one of the pathologies that results in goodharting (i.e. everything thinks social media is Hype, so everyone founds social media companies, but maybe by this point social media is overdone and you actually need to be looking for weirder things that people haven’t already saturated the market with)
But, the goodhart is instead “if you don’t put effort into maintaining the early stages of the strategy, despite many instances of that strategy failing… you just end up with less money.”
My sense [again, epistemic status fairly weak based on things Paul Graham said, but that I haven’t heard explicitly argued against] is venture capitalists make the most money from the long tail of companies they invest in. Being willing to get “real results” requires being willing to tolerate lots of things that don’t pay off in results. And many of those companies in the long tail were startup ideas that didn’t sound like sure bets.
There is some sense in which “directly rewarding results” is of course the best way to avoid goodharting, but since we don’t actually have access to “direct results that actually represent the real thing” to reward, the impulse to directly reward results can often result in rewarding not-actually-results.
Sure, that all makes sense, but at least on LW it seems like we ought to insist on saying “rewarding results” when we mean rewarding results, and “deceiving ourselves into thinking we’re rewarding results” when we mean deceiving ourselves into thinking we’re rewarding results.
That makes sense, although I’m not actually sure either “rewarding results” or “deceiving ourselves into thinking we’re rewarding results” quite capture what’s going on here.
Like, I do think it’s possible to reward individual good things (whether blogposts or scientific papers) when you find them. The question is how this shapes the overall system. When you expect “good/real results” to be few and far between, the process of “only reward things that are obvious good and/or great” might technically be rewarding results, while still outputting fewer results on average than if you had rewarded people for following overall strategies like “pursue things you’re earnestly curious about”, and giving people positive rewards for incremental steps along the way.
(Seems good to be precise about language here but I’m not in fact sure how to word this optimally. Meanwhile, earlier parts of the conversation were more explicitly about how ‘reward final results, and only final results’ just isn’t the strategy used in most of the business world)
Strong upvote for clear articulation of points I wanted to see made.
The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation.
This part isn’t obviously/exactly correct to me. If we’re talking about posts and comments on LessWrong, it can be quite hard for me to assess whether a given post is correct or not (although even incorrect posts are often quite valuable parts of the discourse). It might also take a lot of information/effort to arrive that the belief that the strategy of “invest more effort, generate more ideas” leads ultimately to more good ideas such that incentivizing generation itself is good. However, once I hold that belief, it’s relatively easy to apply it. I see someone investing effort in adding to communal knowledge in a way that is plausibly correct/helpful; I then encourage this pro-social contribution despite the fact evaluating whether the post was actually correct or not* can be extremely difficult.
*”Correct or not” is a bit binary, but even assessing the overall “quality” or “value” of a post doesn’t make it much easier to assess. Far harder than number of rabbits. However, if a post doesn’t seem obviously wrong (or even if it’s clearly wrong but because understandable mistake many people might make), I can often confidently say that it is contributing to communal knowledge (often via the discussion it sparks or simply because someone could correct a reasonable misunderstanding) and I overall want to encourage more of whatever generated it. I’m happy to get more posts like that, even if I seek push for refinements in the process, say.
(Reacts or separate upvote/downvotes vs agree/disagree buttons will hopefully make it easier in the future to encourage effort even while expressing that I think something is wrong. )
We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
You seem to have interpreted my comments as saying that we’re trying to reward some particular behavior, but we should do this by rewarding the results of that behavior. As you point out, this is not a wise plan.
But it’s also not what I am saying, at all. I am saying that we are (or, again, should be) trying to reward the results. Not the behavior that led to those results, but the results themselves.
I don’t know why you’re assuming that we’re actually trying to encourage some specific behavior. It’s certainly not what I am assuming. Doing so would not be a very good idea at all.
I think with that approach there are a great many results you’d fail to achieve. People can get animals to do remarkable things with shaping and I would wager that you can’t do them at all otherwise.
We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot. … The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. … The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay.
Humans are more sophisticated than birds, but producing highly complex and abstruse truths in a format understandable to others is also a lot more complicated than getting a bird to put its beak in a particular spot. I think all the same mechanics are at work. If you want to get someone (including yourself) to do something as complex and difficult as producing valuable, novel, correct, expositions of true things on LessWrong—you’re going to have to reward the predictable intermediary steps.
We don’t go to five year olds and say “the desired result is that you can write fluently, therefore no positive feedback on your marginal efforts until you can do so, in fact, I’m going to strike your knuckles every time you make a spelling error or anything which isn’t what we hope to see from you when you’re 12, we will only reward the final desired result and you can back propagate from that to get figure out what’s good.” That’s really only a recipe for children who are unwilling to put any effort in learning to write, not those who progressively put in effort over years to learn what it even looks like to a be a competent writer.
This is beyond my earlier point that verifying results in our cases is often much harder than verifying that good steps were being taken.
We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
I’m afraid this sentence doesn’t parse for me. You seem to be speaking of “results” as something which to which the concept of rewards and punishments are applicable. However, I’m not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I’ve encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there’s something else you’re referring to when you say “reward or punish the results”, I would appreciate it if you clarified what exactly that thing is.
I don’t see what could be simpler. Alice does something. That action has some result. We reward Alice, or punish her, based on the results of her action. There is nothing unusual or obscure here; I mean just what I say.
(There are cases where we do not want to take this approach, but they tend to both be controversial and to be unusual in certain important respects.)
Edit: And if you’re trying to use operant conditioning, of all things, to decide what social norms to have on a forum devoted to the art of rationality, then you’ve already admitted defeat, and this entire project is pointless.
assuming I am not risk-averse—but if I am, then I’m not going to be the one trying the high-variance strategy anyway
But, of course, everyone is risk averse in almost every resource. Even the most ambitious startup founders are still risk averse in total payment, just less so than others. I care less about my 10th million dollar than any of my first 9 million dollars, which already creates risk aversion. The same is true for status or almost any other resource with which you might want to reward people.
“X does damage to Y, therefore X is bad” is a fallacy, because sometimes Y should be damaged. (Or do you think that eliminating bad things isn’t good?)
“A is the product of a lot of effort, therefore A is good” is also a fallacy, because what matters is results. (Or do you think that the labor theory of value applies to ideas?)
It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value. It is bad to encourage people to do such things.
Effort spent foolishly, or harmfully, should not be respected.
It is, of course, both imprudent and harmful to criticize ideas which you do not understand. Likewise, it is, of course, a waste of everyone’s time to reply to a post with arguments which have already been addressed in the post (but which you have neglected to read). These things are to be discouraged and penalized—obviously.
But there is no virtue in mere effort. If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).
In addition to the rest of it, high-effort, low-value contributions waste readers’ time. Low-effort, high-value contributions save readers’ time.
Edsger Dijkstra said:
And this applies fully, not just to code, but to words.
Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of “making things that have little or no (or even negative) value” as a primitive action—something that can be “encouraged” or “discouraged”—whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.
If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be “yes, we ought”), but rather, whether we ought to discourage posters from spending effort. To this, you say:
Perhaps there is no “virtue” in effort, but in that case we must ask why “virtue” is the thing we are measuring. If the goal is to maximize, not “virtue”, but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it’s hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.
And what does it mean to “encourage” or “discourage” a poster? Based on the following part of your comment, it seems that you are taking “discourage” to mean something along the lines of “point out ways in which the post in question is mistaken”:
But how often is it the case that a “long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced” is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations—in other words, that the hypothetical situation you describe does not reflect reality.
What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called “drive-by low-effort criticism” described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.
I don’t think high-effort posts are more likely to contain muddled thinking, but I do think readers are less likely to notice muddled thinking when it appears in high-effort posts, so suppressing criticism of high-effort posts is especially dangerous.
Sure, that’s easy: apply the discouragement (downvotes, critical comments, etc.) only to low-value things, and not to high-value things.
Or are you suggesting that you (or, perhaps, Less Wrong participants in general?) can’t tell the difference between low-value things and high-value things?
“Virtue” here means “whatever we take to be good and desirable, and that which produces those things”. We are measuring it because it is, by definition, the thing we want to be measuring.
[emphasis mine]
Did you mean to write “high-effort”, in place of the bolded part? (If, however, you meant what you wrote, then I don’t understand what you’re trying to say, here; please explain.)
I mean whatever the OP means when he talks about adverse effects, etc.
Not that often, sadly. (Here’s an example. Here’s another. Here’s one which is three short sentences. Here’s another one-liner. This one is two sentences. This one is also two sentences. Another one-liner.) It’s hard to do this sort of thing well; it’s easier to write a long, rambling comment. That is exactly why such density of refutation should be encouraged, not discouraged; because it is highly desirable, but difficult (and thus rare).
Yes, we should discourage low-quality criticism which is wrong, and encourage high-quality criticism which is right. (I already said this, in the grandparent.) Having accounted for this, it makes no sense at all to prefer longer critical comments to shorter ones. (Quite the opposite preference would be sensible, in fact.)
I think that compared to high-effort criticisms, low-effort criticisms are much more likely to be based on misunderstandings or otherwise low quality. I interpret Lionhearted as saying that criticism should, on the margin, be held to a higher bar than it is now.
What is the evaluation of “effort” even doing here? Why not just evaluate whether the criticism is high-quality, understands the post, is correct, etc?
Requiring “effort” (independent of quality) is a proof-of-work scheme meant to tax criticism.
Proof-of-work was originally invented to fight email spam. The analogous argument plausibly applies here: evaluating quality (e.g., letting a human read the email to decide what to do with it, trying to figure out whether a criticism actually makes sense) is costly, so it’s more efficient to first filter using a cheaper-to-evaluate signal/proxy like work/effort. (I don’t think this is the OP’s argument though, which is based more on low effort criticism feeling unpleasant or discouraging to some post authors. I’m kind of going off on a tangent based on your mention of “proof of work”.)
ETA: I recalled that I actually wrote a post about this: Think Before You Speak (And Signal It).
As clone of saturn points out elsethread, criticism of high-effort posts is especially valuable.
Now, at first glance this may seem orthogonal to what you said—which was about how much effort went into the criticism, rather than how much went into the post—but note that evaluation of a criticism as “low-effort” is relative. If I write a very well-researched and lengthy post, and you write the median comment (in effort, length, etc.) in reply, that is “low-effort” relative to the post I wrote, yes?
This implies that criticism which is low-effort relative to the post it is responding to, should not only not be held to a higher bar, but in fact that it should be held to a lower bar!
That having been said, it is of course good to discourage bad criticism. But the point is that “how much effort went into this” is simply orthogonal to quality—and this is true for posts as well as comments.
So, we should discourage bad posts, and encourage good ones. We should discourage bad criticism, and encourage good criticism.
We should not, however, encourage high-effort posts merely for being high-effort, nor should we discourage low-effort criticism merely for being low-effort. What matters is results.
And note that the fact that low-effort criticisms are “more likely to be based on misunderstandings or otherwise low quality” is irrelevant. We can, and should, simply judge whether any given comment actually is a misunderstanding, etc., and respond appropriately.
Taken to a extreme this road can lead to a place where a person thinks that only institutions can produce knowledge.
I tired to spend a long prose to explain my somewhat malformed fears but lets try short and interactive to have practise on the side of the discussions theory.
Trophical witch doctyor answer methodological questions that in fact he has been doing and invented from scracth inductive reasoning because being the only healer in the known world. Fake doctor by fraud assails other credibility by emphasising how western epistemogloy has peerreview and replications. While the university is struggling with reseach branches that have not done a single replication study in 10 years.
People give fraud doctor easy time and go hard on witch doctor. Those that declare witch doctors methods to be legit still face image problems and ordinary folk don’t lend their ear. Bar is high for witch doctor and low for fake doctor and clearing the bar is not significant in some important ways.
It is a distinct skill that when fake doctor prescribes wrong medice that the nurse has the audacity to think does his level of understanding confirm the appropriateness of the medication. If the nurse didn’t think of doctor as doctor he would be more critical and would let less inappropriate medication to pass.”Doctors are so hardworking we should support them” is not a proper way for nurse to adjust.
It is distinct skill to try to evaluate whether the witch doctor has invented a new medice making process without knowledge of chemistry what the active ingredient is. “Religion is ineffective hopeful thinking” is not a good adjust. The nurse skill and the witch doctor listener skills are linked.
Meanwhile “hey we have all these cool standard research practices, hope you get excited about them” is an important sell towards the witch doctor. And “you did a really good job listening to the psyhological trauma the patient was describing when checking in for a broken ankle” is important acknowledgement towards the fake doctor.
If scientist believe that you need to be scientist to have a chance to prove a scientist wrong, then science becomes immune from outside knowledge. It’s dangerous to posit that some party has monopoly on epistemological competency. If you believe so little in one-liners that you stop hearing them your longliner source better have a very fair and representative distribution. If you do not beleive a witch doctor could have done inductive experiments you don’t believe in inductive reasoning you believe in dominant cultures. If you do not believe that a doctor could be mistaken you don’t believe in empirisim but in authorative revelation.
darn still pretty long and I think this has the addiotnal property that there is lot left to imagination that saves on reading time but uses up imagination/decrypt time.
Be a human gatherer tribe. Go hunt rabbits for a week. Tribe member sits around campfire for a week. You both have 0 rabbits at the end of the week. It would seem that atleast you tried is meaningful. I get that someone that hunts for only one day and gets 1 rabbit is better than both. And it might be that you are getting “expected rabbit” even if not actual rabbit.
You can’t eat “expected rabbits”.
But you do want to incentivize expected rabbits.
No, you want to incentivize rabbits, not expected rabbits. Trying to incentivize expected rabbits is mixing levels. Incentivize rabbits, and people will attend to expectation of rabbits themselves.
I… am confused?
Like, just because there is any chance of a variable becoming disconnected in a principal-agent problem doesn’t mean that it’s always a bad idea to incentivize intermediary metrics. I am not fully sure how to understand your point as anything besides “never incentivize any lead-metrics whatsoever, only ever incentivize successful output”, which seems like a recipe for sparse reward landscapes, and also not a common practice in almost any domain in which humans deal with principal-agent problems.
Your employer pays you if you show up for work, not only if you successfully get work done (at least on the day-to-day or month-to-month level). You pay your plumber if they show up, not only if they successfully fix your toilet.
Like, if you see a friend taking an action that you know and they know has a 50% chance of making $10 for you and your friend (let’s say for a communal club) and a 50% chance of losing $5, and then turns out they lose $5, then it seems better to still reward your friend for taking that action, instead of punishing them, given that you know the action had positive expected value.
(assuming you have mostly linear value of money at these stakes)
If they think the odds are 90% $10 and 10% −5$ and you think the odds are 10% $10 and 90% −5$ should you reward for trying to benefit or punish for having wrong beliefs that materially matter?
You should punish your friend for the loss, and reward them (twice as much) for a win. This creates the correct incentives.
No, because humans are risk-averse, at least in money terms, but also in most other currencies. If you do this, you increase the total risk for your friend, for no particular gain.
Punishment is also usually net-negative, whereas rewards tend to be zero-sum, so by adding a bunch of worlds where you added punishments, you destroyed a bunch of value, with no gain (in the world where you both have certainty about the payoff matrix).
One model here is that humans have diminishing returns on money, so in order to reward someone
2x
with dollars, you have to pay more than 2x the dollar amount, so your total cost is higher.A scenario with zero-sum actions and net-negative actions can only go downhill. This would seem to imply that if you have an opportunity to give feedback or not give feedback you should opt to get a guaranteed zero rather than risk destroying value.
Could you elaborate on this? I’m not at all sure what this is referring to.
Rewards are usually a transfer of resources (e.g. me giving you money), which tend to preserve total wealth (or status, or whatever other resource you are thinking about).
Unilateral punishments are usually not transfers of resource, they are usually one party imposing a cost on another party (like hitting them with a stick and injuring them), in a way that does not preserve total wealth (or health, or whatever other resource applies to the situation).
You certainly shouldn’t hit your friend with a stick if he loses $5 of your club’s money. I think this is fairly obvious, and it seems quite improbable that you were assuming that I was suggesting any such thing. So, given that we can’t possibly be talking about injuring anyone, or doing any such thing, how can your point about net-negative punishment apply? The more sensible assumption is that the punishment is of the same kind as the reward.
I think social punishments usually have the same form. Where rewards tend to be more of a transfer of status, and punishments more of a destruction of status (two people can destroy each others reputation with repeated social punishments).
There is also the bandwidth cost of punishment, as well as the simple fact that giving people praise usually comes with a positive emotional component for the receiver (in addition to the status and the reputation), whereas punishments usually come with an addition of stress and discomfort that reduces total output for a while.
In either case, I think the simpler case is made by simply looking at the assumption of diminishing returns in resources and realizing that the cost of giving someone a reward they care 2x about is usually larger than the cost of giving the reward twice, meaning that there is an inherent cost to high-variance reward landscapes.
If you show up, but don’t get work done, you get fired. (How quickly that happens varies from workplace to workplace, of course—but in many places it happens very quickly indeed.)
Yeah, but the fact that it takes a while and we have monthly wages instead of just all being contractors that are paid by the piece is kind of my point. Most of the economy does not pay for completed output, but for intermediary metrics that allow a much higher-level of stability.
But note that even if you don’t get fired immediately for failing to produce satisfactory work, you are likely to receive a dressing-down from your boss, poor evaluations, etc., or even something so simple as your team leader being visibly disappointed with you, even if they take no immediate action.
Now consider what that analogizes to, in the case at hand. Is a downvote, or a critical comment, more like being fired, or more like your boss telling you that your work isn’t up to par and that you should really try to do better?
I think it’s sort of like your boss telling you your work isn’t good, when your boss also isn’t paying you and you’re there as a volunteer.
If your boss isn’t paying you, then what’s the point of the employment analogy? That’s not employment at all, is it?
… what? Of course you only pay your plumber if they successfully fix your toilet!
My experience is definitely the opposite. Random Quora question also suggests that it’s common practice in plumbing to pay someone for the attempt, not for the solution. As someone who recently hired plumbers and electricians to fix a bunch of stuff in a new house we rented, this also matches with my experience. Not sure where your experience comes from.
In general, most contractors bill by the hour, not for completed output, and definitely not “output that the client thinks is worth it”, at least in my experience (there are obviously exceptions, though I found them relatively rare).
My experience comes from the same sort of thing: having, on many occasions, hired various people to do various sorts of work; and also from having worked for several years working at a computer store that specialized in on-the-premises repair/service.
The Quora answer you linked doesn’t really support your point, as it’s quite clear about the prerequisite being an informed, explicit agreement between plumber and customer that the latter will pay the former regardless of outcome. (And even with that caveat, some of what the answer-giver says is suspect, and is not consistent with my experience.)
I do not know of any industry in which contractor agreements with variable payments that are dependent on the quality of the output are common practice. There is often an agreement on what it means to “complete the work” but in almost any case both your downside and your upside are limited by a guaranteed upfront payment, and a conditional final payment. But it’s almost never the case that you can get 2x the money depending on the quality of your output, which seems like a necessary requirement for some of the incentive schemes you outlined.
What does this have to do with anything? You originally said:
I don’t see the connection between “should you pay your plumber even if they don’t actually fix your toilet” and “should you pay your plumber twice as much if they fix your toilet twice as well”; the latter seems like a nonsensical question, and unrelated to the former.
(someone else downvoted. I was sort of torn between downvoting because it seemed importantly wrong, but upvoting for stating the disagreement clearly, ended up not voting either way. Someday we’ll probably have disagree reacts or something)
This isn’t intrinsically wrong, but basically wrong in many circumstances. I’m guessing this is a fairly important crux of disagreement.
The problem comes if either:
a) you actively punish attempts that had a high expectation of rabbits but didn’t produce rabbits. This just straightforwardly punishes high variance strategies. You need at least some people doing low variance strategies, otherwise a week where nobody brings home any rabbits, everyone dies. But if you punish high variance strategies whenever they are employed, you’re going to end up with a lot fewer rabbits.
[there might be a further disagreement about how human psychology works and what counts as punishing]
b) you systematically incentive legibly producing countable rabbits, in a world where it turned out a lot of value wasn’t just about the rabbits, or that some rabbits were actually harder to notice. I think one of the major problems with goodhart in the 20th century comes from expectation of legible results.
Figuring out how to navigate the tension between “much of value isn’t yet legible to us and overfocus on it is goodharting / destroying value” and “but, also, if you don’t focus on legible results you get nonsense” is, in my frame, basically the problem.
You should neither reward nor punish strategies or attempts at all, but results. If I am executing a high-variance strategy, and you punish poor results, and reward good results, in accordance with how poor/good they are, then (if I am right about my strategy having a positive expectation) I will—in expectation—be rewarded. This will incentivize me to execute said strategy (assuming I am not risk-averse—but if I am, then I’m not going to be the one trying the high-variance strategy anyway).
I was talking about rabbits (or things very similar to rabbits). I made, and make, no guarantees that the analysis applies when analogized to anything very different. (It seems clear that the analysis does apply in some very different situations, and does not apply in others.) Reasoning by analogy is dangerous; if we propose to attempt it, we need to be very clear about what the assumptions of the model are, and how the situations we are analogizing differ, and what that does to our assumptions.
This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don’t see why it’s obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence—namely, “results”—and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate—namely, “strategies”.
I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we “meant” to reward or punish, our rewards/punishments may end up impacting aspects of the agent’s behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)
The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence—a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.
(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations—which are often bottlenecked on cognitive effort—generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart’s Law.))
Strange. You bring up Goodhart’s Law, but the way you apply it seems exactly backwards to me. If you’re rewarding strategies instead of results, and someone comes up with a new strategy that has far better results than the strategy you’re rewarding, you fail to reward people for developing better strategies or getting better results. This seems like it’s exactly what Goodhart was trying to warn us about.
I agree this is a weird place to bring up Goodhart that requires extra justification. But I do think it makes sense here. (Though I also agree with Said elsethread that it matters a lot what we’re actually talking about – rabbits, widgets, companies, scientific papers and blogposts might all behave a bit differently)
The two main issues are:
it’s hard to directly incentivize results with fuzzy, unpredictable characteristics.
it’s hard to directly incentivize results over long timescales
In short: preoccupation with “rewarding results”, in situations where you can’t actually reward results, can result in goodharting for all the usual reasons.
Two examples here are:
Scientific papers. Probably the closest to a directly relevant example before we start talking about blogposts in particular. My impression [epistemic status: relatively weak, based on anecodotes, but it seems like everyone I’ve heard talk about this roughly agreed with these anecdotes] is that the publish-or-perish mindset for academic output has been a pretty direct example of “we tried directly incentivizing results, and instead of getting more science we got shittier science.”
Founding Companies. There are ecosystems for founding and investing in companies (startups and otherwise), which are ultimately about a particular result (making money). But, this requires very long time horizons, which many people a) literally can’t pull off because they don’t have the runway, b) are often risk averse, and might not be willing to do it if they had to just risk their own money.
The business venture world works because there’s been a lot of infrastructural effort put into enabling particular strategies (in the case of startups, an established concept of seed funding, series A, etc). In the case of business sometimes more straightforward loans.
The relation to goodhart here is a bit weirder here because yeah, overfocus on “known strategies” is also one of the pathologies that results in goodharting (i.e. everything thinks social media is Hype, so everyone founds social media companies, but maybe by this point social media is overdone and you actually need to be looking for weirder things that people haven’t already saturated the market with)
But, the goodhart is instead “if you don’t put effort into maintaining the early stages of the strategy, despite many instances of that strategy failing… you just end up with less money.”
My sense [again, epistemic status fairly weak based on things Paul Graham said, but that I haven’t heard explicitly argued against] is venture capitalists make the most money from the long tail of companies they invest in. Being willing to get “real results” requires being willing to tolerate lots of things that don’t pay off in results. And many of those companies in the long tail were startup ideas that didn’t sound like sure bets.
There is some sense in which “directly rewarding results” is of course the best way to avoid goodharting, but since we don’t actually have access to “direct results that actually represent the real thing” to reward, the impulse to directly reward results can often result in rewarding not-actually-results.
Sure, that all makes sense, but at least on LW it seems like we ought to insist on saying “rewarding results” when we mean rewarding results, and “deceiving ourselves into thinking we’re rewarding results” when we mean deceiving ourselves into thinking we’re rewarding results.
That makes sense, although I’m not actually sure either “rewarding results” or “deceiving ourselves into thinking we’re rewarding results” quite capture what’s going on here.
Like, I do think it’s possible to reward individual good things (whether blogposts or scientific papers) when you find them. The question is how this shapes the overall system. When you expect “good/real results” to be few and far between, the process of “only reward things that are obvious good and/or great” might technically be rewarding results, while still outputting fewer results on average than if you had rewarded people for following overall strategies like “pursue things you’re earnestly curious about”, and giving people positive rewards for incremental steps along the way.
(Seems good to be precise about language here but I’m not in fact sure how to word this optimally. Meanwhile, earlier parts of the conversation were more explicitly about how ‘reward final results, and only final results’ just isn’t the strategy used in most of the business world)
Strong upvote for clear articulation of points I wanted to see made.
This part isn’t obviously/exactly correct to me. If we’re talking about posts and comments on LessWrong, it can be quite hard for me to assess whether a given post is correct or not (although even incorrect posts are often quite valuable parts of the discourse). It might also take a lot of information/effort to arrive that the belief that the strategy of “invest more effort, generate more ideas” leads ultimately to more good ideas such that incentivizing generation itself is good. However, once I hold that belief, it’s relatively easy to apply it. I see someone investing effort in adding to communal knowledge in a way that is plausibly correct/helpful; I then encourage this pro-social contribution despite the fact evaluating whether the post was actually correct or not* can be extremely difficult.
*”Correct or not” is a bit binary, but even assessing the overall “quality” or “value” of a post doesn’t make it much easier to assess. Far harder than number of rabbits. However, if a post doesn’t seem obviously wrong (or even if it’s clearly wrong but because understandable mistake many people might make), I can often confidently say that it is contributing to communal knowledge (often via the discussion it sparks or simply because someone could correct a reasonable misunderstanding) and I overall want to encourage more of whatever generated it. I’m happy to get more posts like that, even if I seek push for refinements in the process, say.
(Reacts or separate upvote/downvotes vs agree/disagree buttons will hopefully make it easier in the future to encourage effort even while expressing that I think something is wrong. )
You’re still missing my point.
We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
You seem to have interpreted my comments as saying that we’re trying to reward some particular behavior, but we should do this by rewarding the results of that behavior. As you point out, this is not a wise plan.
But it’s also not what I am saying, at all. I am saying that we are (or, again, should be) trying to reward the results. Not the behavior that led to those results, but the results themselves.
I don’t know why you’re assuming that we’re actually trying to encourage some specific behavior. It’s certainly not what I am assuming. Doing so would not be a very good idea at all.
I think with that approach there are a great many results you’d fail to achieve. People can get animals to do remarkable things with shaping and I would wager that you can’t do them at all otherwise.
From the Wikipedia article on Shaping (psychology):
Humans are more sophisticated than birds, but producing highly complex and abstruse truths in a format understandable to others is also a lot more complicated than getting a bird to put its beak in a particular spot. I think all the same mechanics are at work. If you want to get someone (including yourself) to do something as complex and difficult as producing valuable, novel, correct, expositions of true things on LessWrong—you’re going to have to reward the predictable intermediary steps.
We don’t go to five year olds and say “the desired result is that you can write fluently, therefore no positive feedback on your marginal efforts until you can do so, in fact, I’m going to strike your knuckles every time you make a spelling error or anything which isn’t what we hope to see from you when you’re 12, we will only reward the final desired result and you can back propagate from that to get figure out what’s good.” That’s really only a recipe for children who are unwilling to put any effort in learning to write, not those who progressively put in effort over years to learn what it even looks like to a be a competent writer.
This is beyond my earlier point that verifying results in our cases is often much harder than verifying that good steps were being taken.
See the “Edit:” part of this comment, which is my response to your comment also.
I’m afraid this sentence doesn’t parse for me. You seem to be speaking of “results” as something which to which the concept of rewards and punishments are applicable. However, I’m not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I’ve encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there’s something else you’re referring to when you say “reward or punish the results”, I would appreciate it if you clarified what exactly that thing is.
I don’t see what could be simpler. Alice does something. That action has some result. We reward Alice, or punish her, based on the results of her action. There is nothing unusual or obscure here; I mean just what I say.
(There are cases where we do not want to take this approach, but they tend to both be controversial and to be unusual in certain important respects.)
Edit: And if you’re trying to use operant conditioning, of all things, to decide what social norms to have on a forum devoted to the art of rationality, then you’ve already admitted defeat, and this entire project is pointless.
But, of course, everyone is risk averse in almost every resource. Even the most ambitious startup founders are still risk averse in total payment, just less so than others. I care less about my 10th million dollar than any of my first 9 million dollars, which already creates risk aversion. The same is true for status or almost any other resource with which you might want to reward people.