(It seemed important that Habryka not be the one to curate this piece, since he had commissioned it. But I independently quite liked it)
Several things I liked about this post:
It told me some concrete things about remote teams. In particular:
the notion that you should either go “fully remote” or “not remote”
the notion that the benefits of co-locating drop off after a literal radius which extends 30m.
It gave me some sense of how good the evidence on remote teams are (i.e. not very), while providing a bunch of links to followup if I wanted to get an even better sense.
LessWrong currently doesn’t feel like rewards serious scholarship as much as it should, so I’d like to generally reward it when it happens. I also think this post did a good job if combining short, easily readable takeaways with the more extensive background literature.
Note: the following is my personal best guesses about directions LW should go. Habryka disagrees significantly with at least some of the claims here— both on the object and meta levels.
This post was also jumped out significantly as… aspiring to higher epistemic standards than the median curated post. This led me to thinking about it through the lens of peer review (which I have previously mused about)
I ultimately want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.
Right now our highest reward is getting into the curated section, which… just isn’t actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at “extremely well written and extremely epistemically rigorous and extremely useful”, we would basically never be able to curate anything.
My current guess is that there should be a “higher than curated” level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once. Still, there is something significant about writing a post that is at least worth considering for that level.
This post is one of a few ones in the past few months that I’d be interested in seeing improved to meet that level. (Another recent example is Kaj’s sequence on Multi-Agent-Models).
I do think it’d involve some significant work to meet that bar. Things that I’m currently thinking of (not highly confident that any of this is the right thing, but showcasing what sort of improvements I’m imagining)
Someone doing some epistemic spot checks on the claims made here
Improving the presentation (right now it’s written in a kind of bare-bones notes format)
Dramatically improving the notes, to be more readable
Improving the diagram of elizabeth’s model of productivity so it’s easier to parse.
Orienting a bit more around the “the state of management research is shitty” issue. I think (low confidence) that a good practice for LessWrong, if we review a field and find that the evidence base is very shaky, it’d be good to reflect on what it would take to make the evidence less shaky. (This is beyond scope for what habryka originally commissioned, but feels fairly important in the context I’m thinking through here)
Is it worth putting all that work for this particular post? Dunno, probably not. But it seems worth periodically reflecting on how far the bar would be set, when comparing what LessWrong could ultimately be vs. what is necessary to in-practice be.
What about getting money involved? Even relatively small amounts can still confer prestige better than an additional tag or homepage section. It seems like rigorous well-researched posts like this are valuable enough that crowdfunding or someone like OpenPhil or CFAR could sponsor a best-post prize to be awarded monthly. If that goes well you could add incentives for peer-review.
What makes this situation unusual is that being acknowledged by famous computer scientist Donald Knuth to have contributed something useful to one of his works is inherently prestigious; the check is evidence of that reward, not itself the reward. (Note that many of the checks do not even get cashed! A trophy showing that you fixed a bug in Knuth’s code is vastly more valuable than enough money to buy a plain slice of pizza.)
In contrast, Less Wrong is not prestigious. No one will be impressed to hear that you wrote a Less Wrong post. How likely do you think it is that someone who is paid some money for a well-researched LW post will, instead of claiming said money, frame the check and display it proudly?
I think you’re viewing intrinsic versus extrinsic reward as dichotomous rather than continuous. Knuth awards are on one end of the spectrum, salaries at large organizations are at the other. Prestige isn’t binary, and there is a clear interaction between prestige and standards—raising standards can itself increase prestige, which will itself make the monetary rewards more prestigious.
Sure, but we can close the global prestige gap to some extent, and in the mean time, we can leverage in-group social prestige, as the current format implicitly does.
Orienting a bit more around the “the state of management research is shitty” issue
Can you say more about this? That seems like a very valuable but completely different post, which I imagine would take an order of magnitude more effort than investigation into a single area.
Yeah, there’s definitely a version of this that is just a completely different post. I think Habryka had his own opinions here that might be worth sharing.
Some off the cuff thoughts:
Within scope for something “close to the original post”, I think it’d be useful to have:
clearer epistemic status tags for the different claims.
Which claims are based on out of date research? How old is the research?
Which are based on shoddy research?
What’s your credence for each claim?
More generally, how much stock should a startup founder place in this post? In your opinion, does the state of this research rise to the level of “you should most likely follow this post’s advice?” or is it more like “eh, read this post to get a sense of what considerations might be at play but mostly rely on your own thinking?”
Broader scope, maybe it’s own entire post (although I think there’s room for a “couple paragraphs version” and a “entire longterm research project” version)
Generally, what research do you wish had existed, that would have better informed you here?
Are there are particular experiments or case studies that seemed (relatively) easy to replicate, that just needed to be run again in the modern era with 21st century communication tech?
clearer epistemic status tags for the different claims....
I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post. If I was going to do that it would be on a per-paper basis: for each paper list the claims and how well supported they are.
Generally, what research do you wish had existed, that would have better informed you here?
This seems interesting and fun to write to me. It might also be worth going over my favorite studies.
I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post
Hard because of limitations on written word / UX, or intellectual difficulties with processing that class of information in the same pass that you process the synthesis type of information?
(Re: UX – I think it’d work best if we had a functioning side-note system. In the meanwhile, something that I think would work is to give each claim a rough classification of “high credence, medium or low”, including a link to a footnote that explains some of the detais)
Data points from papers can either contribute directly to predictions (e.g. we measured it and gains from colocation drop off at 30m), or to forming a model that makes predictions (e.g. the diagram). Credence levels for the first kind feel fine, but like a category error for model-born predictions . It’s not quite true that the model succeeds or fails as a unit, because some models are useful in some arenas and not in others, but the thing to evaluate is definitely the model, not the individual predictions.
I can see talking about what data would make me change my model and how that would change predictions, which may be isomorphic to what you’re suggesting.
Curated.
(It seemed important that Habryka not be the one to curate this piece, since he had commissioned it. But I independently quite liked it)
Several things I liked about this post:
It told me some concrete things about remote teams. In particular:
the notion that you should either go “fully remote” or “not remote”
the notion that the benefits of co-locating drop off after a literal radius which extends 30m.
It gave me some sense of how good the evidence on remote teams are (i.e. not very), while providing a bunch of links to followup if I wanted to get an even better sense.
LessWrong currently doesn’t feel like rewards serious scholarship as much as it should, so I’d like to generally reward it when it happens. I also think this post did a good job if combining short, easily readable takeaways with the more extensive background literature.
Object-level Musings on Peer Review
Note: the following is my personal best guesses about directions LW should go. Habryka disagrees significantly with at least some of the claims here — both on the object and meta levels.
This post was also jumped out significantly as… aspiring to higher epistemic standards than the median curated post. This led me to thinking about it through the lens of peer review (which I have previously mused about)
I ultimately want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.
Right now our highest reward is getting into the curated section, which… just isn’t actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at “extremely well written and extremely epistemically rigorous and extremely useful”, we would basically never be able to curate anything.
My current guess is that there should be a “higher than curated” level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once. Still, there is something significant about writing a post that is at least worth considering for that level.
This post is one of a few ones in the past few months that I’d be interested in seeing improved to meet that level. (Another recent example is Kaj’s sequence on Multi-Agent-Models).
I do think it’d involve some significant work to meet that bar. Things that I’m currently thinking of (not highly confident that any of this is the right thing, but showcasing what sort of improvements I’m imagining)
Someone doing some epistemic spot checks on the claims made here
Improving the presentation (right now it’s written in a kind of bare-bones notes format)
Dramatically improving the notes, to be more readable
Improving the diagram of elizabeth’s model of productivity so it’s easier to parse.
Orienting a bit more around the “the state of management research is shitty” issue. I think (low confidence) that a good practice for LessWrong, if we review a field and find that the evidence base is very shaky, it’d be good to reflect on what it would take to make the evidence less shaky. (This is beyond scope for what habryka originally commissioned, but feels fairly important in the context I’m thinking through here)
Is it worth putting all that work for this particular post? Dunno, probably not. But it seems worth periodically reflecting on how far the bar would be set, when comparing what LessWrong could ultimately be vs. what is necessary to in-practice be.
What about getting money involved? Even relatively small amounts can still confer prestige better than an additional tag or homepage section. It seems like rigorous well-researched posts like this are valuable enough that crowdfunding or someone like OpenPhil or CFAR could sponsor a best-post prize to be awarded monthly. If that goes well you could add incentives for peer-review.
Money might do the opposite. “I did all this work and all I got was… several dollars and cents”.
A small amount of money would do the opposite of conferring prestige; it would make the activity less prestigious than it is now.
My impression is that money can only lower prestige if the amount is low relative to an anchor.
For example a $3000 prize would be high prestige if it’s interpreted as an award, but low prestige if it’s interpreted as a salary.
cf. https://en.wikipedia.org/wiki/Knuth_reward_check
What makes this situation unusual is that being acknowledged by famous computer scientist Donald Knuth to have contributed something useful to one of his works is inherently prestigious; the check is evidence of that reward, not itself the reward. (Note that many of the checks do not even get cashed! A trophy showing that you fixed a bug in Knuth’s code is vastly more valuable than enough money to buy a plain slice of pizza.)
In contrast, Less Wrong is not prestigious. No one will be impressed to hear that you wrote a Less Wrong post. How likely do you think it is that someone who is paid some money for a well-researched LW post will, instead of claiming said money, frame the check and display it proudly?
I think you’re viewing intrinsic versus extrinsic reward as dichotomous rather than continuous. Knuth awards are on one end of the spectrum, salaries at large organizations are at the other. Prestige isn’t binary, and there is a clear interaction between prestige and standards—raising standards can itself increase prestige, which will itself make the monetary rewards more prestigious.
I don’t see where Said’s comment implies a dichotomous view of prestige. He simply believes the gap between LessWrong and Donald Knuth is very large.
Sure, but we can close the global prestige gap to some extent, and in the mean time, we can leverage in-group social prestige, as the current format implicitly does.
Can you say more about this? That seems like a very valuable but completely different post, which I imagine would take an order of magnitude more effort than investigation into a single area.
Yeah, there’s definitely a version of this that is just a completely different post. I think Habryka had his own opinions here that might be worth sharing.
Some off the cuff thoughts:
Within scope for something “close to the original post”, I think it’d be useful to have:
clearer epistemic status tags for the different claims.
Which claims are based on out of date research? How old is the research?
Which are based on shoddy research?
What’s your credence for each claim?
More generally, how much stock should a startup founder place in this post? In your opinion, does the state of this research rise to the level of “you should most likely follow this post’s advice?” or is it more like “eh, read this post to get a sense of what considerations might be at play but mostly rely on your own thinking?”
Broader scope, maybe it’s own entire post (although I think there’s room for a “couple paragraphs version” and a “entire longterm research project” version)
Generally, what research do you wish had existed, that would have better informed you here?
Are there are particular experiments or case studies that seemed (relatively) easy to replicate, that just needed to be run again in the modern era with 21st century communication tech?
I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post. If I was going to do that it would be on a per-paper basis: for each paper list the claims and how well supported they are.
This seems interesting and fun to write to me. It might also be worth going over my favorite studies.
Hard because of limitations on written word / UX, or intellectual difficulties with processing that class of information in the same pass that you process the synthesis type of information?
(Re: UX – I think it’d work best if we had a functioning side-note system. In the meanwhile, something that I think would work is to give each claim a rough classification of “high credence, medium or low”, including a link to a footnote that explains some of the detais)
Data points from papers can either contribute directly to predictions (e.g. we measured it and gains from colocation drop off at 30m), or to forming a model that makes predictions (e.g. the diagram). Credence levels for the first kind feel fine, but like a category error for model-born predictions . It’s not quite true that the model succeeds or fails as a unit, because some models are useful in some arenas and not in others, but the thing to evaluate is definitely the model, not the individual predictions.
I can see talking about what data would make me change my model and how that would change predictions, which may be isomorphic to what you’re suggesting.
The UI would also be a pain.