Evaluating the RCT is a chance to train the evaluation-muscle in a well-defined domain with feedback. I’ve generally found that the people who are best at evaluations in RCT’able domains, are better at evaluating the hard-to-evaluate claims as well.
Often the difficult to evaluate domains have ways of getting feedback, but if you’re not in the habit of looking for it, you’re less likely to find the creative ways to get data.
I think a much more common failure mode within this community, is that we get way overconfident beliefs about hard-to-evaluate domains, because there aren’t many feedback loops and we aren’t in the habit of looking for them.
Evaluating the RCT is a chance to train the evaluation-muscle in a well-defined domain with feedback. I’ve generally found that the people who are best at evaluations in RCT’able domains, are better at evaluating the hard-to-evaluate claims as well.
Evaluating the RCT is a chance to train the evaluation-muscle in a well-defined domain with feedback.
Yep, and I don’t advise people to ignore all RCTs.
I thought about discussing your point when I wrote the OP (along with other advantages of having a community that contains some trivia-collecting), but decided against it, because I suspect EAs and rats tend to misunderstand the nature of this advantage. I suspect most “we need to spend more time on fast-empirical-feedback-loop stuff even if it looks very low-VOI” is rationalizing the mistake described in the OP, rather than actually being about developing this skill.
In particular, if you’re just trying to build skill (rather than replacing a hard question with a superficially related easy one), then I think it’s often actively bad to build this skill in a domain that’s related to the one you care about. EAs and rats IMO should spend more time collecting trivia about physics, botany, and the history of Poland (as opposed to EA topics), insofar as the goal is empiricism skill-building. You’re less liable to trick yourself, then, into thinking that the new data points directly bear on the question you’re not currently working on.
I think a much more common failure mode within this community, is that we get way overconfident beliefs about hard-to-evaluate domains, because there aren’t many feedback loops and we aren’t in the habit of looking for them.
Maybe? I think the rationality community is pretty good at reasoning, and I’m not sure I could predict the direction of their error here. With EAs, I have an easier time regularly spotting clear errors, and they seem to cluster in a similar direction (the one described in https://equilibriabook.com/toc).
I agree that rationalists spend more time thinking about hard-to-evaluate domains, and that this makes some failures likelier (while making others less likely). But I also see rats doing lots of deep-dive reviews of random-seeming literatures (and disproportionately reading blogs like ACX that love doing those deep dives), exploring lots of weird and random empirical domains out of curiosity, etc.
It’s not clear to me what the optimal level of this is (for purposes of skill-building), or where the status quo falls relative to the optimum. (What percent of LW’s AI-alignment-related posts would you replace with physics lit reviews and exercises?)
Evaluating the RCT is a chance to train the evaluation-muscle in a well-defined domain with feedback. I’ve generally found that the people who are best at evaluations in RCT’able domains, are better at evaluating the hard-to-evaluate claims as well.
Often the difficult to evaluate domains have ways of getting feedback, but if you’re not in the habit of looking for it, you’re less likely to find the creative ways to get data.
I think a much more common failure mode within this community, is that we get way overconfident beliefs about hard-to-evaluate domains, because there aren’t many feedback loops and we aren’t in the habit of looking for them.
Sounds confounded by general cognitive ability.
Yep, and I don’t advise people to ignore all RCTs.
I thought about discussing your point when I wrote the OP (along with other advantages of having a community that contains some trivia-collecting), but decided against it, because I suspect EAs and rats tend to misunderstand the nature of this advantage. I suspect most “we need to spend more time on fast-empirical-feedback-loop stuff even if it looks very low-VOI” is rationalizing the mistake described in the OP, rather than actually being about developing this skill.
In particular, if you’re just trying to build skill (rather than replacing a hard question with a superficially related easy one), then I think it’s often actively bad to build this skill in a domain that’s related to the one you care about. EAs and rats IMO should spend more time collecting trivia about physics, botany, and the history of Poland (as opposed to EA topics), insofar as the goal is empiricism skill-building. You’re less liable to trick yourself, then, into thinking that the new data points directly bear on the question you’re not currently working on.
Maybe? I think the rationality community is pretty good at reasoning, and I’m not sure I could predict the direction of their error here. With EAs, I have an easier time regularly spotting clear errors, and they seem to cluster in a similar direction (the one described in https://equilibriabook.com/toc).
I agree that rationalists spend more time thinking about hard-to-evaluate domains, and that this makes some failures likelier (while making others less likely). But I also see rats doing lots of deep-dive reviews of random-seeming literatures (and disproportionately reading blogs like ACX that love doing those deep dives), exploring lots of weird and random empirical domains out of curiosity, etc.
It’s not clear to me what the optimal level of this is (for purposes of skill-building), or where the status quo falls relative to the optimum. (What percent of LW’s AI-alignment-related posts would you replace with physics lit reviews and exercises?)