I admit that the details of how science works these days is far from my area of expertise. I am neither in science, nor a dedicated layman. I informally experiment with things all the time (as do most intellectual types, I imagine), but not in a rigorous way.
I agree that people tend to be bad at evaluating things, but it isn’t just biased thinking; there is true randomness in a number of the decisions that go into an evaluation. Both bias and randomness are noise in the signal. I don’t believe impact is a good metric for actual quality (widely cited does not mean that each of those references was actually valuable.), though I don’t have something better to replace it with. As far as inter-rater reliability goes, 0.2 does seem quite low, but I’m sure it could be substantially improved (perhaps with teams of professional reviewers instead of simply other scientists in the field? That does, of course, have it’s own sources of bias, but you can use both.)
I don’t think you can eliminate the spamming problem by only allowing a single entry. They can enter into many different lotteries as they like if the lotteries become popular, with minimal effort, and you wouldn’t want to rule out a person participating in multiple sequential lotteries of yours (unless you knew they were making garbage proposals . . . which the limited review would be much less likely to catch.)
Sometimes random noise is good, such as in simulated annealing (is simulated annealing widely known?), but they make sure to tamp down the noise quite a bit before doing most of the search for solutions. This is precisely what the solution I suggested would do if you still want randomness. Additionally, this search could be run through a number of times by separate processes, to have greater chance of finding the signal while still using the noise.
Another thing that could be borrowed from computer science for science in general is the idea of a depth first search. This is what I think you are really looking for here. In depth first search adapted to science, a scientist would follow their ideas to their conclusions, again and again, before looking around. This would likely ensure that they were well of the beaten path, but could be done rigorously, doing good science along the way. This would likely result in fewer papers, but more impactful ones. This is what many scientists did in the old days, back when scientific progress was much more individual, and progress was faster, but things like ‘publish or perish’ are not very compatible.
The strengths of going depth first are two fold. One, original work would happen by default. Two, there would be much less that each individual scientist would have to keep in mind to advance the state of the art, much like the advantage of ridiculously narrow sub-specialties, but without the narrow focus being so insular. There is simply too much to process to advance the state of the art, and that, I suspect, is largely why people don’t.
The disadvantages are simple as well. First, a large number of scientists would end up duplicating each other’s work unintentionally . . . but that isn’t a big problem (built in replication, though very inefficiently so.) Second, and much more importantly, a very large fraction of scientists might never actually contribute something of even the slightest value. Some would say that is true now, though I think their contribution just tend to be small. (Depth first search will miss a great many things that are just slightly off the path taken, though a slight variation, depth limited search would work better at that. It is like a depth limited search, but at a preset depth, you switch to checking out the other possibilities, starting as far down along the path as is new. This is usually implemented recursively. This can be improved further with iterated deepening search, which changes the depth limit when the broader search failed too.) Third, it would require a change in culture (back to valorizing the lone genius, or the small team.)
For hiring, I don’t think that the prestige of the school/lab is a good signal, but rather the quality of their theories and experimental designs. I have heard ideas that the best way to determine if someone could do a job is to have them do it. You could have them explain one of their theories, and explain how they would test one of yours. If it sounds good, let them try (perhaps with a shorter term contract that can become a long term one). I’ve never hired someone, so I don’t have too much to say about whether my ideas there would be useful.
You are right about the use of impact as a metric, definitely not perfect, and I think both of those sources probably oversell how poor scientific evaluation is in general. Some of the problem is that people are not incentivized to really care that much and they don’t specialize in grant/paper evaluation, the idea of having “professional reviewers” is interesting, but not sure how practically achievable it is.
I hadn’t heard about the idea of depth first search but it is exactly what I am talking about and you explained it very well, thank you for sharing.
I admit that the details of how science works these days is far from my area of expertise. I am neither in science, nor a dedicated layman. I informally experiment with things all the time (as do most intellectual types, I imagine), but not in a rigorous way.
I agree that people tend to be bad at evaluating things, but it isn’t just biased thinking; there is true randomness in a number of the decisions that go into an evaluation. Both bias and randomness are noise in the signal. I don’t believe impact is a good metric for actual quality (widely cited does not mean that each of those references was actually valuable.), though I don’t have something better to replace it with. As far as inter-rater reliability goes, 0.2 does seem quite low, but I’m sure it could be substantially improved (perhaps with teams of professional reviewers instead of simply other scientists in the field? That does, of course, have it’s own sources of bias, but you can use both.)
I don’t think you can eliminate the spamming problem by only allowing a single entry. They can enter into many different lotteries as they like if the lotteries become popular, with minimal effort, and you wouldn’t want to rule out a person participating in multiple sequential lotteries of yours (unless you knew they were making garbage proposals . . . which the limited review would be much less likely to catch.)
Sometimes random noise is good, such as in simulated annealing (is simulated annealing widely known?), but they make sure to tamp down the noise quite a bit before doing most of the search for solutions. This is precisely what the solution I suggested would do if you still want randomness. Additionally, this search could be run through a number of times by separate processes, to have greater chance of finding the signal while still using the noise.
Another thing that could be borrowed from computer science for science in general is the idea of a depth first search. This is what I think you are really looking for here. In depth first search adapted to science, a scientist would follow their ideas to their conclusions, again and again, before looking around. This would likely ensure that they were well of the beaten path, but could be done rigorously, doing good science along the way. This would likely result in fewer papers, but more impactful ones. This is what many scientists did in the old days, back when scientific progress was much more individual, and progress was faster, but things like ‘publish or perish’ are not very compatible.
The strengths of going depth first are two fold. One, original work would happen by default. Two, there would be much less that each individual scientist would have to keep in mind to advance the state of the art, much like the advantage of ridiculously narrow sub-specialties, but without the narrow focus being so insular. There is simply too much to process to advance the state of the art, and that, I suspect, is largely why people don’t.
The disadvantages are simple as well. First, a large number of scientists would end up duplicating each other’s work unintentionally . . . but that isn’t a big problem (built in replication, though very inefficiently so.) Second, and much more importantly, a very large fraction of scientists might never actually contribute something of even the slightest value. Some would say that is true now, though I think their contribution just tend to be small. (Depth first search will miss a great many things that are just slightly off the path taken, though a slight variation, depth limited search would work better at that. It is like a depth limited search, but at a preset depth, you switch to checking out the other possibilities, starting as far down along the path as is new. This is usually implemented recursively. This can be improved further with iterated deepening search, which changes the depth limit when the broader search failed too.) Third, it would require a change in culture (back to valorizing the lone genius, or the small team.)
For hiring, I don’t think that the prestige of the school/lab is a good signal, but rather the quality of their theories and experimental designs. I have heard ideas that the best way to determine if someone could do a job is to have them do it. You could have them explain one of their theories, and explain how they would test one of yours. If it sounds good, let them try (perhaps with a shorter term contract that can become a long term one). I’ve never hired someone, so I don’t have too much to say about whether my ideas there would be useful.
You are right about the use of impact as a metric, definitely not perfect, and I think both of those sources probably oversell how poor scientific evaluation is in general. Some of the problem is that people are not incentivized to really care that much and they don’t specialize in grant/paper evaluation, the idea of having “professional reviewers” is interesting, but not sure how practically achievable it is.
I hadn’t heard about the idea of depth first search but it is exactly what I am talking about and you explained it very well, thank you for sharing.