rogersbacon comments on Randomness in Science

rogersbacon 9 Dec 2021 14:31 UTC
1 point
Your point is well taken, and we should definitely keep in mind that randomness can also create perverse incentives and can easily be overdone. However, I would argue that there is virtually no randomness in science now and ample evidence that we are bad at evaluating grants, papers, applicants and are generally overly conservative when we do evaluate (see Conservatism in Science for a review). In rare cases, I might advocate for pure randomness but, like you suggest, I think some kind of mixed strategy is probably the way to go in most cases. For example, with grants we can imagine a strategy where there is a quick review to rule out obvious nonsense and then maybe place grants in high quality and low quality with the number of slots allocated to those categories accordingly (you could also just limit people to one submission to get rid of spamming problem).
A few examples of us being bad at evaluating things:
“I just did a retrospective analysis of 2014 NeurIPS … There was no correlation between reviewer quality scores and paper’s eventual impact.”
https://twitter.com/lawrennd/status/1406380063596089346?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1406380063596089346%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fwww.theseedsofscience.org%2Fmanifesto
“Analysing data from 4,000 social science grant proposals and 15,000 reviews, this paper illustrates how the peer-review scores assigned by different reviewers have only low levels of consistency (a correlation between reviewer scores of only 0.2). From: Are peer-reviews of grant proposals reliable? An analysis of Economic and Social Research Council (ESRC) funding applications
For hiring decisions, it might be even worse—is this person truly a better scientist or did they just happen to land in a more productive research lab for their PhD? Will this person make a better graduate student or did they just go to a better undergraduate college? I would advocate for a threshold (we are fine with hiring any of these people) and then randomness in some hiring situations.
- gwern 9 Dec 2021 17:17 UTC
  11 points
  Parent
  One good question would be what kinds of randomness are useful. “Greatness cannot be planned”, but there’s still a lot of different plans going on. Obviously, there are countless ways to ‘add randomness to science’, differing in how much randomness (both in distribution and size of said distribution—do we want ‘randomness’ which looks more like normal noise or is heavy tails key?), what level the randomness is applied at (inside an experiment, the experiment, the scientist, theories of subject, the subject, individual labs or colleges, community, country...?), how many times it’s applied and so on. In evolutionary computation, for example, how and how much randomness you use is practically the entire area of research: how much do you mutate individuals, how many populations, how do you intermix the populations, how do you reintroduce old mutants, how hard do you select post-mutation, and if you don’t tune this right, it may not work at all, while a well-tuned solution will rapidly home in on a diversity of excellent results. We often observe that the solutions found by genetic algorithms, or NNs, or cats, are strange, perverse, unexpected, and trigger a reaction of ‘how did it come up with that?‘; one reason is just that they are very thorough about exploring the possibility space, where a human would have long since gotten bored, said “this is stupid”, and moved on—it was stupid, but it wasn’t stupid enough, and if they had persisted long enough, it would’ve wrapped around from idiocy to genius. Our satisficing nature undermines our search for truly novel solutions; we aren’t inhumanly patient enough to find them. There are also many examples of people solving problems they didn’t know were supposed to be hard, like the famous Dantzig one, but it’s been noted that just knowing that a problem has been solved is sometimes enough to trigger a new solution (eg the critical mass of an atomic bomb—the Nazi scientists ‘knew’ it was big, but once they heard about Hiroshima, they were immediately able to fix their mistake; Chollet claims that in Kaggle competitions, merely seeing a competitor jump is enough to trigger a wave of improvements, even without knowing anything else). The weird part about this trick is, as Manuel Blum notes, “you can always give it to yourself”, as a cheap motivational hack well worth one’s while… so why don’t we?
  
  This all sounds like classic explore vs exploit territory: most scientists are doing mostly just epsilon-greedy-style exploration where one knob is, fairly arbitrarily, tweaked at random, whereas a lot of progress comes from bold giant leaps into the unknown by a marginal thinker or theory. ‘Deep exploration’ to borrow a DRL term: not jittering one action at a time inside episodes, but constructing an agent with a ‘hypothesis’ about the environment, and letting it explore deeply to the end of the game, possibly discovering something totally new. Tweaking a good strategy usually produces a worse strategy; and averaging two good strategies, a horrible strategy—like tossing a hot grilled steak and a scoop of ice cream into a blender, two delicious flavors that decidedly do not go great together. (We probably don’t want to randomize scientists’ brains so that some are convinced that the earth is flat: that’s too random. It has to be more targeted. Imagine if you could copy Einstein and brainwash each copy: one copy is utterly irrationally convinced that the ether exists, and the other is equally fanatically convinced that it doesn’t exist; send them off for a decade, then force them into an adversarial collaboration where they generate their best predictions and a decisive experiment, and the physics community evaluates the results. And you could do this for every research topic. Things might go a lot faster!)
  
  https://www.gwern.net/docs/reinforcement-learning/exploration/index https://www.gwern.net/notes/Small-groups https://www.gwern.net/Timing https://www.gwern.net/Backstop#internet-community-design https://www.gwern.net/reviews/Bakewell#social-contagion
  - rogersbacon 10 Dec 2021 13:56 UTC
    1 point
    Parent
    “We often observe that the solutions found by genetic algorithms, or NNs, or cats, are strange, perverse, unexpected, and trigger a reaction of ‘how did it come up with that?’; one reason is just that they are very thorough about exploring the possibility space”
    Do you have any specific examples in mind here that you are willing to share? None are coming to mind off the top of my head and I’d love to have some examples for future reference.
    - gwern 10 Dec 2021 16:15 UTC
      6 points
      Parent
      https://www.gwern.net/Tanks#alternative-examples wasn’t really intended to compile funny cat stories, but should help you out in terms of perverse creativity like the famous radio circuits.
      - rogersbacon 10 Dec 2021 17:50 UTC
        1 point
        Parent
        Thanks
  - rogersbacon 9 Dec 2021 21:08 UTC
    1 point
    Parent
    Ha I like the Einstein example! I think about the “bold leaps” thing a lot—we may be in kind of “epistemic hell” with respect to certain ideas/theories i.e. all small steps in that direction will seem completely false/irrational (the valley between us and the next peak is deep and wide). Maybe not perfect but I think the problem of inheritance as you describe in the Bakewell article fits as an example here. Heredity was much more complex than we thought and the problem was complicated by the fact that we had lots of wrong but vaguely reasonable ideas that came from essentially mythical figures like Aristotle. The idea that we should study a very simple system and collect huge amounts of data until a pattern emerges and then go from there instead of armchair theorizing was kind of a crazy idea, which is why a monk was the one to do it and no one realized how important it was until 40 years later.
    The question is how do we create individuals that are capable of making huge jumps in knowledge space and environments that encourage them to do so. Anything that sounds super reasonable is probably not radical enough (which is why this is so difficult). Like you say, it can’t be too crazy, but we need people who will go incredibly far in one direction while starting with a premise that is highly speculative but not outright wrong. One example might be panpsychism—we need an Einstein who takes panpsychism as brute fact and then attempts to reconstruct physics from there. My own wild offering is that ideas are alive, not in the trivial sense of a meme, but as complex spatiotemporal organisms, or maybe they are endosymbionts that are made of consciousness in the same way we are made of matter (see Ideas are Alive and You are Dead). Before the microscope we couldn’t really conceive how a life form could be that small, maybe there is something like that going on here as well and new tools/theories will lead to the discovery of an entirely new domain of life. Obviously this is crazy but maybe this is an example of the general flavor of crazy we need to explore.
    …one reason is just that they are very thorough about exploring the possibility space, where a human would have long since gotten bored, said “this is stupid”, and moved on—it was stupid, but it wasn’t stupid enough, and if they had persisted long enough, it would’ve wrapped around from idiocy to genius. Our satisficing nature undermines our search for truly novel solutions; we aren’t inhumanly patient enough to find them.
    One reason that people might persist in something way past boredom or reasonable justification is religious faith or some kind of irrational conviction arising from a spiritual experience. From a different angle, Tyler Cowen also offers some thoughts on why the important thinkers of the future will be religious:
    Third, religious thinkers arguably have more degrees of freedom. I don’t mean to hurt anybody’s feelings here, but…how shall I put it? The claims of the religions are not so closely tied to the experimental method and the randomized control trial. (Narrator: “Neither are the secular claims!”) It would be too harsh to say “they can just make stuff up,” but…arguably there are fewer constraints. That might lead to more gross errors and fabrications in the distribution as a whole, but also more creativity in the positive direction. And right now we seem pretty hungry for some breaks in the previous debates, even if not all of those breaks will be for the better.
    I don’t think Mendel was particularly inspired by his religious faith to study heredity (I might be wrong) but it certainly didn’t stop him and in the broad sense it enabled him to be an outsider who could dedicate extended study to something seemingly trivial. As you pointed towards, being an outsider is crucial if someone is to take these kinds of bold leaps. Among other things, being an insider makes it harder to get past what you described at the end of the Origins of Innovation article:
    Perhaps there is some sort of psychological barrier, where the mind flinches at any suggestion bubbling up from the subconscious that conflicts with age-old tradition or with higher-status figures. Should any new ideas still manage to come up, they are suppressed; “don’t rock the boat”, don’t stand out (“the innovator has for enemies all those who have done well under the old conditions”)
    This is the fundamental reasoning behind an article I wrote that was recently published in New Ideas in Psychology – “Amateur hour: Improving knowledge diversity in psychological and behavioral science by harnessing contributions from amateurs” (author access link). Amateurs can think and do research in ways that professionals can’t by virtue of not facing the incentives and constraints that come with having a career in academia. We identify six “blind spots” in academia that amateurs might focus on – long-term research, interdisciplinary, speculative, uncommon or taboo topics, basic observational research, and aimless projects). This led us to write:
    Taken together, our discussion of blind spots highlights one overarching direction in “research-space” that may be especially promising: long, aimless, speculative, and interdisciplinary research on uncommon or taboo subjects. Out of all amateur contributions to sciences so far, Darwin’s achievements may be the primary exemplar of this type of endeavor. As aforementioned, at the time of his departure on the HMS Beagle in 1831 he was an independent scientist—a 22-year-old Cambridge graduate with no advanced publications who had to pay his own way on the voyage (Bowlby, 1990; Keynes & Darwin, 2001). Darwin’s work on evolution certainly took a long time to develop (the Beagle’s voyage took 5 years and he did not publish On the Origin of Species until 23 years after he returned). It was aimless in the sense that he did not set out from the beginning to develop a theory of evolution. His work was highly interdisciplinary (Darwin drew on numerous fields within the biological sciences in addition to geology and economics), was the culmination of a huge amount of basic observational work, and was not necessarily an experimental contribution (though he did make those as well), but primarily theoretical (and sometimes more speculative) in nature. Darwin’s theories were taboo in the sense that they went against the prevailing theological ideas of the time and caused significant controversy (and still do). We speculate that there may one day be a “Charles Darwin of the Mind” who follows a similar path. Indeed, it seems that the state of theorizing in psychology today is at an early stage comparable to evolutionary theorizing at the time of Darwin (Muthukrishna & Henrich, 2019), and the time may be ripe for an equally transformative amateur contribution in psychology. We hope that this paper provides the smallest nudge in this direction.
    I actually just posted about the article here because we mention LessWrong as an example of a community where amateurs make novel research contributions in psychology – “LessWrong discussed in New Ideas in Psychology article”.
    So if I had to guess – the next Darwin/Einstein/Newton will be an amateur/outsider, religious or for some reason have some weird idea that they pursue to the extreme, and have some kind of life circumstance that allows them to do this (maybe like Darwin they come from money).
    I also touch on this theme in my article “The Myth of Myth of the Lone Genius”. Briefly, we have put too much cultural emphasis in science on incrementalism, on standing on the shoulders of giants. Sure, most discoveries come from armies of scientists making small contributions, but we need to also cultivate the belief that you can make a radical discovery by yourself if you try really really hard. I also quote you at the beginning of the article.
    “The Great Man theory of history may not be truly believable and Great Men not real but invented, but it may be true we need to believe the Great Man theory of history and would have to invent them if they were not real.”
    - deepthoughtlife 9 Dec 2021 23:59 UTC
      −1 points
      Parent
      I believe you do make one substantial error in this post. It isn’t that academics can’t do it, it’s that they won’t. You see, if you say can’t, you are inherently supposing the incentives can’t be changed, but the structure of these incentives is not fixed as they are now. They can change, and they will change, though likely not in a useful way anytime soon.
      - rogersbacon 10 Dec 2021 13:53 UTC
        1 point
        Parent
        I’m a little confused by what you are referring to here so if you are willing to spell it out I would appreciate it but no worries either way. Many very fascinating ideas in your other comment, I’ll try to respond in a day or two.
- deepthoughtlife 9 Dec 2021 23:54 UTC
  −1 points
  Parent
  I admit that the details of how science works these days is far from my area of expertise. I am neither in science, nor a dedicated layman. I informally experiment with things all the time (as do most intellectual types, I imagine), but not in a rigorous way.
  I agree that people tend to be bad at evaluating things, but it isn’t just biased thinking; there is true randomness in a number of the decisions that go into an evaluation. Both bias and randomness are noise in the signal. I don’t believe impact is a good metric for actual quality (widely cited does not mean that each of those references was actually valuable.), though I don’t have something better to replace it with. As far as inter-rater reliability goes, 0.2 does seem quite low, but I’m sure it could be substantially improved (perhaps with teams of professional reviewers instead of simply other scientists in the field? That does, of course, have it’s own sources of bias, but you can use both.)
  I don’t think you can eliminate the spamming problem by only allowing a single entry. They can enter into many different lotteries as they like if the lotteries become popular, with minimal effort, and you wouldn’t want to rule out a person participating in multiple sequential lotteries of yours (unless you knew they were making garbage proposals . . . which the limited review would be much less likely to catch.)
  Sometimes random noise is good, such as in simulated annealing (is simulated annealing widely known?), but they make sure to tamp down the noise quite a bit before doing most of the search for solutions. This is precisely what the solution I suggested would do if you still want randomness. Additionally, this search could be run through a number of times by separate processes, to have greater chance of finding the signal while still using the noise.
  Another thing that could be borrowed from computer science for science in general is the idea of a depth first search. This is what I think you are really looking for here. In depth first search adapted to science, a scientist would follow their ideas to their conclusions, again and again, before looking around. This would likely ensure that they were well of the beaten path, but could be done rigorously, doing good science along the way. This would likely result in fewer papers, but more impactful ones. This is what many scientists did in the old days, back when scientific progress was much more individual, and progress was faster, but things like ‘publish or perish’ are not very compatible.
  The strengths of going depth first are two fold. One, original work would happen by default. Two, there would be much less that each individual scientist would have to keep in mind to advance the state of the art, much like the advantage of ridiculously narrow sub-specialties, but without the narrow focus being so insular. There is simply too much to process to advance the state of the art, and that, I suspect, is largely why people don’t.
  The disadvantages are simple as well. First, a large number of scientists would end up duplicating each other’s work unintentionally . . . but that isn’t a big problem (built in replication, though very inefficiently so.) Second, and much more importantly, a very large fraction of scientists might never actually contribute something of even the slightest value. Some would say that is true now, though I think their contribution just tend to be small. (Depth first search will miss a great many things that are just slightly off the path taken, though a slight variation, depth limited search would work better at that. It is like a depth limited search, but at a preset depth, you switch to checking out the other possibilities, starting as far down along the path as is new. This is usually implemented recursively. This can be improved further with iterated deepening search, which changes the depth limit when the broader search failed too.) Third, it would require a change in culture (back to valorizing the lone genius, or the small team.)
  For hiring, I don’t think that the prestige of the school/lab is a good signal, but rather the quality of their theories and experimental designs. I have heard ideas that the best way to determine if someone could do a job is to have them do it. You could have them explain one of their theories, and explain how they would test one of yours. If it sounds good, let them try (perhaps with a shorter term contract that can become a long term one). I’ve never hired someone, so I don’t have too much to say about whether my ideas there would be useful.
  - rogersbacon 10 Dec 2021 18:04 UTC
    1 point
    Parent
    You are right about the use of impact as a metric, definitely not perfect, and I think both of those sources probably oversell how poor scientific evaluation is in general. Some of the problem is that people are not incentivized to really care that much and they don’t specialize in grant/paper evaluation, the idea of having “professional reviewers” is interesting, but not sure how practically achievable it is.
    I hadn’t heard about the idea of depth first search but it is exactly what I am talking about and you explained it very well, thank you for sharing.