I agree, but I think it’s out of scope for what I’m doing here — the validity and novelty of an attempted contribution can at least in principle be analyzed fairly objectively, but the importance seems much fuzzier and more subjective.
The idea of seeking objectivity here is not helpful if you want to contribute to the scientific project. I think that Larry McEnerney is good at explaining why that’s the case, but you can also read plenty of Philosophy and History of Science on why that is.
If you want to contribute to the scientific project thinking about how what you doing relates to the scientific project is essential.
I’m not sure what you mean with “validity” and whether it’s a sensible thing to talk about. If you try to optimize for some notion of validity instead of optimizing for doing something that’s valuable to scientists, you doing something like trying to guess the teachers password. You are optimizing for form instead of optimizing for actually creating something valuable.
If you innovate in the method you are using in a way that violates some idea of conventional “validity” but you are providing value, you are doing well. Against Method wasn’t accidently chosen as title. When Feynman was doing his drawings the first reaction of his fellow scientists was that they weren’t “real science”. He ended up getting his Nobel Prize for them.
As far as novelty goes, the query you are proposing isn’t really a good way to determine novelty. To check novelty a better way is not to ask “Is this novel?” but “Is there prior art here?” Today, a good way to check that to run deep research reports. If your deep research request comes back with “I didn’t find anything” that a better signal for novelty than an question asking whether something is novel being answered with “yes”. LLMs don’t like to answer “I didn’t find anything if you let them run deep research request, they are much more willing to say something is novel when you ask them whether it’s novel.
It’s no good if you change your mind about the meaning of the experiment after you run it :)
Actually, a lot of scientific progress happens that way. You run experiments that they have results that you surprise you. You think about how to explain the results that you got and that brings you a better understanding of the problem domain you are interacting with.
If you want to create something intellectually valuable you need to go through the intellectual work of engaging with counter arguments to what you are doing. If an LLM provides a criticism of your work, that criticism might be valid or it isn’t. If what you are doing is highly complex, the LLM might now understand what you are doing and that doesn’t mean that your idea is doomed. Maybe, you can flesh out your idea more clearly. Even if you can’t and the idea provides value it’s still a good idea.
There the saying that the key to doing a successful startup is to find an idea that looks stupid but that isn’t. A startup is successful when it pursues a path that other people reject to pursue but that’s valuable.
In many cases it’s probably the same for scientific breakthroughs. The ideas behind them are not pursued because the experts in the field believe that the ideas are not promising on the surface.
A lot of the posts that you find on r/LLMPhysics and rejected LW posts have the feature of sounding smart on the surface to some lay people (the person interacting with the LLM), but that don’t work. LLMs might have the feature of giving the kind of idea that sounds smart to lay people at the surface the benefit of the doubt but the kind of idea that sounds stupid to everyone on the surface evaluation no benefit of doubt.
You can get a PHD in theoretical physics without developing ideas that allow you to make falsifiable predictions.
Making falsifiable predictions is one way to create value for other scientists but it’s not the only one. Larry brings the example of “There are 20 people in this classroom” as a theory, that can be novel (nobody in the literature said anything about the amount of people in this classroom) and makes falsifiable predictions (everyone who counts, will count 20 people) but is completely worthless.
Your standard has both the problem that people whom the physics community gives PHDs don’t meet it and also that plenty of work that does meet it is worthless.
I think the general principle should be that before you try to contact a researcher with your idea of a breakthrough, you should let the LLM simulate the answer of that researcher beforehand and iterate based on the objections that the LLM predicts to come from the researcher.