It seems to me that if you expect that the results of your experiment can be useful in and generalized to other situations, then it has to be possible to replicate it. Or to put it another way, if the principle you discovered is useful for more than running the same program with a different seed, shouldn’t it be possible to test it by some means other than running the same program with a different seed?
Or to put it another way, if the principle you discovered is useful for more than running the same program with a different seed, shouldn’t it be possible to test it by some means other than running the same program with a different seed?
Certainly. But even if the results are not useful and can’t be generalized to other situations, it’s probably possible to replicate it, in a way that’s slightly different from running the same program with a different seed. (E.g. you could run the same algorithm on a different environment that was constructed to be the kind of environment that algorithm could solve.) So this wouldn’t work as a test to distinguish between useful results and non-useful results.
It seems to me that if you expect that the results of your experiment can be useful in and generalized to other situations, then it has to be possible to replicate it. Or to put it another way, if the principle you discovered is useful for more than running the same program with a different seed, shouldn’t it be possible to test it by some means other than running the same program with a different seed?
Certainly. But even if the results are not useful and can’t be generalized to other situations, it’s probably possible to replicate it, in a way that’s slightly different from running the same program with a different seed. (E.g. you could run the same algorithm on a different environment that was constructed to be the kind of environment that algorithm could solve.) So this wouldn’t work as a test to distinguish between useful results and non-useful results.
Relevant recent Andrew Gelman blog post