[Question] How do top AI labs vet architecture/​algorithm changes?

How do labs working at or near the frontier assess major architecture and/​or algorithm changes before committing huge compute resources to try them out? For example, how do they assess stability and sample efficiency without having to do full-scale runs?