Twitter points me to an instance of this with T5, Figure 6/Table 9: at the lowest tested level of 64 repeats, there is slight downstream benchmark harm but still a lot less than I would’ve guessed.
Not sure how strongly to take this: those benchmarks are weak, not very comprehensive, and wouldn’t turn up harm to interesting capabilities like few-shots or emergent ones like inner-monologues; but on the other hand, T5 is also a pretty strong model-family, was SOTA in several ways at the time & the family regularly used in cutting-edge work still, and so it’s notable that it’s harmed so little.
Twitter points me to an instance of this with T5, Figure 6/Table 9: at the lowest tested level of 64 repeats, there is slight downstream benchmark harm but still a lot less than I would’ve guessed.
Not sure how strongly to take this: those benchmarks are weak, not very comprehensive, and wouldn’t turn up harm to interesting capabilities like few-shots or emergent ones like inner-monologues; but on the other hand, T5 is also a pretty strong model-family, was SOTA in several ways at the time & the family regularly used in cutting-edge work still, and so it’s notable that it’s harmed so little.