Without commenting on the proposal itself; I think the term “eval test set” is clearer for this purpose than “closed source eval”.
agreed
Without commenting on the proposal itself; I think the term “eval test set” is clearer for this purpose than “closed source eval”.
agreed