Wouldn’t that be “Optimizing for the output of a grader which evaluates plans”, where one of the items on the grading rubric is “This plan is in-distribution”?
Maybe if you have a good measure of being in-distribution, which itself is a nontrivial problem.
Wouldn’t that be “Optimizing for the output of a grader which evaluates plans”, where one of the items on the grading rubric is “This plan is in-distribution”?
Maybe if you have a good measure of being in-distribution, which itself is a nontrivial problem.