Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high πn0 probability (i.e., as you say, π0 is the data), while the learning bound or loss reduction is given for π.
Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high πn0 probability (i.e., as you say, π0 is the data), while the learning bound or loss reduction is given for π.