1. The error on real data will be similar to the error on the training set + epsilon, where epsilon is roughly proportional to (datapoints / VC dimension.) This is the one I linked above.
2. The error on real data will be similar to the error of the best hypothesis in the hypothesis class, with similar proportionality
3. Special case of 2 – if the true hypothesis is in the hypothesis class, then the absolute error will be < epsilon (since the absolute error is just the difference from the true, best hypothesis.)
3 is probably the one you’re thinking of, but you don’t need the hypothesis to be in the class.
A good exposition of the related theorems is in Chapter 6 of Understanding Machine Learning (https://www.amazon.com/Understanding-Machine-Learning-Theory-Algorithms/dp/1107057132/ref=sr_1_1?crid=2MXVW7VOQH6FT&keywords=understanding+machine+learning+from+theory+to+algorithms&qid=1562085244&s=gateway&sprefix=understanding+machine+%2Caps%2C196&sr=8-1)
There are several related theorems. Roughly:
1. The error on real data will be similar to the error on the training set + epsilon, where epsilon is roughly proportional to (datapoints / VC dimension.) This is the one I linked above.
2. The error on real data will be similar to the error of the best hypothesis in the hypothesis class, with similar proportionality
3. Special case of 2 – if the true hypothesis is in the hypothesis class, then the absolute error will be < epsilon (since the absolute error is just the difference from the true, best hypothesis.)
3 is probably the one you’re thinking of, but you don’t need the hypothesis to be in the class.