See the edit (especially for your first suggestion): “decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can’t generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point.”
See the edit (especially for your first suggestion): “decide on the length of each episode, and how the outcome is calculated. The Oracle is run once an episode only (and other Oracles can’t generally be used on the same problem; if you want to run multiple Oracles, you have to justify why this would work), and has to get objective/loss/reward by the end of that episode, which therefore has to be estimated in some way at that point.”