The third one was a typo which I just fixed. I have also changed it to use “base policy” everywhere to be consistent, although this may change depending on what terminology is most common in an ML context, which I’m not sure of.
The third one was a typo which I just fixed. I have also changed it to use “base policy” everywhere to be consistent, although this may change depending on what terminology is most common in an ML context, which I’m not sure of.