Therefore, if epsilon is small enough (comparable to the probability of hitting an escape message at random), then the learning gets extremely slow and the oracle might shoot at the escape action at random.
The escape action being randomly called should not be a problem if it is a text string that is only read if r=1, and is ineffectual otherwise...
The escape action being randomly called should not be a problem if it is a text string that is only read if r=1, and is ineffectual otherwise...
The string is read with probability 1-ϵ