Let’s view each accessible action space A(s) as the set of randomized policies over V(A(s)).
Seems worth to clarify that this representation is non-unique: multiple distribution over V(A) can correspond to the same point in A.
Seems worth to clarify that this representation is non-unique: multiple distribution over V(A) can correspond to the same point in A.