LW server reports: not allowed.
This probably means the post has been deleted or moved back to the author's drafts.
r(q):=max(p1,…,pk)∈ϕ−1t(q)n∑k=1qk⋅r′k(pk).
What is qk? Also, we should allow adding some valid reward function of ~t.
kth element of q
P(xi) is a polytope with P(xi)⊆ΔA, corresponding to allowed action distributions at that state.
I think it’s mathematically cleaner to get rid of A and have those be abstract polytopes.
Sounds good
What is qk? Also, we should allow adding some valid reward function of ~t.
kth element of q
I think it’s mathematically cleaner to get rid of A and have those be abstract polytopes.
Sounds good