Another design: imitation learning. Generally, there seems to be a pattern of: policies which aren’t selected for on the basis of maximizing some kind of return.
Another design: imitation learning. Generally, there seems to be a pattern of: policies which aren’t selected for on the basis of maximizing some kind of return.