Where does the selection come from? Will the designers toss a really impressive AI for not getting reward on that one timestep? I think not.
I was talking about gradient descent here, not designers.
I was talking about gradient descent here, not designers.