If the agent is sufficiently farsighted (i.e. the discount is near 1)
I’d change this to “optimizes average reward (i.e. the discount equals 1)”. Otherwise looks good!
Done :)
I’d change this to “optimizes average reward (i.e. the discount equals 1)”. Otherwise looks good!
Done :)