Sorry for the format here, and I still try to figure out how to use markdown in the comment.
I find difficulty understanding inferences about parameters $ \alpha,\beta,\gamma $ in the “Example:regret” part.
Take the fully rational planner p
for example.
Since the human will say h
following s
, the different between reward functions for h
and -h
is non-negative, which implies that: $ (\beta R(h)+\gamma R(h|s)) - (\beta R(\sim h)+\gamma R(\sim h|s)) \geq 0 $
Then it is concluded that $ \beta R(h-\sim h)+\gamma R(h-\sim h|s)\geq0$
Similarly, from the human will say $ \sim h$ following i
, we have $ \beta R(h-\sim h)+\delta R(h-\sim h|i)\leq0$
It seems that more information about the reward function is need in order to arrive at the final model with $ (p,R(\alpha,\beta,\gamma,\delta)|\gamma\geq-\beta\geq\delta) $
It seems on the MacOS Catalina Version 10.15, Safari Version 13.0.2, the Ctrl-Cmd-4 does not work either. I have tried every possible combination (Ctrl,Cmd,4,M) on both Safari and Chrome but none of them works. Do you have any ideas?