I was thinking about normalisation as linearly rescaling every reward to [0,1] when I wrote the comment. Then, one can always look at [0,1]2, which might make it easier to graphically think about how different beliefs lead to different policies. Different scales can then be translated to a certain reweighting of the beliefs (at least from the perspective of the optimal policy), as maximizing P(R1)S1R1+P(R2)S2R2 is the same as maximizing P(R1)S1P(R1)S1+P(R2)S2R1+P(R2)S2P(R1)S1+P(R2)S2R2
I was thinking about normalisation as linearly rescaling every reward to [0,1] when I wrote the comment. Then, one can always look at [0,1]2, which might make it easier to graphically think about how different beliefs lead to different policies. Different scales can then be translated to a certain reweighting of the beliefs (at least from the perspective of the optimal policy), as maximizing P(R1)S1R1+P(R2)S2R2 is the same as maximizing P(R1)S1P(R1)S1+P(R2)S2R1+P(R2)S2P(R1)S1+P(R2)S2R2