There’s no PPO/PPG curve there—I’d be curious to see that comparison. (though I agree that QL/MuZero will probably be more sample efficient.)
I was eyeballing Figure 2 in the PPG paper and comparing it to our results on the full distribution (Table A.3).
PPO: ~0.25PPG: ~0.52MuZero: 0.68MuZero+Reconstruction: 0.93
There’s no PPO/PPG curve there—I’d be curious to see that comparison. (though I agree that QL/MuZero will probably be more sample efficient.)
I was eyeballing Figure 2 in the PPG paper and comparing it to our results on the full distribution (Table A.3).
PPO: ~0.25
PPG: ~0.52
MuZero: 0.68
MuZero+Reconstruction: 0.93