An alternative measure might be looking at the curve of Elo vs amount of computation. Deeper games presumably have greater rewards for thinking, so the curve will level off more slowly. This is measurable for computer players—at least after controlling for architecture (neural network vs traditional).
“How many Elo levels until diminishing returns on effort” seems like a sensible idea, and might work on humans as well as computers. But I still think coin-chess and poker show that using win probabilities to infer rating differences (as Elo does) isn’t very meaningful, it’s better to look only at the binary fact of whether Alice wins against Bob more than half the time.
An alternative measure might be looking at the curve of Elo vs amount of computation. Deeper games presumably have greater rewards for thinking, so the curve will level off more slowly. This is measurable for computer players—at least after controlling for architecture (neural network vs traditional).
“How many Elo levels until diminishing returns on effort” seems like a sensible idea, and might work on humans as well as computers. But I still think coin-chess and poker show that using win probabilities to infer rating differences (as Elo does) isn’t very meaningful, it’s better to look only at the binary fact of whether Alice wins against Bob more than half the time.