I’d be pretty surprised if those ended up being the same thing as each other.
Yes, there’s several levels here, and it can get confusing which one you’re fiddling with or ‘exploring’ in. This can make the (pre-DL) MCTS literature hard to read because it can get fairly baroque as you use different heuristics at each level. What is useful for making choices how to explore the tree to eventually make the optimal choice is not the same thing as just bandit sampling. (It’s like the difference between best-arm finding and simple bandit minimizing regret: you don’t want to ‘minimize long-run regret’ when you explore a MCTS tree, because there is no long run: you want to make the right choice after you’ve done a small number of rollouts, because you need to take an action, and you will start planning again.) So while you can explore a MCTS tree with something simple & convenient like Bayesian Thompson sampling (if you’re predicting win rates as a binomial, that’s conjugate and so is very fast), it won’t work as well as something deliberately targeting exploration, or one which is budget-aware and tries to maximize the probability of finding best action within n rollouts.
Yes, there’s several levels here, and it can get confusing which one you’re fiddling with or ‘exploring’ in. This can make the (pre-DL) MCTS literature hard to read because it can get fairly baroque as you use different heuristics at each level. What is useful for making choices how to explore the tree to eventually make the optimal choice is not the same thing as just bandit sampling. (It’s like the difference between best-arm finding and simple bandit minimizing regret: you don’t want to ‘minimize long-run regret’ when you explore a MCTS tree, because there is no long run: you want to make the right choice after you’ve done a small number of rollouts, because you need to take an action, and you will start planning again.) So while you can explore a MCTS tree with something simple & convenient like Bayesian Thompson sampling (if you’re predicting win rates as a binomial, that’s conjugate and so is very fast), it won’t work as well as something deliberately targeting exploration, or one which is budget-aware and tries to maximize the probability of finding best action within n rollouts.