If repetitions arise from sampling merely due to high conditional probability given an initial “misstep”, they should be avoidable in an MCTS that sought to maximize unconditional probability of the output sequence (or rather conditional upon its input but not upon its own prior output). After entering the “trap” once or a few times, it would simply avoid the unfortunate misstep in subsequent “playouts”. From my understanding, that is.
If repetitions arise from sampling merely due to high conditional probability given an initial “misstep”, they should be avoidable in an MCTS that sought to maximize unconditional probability of the output sequence (or rather conditional upon its input but not upon its own prior output). After entering the “trap” once or a few times, it would simply avoid the unfortunate misstep in subsequent “playouts”. From my understanding, that is.