Run your decision procedure for a constant time. If it doesn’t halt, abort it and break the symmetry—e.g. by choosing the option that sorts first lexically.
The constant time part could work, but is hardly the only escape valve you should have. You have a utility estimate for each action- the estimates will have some variance, and you can run the procedure until either the variance is below a certain amount or the variance has decreased by less than some threshold in the last iteration or you’ve run out of time.
Run your decision procedure for a constant time. If it doesn’t halt, abort it and break the symmetry—e.g. by choosing the option that sorts first lexically.
The constant time part could work, but is hardly the only escape valve you should have. You have a utility estimate for each action- the estimates will have some variance, and you can run the procedure until either the variance is below a certain amount or the variance has decreased by less than some threshold in the last iteration or you’ve run out of time.