It depends whether the agent has any way of predicting what the random action will be at a future point in time.
You don’t have to literally sample a random action; you can just calculate the expected thing that would happen under a random policy. For example, you would replace Q∗(s,ϕ) with 1|A|A∑i=1Q∗(s,ai).
You don’t have to literally sample a random action; you can just calculate the expected thing that would happen under a random policy. For example, you would replace Q∗(s,ϕ) with 1|A|A∑i=1Q∗(s,ai).