Just exposition-wise, I’d front-load pi^H and pi^* when you define pi^B, and also clarify then that pi^B considers human-exploration as part of it’s policy.
Just exposition-wise, I’d front-load pi^H and pi^* when you define pi^B, and also clarify then that pi^B considers human-exploration as part of it’s policy.