And if I pushed around symbols correctly, the geometric derivative can be pulled inside of a geometric expectation (∇∗θGx∼P(x)[f(x)]=Gx∼P(x)[∇∗θf(x)]) similarly to how an additive derivative can be pulled inside an additive expectation (∇θEx∼P(x)[fθ(x)]=Ex∼P(x)[∇θfθ(x)]). Also, just as additive expectation distributes over addition (E[f(x)+g(x)]=E[f(x)]+E[g(x)]), geometric expectation distributes over multiplication (G[f(x)g(x)]=G[f(x)]G[g(x)]).
I think what is going on here is that both ∇∗ and G are of the form (e∧)∘g∘ln with g=∇ and g=E, respectively. Let’s define the star operator as g∗=(e∧)∘g∘ln. Then (f∘g)∗=(e∧)∘(f∘g)∘ln=(e∧)∘f∘ln∘(e∧)∘g∘ln=f∗∘g∗, by associativity of function composition. Further, if f and g commute, then so do f∗ and g∗: g∗∘f∗=(g∘f)∗=(f∘g)∗=f∗∘g∗.
So the commutativity of the geometric expectation and derivative fall directly out of their representation as E∗ and ∇∗, respectively, by commutativity of E and ∇, as long as they are over different variables.
We can also derive what happens when the expectation and gradient are over the same variables: (∇θ∘Ex∼Pθ(x))∗. First, notice that (∗k)∗(x)=ek∗lnx=elnx∗k=xk, so (∗k)∗=(∧k).. Also (+k)∗(x)=ek+ln(x)=ekeln(x)=xek⟹(+k)∗=(∗ek).
Now let’s expand the composition of the gradient and expectation. (∇θ∘Ex∼Pθ(x))(f(x))=∇θ∫Pθ(x)f(x)dx=Ex∼Pθ(x)[∇θ(f(x)lnPθ(x))], using the log-derivative trick. So ∇θ∘Ex∼Pθ(x)=Ex∼Pθ(x)∘∇θ∘(∗lnPθ(x)).
Which is equivalent to f∗(x)=exp[ddxln(f(x))]
And if I pushed around symbols correctly, the geometric derivative can be pulled inside of a geometric expectation (∇∗θGx∼P(x)[f(x)]=Gx∼P(x)[∇∗θf(x)]) similarly to how an additive derivative can be pulled inside an additive expectation (∇θEx∼P(x)[fθ(x)]=Ex∼P(x)[∇θfθ(x)]). Also, just as additive expectation distributes over addition (E[f(x)+g(x)]=E[f(x)]+E[g(x)]), geometric expectation distributes over multiplication (G[f(x)g(x)]=G[f(x)]G[g(x)]).
I think what is going on here is that both ∇∗ and G are of the form (e∧)∘g∘ln with g=∇ and g=E, respectively. Let’s define the star operator as g∗=(e∧)∘g∘ln. Then (f∘g)∗=(e∧)∘(f∘g)∘ln=(e∧)∘f∘ln∘(e∧)∘g∘ln=f∗∘g∗, by associativity of function composition. Further, if f and g commute, then so do f∗ and g∗: g∗∘f∗=(g∘f)∗=(f∘g)∗=f∗∘g∗.
So the commutativity of the geometric expectation and derivative fall directly out of their representation as E∗ and ∇∗, respectively, by commutativity of E and ∇, as long as they are over different variables.
We can also derive what happens when the expectation and gradient are over the same variables: (∇θ∘Ex∼Pθ(x))∗. First, notice that (∗k)∗(x)=ek∗lnx=elnx∗k=xk, so (∗k)∗=(∧k).. Also (+k)∗(x)=ek+ln(x)=ekeln(x)=xek⟹(+k)∗=(∗ek).
Now let’s expand the composition of the gradient and expectation. (∇θ∘Ex∼Pθ(x))(f(x))=∇θ∫Pθ(x)f(x)dx=Ex∼Pθ(x)[∇θ(f(x)lnPθ(x))], using the log-derivative trick. So ∇θ∘Ex∼Pθ(x)=Ex∼Pθ(x)∘∇θ∘(∗lnPθ(x)).
Therefore, ∇∗θ∘Gx∼Pθ(x)=(∇θ∘Ex∼Pθ(x))∗ =E∗x∼Pθ(x)∘∇∗θ∘(∗lnPθ(x))∗ =Gx∼Pθ∘∇∗θ∘(∧lnPθ).
Writing it out, we have ∇∗θGx∼Pθ(x)[f(x)]=Gx∼Pθ(x)[∇∗θ(f(x)lnPθ(x)].