Hey Nisan. Check the following passage from Domain Theory (Samson Abramsky and Achim Jung). This might be helpful for equipping Δ with an appropriate domain structure. (You mention [JP89] yourself.)
We should also mention the various attempts to define a probabilistic version of the powerdomain construction, see [SD80, Mai85, Gra88, JP89, Jon90].
[SD80] N. Saheb-Djahromi. CPO’s of measures for nondeterminism. Theoretical Computer Science, 12:19–37, 1980.
[Mai85] M. Main. Free constructions of powerdomains. In A. Melton, editor, Mathematical Foundations of Programming Semantics, volume 239 of Lecture Notes in Computer Science, pages 162–183. Springer Verlag, 1985.
[Gra88] S. Graham. Closure properties of a probabilistic powerdomain construction. In M. Main, A. Melton, M. Mislove, and D. Schmidt, editors, Mathematical Foundations of Programming Language Semantics, volume 298 of Lecture Notes in Computer Science, pages 213–233. Springer Verlag, 1988.
[JP89] C. Jones and G. Plotkin. A probabilistic powerdomain of evaluations. In Proceedings of the 4th Annual Symposium on Logic in Computer Science, pages 186–195. IEEE Computer Society Press, 1989.
[Jon90] C. Jones. Probabilistic Non-Determinism. PhD thesis, University of Edinburgh, Edinburgh, 1990. Also published as Technical Report No. CST63-90.
During my own incursion into agent foundations and game theory, I also bumped into this exact obstacle — namely, that there is no obvious way to equip Δ with a least-fixed-point constructor FixΔX:(X→ΔX)→ΔX. In contrast, we can equip P with a LFP constructor FixPX:(X→PX)→PX,g↦{x∈X:x∈g(x)}.
One trick is to define FixΔX(g) to be thedistribution π∈ΔX which maximises the entropy H(π) subject to the constraint gΔ(π)=π.
A maximum entropy distribution π∗ exists, because —
For g:X→ΔX, let gΔ:ΔX→ΔX be the lift via the Δ monad, and let G={π∈ΔX|gΔ(π)=π} be the set of fixed points of gΔ.
ΔX is Hausdorff and compact, and gΔ:ΔX→ΔX is continuous, so G={π∈ΔX:π=gΔ(π)} is compact.
H:ΔX→R is continuous, and G⊆ΔX is compact, so H obtains a maximum π∗ in G.
Moreover, π∗ must be unique, because —
G is a convex set, i.e. if π1=gΔ(π1) and π2=gΔ(π2) then λ1π1+λ2π2=gΔ(λ1π1+λ2π2) for all λ1+λ2=1.
H:ΔX→R is strictly concave, i.e.H(λ1π1+λ2π2)≥λ1H(π1)+λ2H(π2) for all λ1+λ2=1, and moreover the inequality is strict if π1≠π2 and λ1,λ2>0.
Hence if π∗1,π∗2∈G both obtain the maximum entropy, then π∗1≠π∗2⟹H(0.5π1+0.5π2)>0.5H(π1)+0.5H(π2), a contradiction.
The justification here is the Principle of Maximum Entropy:
Given a set of constraints on a probability distribution, then the “best” distribution that fits the data will be the one of maximum entropy.
More generally, we should define FixΔX(g) to be thedistribution π∈ΔX which minimises cross-entropy DKL(π||π0) subject to the constraint π=g(π), where π0 is some uninformed prior such as Solomonoff. The previous result is a special case by considering π0 to be the uniform prior. The proof generalises by noting that DKL(−||π0):ΔX→R is continuous and strictly convex. See the Principle of Minimum Discrimination.
Ideally, we’d like FixP and FixΔ to “coincide” modulo the maps Supp:ΔX→PX, i.e.Supp(FixΔ(g))=FixP(Supp∘g) for all g:X→ΔX. Unfortunately, this isn’t the case — if g:H↦0.5⋅|H⟩+0.5⋅|T⟩,T↦|T⟩ then FixP(Supp∘g)={H,T} but Supp(FixΔ(g))={T}.
Alternatively, we could consider the convex sets of distributions over X.
Let C(X) denote the set of convex sets of distributions over X. There is an ordering ≤ on C(X) where A≤B⟺A⊇B. We have a LFP operator FixCX:(X→CX)→CX via g↦⋃{S∈CX:gCX(S)=S} where gC:CX→CX,S↦{∑ni=1αi⋅πi|πi∈g(xi),∑ni=1αi|xi⟩∈S} is the lift of g:X→CX via the C monad.
Thanks! For convex sets of distributions: If you weaken the definition of fixed point to gC(S)≤S, then the set {S∈CX:gC(S)=S} has a least element which really is a least fixed point.
Hey Nisan. Check the following passage from Domain Theory (Samson Abramsky and Achim Jung). This might be helpful for equipping Δ with an appropriate domain structure. (You mention [JP89] yourself.)
During my own incursion into agent foundations and game theory, I also bumped into this exact obstacle — namely, that there is no obvious way to equip Δ with a least-fixed-point constructor FixΔX:(X→ΔX)→ΔX. In contrast, we can equip P with a LFP constructor FixPX:(X→PX)→PX,g↦{x∈X:x∈g(x)}.
One trick is to define FixΔX(g) to be the distribution π∈ΔX which maximises the entropy H(π) subject to the constraint gΔ(π)=π.
A maximum entropy distribution π∗ exists, because —
For g:X→ΔX, let gΔ:ΔX→ΔX be the lift via the Δ monad, and let G={π∈ΔX|gΔ(π)=π} be the set of fixed points of gΔ.
ΔX is Hausdorff and compact, and gΔ:ΔX→ΔX is continuous, so G={π∈ΔX:π=gΔ(π)} is compact.
H:ΔX→R is continuous, and G⊆ΔX is compact, so H obtains a maximum π∗ in G.
Moreover, π∗ must be unique, because —
H:ΔX→R is strictly concave, i.e.H(λ1π1+λ2π2)≥λ1H(π1)+λ2H(π2) for all λ1+λ2=1, and moreover the inequality is strict if π1≠π2 and λ1,λ2>0.
Hence if π∗1,π∗2∈G both obtain the maximum entropy, then π∗1≠π∗2⟹H(0.5π1+0.5π2)>0.5H(π1)+0.5H(π2), a contradiction.
The justification here is the Principle of Maximum Entropy:
More generally, we should define FixΔX(g) to be the distribution π∈ΔX which minimises cross-entropy DKL(π||π0) subject to the constraint π=g(π), where π0 is some uninformed prior such as Solomonoff. The previous result is a special case by considering π0 to be the uniform prior. The proof generalises by noting that DKL(−||π0):ΔX→R is continuous and strictly convex. See the Principle of Minimum Discrimination.
Ideally, we’d like FixP and FixΔ to “coincide” modulo the maps Supp:ΔX→PX, i.e.Supp(FixΔ(g))=FixP(Supp∘g) for all g:X→ΔX. Unfortunately, this isn’t the case — if g:H↦0.5⋅|H⟩+0.5⋅|T⟩,T↦|T⟩ then FixP(Supp∘g)={H,T} but Supp(FixΔ(g))={T}.
Alternatively, we could consider the convex sets of distributions over X.
Let C(X) denote the set of convex sets of distributions over X. There is an ordering ≤ on C(X) where A≤B⟺A⊇B. We have a LFP operator FixCX:(X→CX)→CX via g↦⋃{S∈CX:gCX(S)=S} where gC:CX→CX,S↦{∑ni=1αi⋅πi|πi∈g(xi),∑ni=1αi|xi⟩∈S} is the lift of g:X→CX via the C monad.
Thanks! For convex sets of distributions: If you weaken the definition of fixed point to gC(S)≤S, then the set {S∈CX:gC(S)=S} has a least element which really is a least fixed point.