AlexMennen comments on Formal Open Problem in Decision Theory

AlexMennen 23 Apr 2017 5:00 UTC
LW: 1 AF: 1
AF
$g$ can be a fiber of $f$ , since for each $n$ , $x_{n}$ and $y_{n}$ could be distance greater than $n$ from the basepoint.

Example: let $X := {x_{n}, y_{n} | n \in N}$ , with $d (x_{n}, x_{m}) = d (y_{n}, y_{m}) = d (x_{n}, y_{m}) = 2 | n - m |$ for $n \neq m$ and $d (x_{n}, y_{n}) = 6^{- n}$ . Let $x_{0}$ be the basepoint (so that $(x_{0}, x_{0})$ is the point you were calling “ $0$ ”). Let $g (x_{n}) := 0$ , $g (y_{n}) := 1$ , $f (z, w) := g (z)$ , and $h_{n} (r) := 3^{n} r$ .

I also don’t see how to even construct the function $g$ , or, relatedly, what you mean by “geometrically nicely”, but I guess it doesn’t matter.

Also, I’m not convinced that metric spaces with uniform continuity on bounded subsets is a better framework than topological spaces with continuity.
- AlexMennen 29 Apr 2017 18:20 UTC
  0 points
  AF Parent
  This is intended as a reply to David Simmons’s comment, which for some reason I can’t reply to directly.
  
  In the new version of your proof, how do we know $Y_{k}$ isn’t too close to $X_{l}$ for some $l > k$ ? And how do we know that $g$ is uniformly continuous on bounded subsets?
  
  About continuity versus uniform continuity on bounded sets:
  
  It seems to me that your point 1 is just a pithier version of your point 4, and that these points support paying attention to uniform continuity, rather than uniform continuity restricted to bounded sets. This version of the problem seems like it would be a less messy version of the “fixed modulus of continuity” version of the problem you mentioned (which I did not understand your solution to, but will look again later).
  
  I’m not sure what you’re getting at about singularities in point 3. I wouldn’t have asked why you were considering uniform continuity on bounded sets instead of uniform continuity away from singularities (in fact, I don’t know what that means). I would ask, though, why uniform continuity on bounded sets instead of uniform continuity on compact sets? As you point out, the latter is the same as continuity.
  
  Your point 2 is completely wrong, and in fact this is the primary reason I was convinced that continuity is a better thing to pay attention to than uniform continuity on bounded sets. The type of object you are describing is an effective Polish space that remembers its metric. Typically, descriptive set theorists forget the metric, and the isomorphisms between Polish spaces are homeomorphisms (and the isomorphisms between effective Polish spaces are computable homeomorphisms). Changing the metric in a computably homeomorphic way does not change what can be done when points are represented as descending chains of basic open sets with singleton intersection. So the thing you described was really topological rather than metric in nature, even though it involves introducing a metric in the setup. I am not aware of any notions of computability in metric spaces in which the metric matters in the way you are suggesting. It is not true that uniform continuity gives you an algorithm for computing the function. As a counterexample, let $a$ be an uncomputable number, and let $f : R \to R$ be given by $f (x) = a$ for every $x$ . $f$ is clearly uniformly continuous, but not computable. It is also not true that uniform continuity is necessary in order for the function to be computable. For instance, $sin (1 / x)$ is computable on $(0, 1]$ . Of course, $(0, 1]$ is not complete, but for another example, consider an effective infinite-dimensional Hilbert space, and let ${e_{n} | n \in N}$ be an effective orthonormal basis. Let $f_{n} (x) := max (0, \frac{1}{2} - ∥ x - e_{n} ∥)$ . This sequence of functions is computable, they have disjoint support, and for any point, a sufficiently small neighborhood around it will be disjoint from the supports of all but at most one of these functions. Thus $f (x) := \sum_{n} n f_{n} (x)$ is computable, but of course it is not uniformly continuous on the unit ball, which is bounded. However, it is true that every computable function is continuous, and conversely, every continuous function is computable with respect to some oracle. Of course, we really want computability, not computability with respect to some oracle, but this still seems to show that continuity is at least a good metaphor for computability, whereas uniform continuity on bounded sets doesn’t seem so to me.
  
  Of course, all this about continuity as a metaphor for computability makes the most sense in the context of Polish spaces, and we can only talk about actual computability in the context of effective Polish spaces. Scott’s problem involves a space $X$ and the exponential $[0, 1]^{X}$ . If $X$ is a locally compact Polish space, then $[0, 1]^{X}$ is also Polish. I think that this might be necessary (that is, if $X$ and $[0, 1]^{X}$ are Polish, then $X$ is locally compact), although I’m not sure. If so, and if your proof is correct, it seems plausible that your proof could be adapted to show that there is no locally compact Polish space with the property that Scott was looking for, and that would show that there is no solution to the problem in which $X$ and $[0, 1]^{X}$ are both Polish spaces, and hence no computable solution, if computability is formalized as in effective descriptive set theory.
  - Dacyn 1 May 2017 11:10 UTC
    LW: 2 AF: 1
    AF Parent
    I don’t know why my comment doesn’t have a reply button. Maybe it is related to the fact that my comment shows up as “deleted” when I am not logged in.
    
    Sorry, I seem to be getting a little lazy with these proofs. Hopefully I haven’t missed anything this time.
    
    New proof: … We can extract a subsequence $(n_{k})$ such that if $X_{k} = x_{n_{k}}$ and $Y_{k} = y_{n_{k}}$ , then $d (X_{k + 1}, Y_{k + 1}) \leq (1 / 6) d (X_{k}, Y_{k})$ for all $k$ , and for all $k$ and $l > k$ , either (A) $d (X_{k}, Y_{l}) \geq (1 / 3) d (X_{k}, Y_{k})$ and $d (X_{k}, X_{l}) \geq (1 / 3) d (X_{k}, Y_{k})$ or (B) $d (Y_{k}, X_{l}) \geq (1 / 3) d (X_{k}, Y_{k})$ and $d (Y_{k}, Y_{l}) \geq (1 / 3) d (X_{k}, Y_{k})$ . By extracting a further subsequence we can assume that which of (A) or (B) holds depends only on $k$ and not on $l$ . By swapping $X_{k}$ and $Y_{k}$ if necessary we can assume that case (A) always holds.
    
    Lemma: For each $z$ there is at most one $k$ such that $d (z, X_{k}) \leq (1 / 6) d (X_{k}, Y_{k})$ .
    
    Proof: Suppose $d (z, X_{k}) \leq (1 / 6) d (X_{k}, Y_{k})$ and $d (z, X_{l}) \leq (1 / 6) d (X_{l}, Y_{l})$ , with $k < l$ . Then $d (X_{k}, X_{l}) < (1 / 3) d (X_{k}, Y_{k})$ , a contradiction.
    
    It follows that by extracting a further subsequence we can assume that $d (Y_{k}, X_{l}) \geq (1 / 6) d (X_{l}, Y_{l})$ for all $l > k$ .
    
    Now let $j : [0, \infty) \to [0, \infty)$ be an increasing uniformly continuous function such that $j (0) = 0$ and $j ((1 / 6) d (X_{k}, Y_{k})) > 2^{- n_{k}}$ for all $k$ . Finally, let $g (x) = {inf}_{k} j (d (x, Y_{k}))$ . Then for all $k$ we have $g (Y_{k}) = 0$ . On the other hand, for all $k < l$ we have $d (X_{k}, Y_{l}) \geq (1 / 3) d (X_{k}, Y_{k})$ , for $k = l$ we have $d (X_{k}, Y_{l}) = d (X_{k}, Y_{k})$ , and for $k > l$ we have $d (X_{k}, Y_{l}) \geq (1 / 6) d (X_{k}, Y_{k})$ . Thus $g (X_{k}) = {inf}_{l} j (d (X_{k}, Y_{l})) \geq j ((1 / 6) d (X_{k}, Y_{k})) > 2^{- n_{k}}$ . Clearly, $g$ cannot be a fiber of $f$ . Moreover, since the functions $j$ and $x \mapsto {inf}_{k} d (x, Y_{k})$ are both uniformly continuous, so is $g$ .
    
    Regarding your responses to my points:
    
    I guess I don’t disagree with what you write regarding my points 1 and 4.
    
    It seems to be harder than expected to explain my intuitions regarding singularities in point 3. Basically, I think the reasons that abstract continuity came to be considered standard are mostly the fact that in concrete applications you have to worry about singularities, and this makes uniform continuity a little more technically annoying. But in the kind of problem we are considering here, it seems that continuity is really more annoying to deal with than uniform continuity, with little added benefit. I guess it also depends on what kinds of functions you expect to actually come up, which is a heuristic judgement. Anyway it might not be productive to continue this line of reasoning further as maybe our disagreements just come down to intuitions.
    
    Regarding my point 2, I wasn’t very clear when I said that uniform continuity gives you an algorithm, what I meant was that if you have an algorithm for computing the images of points in the dense sequence and for computing the modulus of continuity function, then uniform continuity gives you an algorithm. The function $x \mapsto sin (1 / x)$ would be the kind of thing I would handle with uniform continuity away from singularities (to fix a definition for this, let us say that you are uniformly continuous away from singularities if you are uniformly continuous on sets of the form $B (0, n) ∖ B (S, 1 / n)$ , where $S$ is some set of singularities).
    
    In your definition of $f_{n}$ , I think you mean to write max instead of min. But I see your point, though the example seems a little pathological to me.
    
    Anyway, it seems that you agree that it makes sense to restrict to Polish spaces based on computability considerations, which is part of what I was trying to say in 2.
    
    If you have a locally compact Polish space, then you can find a metric with respect to which the space is proper (i.e. bounded subsets are compact): let $d^{'} (x, y) = d (x, y) + | f (x) - f (y) |$ , where $f (x) = 1 / sup {r : B (x, r)$ is compact $}$ . With respect to this metric, continuity is the same as uniform continuity on bounded sets, so my proof should work then.
    
    Proposition: Let $X$ be a Polish space that is not locally compact. Then $[0, 1]^{X}$ (with the compact-open topology) is not first countable.
    
    Proof: Suppose otherwise. Then the function $f_{0} \equiv 0$ has a countable neighborhood basis of sets of the form $F (K_{n}, U_{n}) = {f : f (K_{n}) \subseteq U_{n}}$ where $K_{n} \subseteq X$ is compact and $U_{n} \subseteq [0, 1]$ is open. Since $X$ is not locally compact, there exists a point $x$ such that $B (x, r)$ is not compact for any $x$ . For each $n$ , we can choose $x_{n} \in B (x, 1 / n) ∖ ⋃_{i \leq n} K_{i}$ . Let $K = {x_{n} : n} \cup {x}$ , and note that $K$ is compact. Then $F (K, [0, 1 / 2))$ is a neighborhood of $f_{0}$ . But then $F (K, [0, 1 / 2)) \supseteq ⋂_{i \leq n} F (K_{i}, U_{i})$ for some $n$ . This contradicts the fact that $x_{n} \in K ∖ ⋃_{i \leq n} K_{i}$ , since we can find a bump function which is $0$ on $⋃_{i \leq n} K_{i}$ but $1$ at $x$ .
    
    It does still seem to me that most of the useful intuition comes from point 4 of my previous comment, though.
    - AlexMennen 30 Apr 2017 19:22 UTC
      0 points
      AF Parent
      It appears that comments from new users are collapsed by default, and cannot be replied to without a “Like”. These seem like bad features.
      
      Your proof that there’s no uniformly continuous on bounded sets function $f : X \times X \to [0, 1]$ admitting all uniformly continuous on bounded sets functions $X \to [0, 1]$ as fibers looks correct now. It also looks like it can be easily adapted to show that there is no uniformly continuous $f : X \times X \to [0, 1]$ admitting all uniformly continuous functions $X \to [0, 1]$ as fibers. Come to think of it, your proof works for arbitrary metric spaces $X$ , not just complete separable metric spaces, though those are nicer.
      
      I see what you mean now about uniform continuity giving you an algorithm, but I still don’t think that’s specific to uniform continuity in an important way. After all, if you have an algorithm for computing images of points in the countable dense set, and a computable “local modulus of continuity” in the sense of a computable function $h : X \times [0, \infty) \to [0, \infty)$ with $h (x, 0) = 0$ and $d (x, y) < r ⟹ | f (y) - f (x) | < h (x, r)$ , then $f$ is computable, and this does not require $f$ to be uniformly continuous. Although I suppose you could object that this is a bit circular, in that I’m assuming the “local modulus of continuity” is computable only in the standard sense, which does not require uniform continuity.
      
      I’m not sure why you would allow singularities at some points (presumably a uniformly discrete set, or something like that) while still insisting on uniform continuity elsewhere. It still seems to me that the arguments for uniform continuity rather than continuity all point to wanting uniform continuity entirely, rather than some sense of local uniform continuity in most places.
      
      Thanks for pointing out the error in my definition of $f_{n}$ ; I’ve fixed it.
      
      In your argument that locally compact Polish spaces can be given metrics with respect to which they are proper, it isn’t true that $d^{'}$ is necessarily a proper metric. For instance, consider a countably infinite set with $d (x, y) = 1$ for $x \neq y$ . This is a locally compact Polish space, but $f (x) = 1$ for every $x$ , so $d^{'} = d$ , and the space is not proper.
      
      Your last proposition looks correct (though with a typo: last $\subseteq$ in the proof should be $\supseteq$ ). However, if $X$ is not locally compact, then the compact-open topology isn’t necessarily the right topology to consider on $[0, 1]^{X}$ . We want a topology making $[0, 1]^{X}$ into an exponential object, and it isn’t clear that such a topology even exists, or that it is the compact-open topology if it does exist (though it must be a refinement of the compact-open topology if it does exist). Maybe asking about non-locally compact Polish spaces $X$ with a Polish exponential space $[0, 1]^{X}$ is a kind of weird question, though, and if we’re even considering non-locally compact Polish spaces, we should turn to the version of the question where we just want a continuous function $X \times X \to [0, 1]$ admitting all continuous functions $X \to [0, 1]$ as fibers.
      - Dacyn 1 May 2017 11:15 UTC
        LW: 2 AF: 1
        AF Parent
        I will have to think more about the issue of continuity vs uniform continuity. I suppose my last remaining argument would be the fact that Bishop—Bridges’ classic book on constructive analysis uses uniform continuity on bounded sets rather than continuity, which suggests that it is probably better for constructive analysis at least. But maybe they did not analyze the issue carefully enough, or maybe the relevant issues here are for some reason different.
        
        To fix the argument that every locally compact Polish space admits a proper metric, let $f$ be as before and let $F (x, y) = \infty$ if $d (x, y) \geq f (x)$ and $F (x, y) = f (x) / [f (x) - d (x, y)]$ if $d (x, y) < f (x)$ . Next, let $g (y) = {min}_{n} [n + F (x_{n}, y)]$ , where $(x_{n})$ is a countable dense sequence. Then $g$ is continuous and everywhere finite. Moreover, if $S = g^{- 1} ([0, N])$ , then $S \subseteq ⋃_{n \leq N} B (x_{n}, (1 - 1 / N) f (x_{n}))$ and thus $S$ is compact. It follows that the metric $d^{'} (x, y) = d (x, y) + | g (y) - g (x) |$ is proper.
        
        Anyway I have fixed the typo in my previous post.
        
        AlexMennen 2 May 2017 4:02 UTC
        0 points
        AF Parent
        Hm, perhaps I should figure out what the significance of uniform continuity on bounded sets is in constructive analysis before dismissing it, even though I don’t see the appeal myself, since constructive analysis is not a field I know much about, but could potentially be relevant here.
        
        $f$ is the reciprocal of what it was before, but yes, this looks good. I am happy with this proof.
- Dacyn 23 Apr 2017 12:12 UTC
  0 points
  Parent
  Ah, you’re right. The proof can be fixed by changing the division between the two cases. So here is the new proof, with more details added regarding the construction of $g$ :
  1. $B (0, m)$ is uniformly discrete for all $m$ . Then every map from $X$ to $[0, 1]$ is uniformly continuous on bounded sets, so we get a contradiction from cardinality considerations.
  2. $B (0, m)$ is not uniformly discrete for some $m$ . Then for each $n \geq m$ , since f is uniformly continuous on $B (0, n)$ it has a modulus of continuity on this set, i.e. a continuous increasing function $h_{n} : (0, \infty) \to (0, \infty)$ such that $d (f (x), f (y)) < h_{n} (d (x, y))$ for all $x, y \in B (0, n) \subseteq X \times X$ . Since $B (0, m)$ is not uniformly discrete, there exist $x_{n}, y_{n} \in B (0, m)$ such that $h_{n} (d (x_{n}, y_{n})) < 2^{- n}$ and $d (x_{n}, y_{n}) < 2^{- n}$ . We can extract a subsequence $(n_{k})$ such that if $X_{k} = x_{n_{k}}$ and $Y_{k} = y_{n_{k}}$ , then $d (X_{k + 1}, Y_{k + 1}) \leq (1 / 3) d (X_{k}, Y_{k})$ for all $k$ , and for all $k$ and $ℓ > k$ , either (A) $d (X_{k}, Y_{ℓ}) \geq (1 / 3) d (X_{k}, Y_{k})$ or (B) $d (Y_{k}, X_{ℓ}) \geq (1 / 3) d (X_{k}, Y_{k})$ . By extracting a further subsequence we can assume that which of (A) or (B) holds depends only on $k$ and not on $ℓ$ . By swapping $X_{k}$ and $Y_{k}$ if necessary we can assume that case (A) always holds. Now let $j : [0, \infty) \to [0, \infty)$ be an increasing continuous function such that $j ((1 / 3) d (X_{k}, Y_{k})) > 2^{- n_{k}}$ for all $k$ . Finally, let $g (y) = {inf}_{k} j (d (X_{k}, y))$ . Then for all $k$ we have $g (X_{k}) = 0$ but $g (Y_{k}) > 2^{- n_{k}}$ . Clearly, $g$ cannot be a fiber of $f$ .
  Regarding the appropriateness of metric spaces / uniform continuity rather than topological spaces / abstract continuity, here are some of the reasons behind my intuition here (developed working in mathematical analysis, specifically Diophantine approximation, and also constructive mathematics):
  1. The obvious: metric spaces are explicitly meant to represent the intuitive notion of alikeness as a quantitative concept (i.e. distance), whereas topological spaces have no explicit notion of alikeness.
  2. In computability theory, one is interested in the question of how to computationally represent a point or an approximation to a point in a space. The standard way to do this is via restricting to the class of complete separable metric spaces, fixing a countable dense sequence $(x_{n})$ (assumed to be representative of the structure of the metric space), and defining a computational approximation to a point to be an expression of the form $B (x_{n}, 1 / m)$ . Since $n$ and $m$ are integers this expression can be coded as finite data. One then defines a computational encoding of a point to be an infinite bitstream consisting of computational approximations that converge to the point.
    
    In practical applications, in the end you will want everything to be computable. So it makes sense to work in a framework where there are natural notions of computability. I am not aware of any such notions for general topological spaces.
  3. Regarding continuity vs uniform continuity in metric spaces, both are saying that if two points are close in the domain, their images are also close. But the latter gives you a straightforward estimate as to how close, whereas the former says that the degree of closeness may depend on one of the points. Now, there are good reasons to consider such dependence, since even natural functions on the real numbers (such as $x^{2}$ or $1 / x$ ) have “singularities” where they are not uniformly continuous.
    
    So the question is whether to modify the notion of uniform continuity to directly account for singularities, or to use the standard definition of continuity instead. But if one works with the standard definition, then most of the time one is really looking for ways to sneak back to uniform continuity, e.g. by using the fact that a continuous function on a compact set is uniformly continuous.
    
    An intuitive way of thinking about the fact that a continuous function on a compact set is uniformly continuous is that the notion of compactness means that there are no singularities present “within the space”. For example, if we go back to the functions $x^{2}$ or $1 / x$ , then the singularity of the first occurs at infinity, while the singularity of the latter occurs at $0$ . If we take a compact subset of the domain of either function, then what it really means is that we are avoiding the singularity.
    
    By contrast, non-compactness should mean that there are singularities. In some cases like $(0, 1)$ it is easy to identify what the singularities are. But if we are dealing with spaces that are not locally compact like $N^{N}$ or an infinite-dimensional Hilbert space, then it is not as clear what the singularities are, there is just a general sense that they are dispersed “throughout the space” (because the space is not not locally compact).
    
    But you have to ask yourself, are these singularities real or just imagined? In many cases, imagined. For example, in the theory of Banach spaces continuous linear maps are always uniformly continuous.
    
    What about a map that is not uniformly continuous, like the inversion map $f (x) = x / ∥ x ∥^{2}$ in infinite-dimensional Hilbert space? In this case, there is still a singularity—at $0$ -- and the definition of continuity needs to reflect that. But it doesn’t help to imagine all sorts of other singularities dispersed throughout the space, because that prevents you from making useful statements like: if $x, y$ are at least $α$ away from $0$ and $d (x, y) \leq ε$ , then $d (f (x), f (y)) \leq C ε / α^{2}$ , where $C$ is an absolute constant.
    
    Now the example in the previous paragraph is an example of quantitative continuity, which is stronger than uniform continuity away from singularities. But the point is that it can be seen as an extension of uniform continuity away from singularities.
  4. Maybe my last reason will be the most relevant from a naturalized agent perspective. The notion of uniform continuity is important because it introduces the modulus of continuity, which can be viewed as a measure of how continuous a function is. The restriction that an agent must be uniformly continuous can be then thought of in a quantitative sense, with “better” agents less having to follow this restriction. So a more powerful agent may have a looser (larger) modulus of continuity, because it can react more precisely to different possible inputs.
    
    In this terminology, my proof can be thought of as giving an intuitive reason for why the agent cannot implement every possible policy: the agent has limited resources to distinguish different inputs, so it can only implement those policies that can be implemented with these limited resources.
    
    The obvious followup question would be whether if you restrict your attention to the policies that the agent isn’t prevented from implementing due to its limited resources, then can it implement every possible policy? Or in other words, if you fix a modulus of continuity from the outset, can you include all functions with that modulus of continuity as fibers?
    
    If you allow the every-policy function to have an arbitrary modulus of continuity unrelated to the modulus of continuity you are trying to imitate, then it is not hard to see that this is possible at least for some spaces. (By Arzela-Ascoli the space of functions with a fixed modulus of continuity is compact, so there exists a continuous surjection from $2^{N}$ to this space.) But this may require greatly increasing the resources that the agent must spend to differentiate inputs. On the other hand, requiring the exact same modulus of continuity seems like too rigid an assumption. So the right question is probably to ask how close can the modulus of continuity of the every-policy function be to the modulus it is trying to imitate.
    
    For this kind of question it is probably better to work with a concrete example rather than trying to prove something in generality, so I will work with the Cantor space $X = 2^{N}$ with the metric $d ((x_{n}), (y_{n})) = 2^{- min {n : x_{n} \neq y_{n}}}$ . Suppose we want to imitate all functions $g : X \to {0, 1}$ such that $d (x, y) < ε$ implies $g (x) = g (y)$ . (I know this is not quite the same as the original question, but I think it is close enough.) If $ε = 2^{- n}$ then there are $N = 2^{2^{n}}$ such functions. So if we have a single function $f : X \times X \to {0, 1}$ that has all of them as fibers, then by the pigeonhole principle there is some ball of the form $B (x, 2^{- N + 1})$ that contains two such fibers. But then if $x_{1}$ and $x_{2}$ are the two fibers, then there exists $y$ such that $f (x_{1}, y) \neq f (x_{2}, y)$ . It follows that if we want to choose $ε^{'}$ such that $d (x, y) < ε^{'}$ implies $f (x) = f (y)$ (i.e. the analogue of the assumption on $g$ but with $ε$ replaced by $ε^{'}$ ) then we need $ε^{'} \leq 2^{- N + 2}$ .
    
    In conclusion, the required accuracy $ε^{'}$ of $f$ is doubly exponential with respect to the required accuracy $ε$ of $g$ . Thus, it is not feasible to implement such a function.
  What links here?
  - AlexMennen's comment on Formal Open Problem in Decision Theory by Scott Garrabrant (29 Apr 2017 18:20 UTC; 0 points)