Wei Dai comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

Wei Dai 9 Jun 2013 1:04 UTC
5 points

We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn’t get into priors).

I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”. It seems like we’re still at a stage when we don’t understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.

I agree that AI progress is probably socially costly [...] Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.

I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?

I think that the kind of probabilistic reflection we are working on is fairly natural though.

Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
- paulfchristiano 10 Jun 2013 14:05 UTC
  2 points
  Parent
  
  I read that paper before but it doesn’t say why its proposed way of handling logical uncertainty is the correct one, except that it “seem to have some good properties”.
  
  This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don’t quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.
  
  I’m not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?
  
  I think it’s plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.
  
  Maybe I’m just being dense but I’m still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn’t seem to make a strong attempt to explain it either.
  
  Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren’t so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.
  - Wei Dai 10 Jun 2013 16:52 UTC
    4 points
    Parent
    
    Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality)
    
    Why do you say “exactly corresponded to reality”? You’d have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?
    
    Intuitively, what I’d want is a “math intuition module” which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that “think more” is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don’t understand what your intuitive requirement might be.
    - paulfchristiano 2 Jul 2013 12:43 UTC
      2 points
      Parent
      P is intended to be like objective reality, exactly analogously with the predicate “True.” So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent’s ability to formulate or reason about hypotheses.
      
      Statements without P’s in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don’t see a strong argument it shouldn’t happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.
      
      The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is “true”. So having a notion of “truth” seems like the first step.