paulfchristiano comments on Some thoughts on AI, Philosophy, and Safety

paulfchristiano 29 Dec 2011 5:48 UTC
0 points

Therefore, for any agent A that prints only true statements, there is some agent A’ that prints the same statements plus one more hardcoded statement that A doesn’t print. (ETA: for example, it could be the statement “A doesn’t print this statement.”) So your game doesn’t have a best strategy, and the best we can hope for is a pretty good strategy. Do you believe you have a strategy that’s better than just printing out all the logical consequences of some formal system?

Fix A, say make it an emulation of your brain.

What should A do? A doesn’t get to step outside of itself and print a statement “A doesn’t print this statement” without the statement becoming false.

But A can do better than just printing out the consequences of a formal system. It can print out the theorems in a formal system, and then add on as many Con(...) statements as it can think of, as Nesov remarks. However, it can do even better still by outputting Con(A), even though it cannot then output all of the logical consequences of Con(A). It will then go on to output some other statements, and in general the behavior will be extremely complicated.
- Vladimir_Nesov 29 Dec 2011 9:43 UTC
  0 points
  Parent
  A program is not a formal system, at least not trivially, you need to specify an interpretation that sees it or its part so. If a program outputs a Con(A) statement, why not form a formal system B=A+Con(A) and have the agent use B? It’s better than using just A and outputting a single additional Con(A) without consequences. What’s the point of not extending formal systems with new things agent could output, and instead generating them in some different ad-hoc way?
  - paulfchristiano 29 Dec 2011 9:55 UTC
    0 points
    Parent
    Sorry, in the post I introduced Con(A) [where A is a program which outputs statements ] to mean:
    
    “For no statement S does A output both S and not S.”
    
    (This seems like the natural translation to the type “program.”) Hopefully now this is clear?
    - Vladimir_Nesov 29 Dec 2011 10:00 UTC
      0 points
      Parent
      That part is now clear, though that means A is not a formal system (its output is not closed under logical inference rules), and instead of Con(A) you could output all kinds of other things. It’s not clear what motivates this construction as compared to countless others.
      - paulfchristiano 29 Dec 2011 10:18 UTC
        0 points
        Parent
        I am not really trying to endorse one construction or another.
        
        The point is: if you are playing the game “say as many true things as you can without messing up” and you are good at it, you don’t end up saying a set of things which is closed under logical inference.
        
        It is quite possible that this is consistent with your intuitions, but it wasn’t consistent with mine until making this observation (and I would guess that most mathematicians would guess that you should say all logical implications of anything you say).
        Vladimir_Nesov 29 Dec 2011 10:33 UTC
        0 points
        Parent
        Right. So your point is to antipredict the assumption that if agent is generating statements, then it’s a formal system. The argument is to point out that there seems to be no clear sense in which being a formal system is optimal, since there are lots of strategies that break this property, so a good agent probably doesn’t just parrot a complete list of some formal system’s output. I agree, though seeing agent’s output as logical statements at all still doesn’t seem like a natural or useful interpretation. I guess my next question would be about motivation for considering that setting.
        paulfchristiano 29 Dec 2011 18:25 UTC
        0 points
        Parent
        I tend to think about and talk about agents’ “beliefs,” and apply various intuitions about my own beliefs to decision theoretic problems; this setting is designed to better inform some of those intuitions (in particular, it shifts the border on some conflicts between my naive intuitions and incompleteness results).
        Vladimir_Nesov 29 Dec 2011 18:58 UTC
        0 points
        Parent
        This didn’t clarify the situation for me.