Cole Wyeth comments on My AI Model Delta Compared To Christiano

Cole Wyeth 13 Jun 2024 20:25 UTC
2 points
0
Do you expect A.G.I. to be solving problems outside of NP? If not, it seems the relevant follow-up question is really out of the problems that are in NP, how many are in P?
Actually, my intuition is that deep learning systems cap out around P/poly, which probably strictly contains NP, meaning (P/poly) \ NP may be hard to verify, so I think I agree with you.
- johnswentworth 13 Jun 2024 21:10 UTC
  31 points
  11
  Parent
  Most real-world problems are outside of NP. Let’s go through some examples...
  Suppose I am shopping for a new fridge, and I want to know which option is best for me (according to my own long-term values). Can I easily write down a boolean circuit (possibly with some inputs from data on fridges) which is satisfiable if-and-only-if this fridge in particular is in fact the best option for me according to my own long-term values? No, I have no idea how to write such a boolean circuit at all. Heck, even if my boolean circuit could internally use a quantum-level simulation of me, I’d still have no idea how to do it, because neither my stated values nor my revealed preferences are identical to my own long-term values. So that problem is decidedly not in NP.
  (Variant of that problem: suppose an AI hands me a purported mathematical proof that this fridge in particular is the best option for me according to my own long-term values. Can I verify the proof’s correctness? Again, no, I have no idea how to do that, I don’t understand my own values well enough to distinguish a proof which makes correct assumptions about my values from one which makes incorrect assumptions.)
  A quite different example from Hindsight Isn’t ²⁰⁄₂₀: suppose our company has 100 workers, all working to produce a product. In order for the product to work, all 100 workers have to do their part correctly; if even just one of them messes up, then the whole product fails. And it’s an expensive one-shot sort of project; we don’t get to do end-to-end tests a billion times. I have been assigned to build the backup yellow connector widget, and I do my best. The product launches. It fails. Did I do my job correctly? No idea, even in hindsight; isolating which parts failed would itself be a large and expensive project. Forget writing down a boolean circuit in advance which is satisfiable if-and-only-if I did my job correctly; I can’t even write down a boolean circuit in hindsight which is satisfiable if-and-only-if I did my job correctly. I simply don’t have enough information to know.
  Another kind of example: I read a paper which claims that FoxO mediates the inflammatory response during cranial vault remodelling surgery. Can I easily write down a boolean circuit (possibly with some inputs from the paper) which is satisfiable if-and-only-if the paper’s result is basically correct? Sure, it could do some quick checks (look for p-hacking or incompetently made-up data, for example), but from the one paper I just don’t have enough information to reliably tell whether the result is basically correct.
  Another kind of example: suppose I’m building an app, and I outsource one part of it. The contractor sends me back a big chunk of C code. Can I verify that (a) the C code does what I want, and (b) the C code has no security holes? In principle, formal verification tools advertise both of those. In practice, expressing what I want in a formal verification language is as-much-or-more-work as writing the code would be (assuming that I understand what I want well enough to formally express it at all, which I often don’t). And even then, I’d expect someone who’s actually good at software security to be very suspicious of the assumptions made by the formal verifier.
  - Cole Wyeth 14 Jun 2024 13:02 UTC
    6 points
    4
    Parent
    I think the issues here are more conceptual than algorithmic.
    - tailcalled 16 Jun 2024 12:04 UTC
      6 points
      4
      Parent
      The conceptual vagueness certainly doesn’t help, but in general generation can be easier than validation because when generating you can stay within a subset of the domain that you understand well, whereas when verifying you may have to deal with all sorts of crazy inputs.
      - Algon 19 Jun 2024 22:11 UTC
        8 points
        4
        Parent
        generation can be easier than validation because when generating you can stay within a subset of the domain that you understand well, whereas when verifying you may have to deal with all sorts of crazy inputs.
        Attempted rephrasing: you control how you generate things, but not how others do, so verifying their generations can expose you to stuff you don’t know how to handle.
        
        Example:
        “Writing code yourself is often easier than validating someone else’s code”
        
        O O 19 Jun 2024 22:52 UTC
        4 points
        0
        Parent
        I think a more nuanced take is there is a subset of generated outputs that are hard to verify. This subset is split into two camps, one where you are unsure of the outputs correctness (and thus can reject/ask for an explanation). This isn’t too risky. The other camp is ones where you are sure but in reality overlook something. That’s the risky one.
        
        However at least my priors tell me that the latter is rare with a good reviewer. In a code review, if something is too hard to parse, a good reviewer will ask for an explanation or simplification. But bugs still slip by so it’s imperfect.
        
        The next question is whether the bugs that slip by in the output will be catastrophic. I don’t think it dooms the generation + verification pipeline if the system is designed to be error tolerant.
        Algon 20 Jun 2024 11:12 UTC
        5 points
        0
        Parent
        I’d like to try another analogy, which makes some potential problems for verifying output in alignment more legible.
        
        Imagine you’re a customer and ask a programmer to make you an app. You don’t really know what you want, so you give some vague design criteria. You ask the programmer how the app works, and they tell you, and after a lot of back and forth discussion, you verify this isn’t what you want. Do you know how to ask for what you want, now? Maybe, maybe not.
        
        Perhaps the design space you’re thinking of is small, perhaps you were confused in some simple way that the discussion resolved, perhaps the programmer worked with you earnestly to develop the design you’re really looking for, and pointed out all sorts of unknown unknowns. Perhaps.
        
        I think we could wind up in this position. The position of a non-expert verifying an experts’ output, with some confused and vague ideas about what we want from the experts. We won’t know the good questions to ask the expert, and will have to rely on the expert to help us. If ELK is easy, then that’s not a big issue. If it isn’t, then that seems like a big issue.
        rif a. saurous 23 Jun 2024 15:38 UTC
        14 points
        4
        Parent
        I feel like a lot of the difficulty here is a punning of the word “problem.”
        In complexity theory, when we talk about “problems”, we generally refer to a formal mathematical question that can be posed as a computational task. Maybe in these kinds of discussions we should start calling these problems_C (for “complexity”). There are plenty of problems_C that are (almost definitely) not in NP, like #SAT (“count the number of satisfying assignments of this Boolean formula”), and it’s generally believed that verification is hard for these problems. A problem_C like #SAT that is (believed to be) in #P but not NP will often have a short easy-to-understand algorithm that will be very slow (“try every assignment and count up the ones that satisfy the formula”).
        On the other hand, “suppose I am shopping for a new fridge, and I want to know which option is best for me (according to my own long-term values)” is a very different sort of beast. I agree it’s not in NP in that I can’t easily verify a solution, but the issue is that it’s not a problem_C, rather than it being a problem_C that’s (almost definitely) not in NP. With #SAT, I can easily describe how to solve the task using exponential amounts of compute; for “choose a refrigerator”, I can’t describe any computational process that will solve at all. If I could (for instance, if I could write down an evaluation function f : fridge → R (where f was computable in P)), then the problem would be not only in NP but in P (evaluate each fridge, pick the best one).
        So it’s not wrong to say that “choose a refrigerator” is not (known to be) in NP, but it’s important to foreground that that’s because the task isn’t written as a problem_C, rather than because it needs a lot of compute. So discussions about complexity classes and relative ease of generation and verification seem not especially relevant.
        I don’t think I’m saying anything non-obvious, but I also think I’m seeing a lot of discussions that don’t seem to fully internalize this?
- quetzal_rainbow 13 Jun 2024 20:48 UTC
  9 points
  1
  Parent
  PCP theorem states that problems with probabilistically checkable in polynomial time verifications contain NEXP problems, so, in some sense, there is a very large class of problems that can be “easily” verified.
  
  I think the whole “verification is easier than generation because of computational complexity theory” line of reasoning is misguided. The problem is not whether we have enough computing power to verify solution, it is that we have no idea how to verify solution.