johnswentworth comments on Alignment as Translation

johnswentworth 28 Mar 2020 23:50 UTC
2 points
I’m pretty sure none of this actually affects what I said: the low-level behavior still needs produce results which are predictable to humans in order for predictability to be useful, and that’s still hard.
The problem is that making an AI predictable to a human is hard. This is true regardless of whether or not it’s doing any outside-the-box thinking. Having a human double-check the instructions given to a fast low-level AI does not make the problem any easier; the low-level AI’s behavior still has to be understood by a human in order for that to be useful.
As you say toward the end, you’d need something like a human-readable communications protocol. That brings us right back to the original problem: it’s hard to translate between humans’ high-level abstractions and low-level structure. That’s why AI is unpredictable to humans in the first place.
- TAG 30 Mar 2020 17:55 UTC
  1 point
  Parent
  If you know in general that a low level AI will follow the rule si has been given, you don’t need to keep re-checking.
  - johnswentworth 30 Mar 2020 20:11 UTC
    2 points
    Parent
    The rules it’s given are, presumably, at a low level themselves. (Even if that’s not the case, the rules it’s given are definitely not human-intelligible unless we’ve already solved the translation problem in full.)
    The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time. The problem is accurately translating between human-intelligible structure and the rules given to the AI.
    The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
    - TAG 8 Apr 2020 20:51 UTC
      1 point
      Parent
      
      The rules it’s given are, presumably, at a low level themselves.
      
      The rules that the low level AI runs on could be medium level. There is no point in giving it very low level rules, since its job is to fill in the details. But the point is that I am stipulating that the rules should be high level enough to be human-readable.
      
      The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time.
      
      But the world hasn’t ended. A python interpreter doesn’t do surprisingly intelligent things, because it is not intelligent.
      
      The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
      
      In your framing of the problem , you create one superpowerful AI that has to be programmed perfectly, which is impossible. In my solution, you reduce the problem to more manageable chunks. My solution is already partially implemented.
      - johnswentworth 8 Apr 2020 23:25 UTC
        2 points
        Parent
        But the point is that I am stipulating that the rules should be high level enough to be human-readable.
        If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard. That’s basically the whole alignment problem. If an AI is doing that translation, then writing/training that AI is as hard as the whole alignment problem.
        A python interpreter doesn’t do surprisingly intelligent things, because it is not intelligent.
        If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
        In your framing of the problem , you create one superpowerful AI that has to be programmed perfectly, which is impossible.
        I’m not sure what gave you that impression, but I definitely do not intend to assume any of that.
        TAG 9 Apr 2020 10:30 UTC
        1 point
        Parent
        
        If the rules are high level enough to be human readable, then translating them into something a computer can run while still maintaining the original intent is hard.
        
        It’s not harder than AGI, because NL is a central part of AGI.
        
        That’s basically the whole alignment problem.
        
        No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
        
        If a system is doing large, fast, irreversible things, then it does not matter whether those things are surprisingly intelligent. If they’re surprising, then that’s sufficient for it to be a problem.
        
        Yes, but that’s a problem we already have, with solutions we already have. For instance, high frequency trading systems can be shut down [automatically] if the market moves too much.
        
        johnswentworth 9 Apr 2020 17:18 UTC
        4 points
        Parent
        Yes, but that’s a problem we already have, with solutions we already have.
        It is a problem we already have, but the solutions we already have are all based on the assumption that either (a) we know in advance what kind of problems can happen, or (b) the problem doesn’t kill us all in one shot. For instance, in your HFT system shutdown example, we already know that “market moves too much” is something which makes a lot of HFT systems not work very well. But how did we learn that? Either we had a prior idea of what problems could happen (implying some transparency of the system), or the problem happened at least once and we learned from that (implying it didn’t kill us the first time—see e.g. Knight capital).
        With AI, it’s the same old problem, but on hard mode (i.e. the system is very opaque) and high stakes (i.e. we don’t necessarily the survive the first big mistake). That’s exactly the sort of scenario where our current solutions do not work.
        It’s not harder than AGI, because NL is a central part of AGI.
        NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
        No it isn’t. You can have systems that do what they are told without having any notion of values and preferences. The higher level systems need goals because they are defining strategy,but only the higher level ones.
        I’m not even convinced that higher-level systems necessarily need goals. Pure goal-free tool AI is one possible path; the OP was written to be agnostic to such considerations.
        Indeed, that’s a big part of why I say translation is the central piece of the alignment problem: it’s the piece that’s agnostic. It’s the piece that has to be there, in every scheme, under a wide range of assumptions about how the world works. Tool AI? Still needs to solve the translation problem in order to be safe and useful, even without any notion of values or preferences. Utility-maximizing AI? Needs to solve the translation problem in order to be safe and useful. Hierarchical scheme? Translation still needs to be handled somewhere in order to be safe and useful. Humans-consulting-humans or variations thereof? Full system needs to solve the translation problem in order to be safe and useful. Etc.
        Vaniver 9 Apr 2020 17:24 UTC
        4 points
        Parent
        NL? I’m not familiar with this acronym. Also I said it’s as hard as alignment, not as hard as AGI, in case that’s relevant.
        Presumably “natural language”, which often gets called NLP for “natural language processing” in AI.
        I think the right response there is something like “suppose you have an AGI that can understand what a human means as well as another human does; now you still have all the difficulty of interpretation that makes law a complicated and contentious field.” It’d be nice to be able to write a Constitution and recognize it after the AI has thought about it while having adversarial pressure on how to interpret it for 300 years, for example.