Ben Pace comments on The Steering Problem

Ben Pace 1 Dec 2019 22:50 UTC
LW: 5 AF: 3
AF
Reading this post was the first time I felt I understood what Paul’s (and many others’) research was motivated by. I think about it regularly, and it comes up in conversation a fair bit.
- Rafael Harth 12 Sep 2020 16:58 UTC
  2 points
  Parent
  Given that you liked it, can you explain to me why? After having already read the sequence (much of it twice), I’m pretty confused about the structure in the first section. I don’t see the point of introducing the Steering Problem at all. It’s similar to Intent Alignment, but not exactly the same (since it makes stronger assumptions about the nature of the AI) -- and the rest of the sequence (and IDA in general) seems to be trying to solve intent alignment, not the steering problem. It’s listed under ‘motivation’ but I don’t really get aspect, either. I don’t know how I’m supposed to connect it to the rest of the sequence.
  - Ben Pace 13 Sep 2020 23:47 UTC
    6 points
    Parent
    Things this post does:
    Explain what problem civilization is likely to face with the development of AI (i.e. we’ll be able to build AIs that solve well-defined tasks, but we don’t know how to well define the task of being ‘useful’).
    Connect this problem to a clear and simple open problem, which is clear on its assumptions and helps me understand the hard parts of fixing the above problem regarding AI development.
    Provides argument for the important and urgency of working on this sort of technical research.
    This gives real-world motivation and a direct problem to be solved to help the real-world problem, and engages concretely with the bigger picture in a way other posts in the sequence don’t (e.g. “Clarifying AI Alignment” is a clarification and doesn’t explicitly motivate why the problem is important).
    In general, especially at the time when these posts were published, when I’d read them I felt like I understood a particular detail very clearly, but I did not understand the bigger picture of why that particular detail was interesting, or why those particular assumptions should be salient, and this post helped me understand it a great deal.
    I’m not sure if the above still helped? Does it make sense what experience I had reading the post? I’m also interested if other posts feel to you like they clearly motivate iterated amplification.
    - Rafael Harth 14 Sep 2020 18:18 UTC
      2 points
      Parent
      The motivation seems trivial to me, which might be part of the problem. Most training procedures are obviously outer-misaligned, so if we have one that may plausibly be outer-aligned (and might plausibly scale), that seems like an obvious reason to take it seriously. I’ve felt like I totally got that once I first understood what IDA is trying to do.
      
      Does it make sense what experience I had reading the post?
      
      It does, but it still leaves me with the problem that it doesn’t seem to be connected to the remaining sequence. IDA isn’t about how we take an already trained system with human-level performance and use it for good things; it’s about how we train a system from the ground up.
      
      The real problem may be that I expect a post like this to closely tie into the actual scheme, so when it doesn’t I take it as evidence that I’ve misunderstood something. What this post talks about may just not be intended to be [the problem the remaining sequence is trying to solve].
      - Ben Pace 14 Sep 2020 18:44 UTC
        4 points
        Parent
        The motivation seems trivial to me, which might be part of the proble
        Yeah, for a long time many people have been very confused about the motivation of Paul’s research, so I don’t think you’re typical in this regard. I think that due to this sequence, Alex Zhu’s FAQ, lots of writing by Evan, Paul’s post “What Failure Looks Like”, and more posts on LW by others, a many more people understand Paul’s work on a basic level that was not at all the case 3 years ago. Like, you say “Most training procedures are obviously outer-misaligned”, and ‘outer-alignment’ was not a concept with a name or a write-up at the time this sequence was published.
        I agree that this post talks about assuming human-level performance, whereas much of iterated amplification also relaxes that assumption. My sense is that if someone were to just read this sequence, it would still help them focus on the brunt of the problem, that being ‘helpful’ or ‘useful’ is not well-defined in the way many other tasks are, and help realize the urgency of this task and why it’s possible to make progress now.
        Rafael Harth 14 Sep 2020 21:55 UTC
        4 points
        Parent
        That makes me feel a bit like the student who thinks they can debate the professor after having researched the field for 5 minutes.
        But it would actually be a really good sign if the area has become more accessible.
        Ben Pace 14 Sep 2020 23:26 UTC
        2 points
        Parent
        Hah!
        Yeah, I think we’ve made substantial progress in the last couple of years.