johnswentworth comments on Scalable oversight as a quantitative rather than qualitative problem

johnswentworth 6 Jul 2024 20:13 UTC
LW: 11 AF: 7
2
AF
Based on this example and your other comment, it sounds like the intended claim of the post could be expressed as:
I think that this is indeed part of the value proposition for scalable oversight. But in my opinion, it’s missing the more central application of these techniques: situations where the AIs ~~are taking many actions~~ solving many subproblems, where humans would eventually understand ~~any particular action~~ any particular subproblem and its solution if they spent a whole lot of time investigating it, but where that amount of time taken to oversee any ~~action~~ subproblem is prohibitively large. In such cases, the point of scalable oversight is to allow them to oversee ~~actions~~ subproblems at a much lower cost in terms of human time—to push out the Pareto frontier of oversight quality vs cost.
Does that accurately express the intended message?
- Buck 6 Jul 2024 20:41 UTC
  LW: 7 AF: 6
  0
  AF Parent
  No, because I’m not trying to say that the humans understand the subproblem well enough to e.g. know what the best answer to it is, I’m trying to say that they understand the subproblem well enough to know how good an answer the answer provided was.
  - johnswentworth 6 Jul 2024 21:35 UTC
    LW: 17 AF: 8
    2
    AF Parent
    I wasn’t imagining that the human knew the best answer to any given subproblem, but nonetheless that did flesh out a lot more of what it means (under your mental model) for a human to “understand a subproblem”, so that was useful.
    I’ll try again:
    I think that this is indeed part of the value proposition for scalable oversight. But in my opinion, it’s missing the more central application of these techniques: situations where the AIs ~~are taking many actions~~ solving many subproblems, where humans would eventually understand ~~any particular action~~ how well the AI’s plan/action solves any particular subproblem if they spent a whole lot of time investigating it, but where that amount of time taken to oversee any ~~action~~ subproblem is prohibitively large. In such cases, the point of scalable oversight is to allow them to oversee ~~actions~~ subproblems at a much lower cost in terms of human time—to push out the Pareto frontier of oversight quality vs cost.
    (… and presumably an unstated piece here is that “understanding how well the AI’s plan/action solves a particular subproblem” might include recursive steps like “here’s a sub-sub-problem, assume the AI’s actions do a decent job solving that one”, where the human might not actually check the sub-sub-problem.)
    Does that accurately express the intended message?
    - Buck 7 Jul 2024 16:11 UTC
      LW: 11 AF: 10
      1
      AF Parent
      Sort of; I don’t totally understand why you want to phrase things in terms of subproblems instead of actions but I think it’s probably equivalent to do so, except that it’s pretty weird to describe an AI as only solving “subproblems”. Like, I think it’s kind of unnatural to describe ChatGPT as “solving many subproblems”; in some sense I guess you can think of all its answers as solutions to subproblems of the “be a good product” problem, but I don’t think that’s a very helpful frame.