Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 6 May 2023 15:22 UTC
LW: 2 AF: 1
0
AF
I agree that those are useful pursuits.
I still disagree but it no longer seems internally inconsistent
Mind gesturing at your disagreements? Not necessarily to argue them, just interested in the viewpoint.
- Rohin Shah 6 May 2023 15:47 UTC
  LW: 7 AF: 5
  1
  AF Parent
  Oh, I disagree with your core thesis that the general intelligence property is binary. (Which then translates into disagreements throughout the rest of the post.) But experience has taught me that this disagreement tends to be pretty intractable to talk through, and so I now try just to understand the position I don’t agree with, so that I can notice if its predictions start coming true.
  You mention universality, active adaptability and goal-directedness. I do think universality is binary, but I expect there are fairly continuous trends in some underlying latent variables (e.g. “complexity and generality of the learned heuristics”), and “becoming universal” occurs when these fairly continuous trends exceed some threshold. For similar reasons I think active adaptability and goal-directedness will likely increase continuously, rather than being binary.
  You might think that since I agree universality is binary that alone is enough to drive agreement with other points, but:
  1. I don’t expect a discontinuous jump at the point you hit the universality property (because of the continuous trends), and I think it’s plausible that current LLMs already have the capabilities to be “universal”. I’m sure this depends on how you operationalize universality, I haven’t thought about it carefully.
  2. I don’t think that the problems significantly change character after you pass the universality threshold, and so I think you are able to iterate prior to passing it.
  - Thane Ruthenis 6 May 2023 16:08 UTC
    LW: 4 AF: 3
    0
    AF Parent
    Interesting, thanks.
    I don’t expect a discontinuous jump at the point you hit the universality property
    Agreed that this point (universality leads to discontinuity) probably needs to be hashed out more. Roughly, my view is that universality allows the system to become self-sustaining. Prior to universality, it can’t autonomously adapt to novel environments (including abstract environments, e. g. new fields of science). Its heuristics have to be refined by some external ground-truth signals, like trial-and-error experimentation or model-based policy gradients. But once the system can construct and work with self-made abstract objects, it can autonomously build chains of them — and that causes a shift in the architecture and internal dynamics, because now its primary method of cognition is iterating on self-derived abstraction chains, instead of using hard-coded heuristics/modules.
    - Rohin Shah 6 May 2023 16:22 UTC
      LW: 4 AF: 3
      0
      AF Parent
      I agree that there’s a threshold for “can meaningfully build and chain novel abstractions” and this can lead to a positive feedback loop that was not previously present, but there will already be lots of positive feedback loops (such as “AI research → better AI → better assistance for human researchers → AI research”) and it’s not clear why to expect the new feedback loop to be much more powerful than the existing ones.
      (Aside: we’re now talking about a discontinuity in the gradient of capabilities rather than of capabilities themselves, but sufficiently large discontinuities in the gradient of capabilities have much of the same implications.)
      - Thane Ruthenis 6 May 2023 16:56 UTC
        LW: 4 AF: 3
        0
        AF Parent
        it’s not clear why to expect the new feedback loop to be much more powerful than the existing ones
        Yeah, the argument here would rely on the assumption that e. g. the extant scientific data already uniquely constraint some novel laws of physics/engineering paradigms/psychological manipulation techniques/etc., and we would be eventually able to figure them out even if science froze right this moment. In this case, the new feedback loop would be faster because superintelligent cognition would be faster than real-life experiments.
        And I think there’s a decent amount of evidence for this. Consider that there are already narrow AIs that can solve protein folding more efficiently than our best manually-derived algorithms — which suggests that better algorithms are already uniquely constrained by the extant data, and we’ve just been unable to find them. Same may be true for all other domains of science — and thus, a superintelligence iterating on its own cognition would be able to outspeed human science.