johnswentworth comments on Why Not Just Outsource Alignment Research To An AI?

johnswentworth 10 Mar 2023 16:38 UTC
LW: 7 AF: 5
2
AF
I think the missing piece here is that people who want to outsource the solving of alignment to AIs are usually trying to avoid engaging with the hard problems of alignment themselves. So the key difference is that, in B, the people outsourcing usually haven’t attempted to understand the problem very deeply.
- HoldenKarnofsky 18 Mar 2023 5:18 UTC
  LW: 10 AF: 8
  7
  AF Parent
  I don’t agree with this characterization, at least for myself. I think people should be doing object-level alignment research now, partly (maybe mostly?) to be in better position to automate it later. I expect alignment researchers to be central to automation attempts.
  It seems to me like the basic equation is something like: “If today’s alignment researchers would be able to succeed given a lot more time, then they also are reasonably likely to succeed given access to a lot of human-level-ish AIs.” There are reasons this could fail (perhaps future alignment research will require major adaptations and different skills such that today’s top alignment researchers will be unable to assess it; perhaps there are parallelization issues, though AIs can give significant serial speedup), but the argument in this post seems far from a knockdown.
  Also, it seems worth noting that non-experts work productively with experts all the time. There are lots of shortcomings and failure modes, but the video is a parody.
  - johnswentworth 18 Mar 2023 16:49 UTC
    LW: 4 AF: 3
    2
    AF Parent
    I don’t agree with this characterization, at least for myself. I think people should be doing object-level alignment research now, partly (maybe mostly?) to be in better position to automate it later.
    Indeed, I think you’re a good role model in this regard and hope more people will follow your example.
- James Payor 10 Mar 2023 22:17 UTC
  LW: 5 AF: 3
  2
  AF Parent
  Also Plan B is currently being used to justify accelerating various danger tech by folks with no solid angles on Plan A...