I found this post a bit odd, in that I was assuming the context was comparing
“Plan A: Humans solve alignment” -versus-
“Plan B: Humans outsource the solving of alignment to AIs”
If that’s the context, you can say “Plan B is a bad plan because humans are too incompetent to know what they’re looking for, or recognize a good idea when they see it, etc.”. OK sure, maybe that’s true. But if it’s true, then both plans are doomed! It’s not an argument to do Plan A, right?
To be clear, I don’t actually care much, because I already thought that Plan A was better than Plan B anyway (for kinda different reasons from you—see here).
I think the missing piece here is that people who want to outsource the solving of alignment to AIs are usually trying to avoid engaging with the hard problems of alignment themselves. So the key difference is that, in B, the people outsourcing usually haven’t attempted to understand the problem very deeply.
I don’t agree with this characterization, at least for myself. I think people should be doing object-level alignment research now, partly (maybe mostly?) to be in better position to automate it later. I expect alignment researchers to be central to automation attempts.
It seems to me like the basic equation is something like: “If today’s alignment researchers would be able to succeed given a lot more time, then they also are reasonably likely to succeed given access to a lot of human-level-ish AIs.” There are reasons this could fail (perhaps future alignment research will require major adaptations and different skills such that today’s top alignment researchers will be unable to assess it; perhaps there are parallelization issues, though AIs can give significant serial speedup), but the argument in this post seems far from a knockdown.
Also, it seems worth noting that non-experts work productively with experts all the time. There are lots of shortcomings and failure modes, but the video is a parody.
I don’t agree with this characterization, at least for myself. I think people should be doing object-level alignment research now, partly (maybe mostly?) to be in better position to automate it later.
Indeed, I think you’re a good role model in this regard and hope more people will follow your example.
I found this post a bit odd, in that I was assuming the context was comparing
“Plan A: Humans solve alignment” -versus-
“Plan B: Humans outsource the solving of alignment to AIs”
If that’s the context, you can say “Plan B is a bad plan because humans are too incompetent to know what they’re looking for, or recognize a good idea when they see it, etc.”. OK sure, maybe that’s true. But if it’s true, then both plans are doomed! It’s not an argument to do Plan A, right?
To be clear, I don’t actually care much, because I already thought that Plan A was better than Plan B anyway (for kinda different reasons from you—see here).
I think the missing piece here is that people who want to outsource the solving of alignment to AIs are usually trying to avoid engaging with the hard problems of alignment themselves. So the key difference is that, in B, the people outsourcing usually haven’t attempted to understand the problem very deeply.
I don’t agree with this characterization, at least for myself. I think people should be doing object-level alignment research now, partly (maybe mostly?) to be in better position to automate it later. I expect alignment researchers to be central to automation attempts.
It seems to me like the basic equation is something like: “If today’s alignment researchers would be able to succeed given a lot more time, then they also are reasonably likely to succeed given access to a lot of human-level-ish AIs.” There are reasons this could fail (perhaps future alignment research will require major adaptations and different skills such that today’s top alignment researchers will be unable to assess it; perhaps there are parallelization issues, though AIs can give significant serial speedup), but the argument in this post seems far from a knockdown.
Also, it seems worth noting that non-experts work productively with experts all the time. There are lots of shortcomings and failure modes, but the video is a parody.
Indeed, I think you’re a good role model in this regard and hope more people will follow your example.
Also Plan B is currently being used to justify accelerating various danger tech by folks with no solid angles on Plan A...