One way to reject this case for HRAD work is by saying that imprecise theories of rationality are insufficient for helping to align AI systems. This is what Rohin does in this comment where he says imprecise theories cannot build things “2+ levels above”.
I should note that there are some things in world 1 that I wouldn’t reject this way—e.g. one of the examples of deconfusion is “anyhow, we could just unplug [the AGI].” That is directly talking about AGI safety, and so deconfusion on that point is “1 level away” from the systems we actually build, and isn’t subject to the critique. (And indeed, I think it is important and great that this statement has been deconfused!)
It is my impression though that current HRAD work is not “directly talking about AGI safety”, and is instead talking about things that are “further away”, to which I would apply the critique.
I should note that there are some things in world 1 that I wouldn’t reject this way—e.g. one of the examples of deconfusion is “anyhow, we could just unplug [the AGI].” That is directly talking about AGI safety, and so deconfusion on that point is “1 level away” from the systems we actually build, and isn’t subject to the critique. (And indeed, I think it is important and great that this statement has been deconfused!)
It is my impression though that current HRAD work is not “directly talking about AGI safety”, and is instead talking about things that are “further away”, to which I would apply the critique.