There’s the additional objection of “if you’re doing this, why not just have the AI ask HCH what to do?”
Overall, I’m hoping that it could be easier for an AI to robustly conclude that a certain plan only changes a human’s HCH via certain informational content, than for the AI to reliably calculate the human’s HCH. But I don’t have strong arguments for this intuition.
There’s the additional objection of “if you’re doing this, why not just have the AI ask HCH what to do?”
Overall, I’m hoping that it could be easier for an AI to robustly conclude that a certain plan only changes a human’s HCH via certain informational content, than for the AI to reliably calculate the human’s HCH. But I don’t have strong arguments for this intuition.
“Having a well-calibrated estimate of HCH” is the condition you want, not “being able to reliably calculate HCH”.
I should have said “reliably estimate HCH”; I’d also want quite a lot of precision in addition to calibration before I trust it.