Elliot Callender comments on My AI Model Delta Compared To Christiano

Elliot Callender 12 Jun 2024 18:38 UTC
20 points
5
I think it depends on which domain you’re delegating in. E.g. physical objects, especially complex systems like an AC unit, are plausibly much harder to validate than a mathematical proof.
In that vein, I wonder if requiring the AI to construct a validation proof would be feasible for alignment delegation? In that case, I’d expect us to find more use and safety from [ETA: delegation of] theoretical work than empirical.
- Davidmanheim 13 Jun 2024 5:40 UTC
  6 points
  2
  Parent
  That seems a lot like Davidad’s alignment research agenda.