For the deontic feasibility hypothesis, do you have any expectation for whether formalizing the moral desiderata (specifically: «boundaries») will or should ultimately be done by 1) humans; or 2) automated AI alignment assistants? @davidad
The formal desiderata should be understood, reviewed, discussed, and signed-off on by multiple humans. However, I don’t have a strong view against the use of Copilot-style AI assistants. These will certainly be extremely useful in the world-modeling phase, and I suspect will probably also be worth using in the specification phase. I do have a strong view that we should have automated red-teamers try to find holes in the desiderata.
For the deontic feasibility hypothesis, do you have any expectation for whether formalizing the moral desiderata (specifically: «boundaries») will or should ultimately be done by 1) humans; or 2) automated AI alignment assistants? @davidad
The formal desiderata should be understood, reviewed, discussed, and signed-off on by multiple humans. However, I don’t have a strong view against the use of Copilot-style AI assistants. These will certainly be extremely useful in the world-modeling phase, and I suspect will probably also be worth using in the specification phase. I do have a strong view that we should have automated red-teamers try to find holes in the desiderata.