Chipmonk comments on Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Chipmonk 14 Jul 2023 23:28 UTC
1 point
0
AF
For the deontic feasibility hypothesis, do you have any expectation for whether formalizing the moral desiderata (specifically: «boundaries») will or should ultimately be done by 1) humans; or 2) automated AI alignment assistants? @davidad
- davidad 22 Jul 2023 18:05 UTC
  LW: 3 AF: 1
  0
  AF Parent
  The formal desiderata should be understood, reviewed, discussed, and signed-off on by multiple humans. However, I don’t have a strong view against the use of Copilot-style AI assistants. These will certainly be extremely useful in the world-modeling phase, and I suspect will probably also be worth using in the specification phase. I do have a strong view that we should have automated red-teamers try to find holes in the desiderata.