myyycroft comments on Orthogonal’s Formal-Goal Alignment theory of change

myyycroft 14 Nov 2024 9:40 UTC
9 points
0
I endorse alignment proposals which aim to be formally grounded; however, I’d like to know some concrete ideas on how to handle the common hard subproblems.
In the beginning of the post, you say that you want to 1) build a formal goal which leads to good worlds when pursued and 2) design an AI which pursues this goal.
- It seems to me that 1) includes some form of value learning (since we speak about good worlds). Can you give a high-level overview on how concretely you plan to deal with complexity and fragility of value?
- Now suppose 1) is solved. Can you give a high-level overview on how do you plan to design the AI? In particular, how to make it aimable?