I endorse alignment proposals which aim to be formally grounded; however, I’d like to know some concrete ideas on how to handle the common hard subproblems.
In the beginning of the post, you say that you want to 1) build a formal goal which leads to good worlds when pursued and 2) design an AI which pursues this goal.
It seems to me that 1) includes some form of value learning (since we speak about good worlds). Can you give a high-level overview on how concretely you plan to deal with complexity and fragility of value?
Now suppose 1) is solved. Can you give a high-level overview on how do you plan to design the AI? In particular, how to make it aimable?
I endorse alignment proposals which aim to be formally grounded; however, I’d like to know some concrete ideas on how to handle the common hard subproblems.
In the beginning of the post, you say that you want to 1) build a formal goal which leads to good worlds when pursued and 2) design an AI which pursues this goal.
It seems to me that 1) includes some form of value learning (since we speak about good worlds). Can you give a high-level overview on how concretely you plan to deal with complexity and fragility of value?
Now suppose 1) is solved. Can you give a high-level overview on how do you plan to design the AI? In particular, how to make it aimable?