An AI has the objective function you set, not the objective function full of caveats and details that lives in your head, or that you would come up with on reflection.
With a chatbot making preference decisions based on labeling instructions (as in Constitutional AI or online DPO), the decisions they make actually are full of caveats and details that live in the chatbot’s model and likely fit what a human would intend, though meaningful reflection is not currently possible.
With a chatbot making preference decisions based on labeling instructions (as in Constitutional AI or online DPO), the decisions they make actually are full of caveats and details that live in the chatbot’s model and likely fit what a human would intend, though meaningful reflection is not currently possible.