Isn’t the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).
This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.
Isn’t the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).
This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.