As I say here https://x.com/boazbaraktcs/status/1870369979369128314
Constitutional AI is a great work but Deliberative Alignment is fundamentally different. The difference is basically system 1 vs system 2. In RLAIF ultimately the generative model that answers user prompt is trained with (prompt, good response, bad response). Even if the good and bad responses were generated based on some constitution, the generative model is not taught the text of this constitution, and most importantly how to reason about this text in the context of a particular example.
This ability to reason is crucial to OOD performance such as training only on English and generalizing to other languages or encoded output.
See also https://x.com/boazbaraktcs/status/1870285696998817958
Also the thing I am most excited about deliberative alignment is that it becomes better as models are more capable. o1 is already more robust than o1 preview and I fully expect this to continue.
(P.s. apologies in advance if I’m unable to keep up with comments; popped from holiday to post on the DA paper.)