This does work, but I think in this case the filter is actually doing the right thing. ChatGPT can’t actually cite sources (there were citations in its training set but it didn’t exactly memorize them); if it tries, it winds up making correctly-formatted citations to papers that don’t exist. The filter is detecting (in this case, accurately) that the output is going to be junk, and that an apology would be a better result.
This does work, but I think in this case the filter is actually doing the right thing. ChatGPT can’t actually cite sources (there were citations in its training set but it didn’t exactly memorize them); if it tries, it winds up making correctly-formatted citations to papers that don’t exist. The filter is detecting (in this case, accurately) that the output is going to be junk, and that an apology would be a better result.