You use RLHF to try and avoid damaging answers to certain questions. In doing so, you necessarily reinforce against accurate maps and logical consistency in the general case, unless you do something highly bespoke to prevent this from happening.
This is a great application of the problem outlined by the Parable of Lightning. Including the “unless you do something highly bespoke” such as cities of academics.
This is a great application of the problem outlined by the Parable of Lightning. Including the “unless you do something highly bespoke” such as cities of academics.