domenicrosati comments on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

domenicrosati 10 Mar 2023 11:39 UTC
2 points
0
If someone did this—it would be nice to collect preference data over answers that are helpful to alignment and not helpful to alignment… that could be a dataset that is interesting for a variety of reasons like analyzing current models abilities to help with alignment, gaps in being helpful w.r.t alignment and of course providing a mechanism for making models better at alignment… a model like this could also maybe work as a specialized type of Constitutional AI to collect feedback from the models preferences that are more “alignment-aware” so to speak… none of this of course is a solution to alignment as the OP points out but interesting nonetheless.

I’d be interested in participating in this project if other folks set something up…