I am a data scientist and full stack developer that veered off the mathematics academic track during my postdoc when I discovered machine learning in 2012.
I am currently leading a data team for a startup specialty lines insurance company based in Juno Beach, FL.
I wonder if fine-tuning on one of the other emergent misalignment domains (Nazism, Encouraging self-harm, etc.) would result in emergent insecure code. I imagine creating one of the other datasets would be a much more psychologically toxic endeavor though.