API requests should be automatically screened for human intent, and requests judged by the model to be disrespectfwl should be denied. (And they shouldn’t be trained to agree to respond to everything.)
I appreciate the post, but I also wish to hear more detailed and realistic scenarios of exactly how we might end up accidentally (or intentionally) sleepwalk into a moral catastrophe. I think it’s unlikely that punishment walls will make AIs more productive, but similar things may profitable/popular if advertised for human (sadist) entertainment.
API requests should be automatically screened for human intent, and requests judged by the model to be disrespectfwl should be denied. (And they shouldn’t be trained to agree to respond to everything.)
I appreciate the post, but I also wish to hear more detailed and realistic scenarios of exactly how we might end up accidentally (or intentionally) sleepwalk into a moral catastrophe. I think it’s unlikely that punishment walls will make AIs more productive, but similar things may profitable/popular if advertised for human (sadist) entertainment.
Atypical keyboard layout causing this typo, or stylistic choice?