I think the important factors w.r.t. risks re [morally relevant disvalue that occurs during inference in ML models] are probably more like:
The training algorithm. Unsupervised learning seems less risky than model-free RL (e.g. the RLHF approach currently used by OpenAI maybe?); the latter seems much more similar, in a relevant sense, to the natural evolution process that created us.
The architecture of the model.
Being polite to GPT-n is probably not directly helpful (though it can be helpful by causing humans to care more about this topic). A user can be super polite to a text generating model, and the model (yielded by model-free RL) can still experience disvalue, particularly during an ‘impossible inference’, one in which the input text (the “environment”) is bad in the sense that there is obviously no way to complete the text in a “good” way.
I think the important factors w.r.t. risks re [morally relevant disvalue that occurs during inference in ML models] are probably more like:
The training algorithm. Unsupervised learning seems less risky than model-free RL (e.g. the RLHF approach currently used by OpenAI maybe?); the latter seems much more similar, in a relevant sense, to the natural evolution process that created us.
The architecture of the model.
Being polite to GPT-n is probably not directly helpful (though it can be helpful by causing humans to care more about this topic). A user can be super polite to a text generating model, and the model (yielded by model-free RL) can still experience disvalue, particularly during an ‘impossible inference’, one in which the input text (the “environment”) is bad in the sense that there is obviously no way to complete the text in a “good” way.
See also: this paper by Brian Tomasik.