Roman Leventov comments on Some Thoughts on Virtue Ethics for AIs

Roman Leventov 3 May 2023 10:44 UTC
2 points
0
In the language of generative models, “praxis” correspond to cognitive and “action” disciplines, from rationality (the discipline/praxis of rational reasoning), epistemology, and ethics to dancing and pottery. The generative model (Active Inference) frame and the shard theory frames thus seem to be in agreement that disciplinary alignment (“virtue ethics”) is more important (fundamental, robust) than “deontology” and “consequentialism” alignment, which roughly correspond to goal alignment and prediction (“future fact”) alignment, respectively. The generative model frame treats goal alignment and prediction alignment downstream of disciplinary (a.k.a. generative model, praxis) alignment. Thus the former are largely ineffectual or futile to align without a more fundamental latter type of alignment.
It’s also worth noting that the concept of “cognitive discipline” is vague, cognitive disciplines are not dis-intangible from each other when we look at the actual behaviour/cognition/computation/action/generative model. We can look at the whole behaviour/cognition and say “It was rational”, “It was ethical”, or “It exhibited good praxis of science (i.e., epistemology)”, but we probably cannot say “This exact elementary operation was an execution of rationality, and this next elementary operation was an execution of ethics”. So “disciplines” that I talk about above are indeed more like “properties” of behaviour/cognition, and the invocation of “ratings” to assess properties/disciplines and their alignment with the corresponding properties/disciplines of behaviour/cognition is probably right (or, perhaps more informational language feedback on these properties should be used).
Speaking of concrete examples of praxis that you give, corrigibility, transparency, and niceness, I think corrigibility is a confused concept that is not achievable nor desirable in practice (or achievable in the form that makes the name ‘corrigibility’ confusing and not reflecting the nature of this realised property). Persuadability seems like a more coherent thing to want in the intelligences we wish to interact with. Transparency and niceness sound like sub-properties of good communication praxis, and I’m also not sure we “want” them a priori, it could be more nuanced (transparent and nice with a friend and closed and hostile to a foe, the art of distinguishing friend from foe, etc. See cooperative RL, multi-agent common sense.)
What links here?
- Roman Leventov's comment on An open letter to SERI MATS program organisers by Roman Leventov (4 May 2023 10:00 UTC; 4 points)