That said, it’s hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like “be nice” or “produce paperclips”. It’d be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives.
(Also I’ll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the ‘you can’t just do ethical injunctions’ problem. That said, humans are able to do moral reasoning somehow, so it can’t be crazy difficult.)
That said, it’s hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like “be nice” or “produce paperclips”. It’d be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives.
(Also I’ll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the ‘you can’t just do ethical injunctions’ problem. That said, humans are able to do moral reasoning somehow, so it can’t be crazy difficult.)