Stuart_Armstrong comments on Thoughts on Human Models

Stuart_Armstrong 5 Mar 2019 19:42 UTC
LW: 12 AF: 5
AF

Some existing work that does not rely on human modelling includes the formulation of safely interruptible agents, the formulation of impact measures (or side effects), approaches involving building AI systems with clear formal specifications (e.g., some versions of tool AIs), some versions of oracle AIs, and boxing/containment.

Most of these require at least partial specification of human preferences, hence partial modelling of humans: https://www.lesswrong.com/posts/sEqu6jMgnHG2fvaoQ/partial-preferences-needed-partial-preferences-sufficient