Stuart_Armstrong comments on Simplified preferences needed; simplified preferences sufficient

Stuart_Armstrong 21 Mar 2019 10:38 UTC
LW: 3 AF: 2
AF
The counter-examples are of that type because the examples are often of that type—presented formally, so vulnerable to a formal solution.

If you’re saying that ” $- 10^{10^{10}}$ utility on something like turning on a yellow light” is not a reasonable utility function, then I agree with you, and that’s the very point of this post—we need to define what a “reasonable” utility function is, at least to some extent (“partial preferences...”), to get anywhere with these ideas.
- Rohin Shah 21 Mar 2019 20:18 UTC
  LW: 4 AF: 2
  AF Parent
  The counter-examples are of that type because the examples are often of that type—presented formally, so vulnerable to a formal solution.
  It does not seem to me the cluster of concepts in corrigibility, Clarifying AI Alignment, and my comment on it are presented formally. They feel very, very informal (to the point that I think we should try to make them more formal, though I’m not optimistic about getting them to the level of formality you typically use).
  (I still need to get a handle on ascription universality, which might be making these concepts more formal, but from what I understand of it so far it’s still much less formal than you usually work with.)
  we need to define what a “reasonable” utility function
  My argument is that we don’t need to define this formally; we can reason about it informally and still get justified confidence that we will get good outcomes, though not justified confidence in < 1-in-a-billion chance of failure.