Vaniver comments on Thoughts on “Human-Compatible”

Vaniver 10 Oct 2019 23:48 UTC
LW: 5 AF: 3
AF
yes, but its underlying model is still accurate, even if it doesn’t reveal that to us?
This depends on whether it thinks we would approve more of it having an accurate model and deceiving us or having an inaccurate model in the way we want its model to be less accurate. Some algorithmic bias work is of the form “the system shouldn’t take in inputs X, or draw conclusions Y, because that violates a deontological rule, and simple accuracy-maximization doesn’t incentivize following that rule.”
My point is something like “the genius of approval-directed agency is that it grounds out every meta-level in ‘approval,’ but this is also (potentially) the drawback of approval-directed agency.” Specifically, for any potentially good property the system might have (like epistemic accuracy) you need to check whether that actually in-all-cases for-all-users maximizes approval, because if it doesn’t, then the approval-directed agent is incentivized to not have that property.
[The deeper philosophical question here is something like “does ethics backchain or forwardchain?”, as we’re either grounding things out in what will believe or what we believe now, and approval-direction is more the latter, and CEV-like things are more the former.]
- TurnTrout 11 Oct 2019 15:48 UTC
  LW: 4 AF: 2
  AF Parent
  Note that I wasn’t talking about approval directed agents in the part you originally quoted. I was saying that normal maximizers will learn to build good models as part of capability generalization.
  - Vaniver 11 Oct 2019 16:42 UTC
    LW: 2 AF: 1
    AF Parent
    Oh! Sorry, I missed the “How does this compare with” line.