Steven Byrnes comments on The Catastrophic Convergence Conjecture

Steven Byrnes 15 Feb 2020 10:26 UTC
LW: 3 AF: 2
AF

I have previously criticized value learning for needing to locate the human within some kind of prespecified ontology (this criticism is not new). By taking only the agent itself as primitive, perhaps we could get around this (we don’t need any fancy engineering or arbitrary choices to figure out AUs/optimal value from the agent’s perspective).

Wouldn’t you need to locate the abstract concept of AU within the AI’s ontology? Is that easier? Or sorry if I’m misunderstanding.
- TurnTrout 15 Feb 2020 14:17 UTC
  LW: 2 AF: 1
  AF Parent
  
  Wouldn’t you need to locate the abstract concept of AU within the AI’s ontology? Is that easier? Or sorry if I’m misunderstanding.
  
  To the contrary, an AU is naturally calculated from reward, one of the few things that is ontologically fundamental in the paradigm of RL. As mentioned in the last post, the AU of reward function $R$ is $V_{R}^{*}$ - which calculates the maximum possible $R$ -return from a given state.
  
  This will become much more obvious in the AUP empirical post.
  - Steven Byrnes 15 Feb 2020 19:00 UTC
    LW: 7 AF: 4
    AF Parent
    Sure. Looking forward to that. My current intuition is: Humans have a built-in reward system based on (mumble mumble) dopamine, but the existence of that system doesn’t make it easy for us to understand dopamine, or reward functions in general, or anything like that, nor does it make it easy for us to formulate and pursue goals related to those things. It takes quite a bit of education and beautifully-illustrated blog posts to get us to that point :-D
    - TurnTrout 15 Feb 2020 23:52 UTC
      LW: 3 AF: 2
      AF Parent
      Note that when I said
      
      (we don’t need any fancy engineering or arbitrary choices to figure out AUs/optimal value from the agent’s perspective).
      
      I meant we could just consider how the agent’s AUs are changing without locating a human in the environment.
      - Steven Byrnes 16 Feb 2020 2:26 UTC
        LW: 1 AF: 1
        AF Parent
        Cool. We’re probably on the same page then.