DanielFilan comments on AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger

DanielFilan 19 Feb 2021 2:02 UTC
LW: 8 AF: 6
AF
(sorry I didn’t reply to this when you messaged it to me privately, this has been a low-brain-power week)

To reason about whether machine learning will result in these mechanistic optimizers, we need to reason about the inductive biases of machine learning algorithms. We mostly don’t yet know how likely they are.

I think Evan also indirectly appeals to ‘inductive biases’ in the parameter to function mapping of neural networks, e.g. the result Joar Skalse contributed to about properties of random nets.
- DanielFilan 19 Feb 2021 2:03 UTC
  LW: 4 AF: 2
  AF Parent
  Also my biggest take-away was the argument for why we shouldn’t expect myopia by default. But perhaps this was already obvious to others.
  - Rohin Shah 19 Feb 2021 4:32 UTC
    LW: 4 AF: 4
    AF Parent
    So my understanding is there are two arguments:
    A myopic objective requires an extra distinction to say “don’t continue past the end of the episode”
    Something about online learning
    The online learning argument is actually super complicated and dependent on a bunch of factors, so I’m not going to summarize that one here. So I’ve just added the other one:
    Even if training on a myopic base objective, we might expect the mesa objective to be non-myopic, as the non-myopic objective “pursue X” is simpler than the myopic objective “pursue X until time T”.
- Rohin Shah 19 Feb 2021 4:21 UTC
  LW: 2 AF: 2
  AF Parent
  I did mean to include that; going to delete the word “algorithms” since that’s what’s causing the ambiguity.