paulfchristiano comments on Learning the prior

paulfchristiano Jan 25, 2021, 4:21 PM
LW: 3 AF: 3
AF
Even if you were taking D as input and ignoring tractability, IDA still has to decide what to do with D, and that needs to be at least as useful as what ML does with D (and needs to not introduce alignment problems in the learned model). In the post I’m kind of vague about that and just wrapping it up into the philosophical assumption that HCH is good, but really we’d want to do work to figure out what to do with D, even if we were just trying to make HCH aligned (and I think even for HCH competitiveness matters because it’s needed for HCH to be stable/aligned against internal optimization pressure).
- jon_crescent Jan 26, 2021, 1:13 AM
  LW: 1 AF: 1
  AF Parent
  Okay, that makes sense (and seems compelling, though not decisive, to me). I’m happy to leave it here—thanks for the answers!