abramdemski comments on Recursive Quantilizers II

abramdemski 9 Mar 2021 14:54 UTC
LW: 2 AF: 2
AF
Okay, I think with this elaboration I stand by what I originally said
You mean with respect to the system as described in the post (in which case I 100% agree), or the modified system which restarts training upon new feedback (which is what I was just describing)?
Because I think this is pretty solidly wrong of the system that restarts.
Specifically, isn’t it the case that the first few bits of feedback determine $D_{1}$ , which might then lock in some bad way of interpreting feedback (whether existing or future feedback)?
All feedback so far determines the new $D_{1}$ when the system restarts training.
(Again, I’m not saying it’s feasible to restart training all the time, I’m just using it as a proof-of-concept to show that we’re not fundamentally forced to make a trade-off between (a) order independence and (b) using the best model to interpret feedback.)
- Rohin Shah 9 Mar 2021 18:41 UTC
  LW: 2 AF: 2
  AF Parent
  I continue to not understand this but it seems like such a simple question that it must be that there’s just some deeper misunderstanding of the exact proposal we’re now debating. It seems not particularly worth it to find this misunderstanding; I don’t think it will really teach us anything conceptually new.
  (If I did want to find it, I would write out pseudocode for the new proposed system and then try to make a more precise claim in terms of the variables in the pseudocode.)
  - abramdemski 9 Mar 2021 19:51 UTC
    LW: 2 AF: 2
    AF Parent
    Fair.