Algon comments on An observation about Hubinger et al.’s framework for learned optimization

Algon 2 Aug 2022 23:09 UTC
1 point
Note: I skimmed this post because I didn’t want to think about maths right now.

Take a Q-value table for, I don’t know, a 3 state MDP. Apply TD-0 a ton until you get convergence. How has this system not learnt its base objective? Similairly, train any system to completion. How has it not learnt its base objective? If learning the base objective is impossible, then I think I’ve got a contradiction on my hands.