Seems tangentially related to the train a sequence of reporters strategy for ELK. They don’t phrase it in terms of basins and path dependence, but they’re a great frame to look at it with.
Personally, I think supervised learning has low path-dependence because of exact gradients plus always being able find a direction to escape basins in high dimensions, while reinforcement learning has high path-dependence because updates influence future training data causing attractors/equilibra (more uncertain about the latter, but that’s what I feel like)
So the really out there take: We want to give the LLM influence over its future training data in order to increase path-dependence, and get the attractors we want ;)
Seems tangentially related to the train a sequence of reporters strategy for ELK. They don’t phrase it in terms of basins and path dependence, but they’re a great frame to look at it with.
Personally, I think supervised learning has low path-dependence because of exact gradients plus always being able find a direction to escape basins in high dimensions, while reinforcement learning has high path-dependence because updates influence future training data causing attractors/equilibra (more uncertain about the latter, but that’s what I feel like)
So the really out there take: We want to give the LLM influence over its future training data in order to increase path-dependence, and get the attractors we want ;)