This is a really helpful thread, for me, thank you both.
in humans… noticing your values drifted in a bad way is probably a negative reinforcement event
Are you hypothesising a shardy explanation for this (like, former, now dwindled shards get activated for some reason, think ‘what have I done?’; they emit a strong negative reinforcement—maybe they predict low value and some sort of long-horizon temporal-difference credit assignment kicks in...? And squashes/weakens/adjusts the new driften shards...? (The horizon is potentially very long?)) Or just that this is a thing in humans in particular somehow?
This is a really helpful thread, for me, thank you both.
Are you hypothesising a shardy explanation for this (like, former, now dwindled shards get activated for some reason, think ‘what have I done?’; they emit a strong negative reinforcement—maybe they predict low value and some sort of long-horizon temporal-difference credit assignment kicks in...? And squashes/weakens/adjusts the new driften shards...? (The horizon is potentially very long?)) Or just that this is a thing in humans in particular somehow?