It seems that stratification is more flexible than I initially thought.
That’s because the default action or policy ∅, which I was envisioning as a null action (or maybe the AI turning itself off) can actually be more than that. For instance, ∅ could be an obsessive learning policy, for learning human values, for instance—and these human values can form the core of the AI’s value function W.
Then, stratification means that the AI will act to maximise human values, while estimating those values in accordance with what it would have calculated, had it been a pure value-estimator. This avoids the tension between value-learning and value maximising that bedevils most value-learners.
Stratified learning and action
A putative new idea for AI control; index here.
It seems that stratification is more flexible than I initially thought.
That’s because the default action or policy ∅, which I was envisioning as a null action (or maybe the AI turning itself off) can actually be more than that. For instance, ∅ could be an obsessive learning policy, for learning human values, for instance—and these human values can form the core of the AI’s value function W.
Then, stratification means that the AI will act to maximise human values, while estimating those values in accordance with what it would have calculated, had it been a pure value-estimator. This avoids the tension between value-learning and value maximising that bedevils most value-learners.