You may be interested: the NARS literature describes a system that encounters goals as atoms and uses them to shape the pops from a data structure they call bag, which is more or less a probabilistic priority queue. It can do “competing priorities” reasoning as a natural first class citizen, and supports mutation of goals.
But overall your question is something I’ve always wondered about.
I made an attempt to write about it here, I refer systems of fixed/axiomatic goals as “AIXI-like” and systems of driftable/computational goals “AIXI-unlike”.
I share your intuition that this razor seems critical to mathematizing agency! I can conjecture about why we do not observe it in the literature:
Goal mutation is a special case of multi-objective optimization, and MOO is is just single-objective optimization where the objective is a linear multivariate function of other objectives
Perhaps agent foundations researchers, in some verbal/tribal knowledge that is on the occasional whiteboard in berkeley but doesn’t get written up, reason that if goals are a function of time, the image of a sequence of discretized time steps forms a multi-objective optimization problem.
AF under goal mutation is super harder than AF under fixed goals, and we’re trying to walk before we run
Maybe agent foundations researchers believe that just fixing the totally borked situation of optimization and decision theory with fixed goals costs 10 to 100 tao-years, and that doing it with unfixed goals costs 100 to 1000 tao-years.
If my goal is a function of time, instrumental convergence still applies
self explanatory
If my goal is a function of time, corrigibility????
Incorrigibility is the desire to preserve goal-content integrity, right? This implies that as time goes to infinity, the agent will desire for the goal to stabilize/converge/become constant. How does it act on this desire? Unclear to me. I’m deeply, wildly confused, as a matter of fact.
You may be interested: the NARS literature describes a system that encounters goals as atoms and uses them to shape the pops from a data structure they call bag, which is more or less a probabilistic priority queue. It can do “competing priorities” reasoning as a natural first class citizen, and supports mutation of goals.
But overall your question is something I’ve always wondered about.
I made an attempt to write about it here, I refer systems of fixed/axiomatic goals as “AIXI-like” and systems of driftable/computational goals “AIXI-unlike”.
I share your intuition that this razor seems critical to mathematizing agency! I can conjecture about why we do not observe it in the literature:
Goal mutation is a special case of multi-objective optimization, and MOO is is just single-objective optimization where the objective is a linear multivariate function of other objectives
Perhaps agent foundations researchers, in some verbal/tribal knowledge that is on the occasional whiteboard in berkeley but doesn’t get written up, reason that if goals are a function of time, the image of a sequence of discretized time steps forms a multi-objective optimization problem.
AF under goal mutation is super harder than AF under fixed goals, and we’re trying to walk before we run
Maybe agent foundations researchers believe that just fixing the totally borked situation of optimization and decision theory with fixed goals costs 10 to 100 tao-years, and that doing it with unfixed goals costs 100 to 1000 tao-years.
If my goal is a function of time, instrumental convergence still applies
self explanatory
If my goal is a function of time, corrigibility????
Incorrigibility is the desire to preserve goal-content integrity, right? This implies that as time goes to infinity, the agent will desire for the goal to stabilize/converge/become constant. How does it act on this desire? Unclear to me. I’m deeply, wildly confused, as a matter of fact.
(Edited to make headings H3 instead of H1)