Value deathism by Vladimir Nesov encourages us to fix our values to prevent astronomical waste due to under-optimized future.
When I’ve read it I found that I think about units of measurement of mentioned astronomical waste. Utilons? Seems so. [edit] Jack suggested widely accepted word Utils instead.[/edit]
I’ve tried to precisely define it. It is difference between utility of some world-state G measured by original (drifting) agent and utility of world-state G measured by undrifting version of original agent, where world-state G is optimal according to original (drifting) agent.
There are two questions: can we compare utilities of those agents and what does it mean that G is optimal?
Question
Preconditions: world is deterministic, the agent has full knowledge of the world, i.e. it knows current world-state, full list of actions available for every world-state and consequence of each action (world-state it leads to), the agent has no time limit for computing next action.
Agent’s value is defined as a function from set of world-states to real numbers, for the sake of, uhm, clarity, the bigger the better. (Note: it is unnecessary to define value as a function from set of sequences of world-states, as history of world can be deduced from world-state itself, and if it can’t be deduced, then the agent can’t use history anyway, as the agent is a part of this world-state, so it doesn’t “remember” history too). [edit] I wasn’t aware that this note includes hidden assumption: value of world-state must be constant. But this assumption doesn’t allow agent to single out world-state where agent loses all or part of its memory. Thus value as a function over sequences of world-states has a right to be. But this value function still needs to be specifically shaped to be optimization algorithm independent. [/edit]
Which sequence of world-states is optimal according to agent’s value?
Edit: Consider agents implementing greedy search algorithm and exhaustive search algorithm. For them to choose same sequence of world-states search space should be greedoid. And that requires very specific structure of value function.
Edit2: Alternatively value function can be indirectly self-referential via part of world-state that contains the agent, thus allowing it to modify agent’s optimization algorithm by assigning higher utility to world-states where agent implements desired optimization algorithm. (I call agent’s function ‘value function’ because its meaning can be defined by the function itself, it isn’t necessarily utility).
My answer:
Jura inyhr shapgvba bs gur ntrag vfa’g ersyrpgvir, v.r. qbrfa’g qrcraq ba vagrecergngvba bs n cneg bs jbeyq-fgngr bpphcvrq ol ntrag va grezf bs bcgvzvmngvba cebprff vzcyrzragrq ol guvf cneg bs jbeyq-fgngr, gura bcgvzny frdhrapr qrcraqf ba pbzovangvba bs qrgnvyf bs vzcyrzragngvba bs ntrag’f bcgvzvmngvba nytbevguz naq inyhr shapgvba. V guvax va trareny vg jvyy rkuvovg SBBZ orunivbe.
I’ll try to analyse behavior of classical paperclip maximizer, using toy model I described earlier. Let utility function be min(number_of_paperclips_produced, 50).
1. Paperclip maximizer implements greedy search algorithm. If it can’t produce paperclip (all available actions lead to the same utility), it performs action that depends on implementation of greedy search. All in all it acts erratically, while it isn’t occasionally terminated (it stumbled into world-state where there’s no available actions for him).
2. Paperclip maximizer implements full-search algorithm. Result depends on implementation of full-search. If implementation executes shortest sequence of actions that leads to globally maximal value of utility function, then it produces 50 paperclips as fast as it can [edit] or it wireheads itself into state where his paperclip counter>50 whichever is faster [/edit], then terminates itself. If implementation executes longest possible sequence of actions that leads to globally maximal value of utility function, then the agent behave erratically, but is guarantied to survive, while its optimization algorithm behave according to original plan, but it will occasionally modify itself and gets terminated, as original plan doesn’t care about preservation of agent’s optimization algorithm or utility function.
It seems that in full-knowledge case powerful optimization processes don’t go FOOM. Full-search algorithm is maximally powerful isn’t it?
Maybe it is uncertainty that leads to FOOMing?
Indexical uncertainty can be represented by assumption, than agent knows set of world-states it can be in, and a set of available actions for world-state it is actually in. I’ll try to analyze this case later.
Edit4: Edit3 is wrong. Utility function in that toy model cannot be so simple if it uses some property of the agent. However it seems OK to extend model by including high-level description of state of the agent into world-state, then edit3 holds.
What does it mean to optimize future?
Preamble
Value deathism by Vladimir Nesov encourages us to fix our values to prevent astronomical waste due to under-optimized future.
When I’ve read it I found that I think about units of measurement of mentioned astronomical waste. Utilons? Seems so. [edit] Jack suggested widely accepted word Utils instead.[/edit]
I’ve tried to precisely define it. It is difference between utility of some world-state G measured by original (drifting) agent and utility of world-state G measured by undrifting version of original agent, where world-state G is optimal according to original (drifting) agent.
There are two questions: can we compare utilities of those agents and what does it mean that G is optimal?
Question
Preconditions: world is deterministic, the agent has full knowledge of the world, i.e. it knows current world-state, full list of actions available for every world-state and consequence of each action (world-state it leads to), the agent has no time limit for computing next action.
Agent’s value is defined as a function from set of world-states to real numbers, for the sake of, uhm, clarity, the bigger the better. (Note: it is unnecessary to define value as a function from set of sequences of world-states, as history of world can be deduced from world-state itself, and if it can’t be deduced, then the agent can’t use history anyway, as the agent is a part of this world-state, so it doesn’t “remember” history too). [edit] I wasn’t aware that this note includes hidden assumption: value of world-state must be constant. But this assumption doesn’t allow agent to single out world-state where agent loses all or part of its memory. Thus value as a function over sequences of world-states has a right to be. But this value function still needs to be specifically shaped to be optimization algorithm independent. [/edit]
Which sequence of world-states is optimal according to agent’s value?
Edit: Consider agents implementing greedy search algorithm and exhaustive search algorithm. For them to choose same sequence of world-states search space should be greedoid. And that requires very specific structure of value function.
Edit2: Alternatively value function can be indirectly self-referential via part of world-state that contains the agent, thus allowing it to modify agent’s optimization algorithm by assigning higher utility to world-states where agent implements desired optimization algorithm. (I call agent’s function ‘value function’ because its meaning can be defined by the function itself, it isn’t necessarily utility).
My answer:
Jura inyhr shapgvba bs gur ntrag vfa’g ersyrpgvir, v.r. qbrfa’g qrcraq ba vagrecergngvba bs n cneg bs jbeyq-fgngr bpphcvrq ol ntrag va grezf bs bcgvzvmngvba cebprff vzcyrzragrq ol guvf cneg bs jbeyq-fgngr, gura bcgvzny frdhrapr qrcraqf ba pbzovangvba bs qrgnvyf bs vzcyrzragngvba bs ntrag’f bcgvzvmngvba nytbevguz naq inyhr shapgvba. V guvax va trareny vg jvyy rkuvovg SBBZ orunivbe.
Ohg jura inyhr shapgvba vf ersyrpgvir gura guvatf orpbzr zhpu zber vagrerfgvat.
Edit3:
Implications
I’ll try to analyse behavior of classical paperclip maximizer, using toy model I described earlier. Let utility function be min(number_of_paperclips_produced, 50).
1. Paperclip maximizer implements greedy search algorithm. If it can’t produce paperclip (all available actions lead to the same utility), it performs action that depends on implementation of greedy search. All in all it acts erratically, while it isn’t occasionally terminated (it stumbled into world-state where there’s no available actions for him).
2. Paperclip maximizer implements full-search algorithm. Result depends on implementation of full-search. If implementation executes shortest sequence of actions that leads to globally maximal value of utility function, then it produces 50 paperclips as fast as it can [edit] or it wireheads itself into state where his paperclip counter>50 whichever is faster [/edit], then terminates itself. If implementation executes longest possible sequence of actions that leads to globally maximal value of utility function, then the agent behave erratically, but is guarantied to survive, while its optimization algorithm behave according to original plan, but it will occasionally modify itself and gets terminated, as original plan doesn’t care about preservation of agent’s optimization algorithm or utility function.
It seems that in full-knowledge case powerful optimization processes don’t go FOOM. Full-search algorithm is maximally powerful isn’t it?
Maybe it is uncertainty that leads to FOOMing?
Indexical uncertainty can be represented by assumption, than agent knows set of world-states it can be in, and a set of available actions for world-state it is actually in. I’ll try to analyze this case later.
Edit4: Edit3 is wrong. Utility function in that toy model cannot be so simple if it uses some property of the agent. However it seems OK to extend model by including high-level description of state of the agent into world-state, then edit3 holds.