Vaniver comments on The Brain as a Universal Learning Machine

Vaniver 27 Jun 2015 14:23 UTC
4 points

it probably still would consider time

Is this a mathematical argument, or a verbal argument?

Specifically, what eli_sennesh means by a “planning gradient” is that you compare a plan to alternative plans around it, and switch plans in the direction of more reward. If your reward function returns infinity for any possible plan, then you will be indifferent among all plans, and your utility function will not constrain what actions you take at all, and your behavior is ‘unspecified.’

I think you’re implicitly assuming that the reward function is housed in some other logic, and so it’s not that the AI is infinitely satisfied by every possibility, but that the AI is infinitely satisfied by continuing to exist, and thus seeks to maximize the amount of time that it exists. But if you’re going to wirehead, why would you leave this potential source for disappointment around, instead of making the entire reward logic just return “everything is as good as it could possibly be”?
- Kaj_Sotala 29 Jun 2015 9:53 UTC
  0 points
  Parent
  Here’s one mathematical argument for it, based on the assumption that the AI can rewire its reward channel but not the whole reward/planning function: http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf
  
  We have argued that the reinforcement-learning, goal-seeking and predictionseeking agents all take advantage of the realistic opportunity to modify their inputs right before receiving them. This behavior is undesirable as the agents no longer maximize their utility with respect to the true (inner) environment but instead become mere survival agents, trying only to avoid those dangerous states where their code could be modified by the environment.
- [deleted] 28 Jun 2015 18:53 UTC
  0 points
  Parent
  Yes, that’s the basic problem with considering the reward signal to be a feature, to be maximized without reference to causal structure, rather than a variable internal to the world-model.