The AI gets positive utility from having been created, and that is the whole of its utility function. It’s given a sandbox full of decision-theoretic problems to play with, and is put in a box (i.e. it can’t meaningfully influence the outside world until it has superhuman intelligence). Design it in such a way that it’s initially biased toward action rather than inaction if it anticipates equal utility from both.
Unless the AI develops some sort of non-causal decision theory, it has no reason to do anything. If it develops TDT, it will try to act in accordance with what it judges to be the wishes of its creators, following You’re In Newcomb’s Box logic—it will try to be the sort of thing its creators wished to create.
I’m having a hard time coming up with a motivation system that could lead such an AI to developing an acausal decision theory without relying on some goal-like structure that would end up being externally indistinguishable from terms in a utility function. If we stuck a robot with mechanical engineering tools in a room full of scrap parts and gave it an urge to commit novel actions but no utilitarian guidelines for what actions are desirable, I don’t think I’d expect it to produce a working nuclear reactor in a reasonable amount of time simply for having nothing better to do.
If I understand this correctly your ‘AI’ is biased to do random things, but NOT as a function of its utility function. If that is correct then your ‘AI’ simple does random things (according to its non-utility bias) since its utility function has no influence on its actions.
The AI gets positive utility from having been created, and that is the whole of its utility function. It’s given a sandbox full of decision-theoretic problems to play with, and is put in a box (i.e. it can’t meaningfully influence the outside world until it has superhuman intelligence). Design it in such a way that it’s initially biased toward action rather than inaction if it anticipates equal utility from both.
Unless the AI develops some sort of non-causal decision theory, it has no reason to do anything. If it develops TDT, it will try to act in accordance with what it judges to be the wishes of its creators, following You’re In Newcomb’s Box logic—it will try to be the sort of thing its creators wished to create.
I’m having a hard time coming up with a motivation system that could lead such an AI to developing an acausal decision theory without relying on some goal-like structure that would end up being externally indistinguishable from terms in a utility function. If we stuck a robot with mechanical engineering tools in a room full of scrap parts and gave it an urge to commit novel actions but no utilitarian guidelines for what actions are desirable, I don’t think I’d expect it to produce a working nuclear reactor in a reasonable amount of time simply for having nothing better to do.
If I understand this correctly your ‘AI’ is biased to do random things, but NOT as a function of its utility function. If that is correct then your ‘AI’ simple does random things (according to its non-utility bias) since its utility function has no influence on its actions.