“AI is disabled” and “world more similar to the world as it would have been without the AI interfering” are both magical categories. Your qualitative ontology has big, block objects labeled “AI” and “world” and an arrow from “AI” to “world” that can be either present or absent. The real world is a borderless, continuous process of quantum fields in which shaking one electron affects another electron on the opposite side of the universe.
I understand the general point, but “AI is disabled” seems like a special case, in that an AI able to do any sort of reasoning about itself, allocate internal resources, etc. (I don’t know how necessary this is for it to do anything useful), will have to have concepts in its qualitative ontology of, or sufficient to define, its disability – though perhaps not in a way easily available for framing a goal system (e.g. if it developed them itself, assuming it could build up to them in their absence), and probably complicated in some other ways that haven’t occurred to me in two minutes.
Suppose the AI builds devices in the environment, especially computational devices designed to offload cognitive labor. What do you want to happen when the AI is “switched off”? Hence, magical category.
Interesting, I didn’t think of this situation. How do you define “lack of intelligence” or “removal of the effect of intelligence” in the environment, so that an AI can implement that state? How is this state best achieved?
Once the system is established, the world will ever be determined by a specific goal system, even if the goal is for the world to appear as if no AI is present, starting from a certain time. The best solution is for AI to pretend of not being present, “pulling the planets along their elliptic orbits”.
As an aside, “waiting for Eliezer to find a loophole” probably does not constitute a safe and effective means of testing AI utility functions. This is something we want provable from first principles, not “proven” by “well, I can’t think of a counterexample”.
Right. I know you realize this, and the post was fine in the context of “random discussion on the internet”. However, if someone wants to actually, seriously specify a utility function for an AI any description that starts with “here’s a high-level rule to avoid bad things” and then works from there looking for potential loopholes is deeply and fundamentally misguided completely independently of the rule proposed.
“AI is disabled” and “world more similar to the world as it would have been without the AI interfering” are both magical categories. Your qualitative ontology has big, block objects labeled “AI” and “world” and an arrow from “AI” to “world” that can be either present or absent. The real world is a borderless, continuous process of quantum fields in which shaking one electron affects another electron on the opposite side of the universe.
I understand the general point, but “AI is disabled” seems like a special case, in that an AI able to do any sort of reasoning about itself, allocate internal resources, etc. (I don’t know how necessary this is for it to do anything useful), will have to have concepts in its qualitative ontology of, or sufficient to define, its disability – though perhaps not in a way easily available for framing a goal system (e.g. if it developed them itself, assuming it could build up to them in their absence), and probably complicated in some other ways that haven’t occurred to me in two minutes.
Suppose the AI builds devices in the environment, especially computational devices designed to offload cognitive labor. What do you want to happen when the AI is “switched off”? Hence, magical category.
Interesting, I didn’t think of this situation. How do you define “lack of intelligence” or “removal of the effect of intelligence” in the environment, so that an AI can implement that state? How is this state best achieved?
Once the system is established, the world will ever be determined by a specific goal system, even if the goal is for the world to appear as if no AI is present, starting from a certain time. The best solution is for AI to pretend of not being present, “pulling the planets along their elliptic orbits”.
D’oh. Yes, of course, that breaks it.
As an aside, “waiting for Eliezer to find a loophole” probably does not constitute a safe and effective means of testing AI utility functions. This is something we want provable from first principles, not “proven” by “well, I can’t think of a counterexample”.
Of course, hence ”...and probably complicated in some other ways that haven’t occurred to me in two minutes.”.
Right. I know you realize this, and the post was fine in the context of “random discussion on the internet”. However, if someone wants to actually, seriously specify a utility function for an AI any description that starts with “here’s a high-level rule to avoid bad things” and then works from there looking for potential loopholes is deeply and fundamentally misguided completely independently of the rule proposed.