Another way of conceptualising this is to say that the agent has the single unchanging goal of “cups until 2025, thenceforth paperclips”.
Compare with the situation of being told to make grue cups, where “grue” means “green until 2025, then blue.”
If the agent is not informed in advance, it can still be conceptualised as the agent’s goal being to produce whatever it is told to produce — an unchanging goal.
At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.
It is not clear to me what any of this has to do with Orthogonality.
OK, I’m open to discuss this further using your concept.
As I understand you agree that correct answer is 2nd?
It is not clear to me what any of this has to do with Orthogonality.
I’m not sure how patient you are, but I can reassure that we will come to Orthogonality if you don’t give up 😄
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.
Leaving aside the conceptualisation of “terminal goals”, the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a “terminal” goal of cups now and a “terminal” goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only “terminal” goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.
Unless, of course, people go, “At last, we’ve created the Sorceror’s Apprentice machine, as warned of in Goethe’s cautionary tale, ‘The Sorceror’s Apprentice’!”
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal.
A superintelligent agent will do what it damn well likes, it’s superintelligent. :)
You don’t seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.
I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.
Another way of conceptualising this is to say that the agent has the single unchanging goal of “cups until 2025, thenceforth paperclips”.
Compare with the situation of being told to make grue cups, where “grue” means “green until 2025, then blue.”
If the agent is not informed in advance, it can still be conceptualised as the agent’s goal being to produce whatever it is told to produce — an unchanging goal.
At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.
It is not clear to me what any of this has to do with Orthogonality.
OK, I’m open to discuss this further using your concept.
As I understand you agree that correct answer is 2nd?
I’m not sure how patient you are, but I can reassure that we will come to Orthogonality if you don’t give up 😄
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.
Leaving aside the conceptualisation of “terminal goals”, the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a “terminal” goal of cups now and a “terminal” goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only “terminal” goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.
Unless, of course, people go, “At last, we’ve created the Sorceror’s Apprentice machine, as warned of in Goethe’s cautionary tale, ‘The Sorceror’s Apprentice’!”
A superintelligent agent will do what it damn well likes, it’s superintelligent. :)
You don’t seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.
I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
It seems you didn’t try to answer this question.
The agent will reason:
Future is unpredictable
It is possible that my terminal goal will be different by the time I get outcomes of my actions
Should I take that into account when choosing actions?
If I don’t take that into account, I’m not really intelligent, because I am aware of these risks and I ignore them.
If I take that into account, I’m not really aligned with my terminal goal.
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.