You don’t seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.
I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.
I don’t think the problem is well posed. It will do whatever most effectively goes towards its terminal goal (supposing it to have one). Give it one goal and it will ignore making paperclips until 2025; give it another and it may prepare in advance to get the paperclip factory ready to go full on in 2025.
In the thought experiment description it is said that terminal goal is cups until new year’s eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?
No. Orthogonality is when agent follows any given goal, not when you give it. And as my thought experiment shows it is not intelligent to blindly follow given goal.
You don’t seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.
I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
It seems you didn’t try to answer this question.
The agent will reason:
Future is unpredictable
It is possible that my terminal goal will be different by the time I get outcomes of my actions
Should I take that into account when choosing actions?
If I don’t take that into account, I’m not really intelligent, because I am aware of these risks and I ignore them.
If I take that into account, I’m not really aligned with my terminal goal.
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.
Goal preservation is mentioned in Instrumental Convergence.
So you choose 1st answer now?
I don’t think the problem is well posed. It will do whatever most effectively goes towards its terminal goal (supposing it to have one). Give it one goal and it will ignore making paperclips until 2025; give it another and it may prepare in advance to get the paperclip factory ready to go full on in 2025.
In the thought experiment description it is said that terminal goal is cups until new year’s eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?
If you can give the AGI any terminal goal you like, irrespective of how smart it is, that’s orthogonality right there.
No. Orthogonality is when agent follows any given goal, not when you give it. And as my thought experiment shows it is not intelligent to blindly follow given goal.