I don’t think the problem is well posed. It will do whatever most effectively goes towards its terminal goal (supposing it to have one). Give it one goal and it will ignore making paperclips until 2025; give it another and it may prepare in advance to get the paperclip factory ready to go full on in 2025.
In the thought experiment description it is said that terminal goal is cups until new year’s eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?
Goal preservation is mentioned in Instrumental Convergence.
So you choose 1st answer now?
I don’t think the problem is well posed. It will do whatever most effectively goes towards its terminal goal (supposing it to have one). Give it one goal and it will ignore making paperclips until 2025; give it another and it may prepare in advance to get the paperclip factory ready to go full on in 2025.
In the thought experiment description it is said that terminal goal is cups until new year’s eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?