I’ve been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic. Is it theoretically possible?
Is it possible for a natrual agent? If so, why should it be impossible for an artifical agent?
Are you thinking that it would be impossible to code in software, for agetns if any intelligence? Or are you saying sufficiently intelligent agents would be able and motivated resist any accidental or deliberate changes?
With regard to the latter question, note that value stability under self improvement is far from a give..the Lobian obstacel
applies to all intelligences...the carrot is always in front of the donkey!
Is it possible for a natrual agent? If so, why should it be impossible for an artifical agent?
Are you thinking that it would be impossible to code in software, for agetns if any intelligence? Or are you saying sufficiently intelligent agents would be able and motivated resist any accidental or deliberate changes?
With regard to the latter question, note that value stability under self improvement is far from a give..the Lobian obstacel applies to all intelligences...the carrot is always in front of the donkey!
https://intelligence.org/files/TilingAgentsDraft.pdf