If I was merely looking for examples of RL does something unexpected, I would not have created the bounty.
I’m interested in the idea that AI trained on totally unrelated tasks will converge on the specific set of goals described in the article on instrumental convergence
Self-preservation: A superintelligence will value its continuing existence as a means to continuing to take actions that promote its values.
Goal-content integrity: The superintelligence will value retaining the same preferences over time. Modifications to its future values through swapping memories, downloading skills, and altering its cognitive architecture and personalities would result in its transformation into an agent that no longer optimizes for the same things.
Cognitive enhancement: Improvements in cognitive capacity, intelligence and rationality will help the superintelligence make better decisions, furthering its goals more in the long run.
Technological perfection: Increases in hardware power and algorithm efficiency will deliver increases in its cognitive capacities. Also, better engineering will enable the creation of a wider set of physical structures using fewer resources (e.g., nanotechnology).
Resource acquisition: In addition to guaranteeing the superintelligence’s continued existence, basic resources such as time, space, matter and free energy could be processed to serve almost any goal, in the form of extended hardware, backups and protection.
If I was merely looking for examples of RL does something unexpected, I would not have created the bounty.
I’m interested in the idea that AI trained on totally unrelated tasks will converge on the specific set of goals described in the article on instrumental convergence