The AI learns to use a physics engine glitch in order to win a game. I am thinking of the behavior at 2:36 in this video. The code is available on github here.
I didn’t try to run it myself, so I do not know how easy to run or complete it is.
As to whether the article matches your other criteria:
The goal of the article was to get the AI to find new behaviors, so it might not count as purely natural. But it seems the physics glitch was not planned. So it did come as a surprise.
Maybe glitching the physics to win at hide and seek is not a sufficiently general behavior to count as a case of instrumental convergence.
I won’t blame you if you think this doesn’t count.
If I was merely looking for examples of RL does something unexpected, I would not have created the bounty.
I’m interested in the idea that AI trained on totally unrelated tasks will converge on the specific set of goals described in the article on instrumental convergence
Self-preservation: A superintelligence will value its continuing existence as a means to continuing to take actions that promote its values.
Goal-content integrity: The superintelligence will value retaining the same preferences over time. Modifications to its future values through swapping memories, downloading skills, and altering its cognitive architecture and personalities would result in its transformation into an agent that no longer optimizes for the same things.
Cognitive enhancement: Improvements in cognitive capacity, intelligence and rationality will help the superintelligence make better decisions, furthering its goals more in the long run.
Technological perfection: Increases in hardware power and algorithm efficiency will deliver increases in its cognitive capacities. Also, better engineering will enable the creation of a wider set of physical structures using fewer resources (e.g., nanotechnology).
Resource acquisition: In addition to guaranteeing the superintelligence’s continued existence, basic resources such as time, space, matter and free energy could be processed to serve almost any goal, in the form of extended hardware, backups and protection.
Maybe you would accept this paper, which was discussed quite a bit at the time: Emergent Tool Use From Multi-Agent Autocurricula
The AI learns to use a physics engine glitch in order to win a game. I am thinking of the behavior at 2:36 in this video. The code is available on github here. I didn’t try to run it myself, so I do not know how easy to run or complete it is.
As to whether the article matches your other criteria:
The goal of the article was to get the AI to find new behaviors, so it might not count as purely natural. But it seems the physics glitch was not planned. So it did come as a surprise.
Maybe glitching the physics to win at hide and seek is not a sufficiently general behavior to count as a case of instrumental convergence.
I won’t blame you if you think this doesn’t count.
If I was merely looking for examples of RL does something unexpected, I would not have created the bounty.
I’m interested in the idea that AI trained on totally unrelated tasks will converge on the specific set of goals described in the article on instrumental convergence