Charlie Steiner comments on Instrumental Convergence Bounty

Charlie Steiner 14 Sep 2023 17:52 UTC
4 points
0
Once you understand how it works, it’s no longer surprising.
Take collecting keys in Montezuma’s Revenge. If framed simply as “I trained an AI to take actions that increase the score, and it learned how to collect keys that will only be useful later,” then plausibly it’s a surprising example of learning instrumentally useful actions. But if it’s “I trained an AI to construct a model of the world and then explore options in that model with the eventual goal of getting high reward, and rewarded it for increasing the score,” then it’s no longer so surprising—if you understand why it does what it does, it’s not so surprising.