Stuart_Armstrong comments on Detecting agents and subagents

Stuart_Armstrong 11 Mar 2015 12:21 UTC
2 points

This in an interesting article, though necessarily abstract—how can we take this to implement an actual AI detector?

This is indeed a very preliminary concept.

If this were a friendly AI, shouldn’t the utility function of the gathering agent take sharing / don’t be greedy into account.

Friendly AI’s need not be nice in a game-theoretic sense. They can (and likely would) be ruthless and calculating at achieving their goals—it’s just that heir goals are good/safe/positive. This puts some constraints on means (eg the AI will likely not kill everyone just to get to its goals), but it’s not likely that “play nicer than you have to with other AIs” will be such a constraint.