tailcalled comments on TurnTrout’s shortform feed

tailcalled 25 Nov 2022 21:31 UTC
2 points
I can totally believe that agents that competently and cooperatively seek out to fulfill a goal, rather than seeking to trick evaluators of that goal to think it gets fulfilled, can exist.
However, whether you get such agents out of an algorithm depends on the details of that algorithm. Current reinforcement learning algorithms mostly don’t create agents that competently do anything. If they were more powerful while still doing essentially the same thing they currently do, most of them would end up tricked by the agents they create, rather than having aligned agents.