tailcalled comments on Many arguments for AI x-risk are wrong

tailcalled 6 Mar 2024 7:51 UTC
5 points
2

Well, if we’re going to get historical, PPO is a relatively small variation on Williams’s REINFORCE policy gradient model-free RL algorithm from 1992 (or earlier if you count conferences etc)

Oops.

I don’t know how you can say that.

Well, I didn’t say it, TurnTrout did.