the big questions are just how large a policy you would need to train using existing methods in order to be competitive with a human (my best guess would be a ~trillion to a ~quadrillion)
Curious where this estimate comes from?
Curious where this estimate comes from?