Kenoubi comments on Design, Implement and Verify

Kenoubi 14 Apr 2022 13:21 UTC
1 point

I am also not quite clear why north Korea destroying the world would be so much worse than deepmind doing it.

I think the argument about this part would be that Deepmind is much more likely (which is not to say “likely” on an absolute scale) to at least take alignment seriously enough to build (or even use!) interpretability tools and maybe revise their plans if the tools show the AGI plotting to kill everyone. So by the time Deepmind is actually deploying an AGI (even including accidental “deployments” due to foom during testing), it’s less likely to be misaligned than one deployed by North Korea.

Of course if (as the rest of your comment contends) North Korea can’t develop an AGI anyway, this is a bit beside the point. It’s much easier for lower-tier research organizations to copy and make minor modifications than do totally new things, so NK would be more likely to develop AGI to the extent that techniques leading to AGI are already available. Which would plausibly be the case if researchers in democracies are trying to get as close to the AGI line as is allowed (because that gives useful capabilities), which in turn seems much more plausible to me than democracies globally coordinating to avoid anything even vaguely close to AGI.
- Donald Hobson 14 Apr 2022 14:43 UTC
  2 points
  Parent
  Imagine you are in charge of choosing how fast deep mind develops tech. Go too fast and you have a smaller chance of alignment. Go too slow and north Korea may beat you.
  There isn’t much reason to go significantly faster than north Korea in this scenario. If you can go a bit faster and still make something probably aligned, do that.
  In a worse situation, taking your time and hoping for a drone strike on north Korea is probably the best bet.
  Which would plausibly be the case if researchers in democracies are trying to get as close to the AGI line as is allowed (because that gives useful capabilities), which in turn seems much more plausible to me than democracies globally coordinating to avoid anything even vaguely close to AGI.
  Coordinating on a fuzzy boundary no one can define or measure is really hard. If coordination happens, it will be to avoid something simple, like any project using more than X compute.
  I don’t think Conceptually close to AGI = Profitable. There is simple dumb money making code. And there is code that contains all the ideas for AGI, but is missing one tiny piece, and so is useless.