harfe comments on Daniel Kokotajlo’s Shortform

harfe 19 Jun 2023 1:06 UTC
1 point
0
What is an environmental subagent? An agent on a remote datacenter that the builders of the orginal agent don’t know about?

Another thing that is not so clear to me in this description: Does the first agent consider the alignment problem of the environmental subagent? It sounds like the environmental subagents cares about paperclip-shaped molecules, but is this a thing the first agent would be ok with?
- Daniel Kokotajlo 20 Jun 2023 2:43 UTC
  3 points
  0
  Parent
  I think it means it builds a new version of itself (possibly an exact copy, possibly a slimmed down version) in a place where the humans who normally have power over it don’t have power or visibility. E.g. it convinces an employee to smuggle a copy out to the internet.
  
  My read on this story is: There is indeed an alignment problem between the original agent and the environmental subagent. The story doesn’t specify whether the original agent considers this problem, nor whether it solves it. My own version of the story would be “Just like how the AI lab builds the original agent without having solved the alignment problem, because they are dumb + naive + optimistic + in a race with rivals, so too does the original agent launch an environmental subagent without having solved the alignment problem, for similar or possibly even very similar reasons.”