Coincidentally I’m also trying to understand this post at the same time, and was somewhat confused by the “upstream”/”downstream” distinction.
What I eventually concluded was that there are 3 ways a daemon that intrinsically values optimizing some Y can “look like” it’s optimzing X:
Y = X (this seems both unconcerning and unlikely, and thus somewhat irrelevant)
optimzing Y causes optimization pressure to be applied to X (upstream daemon, describes humans if Y = our actual goals and X = inclusive genetic fitness)
The daemon is directly optimizing X because the daemon believes this instrumentally helps it achieve Y (downstream daemon, e.g. if optimizing X helps the daemon survive)
Does this seem correct? In particular, I don’t understand why upstream daemons would have to have a relatively benign goal.
Coincidentally I’m also trying to understand this post at the same time, and was somewhat confused by the “upstream”/”downstream” distinction.
What I eventually concluded was that there are 3 ways a daemon that intrinsically values optimizing some Y can “look like” it’s optimzing X:
Y = X (this seems both unconcerning and unlikely, and thus somewhat irrelevant)
optimzing Y causes optimization pressure to be applied to X (upstream daemon, describes humans if Y = our actual goals and X = inclusive genetic fitness)
The daemon is directly optimizing X because the daemon believes this instrumentally helps it achieve Y (downstream daemon, e.g. if optimizing X helps the daemon survive)
Does this seem correct? In particular, I don’t understand why upstream daemons would have to have a relatively benign goal.
Yeah that seems right. I think it’s a better summary of what Paul was talking about.