Coincidentally I’m also trying to understand this post at the same time, and was somewhat confused by the “upstream”/”downstream” distinction.
What I eventually concluded was that there are 3 ways a daemon that intrinsically values optimizing some Y can “look like” it’s optimzing X:
Y = X (this seems both unconcerning and unlikely, and thus somewhat irrelevant)
optimzing Y causes optimization pressure to be applied to X (upstream daemon, describes humans if Y = our actual goals and X = inclusive genetic fitness)
The daemon is directly optimizing X because the daemon believes this instrumentally helps it achieve Y (downstream daemon, e.g. if optimizing X helps the daemon survive)
Does this seem correct? In particular, I don’t understand why upstream daemons would have to have a relatively benign goal.
(Summarizing/reinterpreting the upstream/downstream distinction for myself):
“upstream”: has a (relatively benign?) goal which actually helps achieve X
“downstream”: doesn’t
Coincidentally I’m also trying to understand this post at the same time, and was somewhat confused by the “upstream”/”downstream” distinction.
What I eventually concluded was that there are 3 ways a daemon that intrinsically values optimizing some Y can “look like” it’s optimzing X:
Y = X (this seems both unconcerning and unlikely, and thus somewhat irrelevant)
optimzing Y causes optimization pressure to be applied to X (upstream daemon, describes humans if Y = our actual goals and X = inclusive genetic fitness)
The daemon is directly optimizing X because the daemon believes this instrumentally helps it achieve Y (downstream daemon, e.g. if optimizing X helps the daemon survive)
Does this seem correct? In particular, I don’t understand why upstream daemons would have to have a relatively benign goal.
Yeah that seems right. I think it’s a better summary of what Paul was talking about.