That only applies to someone trying to predict the value of X before the disciple is created. “It is hard to tell the value of X even once the disicple already exists (or doesn’t) given certain observations.” is pretty similar to many of the measurements of reduced impact described in your post.
If X is not purely random, tricky issues can emerge—like if X is to be decided by some politician that has promised to say X=1, then the AI may interpret X=0 as happening more likely in a wold where that politician has a brain tumour, or something. We really want X to not tell us anything more about the world than the value of X.
To give a concrete example, what if the value of X is determined by whatever random process, then someone breaks into the facility, takes apart the box where the AI is held, and measures the value of X. Unless the AI thinks that this event is extremely unlikely, it can be used to blackmail it.
That only applies to someone trying to predict the value of X before the disciple is created. “It is hard to tell the value of X even once the disicple already exists (or doesn’t) given certain observations.” is pretty similar to many of the measurements of reduced impact described in your post.
If X is not purely random, tricky issues can emerge—like if X is to be decided by some politician that has promised to say X=1, then the AI may interpret X=0 as happening more likely in a wold where that politician has a brain tumour, or something. We really want X to not tell us anything more about the world than the value of X.
To give a concrete example, what if the value of X is determined by whatever random process, then someone breaks into the facility, takes apart the box where the AI is held, and measures the value of X. Unless the AI thinks that this event is extremely unlikely, it can be used to blackmail it.