1) It seems too weak: In the motivating scenario of Figure 3, isn’t is the case that “what the operator inputs” and “what’s in the memory register after 1 year” are “historically distributed identically”?
2) It seems too strong: aren’t real-world features and/or world-models “dense”? Shouldn’t I be able to find features arbitrarily close to F*? If I can, doesn’t that break the assumption?
3) Also, I don’t understand what you mean by: “it’s on policy behavior [is described as] simulating X”. It seems like you (rather/also) want to say something like “associating reward with X”?
1) It seems too weak: In the motivating scenario of Figure 3, isn’t is the case that “what the operator inputs” and “what’s in the memory register after 1 year” are “historically distributed identically”?
This assumption isn’t necessary to rule out memory-based world-models (see Figure 4). And yes you are correct that indeed it doesn’t rule them out.
2) It seems too strong: aren’t real-world features and/or world-models “dense”? Shouldn’t I be able to find features arbitrarily close to F*? If I can, doesn’t that break the assumption?
Yes. Yes. No. There are only finitely many short English sentences. (I think this answers your concern if I understand it correctly).
3) Also, I don’t understand what you mean by: “its on policy behavior [is described as] simulating X”. It seems like you (rather/also) want to say something like “associating reward with X”?
I don’t quite rely on the latter. Associating reward with X means that the rewards are distributed identically to X under all action sequences. Instead, the relevant implication here is: “the world-model’s on-policy behavior can be described as simulating X” implies “for on-policy action sequences, the world-model simulates X” which means “for on-policy action sequences, rewards are distributed identically to X.”
Also, it’s worth noting that this assumption (or rather, Lemma 3) also seems to preclude BoMAI optimizing anything *other* than revealed preferences (which others have noted seems problematic, although I think it’s definitely out of scope).
I don’t understand what you mean by a revealed preference. If you mean “that which is rewarded,” then it seems pretty straightforward to me that a reinforcement learner can’t optimize anything other than that which is rewarded (in the limit).
That’s why I said the “right” thing to do if you asked about cryonics “I will give you something to deny. I’ll create a perfect reality and you will be cured afterward.”
Comment thread: concerns with Assumption 4
Still wrapping my head around the paper, but...
1) It seems too weak: In the motivating scenario of Figure 3, isn’t is the case that “what the operator inputs” and “what’s in the memory register after 1 year” are “historically distributed identically”?
2) It seems too strong: aren’t real-world features and/or world-models “dense”? Shouldn’t I be able to find features arbitrarily close to F*? If I can, doesn’t that break the assumption?
3) Also, I don’t understand what you mean by: “it’s on policy behavior [is described as] simulating X”. It seems like you (rather/also) want to say something like “associating reward with X”?
This assumption isn’t necessary to rule out memory-based world-models (see Figure 4). And yes you are correct that indeed it doesn’t rule them out.
Yes. Yes. No. There are only finitely many short English sentences. (I think this answers your concern if I understand it correctly).
I don’t quite rely on the latter. Associating reward with X means that the rewards are distributed identically to X under all action sequences. Instead, the relevant implication here is: “the world-model’s on-policy behavior can be described as simulating X” implies “for on-policy action sequences, the world-model simulates X” which means “for on-policy action sequences, rewards are distributed identically to X.”
Also, it’s worth noting that this assumption (or rather, Lemma 3) also seems to preclude BoMAI optimizing anything *other* than revealed preferences (which others have noted seems problematic, although I think it’s definitely out of scope).
I don’t understand what you mean by a revealed preference. If you mean “that which is rewarded,” then it seems pretty straightforward to me that a reinforcement learner can’t optimize anything other than that which is rewarded (in the limit).
Yes, that’s basically what I mean. I think I’m trying to refer to the same issue that Paul mentioned here: https://www.lesswrong.com/posts/pZhDWxDmwzuSwLjou/asymptotically-benign-agi#ZWtTvMdL8zS9kLpfu
That’s why I said the “right” thing to do if you asked about cryonics “I will give you something to deny. I’ll create a perfect reality and you will be cured afterward.”