So I think it is an accurate description, in that it flags that “options” is not just the normal intuitive version of options.
I think the quoted description is not at all what the theorems in the paper show, no matter what concept the word “options” (in scare quotes) refers to. In order to apply the theorems we need to show that an involution with certain properties exist; not that <some set of things after action 1> is larger than <some set of things after action 2>.
To be more specific, the concept that the word “options” refers to here is recurrent state distributions. If the quoted description was roughly correct, there would not be a problem with applying the theorems in stochastic environments. But in fact the theorems can almost never be applied in stochastic environments. For example, suppose action 1 leads to more available “options”, and action 2 causes “immediate death” with probability 0.7515746, and that precise probability does not appear in any transition that follows action 1. We cannot apply the theorems because no involution with the necessary properties exists.
You’re being unhelpfully pedantic. The quoted portion even includes the phrase “As a quick summary (read the paper and sequence if you want more details)”! This reads to me as an attempted pre-emption of “gotcha” comments.
The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads. But this post isn’t about the stochastic sensitivity issue, and I don’t think it should have to talk about the sensitivity issue.
The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads. But this post isn’t about the stochastic sensitivity issue, and I don’t think it should have to talk about the sensitivity issue.
I noticed that after my previous comment you’ve edited your comment to include the page number and the link. Thanks.
I still couldn’t find in the paper (top of page 9) an explanation for the “stochastic sensitivity issue”. Perhaps you were referring to the following:
randomly generated MDPs are unlikely to satisfy our sufficient conditions for POWER-seeking tendencies
But the issue is with stochastic MDPs, not randomly generated MDPs.
Re the linked post section, I couldn’t find there anything about stochastic MDPs.
For (3), environments which “almost” have the right symmetries should also “almost” obey the theorems. To give a quick, non-legible sketch of my reasoning:
For the uniform distribution over reward functions on the unit hypercube ([0,1]|S|), optimality probability should be Lipschitz continuous on the available state visit distributions (in some appropriate sense). Then if the theorems are “almost” obeyed, instrumentally convergent actions still should have extremely high probability, and so most of the orbits still have to agree.
So I don’t currently view (3) as a huge deal. I’ll probably talk more about that another time.
That quote does not seem to mention the “stochastic sensitivity issue”. In the post that you linked to, “(3)” refers to:
Not all environments have the right symmetries
But most ones we think about seem to
So I’m still not sure what you meant when you wrote “The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads.”
(Again, I’m not aware of any previous mention of the “stochastic sensitivity issue” other than in my comment here.)
The phenomena you discuss are explainted in the paper, and in other posts, and discussed at length in other comment threads.
I haven’t found an explanation about the “stochastic sensitivity issue” in the paper, can you please point me to a specific section/page/quote? All that I found about this in the paper was the sentence:
Our theorems apply to stochastic environments, but we present a deterministic case study for clarity.
(I’m also not aware of previous posts/threads that discuss this, other than my comment here.)
I brought up this issue as a demonstration of the implications of incorrectly assuming that the theorems in the paper apply when there are more “options” available after action 1 than after action 2.
(I argue that this issue shows that the informal description in the OP does not correctly describe the theorems in the paper, and it’s not just a matter of omitting details.)
I think the quoted description is not at all what the theorems in the paper show, no matter what concept the word “options” (in scare quotes) refers to. In order to apply the theorems we need to show that an involution with certain properties exist; not that <some set of things after action 1> is larger than <some set of things after action 2>.
To be more specific, the concept that the word “options” refers to here is recurrent state distributions. If the quoted description was roughly correct, there would not be a problem with applying the theorems in stochastic environments. But in fact the theorems can almost never be applied in stochastic environments. For example, suppose action 1 leads to more available “options”, and action 2 causes “immediate death” with probability 0.7515746, and that precise probability does not appear in any transition that follows action 1. We cannot apply the theorems because no involution with the necessary properties exists.
You’re being unhelpfully pedantic. The quoted portion even includes the phrase “As a quick summary (read the paper and sequence if you want more details)”! This reads to me as an attempted pre-emption of “gotcha” comments.
The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads. But this post isn’t about the stochastic sensitivity issue, and I don’t think it should have to talk about the sensitivity issue.
I noticed that after my previous comment you’ve edited your comment to include the page number and the link. Thanks.
I still couldn’t find in the paper (top of page 9) an explanation for the “stochastic sensitivity issue”. Perhaps you were referring to the following:
But the issue is with stochastic MDPs, not randomly generated MDPs.
Re the linked post section, I couldn’t find there anything about stochastic MDPs.
That quote does not seem to mention the “stochastic sensitivity issue”. In the post that you linked to, “(3)” refers to:
So I’m still not sure what you meant when you wrote “The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads.”
(Again, I’m not aware of any previous mention of the “stochastic sensitivity issue” other than in my comment here.)
I haven’t found an explanation about the “stochastic sensitivity issue” in the paper, can you please point me to a specific section/page/quote? All that I found about this in the paper was the sentence:
(I’m also not aware of previous posts/threads that discuss this, other than my comment here.)
I brought up this issue as a demonstration of the implications of incorrectly assuming that the theorems in the paper apply when there are more “options” available after action 1 than after action 2.
(I argue that this issue shows that the informal description in the OP does not correctly describe the theorems in the paper, and it’s not just a matter of omitting details.)