Thanks Alex! Your original comment didn’t read as ill-intended to me, though I wish that you’d just messaged me directly. I could have easily missed your comment in this thread—I only saw it because you linked the thread in the comments on my post.
Your suggested rephrase helps to clarify how you think about the implications of the paper, but I’m looking for something shorter and more high-level to include in my talk. I’m thinking of using this summary, which is based on a sentence from the paper’s intro: “There are theoretical results showing that many decision-making algorithms have power-seeking tendencies.”
(Looking back, the sentence I used in the talk was a summary of the optimal policies paper, and then I updated the citation to point to the retargetability paper and forgot to update the summary...)
“There are theoretical results showing that many decision-making algorithms have power-seeking tendencies.”
I think this is reasonable, although I might say “suggesting” instead of “showing.” I think I might also be more cautious about further inferences which people might make from this—like I think a bunch of the algorithms I proved things about are importantly unrealistic. But the sentence itself seems fine, at first pass.
Thanks Alex! Your original comment didn’t read as ill-intended to me, though I wish that you’d just messaged me directly. I could have easily missed your comment in this thread—I only saw it because you linked the thread in the comments on my post.
Your suggested rephrase helps to clarify how you think about the implications of the paper, but I’m looking for something shorter and more high-level to include in my talk. I’m thinking of using this summary, which is based on a sentence from the paper’s intro: “There are theoretical results showing that many decision-making algorithms have power-seeking tendencies.”
(Looking back, the sentence I used in the talk was a summary of the optimal policies paper, and then I updated the citation to point to the retargetability paper and forgot to update the summary...)
I think this is reasonable, although I might say “suggesting” instead of “showing.” I think I might also be more cautious about further inferences which people might make from this—like I think a bunch of the algorithms I proved things about are importantly unrealistic. But the sentence itself seems fine, at first pass.