I’ve read this twice and I’m still not sure whether I actually get your critique. My guess is you’re saying something like:
Daniel is too much taking the PONR as a thing; this leads him to both accidentally treat PONR as a specific point in time, and also to [?? mistake planning capability for “objective” feasibility ??]
I agree that the OP’s talking of PONR as a point in time doesn’t make sense; a charitable read is that it’s a toy model that’s supposed to help make it more clear what the difference is between our ability to prevent X and X actually happening (like in the movie Armageddon; did we nuke the asteroid soon enough for it to miss Earth vs. has the asteroid actually impacted Earth). I agree that asking about “our planning capability” is vague and gives different answers depending on what counterfactuals you’re using; in an extreme case of “what could we feasibly do”, there’s basically no PONR because we always “could” just sit down at a computer and type in a highly speed-prior-compressed source code of an FAI.
the AI historian-subprocesses of the future will record some sort of summary of the decision-relevant results of a billion billion ancestor simulations, but the answer is not going to fit in a 64-bit timestamp.
It won’t be a timestamp, but it will contain information about humans’s ability to plan. To extract useful lessons from its experience with coming into power surrounded by potentially hostile weak AGIs, a superintelligence has to compare its own developing models across time. It went from not understanding its situation and not knowing what to do to take control from the humans, to yes understanding and knowing, and along the way it was relevantly uncertain about what the humans were able to do.
Anyway, the above feels like it’s sort of skew to the thrust of the OP, which I think is: “notice that your feasible influence will decrease well before the AGI actually kills you with nanobots, so planning under a contrary assumption will produce nonsensical plans”. Maybe I’m just saying, yes it’s subjective how much we’re doomed at a given point, and yes we want our reasoning to be in a sense grounded in stuff actually happening, but also in order to usefully model in more detail what’s happening and what plans will work, we have to talk about stuff that’s intermediate in time and in abstraction between the nanobot end of the word, and the here-and-now. The intermediate stuff then says more specific stuff about when and how much influence you’re losing or gaining.
I don’t think we disagree about anything substantive, and I don’t expect Daniel to disagree about anything substantive after reading this. It’s just—
I agree that the OP’s talking of PONR as a point in time doesn’t make sense; a charitable read is that [...]
I don’t think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say “As a clarifying toy model [...]” rather than making the readers figure it out.
If you’re pessimistic about alignment—and especially if you have short timelines like Daniel—I think most of your point-of-no-return-ness should already be in the past.
I unfortunately was not clear about this, but I meant to define it in such a way that this is false by definition—“loss of influence” is defined relative to the amount of influence we currently have. So even if we had a lot more influence 5 years ago, the PONR is when what little influence we have left mostly dries up. :)
I don’t think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say “As a clarifying toy model [...]” rather than making the readers figure it out.
If by some chance this post does make it to further stages of the review, I will heavily edit it, and I’m happy to e.g. add in “As a clarifying toy model...” among other changes.
I’ve read this twice and I’m still not sure whether I actually get your critique. My guess is you’re saying something like:
I agree that the OP’s talking of PONR as a point in time doesn’t make sense; a charitable read is that it’s a toy model that’s supposed to help make it more clear what the difference is between our ability to prevent X and X actually happening (like in the movie Armageddon; did we nuke the asteroid soon enough for it to miss Earth vs. has the asteroid actually impacted Earth). I agree that asking about “our planning capability” is vague and gives different answers depending on what counterfactuals you’re using; in an extreme case of “what could we feasibly do”, there’s basically no PONR because we always “could” just sit down at a computer and type in a highly speed-prior-compressed source code of an FAI.
It won’t be a timestamp, but it will contain information about humans’s ability to plan. To extract useful lessons from its experience with coming into power surrounded by potentially hostile weak AGIs, a superintelligence has to compare its own developing models across time. It went from not understanding its situation and not knowing what to do to take control from the humans, to yes understanding and knowing, and along the way it was relevantly uncertain about what the humans were able to do.
Anyway, the above feels like it’s sort of skew to the thrust of the OP, which I think is: “notice that your feasible influence will decrease well before the AGI actually kills you with nanobots, so planning under a contrary assumption will produce nonsensical plans”. Maybe I’m just saying, yes it’s subjective how much we’re doomed at a given point, and yes we want our reasoning to be in a sense grounded in stuff actually happening, but also in order to usefully model in more detail what’s happening and what plans will work, we have to talk about stuff that’s intermediate in time and in abstraction between the nanobot end of the word, and the here-and-now. The intermediate stuff then says more specific stuff about when and how much influence you’re losing or gaining.
I don’t think we disagree about anything substantive, and I don’t expect Daniel to disagree about anything substantive after reading this. It’s just—
I don’t think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say “As a clarifying toy model [...]” rather than making the readers figure it out.
I unfortunately was not clear about this, but I meant to define it in such a way that this is false by definition—“loss of influence” is defined relative to the amount of influence we currently have. So even if we had a lot more influence 5 years ago, the PONR is when what little influence we have left mostly dries up. :)
If by some chance this post does make it to further stages of the review, I will heavily edit it, and I’m happy to e.g. add in “As a clarifying toy model...” among other changes.