He likely means a formal statement of the claim about decision systems that would take the form something like “Under the following formal definition of a decision system, as long as the following pathological/stupid conditions don’t hold, a decision system will not seek to modify its goals.” There are a fair number of mathematical theorems that have forms close to this where we can prove something for some large set of things but there are edge cases where we can’t. That’s the sort of thing Eliezer is talking about here (although we don’t even have a really satisfactory definition of decision system at this point so what Eliezer wants is very optimistic here.)
He likely means a formal statement of the claim about decision systems that would take the form something like “Under the following formal definition of a decision system, as long as the following pathological/stupid conditions don’t hold, a decision system will not seek to modify its goals.” There are a fair number of mathematical theorems that have forms close to this where we can prove something for some large set of things but there are edge cases where we can’t. That’s the sort of thing Eliezer is talking about here (although we don’t even have a really satisfactory definition of decision system at this point so what Eliezer wants is very optimistic here.)