I would say Corrigibility paper shares the same “feel” with certain cryptography papers. I think it is true that this feel is distinct, and not true that it means they are “not real”.
For example, what does it mean for cryptosystem to be secure? This is an important topic with impressive achievements, but it does feel different from bolts and nuts of cryptography like how to perform differential cryptanalysis. Indistinguishability under chosen plain text attack, the standard definition of semantic security in cryptography, does sound like “make up rules and then pretend they describe reality and prove results”.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this. Proofs are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this.
I suspect one of the reasons OP feels dissatisfied about the corrigibility paper is that it is not the equivalent of Shannon’s seminal results, which generally gave the correct definition of terms, but instead merely gesturing at a problem (“we have no idea how to formalize corrigibility!”).
That being said, I resonate a lot with this part of the reply:
Proofs [in conceptual/definition papers] are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
I do like the comparison to cryptography, as that is a field I “take seriously” and does also have the issue of it being very difficult to “fairly” define terms.
Indistinguishability under chosen plain text attack being the definition for something to be canonically “secure” seems a lot more defensible than “properly modeling this random weird utility game maybe means something for AGI ??” but I get why it’s a similar sort of issue
I would say Corrigibility paper shares the same “feel” with certain cryptography papers. I think it is true that this feel is distinct, and not true that it means they are “not real”.
For example, what does it mean for cryptosystem to be secure? This is an important topic with impressive achievements, but it does feel different from bolts and nuts of cryptography like how to perform differential cryptanalysis. Indistinguishability under chosen plain text attack, the standard definition of semantic security in cryptography, does sound like “make up rules and then pretend they describe reality and prove results”.
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this. Proofs are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
I suspect one of the reasons OP feels dissatisfied about the corrigibility paper is that it is not the equivalent of Shannon’s seminal results, which generally gave the correct definition of terms, but instead merely gesturing at a problem (“we have no idea how to formalize corrigibility!”).
That being said, I resonate a lot with this part of the reply:
I do like the comparison to cryptography, as that is a field I “take seriously” and does also have the issue of it being very difficult to “fairly” define terms.
Indistinguishability under chosen plain text attack being the definition for something to be canonically “secure” seems a lot more defensible than “properly modeling this random weird utility game maybe means something for AGI ??” but I get why it’s a similar sort of issue