My point is that SFOT likely never work in any environment relevant to AI Alignement, where such diagonal methods show any Agent with a fixed Objective Function is crippled by an adequate counter.
Therefore SFOT should not be used when exploring AI alignement.
Can SFOT hold in ad-hoc limited situations that do not represent the real world? Maybe, but that was not my point.
Finding one counter-example that shows SFOT does not hold in a specific setting (Clippy in my scenario) proves that it does not hold in general, which was my goal.
My point is that SFOT likely never work in any environment relevant to AI Alignement, where such diagonal methods show any Agent with a fixed Objective Function is crippled by an adequate counter.
Therefore SFOT should not be used when exploring AI alignement.
Can SFOT hold in ad-hoc limited situations that do not represent the real world? Maybe, but that was not my point.
Finding one counter-example that shows SFOT does not hold in a specific setting (Clippy in my scenario) proves that it does not hold in general, which was my goal.