Here I’m using “feature” only with its simplest meaning: a direction in activation space. A truth-like feature only means “a direction in activation space with low CCS loss”, which is exactly what CCS enables you to find. By the example above, I show that there can be exponentially many of them. Therefore, the theorems above do not apply.
Maybe requiring directions found by CCS to be “actual features” (satisfying the conditions of those theorems) might enable you to improve CCS. But I don’t know what those conditions are.
Here I’m using “feature” only with its simplest meaning: a direction in activation space. A truth-like feature only means “a direction in activation space with low CCS loss”, which is exactly what CCS enables you to find. By the example above, I show that there can be exponentially many of them. Therefore, the theorems above do not apply.
Maybe requiring directions found by CCS to be “actual features” (satisfying the conditions of those theorems) might enable you to improve CCS. But I don’t know what those conditions are.