So the quantitative experiment you propose is a good idea—and we will be working along these lines, extending the very preliminary experiments in the post about how big of an effect edits like this will have.
In terms of the polytopes, you are right that this doesn’t really fit in with that framework but assumes a pure linear directions framework. We aren’t really wedded to any specific viewpoint and are trying a lot of different perspectives to try to figure out what the correct ontology to understand neural network internals is.
So the quantitative experiment you propose is a good idea—and we will be working along these lines, extending the very preliminary experiments in the post about how big of an effect edits like this will have.
In terms of the polytopes, you are right that this doesn’t really fit in with that framework but assumes a pure linear directions framework. We aren’t really wedded to any specific viewpoint and are trying a lot of different perspectives to try to figure out what the correct ontology to understand neural network internals is.