I agree with the overall point (that this was a solid intellectual contribution and is a reasonable-ish metric), but there’s been a non-zero amount of followups or at least use cases of this work, imo. Off the top of my head:
In general, CaSc has been used on lots of toy/tiny models to a decent level of success. I agree that part of the reason for CaSc’s lack of adoption is that the metric consistently returns “this explanation is not very faithful/complete/etc”. For example:
I checked the hypotheses for the toy modular arithmetic/group composition work with my own hand-crafted CaSc implementation and found that the modular arithmetic results held up quite well.
CaSc-style tests were used by Marius and Stefan to confirm their solutions to Stephen Casper’s Mech Interp challenges (challenge 1, challenge 2).
etc.
Erik Jenner’s agenda is pretty closely related to causal scrubbing and is still actively being worked on.
I agree with the overall point (that this was a solid intellectual contribution and is a reasonable-ish metric), but there’s been a non-zero amount of followups or at least use cases of this work, imo. Off the top of my head:
In general, CaSc has been used on lots of toy/tiny models to a decent level of success. I agree that part of the reason for CaSc’s lack of adoption is that the metric consistently returns “this explanation is not very faithful/complete/etc”. For example:
I checked the hypotheses for the toy modular arithmetic/group composition work with my own hand-crafted CaSc implementation and found that the modular arithmetic results held up quite well.
CaSc-style tests were used by Marius and Stefan to confirm their solutions to Stephen Casper’s Mech Interp challenges (challenge 1, challenge 2).
etc.
Erik Jenner’s agenda is pretty closely related to causal scrubbing and is still actively being worked on.
Thanks for the links! I agree that the usecases are non-zero.