So we can ‘train’ a circuit by optimizing the Mask parameters using gradient descent.
Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I’d love to see that comparison some day! (might be a project I should try!)
Edit: I remember @Buck and @dmz were suggesting something along those lines last year
Do you have a link to a writeup of Li et al. (2023) beyond the git repo?
Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I’d love to see that comparison some day!
Yes it does work well! I did a kind of write up here but decided not to publish for various reasons.
Do you have a link to a writeup of Li et al. (2023) beyond the git repo?
Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I’d love to see that comparison some day! (might be a project I should try!)
Edit: I remember @Buck and @dmz were suggesting something along those lines last year
Do you have a link to a writeup of Li et al. (2023) beyond the git repo?
Yes it does work well! I did a kind of write up here but decided not to publish for various reasons.
https://arxiv.org/abs/2309.05973