StefanHex comments on How To Do Patching Fast

StefanHex 13 May 2024 10:20 UTC
1 point
0
So we can ‘train’ a circuit by optimizing the Mask parameters using gradient descent.
Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I’d love to see that comparison some day! (might be a project I should try!)
Edit: I remember @Buck and @dmz were suggesting something along those lines last year
Do you have a link to a writeup of Li et al. (2023) beyond the git repo?
- Joseph Miller 14 May 2024 17:30 UTC
  1 point
  0
  Parent
  Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I’d love to see that comparison some day!
  Yes it does work well! I did a kind of write up here but decided not to publish for various reasons.
  Do you have a link to a writeup of Li et al. (2023) beyond the git repo?
  https://arxiv.org/abs/2309.05973