Adria (other coauthor on the paper) told me that Redwood ported ACDC to Rust early on, which did provide a useful speedup (idk how much but my guess is 10-100x?) but made it harder to maintain. I’m currently working in Python. I wouldn’t believe those marketing numbers for anything but the Mandelbrot task they test it on, which has particularly high interpreter overhead.
The bigger problem with ACDC is it doesn’t use gradients. Attribution patching fixes this and gets >100x speedups, and there should be even better methods. I don’t expect circuit discovery to be usefully ported to a fast language again, until it is used in production.
Adria (other coauthor on the paper) told me that Redwood ported ACDC to Rust early on, which did provide a useful speedup (idk how much but my guess is 10-100x?) but made it harder to maintain. I’m currently working in Python. I wouldn’t believe those marketing numbers for anything but the Mandelbrot task they test it on, which has particularly high interpreter overhead.
The bigger problem with ACDC is it doesn’t use gradients. Attribution patching fixes this and gets >100x speedups, and there should be even better methods. I don’t expect circuit discovery to be usefully ported to a fast language again, until it is used in production.