Neel Nanda comments on A circuit for Python docstrings in a 4-layer attention-only transformer