The second paper is just about linear connectivity, and does seem to suggest that linearly connected models run similar algorithms. But I guess I don’t expect neural net training to go in straight lines? (Altho I suppose momentum helps with this?)
The second paper is just about linear connectivity, and does seem to suggest that linearly connected models run similar algorithms. But I guess I don’t expect neural net training to go in straight lines? (Altho I suppose momentum helps with this?)