[edit: stefan made the same point below earlier than me]
Nice idea! I’m not sure why this would be evidence for residual networks being an ensemble of shallow circuits — it seems more like the opposite to me? If anything, low effective layer horizon implies that later layers are building more on the outputs of intermediate layers. In one extreme, a network with an effective layer horizon of 1 would only consist of circuits that route through every single layer. Likewise, for there to be any extremely shallow circuits that route directly from the inputs to the final layer, the effective layer horizon must be the number of layers in the network.
I do agree that low layer horizons would substantially simplify (in terms of compute) searching for circuits.
[edit: stefan made the same point below earlier than me]
Nice idea! I’m not sure why this would be evidence for residual networks being an ensemble of shallow circuits — it seems more like the opposite to me? If anything, low effective layer horizon implies that later layers are building more on the outputs of intermediate layers. In one extreme, a network with an effective layer horizon of 1 would only consist of circuits that route through every single layer. Likewise, for there to be any extremely shallow circuits that route directly from the inputs to the final layer, the effective layer horizon must be the number of layers in the network.
I do agree that low layer horizons would substantially simplify (in terms of compute) searching for circuits.