I like the “cut” framing, and I’m happy someone else is having a go at these sorts of questions from a somewhat different angle.
Let’s say we want to express the following program:
def program(a, b, c):
if a:
return b + c
else:
return b - c
I’m not sure I understand the problem. Neural networks can implement operations equivalent to an if. They’re going to be somewhat complicated, but that’s to be expected. An if just isn’t an elementary operation to arithmetic. It takes some non-linearities to build up.
Layer Activation Space is a generalization of looking at neurons: If we optimize activations for the length of the projection onto ei then this is the same as disregarding all components except the ith neuron and maximizing its activation.
My current idea is that we should treat layer activation space, or in your more general framing, cut activation space, by looking at it as a space of functions, mapping the input to the activations. Then, look at an orthogonal basis in this function space, where “orthogonal” means something like “every feature is independent of every other feature”. Right now, I’m trying out the L2 Hilbert space norm for this.
With this method, I’d hope to avoid issues of redundancy and cancellations when e.g. calculating modularity scores. It should also, I hope, effectively prune non-meaningful connections.
I like the “cut” framing, and I’m happy someone else is having a go at these sorts of questions from a somewhat different angle.
I’m not sure I understand the problem. Neural networks can implement operations equivalent to an if. They’re going to be somewhat complicated, but that’s to be expected. An if just isn’t an elementary operation to arithmetic. It takes some non-linearities to build up.
My current idea is that we should treat layer activation space, or in your more general framing, cut activation space, by looking at it as a space of functions, mapping the input to the activations. Then, look at an orthogonal basis in this function space, where “orthogonal” means something like “every feature is independent of every other feature”. Right now, I’m trying out the L2 Hilbert space norm for this.
With this method, I’d hope to avoid issues of redundancy and cancellations when e.g. calculating modularity scores. It should also, I hope, effectively prune non-meaningful connections.