Could you elaborate on “For NN Model 1, the belief is encoded in the learned parameters θ∈Θ. For NN Model 2, the belief is encoded in the architecture itself y_”?
If A=AT (i.e.A is symmetric), then xTAy=yTAx. The first model would (we suppose) learn a symmetric A, because in reality siblingness is symmetric. The second model uses a matrix that will always be symmetric, no matter what it’s learned.
(In reality the first model presumably wouldn’t learn an exactly-symmetric matrix, but we could talk about “close enough” and/or about behavior in the limit.)
Note that the distinction between hinge beliefs and free beliefs does not supervene on the black-box behaviour of NNs/LLMs. It depends on how the belief is implemented, how the belief is learned, how the belief might change, etc.
(2)
“The second model uses a matrix that will always be symmetric, no matter what it’s learned.” might make it seem that the two models are more similar than they actually are.
You might think that both models store an n×n matrix A, and the architecture of both models is xTAy, but Model 1 has a slightly symmetric matrix A whereas Model 2 has an exactly symmetric matrix A. But this isn’t true. The second model doesn’t store a symmetric matrix — it stores an upper triangle.
Could you elaborate on “For NN Model 1, the belief is encoded in the learned parameters θ∈Θ. For NN Model 2, the belief is encoded in the architecture itself y_”?
If A=AT (i.e.A is symmetric), then xTAy=yTAx. The first model would (we suppose) learn a symmetric A, because in reality siblingness is symmetric. The second model uses a matrix that will always be symmetric, no matter what it’s learned.
(In reality the first model presumably wouldn’t learn an exactly-symmetric matrix, but we could talk about “close enough” and/or about behavior in the limit.)
Yep, exactly!
Two things to note:
(1)
Note that the distinction between hinge beliefs and free beliefs does not supervene on the black-box behaviour of NNs/LLMs. It depends on how the belief is implemented, how the belief is learned, how the belief might change, etc.
(2)
“The second model uses a matrix that will always be symmetric, no matter what it’s learned.” might make it seem that the two models are more similar than they actually are.
You might think that both models store an n×n matrix A, and the architecture of both models is xTAy, but Model 1 has a slightly symmetric matrix A whereas Model 2 has an exactly symmetric matrix A. But this isn’t true. The second model doesn’t store a symmetric matrix — it stores an upper triangle.