philh comments on Wittgenstein and ML — parameters vs architecture

philh 26 Mar 2023 10:27 UTC
5 points
1
If $A = A^{T}$ (i.e. $A$ is symmetric), then $x^{T} A y = y^{T} A x$ . The first model would (we suppose) learn a symmetric $A$ , because in reality siblingness is symmetric. The second model uses a matrix that will always be symmetric, no matter what it’s learned.

(In reality the first model presumably wouldn’t learn an exactly-symmetric matrix, but we could talk about “close enough” and/or about behavior in the limit.)
- Cleo Nardo 26 Mar 2023 12:06 UTC
  3 points
  1
  Parent
  Yep, exactly!
  Two things to note:
  (1)
  
  Note that the distinction between hinge beliefs and free beliefs does not supervene on the black-box behaviour of NNs/LLMs. It depends on how the belief is implemented, how the belief is learned, how the belief might change, etc.
  (2)
  “The second model uses a matrix that will always be symmetric, no matter what it’s learned.” might make it seem that the two models are more similar than they actually are.
  You might think that both models store an $n \times n$ matrix $A$ , and the architecture of both models is $x^{T} A y$ , but Model 1 has a slightly symmetric matrix $A$ whereas Model 2 has an exactly symmetric matrix $A$ . But this isn’t true. The second model doesn’t store a symmetric matrix — it stores an upper triangle.