Keenan Pepper comments on Transformers Represent Belief State Geometry in their Residual Stream

Keenan Pepper 20 Apr 2024 21:29 UTC
1 point
0
Can you share the hyperparameters used to make this figure?
- Keenan Pepper 20 Apr 2024 23:51 UTC
  1 point
  0
  Parent
  Ah, never mind, I believe I found the relevant hyperparameters here: https://github.com/adamimos/epsilon-transformers/blob/main/examples/msp_analysis.ipynb
  In particular, the stuff I needed was that it has only a single attention head per layer, and 4 layers.
  - Keenan Pepper 23 Apr 2024 0:21 UTC
    1 point
    0
    Parent
    Actually I would still really appreciate the training hyperparameters like batch size, learning rate schedule...
    - tropea@gwu.edu 1 May 2024 22:34 UTC
      1 point
      0
      Parent
      A simple suggestion on word usage: from “belief state” to “interpretive state.” This would align your comments better with disciplines more concerned with behavior than cognition. JL Tropea.
      - Keenan Pepper 13 May 2024 10:24 UTC
        1 point
        0
        Parent
        I think you may have meant this as a top-level comment rather than a reply to my comment?