Hey, really enjoyed this post, thanks! Did you consider using a binary codebook, i.e. a set of vectors [b_0, …, b_k] where b_i is binary? This gives the latent space more structure and may endow each dimension of the codes with its own meaning, so we can get away with interpreting dimensions rather than full codes. I’m thinking more in line with how SAE latent vars are interpreted. You note in the post:
There’s notoriously a lot of tricks involved in training a VQ-VAE. For instance:
Do you think this should intrinsically make it hard to train a binary version? On some toy experiments with synthetic data i’m finding the codebook underutilised. (i’ve now realised FSQ may solve this problem)
Hey, really enjoyed this post, thanks! Did you consider using a binary codebook, i.e. a set of vectors [b_0, …, b_k] where b_i is binary? This gives the latent space more structure and may endow each dimension of the codes with its own meaning, so we can get away with interpreting dimensions rather than full codes. I’m thinking more in line with how SAE latent vars are interpreted. You note in the post:
Do you think this should intrinsically make it hard to train a binary version? On some toy experiments with synthetic data i’m finding the codebook underutilised. (i’ve now realised FSQ may solve this problem)