NickyP

Karma: 652

Nicky Pochinkov

https://nicky.pro

NickyP Apr 1, 2025, 10:06 PM
1 point
0
on: LessWrong has been acquired by EA
Hmm, it seems the when I achieved the virtue of The Void, it was absorbed by the void

NickyP Mar 7, 2025, 9:51 PM
1 point
0
in reply to: Martin Vlach’s comment on: Distillation of Meta’s Large Concept Models Paper
Yeah, the context length was 128 concepts for the small tests they did between architectures, and 2048 concepts for the larger models.

How this exactly translates is kind of variable. They limit the concepts to be around 200 characters, but this could be any number of tokens. They say they trained the large model on 2.7T tokens and 142B concepts, so on average 19 tokens per concept.

The 128 would translate to 2.4k tokens, and the 2048 concepts would translate to approx 39k tokens.

NickyP Mar 6, 2025, 12:25 AM
1 point
0
in reply to: Kenoubi’s comment on: Literature Review of Text AutoEncoders
Yeah it was annoying to get working. I now have added a Google Colab in case anyone else wants to try anything.

It does seem interesting that the semantic arithmetic is hit or miss (mostly miss).

Energy Markets Temporal Arbitrage with Batteries

NickyPMar 4, 2025, 5:37 PM

21 points

3 comments16 min readLW link

Distillation of Meta’s Large Concept Models Paper

NickyPMar 4, 2025, 5:33 PM

19 points

3 comments4 min readLW link

NickyP Mar 4, 2025, 12:20 AM
2 points
0
in reply to: Kenoubi’s comment on: Literature Review of Text AutoEncoders
Thanks for reading, and yeah I was also surprised by how well it does. It does seem like there is degradation in auto-encoding from the translation, but I would guess that it probably does also make the embedding space have some nicer properties
I bet if you add Gaussian noise to them they still decode fine
I did try some small tests to see how sensitive the Sonar model is to noise, and it seems OK. I tried adding gaussian noise and it started breaking at around >0.5x the original vector size, or at around cosine similarity <0.9, but haven’t tested too deeply, and it seemed to depend a lot on the text.
There also appears to be a way to attempt to use this to enhance model capabilities
I meta’s newer “Large Concept Model” paper they do seem to manage to train a model solely on Sonar vectors for training, though I think they also fine-tune the Sonar model to get better results (here is a draft distillation I did. EDIT: decided to post it). It seems to have some benefits (processing long contexts becomes much easier), though they don’t test on many normal benchmarks, and it doesn’t seem much better than LLMs on those.
The SemFormers paper linked I think also tries to do some kind of “explicit planning” with a text auto-encoder but I haven’t read it too deeply yet. I briefly gleamed that it seemed to get better at graph traversal or something.
There are probably other things people will try, hopefully some that help make models more interpretable.
can we extract semantic information from this 1024-dimensional embedding vector in any way substantially more efficient than actually decoding it and reading the output?
Yeah I would like for there to be a good way of doing this in the general case. So far I haven’t come up with any amazing ideas that are not variations on “train a classifier probe”. I guess if you have a sufficiently good classifier probe setup you might be fine, but it doesn’t feel to me like something that works in the general case. I think there is a lot of room for people to try things though.
I wonder how much information there is in those 1024-dimensional embedding vectors… [Is there] a natural way to encode more tokens
I don’t think there is any explicit reason to limit to 512 tokens, but I guess it depends how much “detail” needs to be stored. In the Large Concept Models paper, the experiments on text segmentation did seem to degrade after around ~250 characters in length, but they only test n-gram BLEU scores.
I also guess that if you had a reinforcement loop setup like in the vec2text inversion paper, that you could probably do a good job getting even more accurate reconstructions from the model.
Exploring this embedding space seems super interesting
Yeah I agree, while it is probably imperfect, I think it seems like an interesting basis.

NickyP Mar 3, 2025, 11:08 PM
1 point
0
in reply to: Kenoubi’s comment on: ParaScope: Do Language Models Plan the Upcoming Paragraph?
Ok thanks, not sure why that happened but it should be fixed now.

ParaScopes: Do Language Models Plan the Upcoming Paragraph?

NickyPFeb 21, 2025, 4:50 PM

36 points

2 comments20 min readLW link

Literature Review of Text AutoEncoders

NickyPFeb 19, 2025, 9:54 PM

20 points

5 comments8 min readLW link

NickyP Jan 26, 2025, 1:30 AM
2 points
0
on: Monet: Mixture of Monosemantic Experts for Transformers Explained
The unlearning results seem promising!
The author’s results from unlearning MMLU seems slightly rushed but moderately promising (I previously wrote a paper trying similar things, making good comparisons here is difficult), but the results from unlearning different coding languages seem very strong (compared to my previous attempt), the model seems to be substantially more monosemantic.
I agree with your suspicions that the gemma SAE performance was poor from using reconstructed activations, matches the drop in performance I got when I tried doing this.
Would be interesting to see if, e.g. steering performance from MONET expert directions is also comparable to that of SAEs. Using SAEs in practice is quite costly so I would prefer an approach more similar to MONET.

Confusing the metric for the meaning: Perhaps correlated attributes are “natural”

NickyPJul 23, 2024, 12:43 PM

33 points

3 comments4 min readLW link

NickyP Jul 16, 2024, 11:00 AM
9 points
0
on: I found >800 orthogonal “write code” steering vectors
I wonder how much of these orthogonal vectors are “actually orthogonal” once we consider we are adding two vectors together, and that the model has things like LayerNorm.

If one conditions on downstream midlayer activations being “sufficiently different” it seems possible one could find like 10x degeneracy of actual effects these have on models. (A possibly relevant factor is how big the original activation vector is compared to the steering vector?)

Comparing Quantized Performance in Llama Models

NickyPJul 15, 2024, 4:01 PM

35 points

2 comments8 min readLW link

NickyP Dec 7, 2023, 1:37 PM
5 points
0
in reply to: RogerDearnaley’s comment on: Deep Forgetting & Unlearning for Safely-Scoped LLMs
I think there are already some papers doing similar work, though usually sold as reducing inference costs. For example, the MoEfication paper and Contextual Sparsity paper could probably be modified for this purpose.

NickyP Nov 29, 2023, 5:52 PM
2 points
0
in reply to: Jonathan Claybrough’s comment on: AISC 2024 - Project Summaries
Sorry! I have fixed this now

NickyP Nov 27, 2023, 10:39 PM
9 points
0
on: AI Safety Camp 2024
In case anyone finds it difficult to go through all the projects, I have made a longer post where each project title is followed by a brief description, and a list of the main skills/roles they are looking for.
See here: https://www.lesswrong.com/posts/npkvZG67hRvBneoQ9

AISC 2024 - Project Summaries

NickyPNov 27, 2023, 10:32 PM

48 points

3 comments18 min readLW link

Research Adenda: Modelling Trajectories of Language Models

NickyPNov 13, 2023, 2:33 PM

28 points

0 comments12 min readLW link

NickyP Oct 23, 2023, 5:21 PM
5 points
2
on: Which LessWrongers are (aspiring) YouTubers?
Cadenza Labs has some video explainers on interpretability-related concepts: https://www.youtube.com/@CadenzaLabs

For example, an intro to Causal Scrubbing:

Machine Unlearning Evaluations as Interpretability Benchmarks

NickyP and Nandi

Oct 23, 2023, 4:33 PM

33 points

2 comments11 min readLW link