james__p

Karma: 22

Intro to Multi-Agent Safety

james__pApr 13, 2025, 5:40 PM

3 points

0 comments5 min readLW link

james__p Mar 22, 2025, 1:22 PM
1 point
0
in reply to: TheManxLoiner’s comment on: Conditional Importance in Toy Models of Superposition
Thanks for the thoughts --
- I used the term “importance” since this was the term used in Anthropic’s original paper. I agree that (unlike in a real model) my toy scenario doesn’t contain sufficient information to deduce the context from the input data.
- I like your phrasing of the task—it does a great job of concisely highlighting the ‘Mathematical Intuition for why Conditional Importance “doesn’t matter”’
- Interesting that the experiment was helpful for you!

james__p Mar 10, 2025, 10:08 PM
1 point
0
in reply to: TheManxLoiner’s comment on: Thoughts on Toy Models of Superposition
Just to check, in the toy scenario, we assume the features in R^n are the coordinates in the default basis. So we have n features X_1, …, X_n
Yes, that’s correct.
Separately, do you have intuition for why they allow network to learn b too? Why not set b to zero too?
My understanding is that the bias is thought to be useful for two reasons:
- It is preferable to be able to output a non-zero value for features the model chooses not to represent (namely their expected values)
- Negative bias allows the model to zero-out small interferences, by shifting the values negative such that the ReLU outputs zero. I think empirically when these toy models are exhibiting lots of superposition, the bias vector typically has many negative entries.

james__p Feb 13, 2025, 1:51 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Conditional Importance in Toy Models of Superposition
Yeah I agree that with hindsight, the conclusion could be better explained and motivated from first principles, rather than by running an experiment. I wrote this post in the order in which I actually tried things as I wanted to give an honest walkthrough of the process that lead me to the conclusion, but I can appreciate that it doesn’t optimise for ease to follow.

Conditional Importance in Toy Models of Superposition

james__pFeb 2, 2025, 8:35 PM

9 points

4 comments10 min readLW link

Thoughts on Toy Models of Superposition

james__pFeb 2, 2025, 1:52 PM

5 points

2 comments9 min readLW link

Reflections on ML4Good

james__pNov 25, 2024, 2:40 AM

12 points

0 comments1 min readLW link