I would really like to be able to submit my own explanations even if they can’t be judged right away. Maybe to save costs, you could only score explanations after they’ve been voted highly by users.
Additionally, it seems clear that a lot of these neurons have polysemanticity, and it would be cool if there was a way to indicate the meanings separately. As a first thought, maybe something like using | to separate them e.g. the letter c in the middle of a word | names of towns near Berlin.
That’s a good idea—I think maybe I could make a “drafts” explanation list so you can queue it up for later. Unfortunately since the website just launched, there is not yet a reasonable threshold for “voted highly” since most explanations have none or very few explanations. But this is a good workaround for when the site is a bit older.
Re: multiple meanings—this is interesting. I need to experiment with this more, but I don’t think you need to use any special syntax. By writing “the letter c in a word or names of towns near Berlin”, it should give you a score based on both of those. There is a related question of, should these neurons should have two highly-voted/rated explanations, or one highly-voted/rated explanation that has both explanations? I’ll put that on the TODO as an open question.
EDIT: after thinking about this a bit more- PRO of multiple separate explanations: if a neuron has 4-5 different meanings it can get unwieldy quickly (and then users might submit an one that is identical, except just swapping the order of each OR explanation) CON of multiple separate explanations—we probably need ranked choice voting or multi-selection at some point… will put this on the TODO.
Btw—would love to have you in the discord to stay updated and provide additional feedback for Neuronpedia! This is super helpful.
Yeah, it’s a tricky situation. It may even be worth using a model trained to avoid polysemanticity.
I also think it would be make the game both more fun and more useful if you switched to a model like the TinyStories one, where it’s much smaller and trained on a more focused dataset.
I may join the discord, but the invite on the website is expired currently fyi.
lol thanks. i can’t believe the link has been broken for so long on the site. it should be fixed in a few seconds from now. in the meantime if you’re interested: https://discord.gg/kpEJWgvdAx
I would really like to be able to submit my own explanations even if they can’t be judged right away. Maybe to save costs, you could only score explanations after they’ve been voted highly by users.
Additionally, it seems clear that a lot of these neurons have polysemanticity, and it would be cool if there was a way to indicate the meanings separately. As a first thought, maybe something like using
|
to separate them e.g.the letter c in the middle of a word | names of towns near Berlin
.That’s a good idea—I think maybe I could make a “drafts” explanation list so you can queue it up for later. Unfortunately since the website just launched, there is not yet a reasonable threshold for “voted highly” since most explanations have none or very few explanations. But this is a good workaround for when the site is a bit older.
Re: multiple meanings—this is interesting. I need to experiment with this more, but I don’t think you need to use any special syntax. By writing “the letter c in a word or names of towns near Berlin”, it should give you a score based on both of those. There is a related question of, should these neurons should have two highly-voted/rated explanations, or one highly-voted/rated explanation that has both explanations? I’ll put that on the TODO as an open question.
EDIT: after thinking about this a bit more-
PRO of multiple separate explanations: if a neuron has 4-5 different meanings it can get unwieldy quickly (and then users might submit an one that is identical, except just swapping the order of each OR explanation)
CON of multiple separate explanations—we probably need ranked choice voting or multi-selection at some point… will put this on the TODO.
Btw—would love to have you in the discord to stay updated and provide additional feedback for Neuronpedia! This is super helpful.
Thanks for the drafts feature!
Yeah, it’s a tricky situation. It may even be worth using a model trained to avoid polysemanticity.
I also think it would be make the game both more fun and more useful if you switched to a model like the TinyStories one, where it’s much smaller and trained on a more focused dataset.
I may join the discord, but the invite on the website is expired currently fyi.
re: polysemanticity- have a big tweak to the game coming up that may help with this! i hope to get it out by early next week.
lol thanks. i can’t believe the link has been broken for so long on the site. it should be fixed in a few seconds from now. in the meantime if you’re interested: https://discord.gg/kpEJWgvdAx