Johnny Lin comments on Neuronpedia

Johnny Lin Jul 26, 2023, 6:48 PM
6 points
Thanks so much for the feedback! Inline below:
Conceptual Feedback:
- I think it would be better if I could see two explanations and vote on which one I like better (when available).
  - When there are multiple explanations, Neuronpedia does display them.
  - However I’ve considered a different game mode where all you do is choose between This Vs That (no skipping, no new explanations). That may be a cool possibility!
- Attention heads are where a lot of the interesting stuff is happening, and need lots of interpretation work. Hopefully this sort of approach can be extended to that case.
  - Will put it on the TODO
- The three explanation limit kicked in just as I was starting to get into it. Hopefully you can get funding to allow for more, but in the meantime I would have budgeted my explanations more carefully if I had known this.
  - Sorry, the limit is daily, you can come back tomorrow. It currently costs $0.24 to do one explanation score.
  - Good idea re: showing limit on number of explanations somehow.
- I don’t feel like I should get a point for skipping, it makes the points feel meaningless.
  - Yeah I struggled with this a bit. But I didn’t want to incentivize people to vote for a bad explanation. E.g, if you only get a point for voting, then you’re more inclined (even subconsciously) to vote.
  - I’m open to being wrong on this. I’m not a game mechanics expert and happy to change it.
UX Feedback:
- I didn’t realize that clicking on the previous explanation would cast a vote and take me to the next question. I wanted to go back but I didn’t see a way to do that.
  - Great suggestion. Will add it to TODO.
- After submitting a new explanation and seeing that I didn’t beat the high score, I wanted to try submitting a better explanation, but it glitched out and skipped to the next question.
  - Hmm I’ll try to repro this. Thanks for reporting.
- I would like to know whether the explanation shown was the GPT-4 created one, or submitted by a user.
  - If you click “Simple” at the top right to toggle to Advanced Mode, it will show you the author and score of the explanations being shown.
- The blue area at the bottom takes up too much space at the expense of the main area (with the text samples).
  - Yes, I havent had time to optimize this. Currently it has that space because it will fit 3 explanations, and I wanted the UI to stay more static (and not “jump around”) based on the number of explanations. But you are right that this is annoying wasted space most of the time.
- It would be nice to be able to navigate to adjacent or related neurons from the neuron’s page.
  - Good idea. Added to TODO.
- Adele Lopez Jul 26, 2023, 9:47 PM
  6 points
  Parent
  I would really like to be able to submit my own explanations even if they can’t be judged right away. Maybe to save costs, you could only score explanations after they’ve been voted highly by users.
  
  Additionally, it seems clear that a lot of these neurons have polysemanticity, and it would be cool if there was a way to indicate the meanings separately. As a first thought, maybe something like using | to separate them e.g. the letter c in the middle of a word | names of towns near Berlin.
  - Johnny Lin Jul 26, 2023, 11:27 PM
    6 points
    Parent
    That’s a good idea—I think maybe I could make a “drafts” explanation list so you can queue it up for later. Unfortunately since the website just launched, there is not yet a reasonable threshold for “voted highly” since most explanations have none or very few explanations. But this is a good workaround for when the site is a bit older.
    Re: multiple meanings—this is interesting. I need to experiment with this more, but I don’t think you need to use any special syntax. By writing “the letter c in a word or names of towns near Berlin”, it should give you a score based on both of those. There is a related question of, should these neurons should have two highly-voted/rated explanations, or one highly-voted/rated explanation that has both explanations? I’ll put that on the TODO as an open question.
    EDIT: after thinking about this a bit more-
    PRO of multiple separate explanations: if a neuron has 4-5 different meanings it can get unwieldy quickly (and then users might submit an one that is identical, except just swapping the order of each OR explanation)
    CON of multiple separate explanations—we probably need ranked choice voting or multi-selection at some point… will put this on the TODO.
    Btw—would love to have you in the discord to stay updated and provide additional feedback for Neuronpedia! This is super helpful.
    - Adele Lopez Jul 29, 2023, 8:09 AM
      2 points
      Parent
      Thanks for the drafts feature!
      
      Yeah, it’s a tricky situation. It may even be worth using a model trained to avoid polysemanticity.
      
      I also think it would be make the game both more fun and more useful if you switched to a model like the TinyStories one, where it’s much smaller and trained on a more focused dataset.
      
      I may join the discord, but the invite on the website is expired currently fyi.
      - Johnny Lin Jul 29, 2023, 8:52 AM
        3 points
        Parent
        re: polysemanticity- have a big tweak to the game coming up that may help with this! i hope to get it out by early next week.
      - Johnny Lin Jul 29, 2023, 8:44 AM
        3 points
        Parent
        lol thanks. i can’t believe the link has been broken for so long on the site. it should be fixed in a few seconds from now. in the meantime if you’re interested: https://discord.gg/kpEJWgvdAx