Johnny Lin

Karma: 466

working on neuronpedia

Johnny Lin Jul 27, 2023, 5:47 PM
2 points
in reply to: Hoagy’s comment on: Neuronpedia—AI Safety Game
Thank you Hoagy. Expanding beyond the neuron unit is a high priority. I’d like to work with you, Logan Riggs, and others to figure out a good way to make this happen in the next major update so that people can easily view, test, and contribute. I’m now creating a new channel on the discord (#directions) to discuss this: https://discord.gg/kpEJWgvdAx, or I’ll DM you my email if you prefer that.

Johnny Lin Jul 27, 2023, 5:33 PM
3 points
in reply to: Logan Riggs’s comment on: Neuronpedia—AI Safety Game
Hi Logan—thanks for your response. Your dictionaries post is on the TODO to investigate and integrate (someone had referred me to it two weeks ago) - I’d love to make it happen.
Thanks for joining the Discord. Let’s discuss when I have a few days to get caught up with your work.

Johnny Lin Jul 27, 2023, 4:59 PM
2 points
in reply to: Neel Nanda’s comment on: Neuronpedia—AI Safety Game
Hi Neel, thanks for playing and thanks for all your incredible work. Neuronpedia uses a ton of your stuff.
Re: polysemantic neurons—yes, I should address this before wider distribution. Some current ideas—if you have a preference please let me know.
1. Your proposed “this is a mess” button
2. Allow voting on more than one option at a time (users can do multiple votes for explanations per neuron on the neuron’s page, but the game automatically moves on to a new neuron after one vote to keep it more “game-like”)
3. Encourage “or” explanations: “cat or tomato or purple”
4. Add a warning for users
Re: Open Web Text tokenizer bug—thank you! this will make the text more legible. i’ll make the display change and footnote.
EDIT: the double unknown chars should now show up as one apostrophe. AFAIK there is no way to tell the difference between a single quote unknown char and a double quote unknown char, but it’s probably ok to just show it as single quote for now

Johnny Lin Jul 27, 2023, 4:52 PM
3 points
in reply to: Pazzaz’s comment on: Neuronpedia—AI Safety Game
Sorry—New Discord link (changed to a “Community Server”) https://discord.gg/kpEJWgvdAx

Johnny Lin Jul 27, 2023, 4:52 PM
2 points
in reply to: Harry Nyquist’s comment on: Neuronpedia—AI Safety Game
Sorry—New Discord link (changed to a “Community Server”) https://discord.gg/kpEJWgvdAx

Johnny Lin Jul 27, 2023, 7:07 AM
1 point
in reply to: mako yass’s comment on: Neuronpedia—AI Safety Game
Should be working now.
Also, thank you for the feedback re- janky tutorial/signin. I will fix that. It is truly a terrible way to have a first experience with a product.
EDIT: the tutorial → sign in friction has been updated.

Johnny Lin Jul 27, 2023, 6:41 AM
5 points
in reply to: mako yass’s comment on: Neuronpedia—AI Safety Game
hey mako—sorry about the issues. i’m looking into it right now. will update asap
edit: looks like the EC2 instance hard crashed. i can’t even restart it from AWS console. i am starting up a new instance with more RAM.
edit2: confirmed via syslog (after taking a long time to restart the old server) it was OOM. new machine has 8x more ram. added monitoring and will investigate potential memory leaks tomorrow

Johnny Lin Jul 27, 2023, 2:33 AM
6 points
in reply to: duck_master’s comment on: Neuronpedia—AI Safety Game
Hi duck_master, thank you for playing and appreciate the tip. Maybe it’s worth compiling these tips and putting it under a “tips” popup/page on the main site. Also—please consider joining the Discord if you’re willing to offer more feedback and suggestions: https://discord.gg/kpEJWgvdAx
Apologies for the limit. It currently costs ~$0.24 to do each explanation score and it’s coming from my personal funds, so I’m capping it daily until I can hopefully get approved for a grant. A few hours ago I raised the limit from 3 new explanations per day to 10 new explanations per day.

Johnny Lin Jul 27, 2023, 2:27 AM
4 points
in reply to: Nathaniel Monson’s comment on: Neuronpedia—AI Safety Game
Hi Nathan, thanks for playing and pointing out the issue. My apologies for the inappropriate text.
Half the text samples are from Open Web Text, which is scraped web data that GPT2 was trained on. I don’t know the exact details, but I believe some of it was reddit and other places.
If you DM me the neurons address next time you see them, I can start compiling a filter. I will also try to look for an open source library to categorize into safe and not safe.
My apologies again. This is a beta experiment, thanks for putting up with this while I fix the issues.

Johnny Lin Jul 27, 2023, 12:43 AM
2 points
in reply to: TinkerBird’s comment on: Neuronpedia—AI Safety Game
Thank you TinkerBird. I hope so too!

Johnny Lin Jul 26, 2023, 11:35 PM
6 points
in reply to: JenniferRM’s comment on: Neuronpedia—AI Safety Game
Hi Jennifer,
Thanks for participating—my apologies for only having GitHub login at the moment. Please feel free to create a throwaway Github account if you’d still like to play (I think Github allows you to use disposable emails to sign up—I had no problem creating an account using an iCloud disposable email). Email/password login is definitely on the TODO.

Johnny Lin Jul 26, 2023, 11:27 PM
6 points
in reply to: Adele Lopez’s comment on: Neuronpedia—AI Safety Game
That’s a good idea—I think maybe I could make a “drafts” explanation list so you can queue it up for later. Unfortunately since the website just launched, there is not yet a reasonable threshold for “voted highly” since most explanations have none or very few explanations. But this is a good workaround for when the site is a bit older.
Re: multiple meanings—this is interesting. I need to experiment with this more, but I don’t think you need to use any special syntax. By writing “the letter c in a word or names of towns near Berlin”, it should give you a score based on both of those. There is a related question of, should these neurons should have two highly-voted/rated explanations, or one highly-voted/rated explanation that has both explanations? I’ll put that on the TODO as an open question.
EDIT: after thinking about this a bit more-
PRO of multiple separate explanations: if a neuron has 4-5 different meanings it can get unwieldy quickly (and then users might submit an one that is identical, except just swapping the order of each OR explanation)
CON of multiple separate explanations—we probably need ranked choice voting or multi-selection at some point… will put this on the TODO.
Btw—would love to have you in the discord to stay updated and provide additional feedback for Neuronpedia! This is super helpful.

Johnny Lin Jul 26, 2023, 9:56 PM
5 points
in reply to: Martin Fell’s comment on: Neuronpedia—AI Safety Game
Hi Martin,
Thanks for playing! I agree there is some risk of confirmation bias, and the option to hide explanations by default is very interesting.
The reason it is designed the way it is now is because I’d prefer to avoid too many duplicate explanations. Currently, you can only submit explanations that are not exact duplicates, though you can submit explanations that are very similar -e.g, “banana” vs “bananas”.
The first downside would be that duplicate explanations may clutter up the voting options. The second downside is when someone is looking at the two explanations later, the vote may be split between the two similar explanations—meaning a third explanation that is worse might actually win (e.g, “cherry” vs “banana(s)”).
HOWEVER—those are not insurmountable downsides. the server just has to have a better duplicate/similarity check (maybe even asking GPT4), like check for plurals—and if you explain similarly to an existing explanation, it just automatically upvotes that. I think it’s definitely worth experimenting. The similarity check would have to not be too loose, otherwise we may lose out on great explanations that appear to only be marginally different but actually score very differently.
Please keep the feedback coming and join the discord if you’d like to keep updated.

Johnny Lin Jul 26, 2023, 9:42 PM
5 points
in reply to: AdamYedidia’s comment on: Neuronpedia—AI Safety Game
Hi Adam and thanks for your feedback / suggestion. Residual Viewer looks awesome. I have DMed you to chat more about it!

Johnny Lin Jul 26, 2023, 7:11 PM
3 points
in reply to: Chris_Leong’s comment on: Neuronpedia—AI Safety Game
Good idea. I haven’t done enough research on why some forums have upvotes only, went with my instinct on this but I should look into the pros/cons.

Johnny Lin Jul 26, 2023, 6:54 PM
4 points
in reply to: Chris_Leong’s comment on: Neuronpedia—AI Safety Game
EDIT: this update was pushed just now. it will warn you on your first vote to confirm that you want to vote.

Thanks for playing, Chris!
I’ll work on the voting thing. I’ll probably just add a “first-timer’s” warning on your first vote to ensure that you want to vote for that.
FYI—if you want to unvote, just go to your profile (neuronpedia.org/user/[username]), click the neuron you voted for, and click to unvote on the left side.

Johnny Lin Jul 26, 2023, 6:48 PM
6 points
in reply to: Adele Lopez’s comment on: Neuronpedia—AI Safety Game
Thanks so much for the feedback! Inline below:
Conceptual Feedback:
- I think it would be better if I could see two explanations and vote on which one I like better (when available).
  - When there are multiple explanations, Neuronpedia does display them.
  - However I’ve considered a different game mode where all you do is choose between This Vs That (no skipping, no new explanations). That may be a cool possibility!
- Attention heads are where a lot of the interesting stuff is happening, and need lots of interpretation work. Hopefully this sort of approach can be extended to that case.
  - Will put it on the TODO
- The three explanation limit kicked in just as I was starting to get into it. Hopefully you can get funding to allow for more, but in the meantime I would have budgeted my explanations more carefully if I had known this.
  - Sorry, the limit is daily, you can come back tomorrow. It currently costs $0.24 to do one explanation score.
  - Good idea re: showing limit on number of explanations somehow.
- I don’t feel like I should get a point for skipping, it makes the points feel meaningless.
  - Yeah I struggled with this a bit. But I didn’t want to incentivize people to vote for a bad explanation. E.g, if you only get a point for voting, then you’re more inclined (even subconsciously) to vote.
  - I’m open to being wrong on this. I’m not a game mechanics expert and happy to change it.
UX Feedback:
- I didn’t realize that clicking on the previous explanation would cast a vote and take me to the next question. I wanted to go back but I didn’t see a way to do that.
  - Great suggestion. Will add it to TODO.
- After submitting a new explanation and seeing that I didn’t beat the high score, I wanted to try submitting a better explanation, but it glitched out and skipped to the next question.
  - Hmm I’ll try to repro this. Thanks for reporting.
- I would like to know whether the explanation shown was the GPT-4 created one, or submitted by a user.
  - If you click “Simple” at the top right to toggle to Advanced Mode, it will show you the author and score of the explanations being shown.
- The blue area at the bottom takes up too much space at the expense of the main area (with the text samples).
  - Yes, I havent had time to optimize this. Currently it has that space because it will fit 3 explanations, and I wanted the UI to stay more static (and not “jump around”) based on the number of explanations. But you are right that this is annoying wasted space most of the time.
- It would be nice to be able to navigate to adjacent or related neurons from the neuron’s page.
  - Good idea. Added to TODO.

Neuronpedia

Johnny LinJul 26, 2023, 4:29 PM

135 points

51 comments2 min readLW link

(neuronpedia.org)