Maybe in this case it’s a “confusion” shard? While it seems to be planning and produce optimizing behavior, it’s not clear that it will behave as a utility maximizer.
Adrià Garriga-alonso
Crafting Polysemantic Transformer Benchmarks with Known Circuits
Thank you!! I agree it’s a really good mesa-optimizer candidate, it remains to see now exactly how good. It’s a shame that I only found out about it about a year ago :)
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Asking for an acquaintance. If I know some graduate-level machine learning, and have read ~most of the recent mechanistic interpretability literature, and have made good progress understanding a small-ish neural network in the last few months.
Is ARENA for me, or will it teach things I mostly already know?
(I advised this person that they already have ARENA-graduate level, but I want to check in case I’m wrong.)
Compact Proofs of Model Performance via Mechanistic Interpretability
How did you feed the data into the model and get predictions? Was there a prompt and then you got the model’s answer? Then you got the logits from the API? What was the prompt?
Catastrophic Goodhart in RL with KL penalty
Thank you for working on this Joseph!
Thank you! Could you please provide more context? I don’t know what ‘E’ you’re referring to.
An evaluation of circuit evaluation metrics
Ophiology (or, how the Mamba architecture works)
That’s a lot of things done, congratulations!
That’s very cool, maybe I should try to do that for important talks. Though I suppose almost always you have slide aid, so it may not be worth the time investment.
Maybe being a guslar is not so different from telling a joke 2294 lines long
That’s a very good point! I think the level of ability required is different but it seems right.
The guslar’s songs are (and were of course already in the 1930-1950s) also printed, so the analogy may be closer than you thought.
Is there a reason I should want to?
I don’t know, I can’t tell you that. If I had to choose I also strongly prefer literacy.
But I didn’t know there was a tradeoff there! I thought literacy was basically unambiguously positive—whereas now I think it is net highly positive.
Also I strongly agree with frontier64 that the skill that is lost is rough memorization + live composition, which is a little different.
It’s definitely not exact memorization, but it’s almost more impressive than that, it’s rough memorization + composition to fit the format.
They memorize the story, with particular names; and then sing it with consitent decasyllabic metre and rhyme. Here’s an example song transcribed with its recording: Ropstvo Janković Stojana (The Captivity of Janković Stojan)
the collection: https://mpc.chs.harvard.edu/lord-collection-1950-51/
I’m curious what you mean, but I don’t entirely understand. If you give me a text representation of the level I’ll run it! :) Or you can do so yourself
Here’s the text representation for level 53