Yep that’s right, thanks! Corrected.
CallumMcDougall
huh interesting, I wasn’t aware of this, thanks for sending it!
Thanks for the suggestion! I’ve edited the first diagram to clarify things, is this what you had in mind?
The first week of WMLB / MLAB maps quite closely onto the first week of ARENA, with a few exceptions (ARENA includes PyTorch Lightning, plus some more meta stuff like typechecking, VSCode testing and debugging, using GPT in your workflow, etc). I’d say that starting some way through the second week would probably be most appropriate. If you didn’t want to repeat stuff on training / sampling from transformers, the mech interp material would start on Wednesday of the second week.
Resolved by private message, but I’m just mentioning this here for others who might be reading this—we didn’t have confirmation emails set up, but we expect to send out coding assessments to applicants tomorrow (Monday 24th April). For people who apply after this point, we’ll generally try to send out coding assessments no later than 24 hours after your application.
Yeah, I think this would be possible. In theory, you could do something like:
Study relevant parts of the week 0 material before the program starts (we might end up creating a virtual group to accommodate this, which also contains people who either don’t get an offer or can’t attend but still want to study the material.)
Join at the start of the 3rd week—at that point there will be 3 days left of the transformers chapter (which is 8 days long and has 4 days of core content), so you could study (most of) the core content and then transition to RL with the rest of the group (and there would be opportunities to return to the transformers & mech interp material during the bonus parts of later chapters / capstone projects, if you wanted.)
How feasible this is would depend on your prereqs and past experience I imagine. Either way, you’re definitely welcome to apply!
Not a direct answer, but this post has a ton of useful advice that I think would be applicable here: https://www.neelnanda.io/blog/mini-blog-post-19-on-systems-living-a-life-of-zero-willpower
Awesome, really glad to hear it was helpful, thanks for commenting!
Yep, fixed, thanks!
Or “prompting” ? Seems short and memorable, not used in many other contexts so its meaning would become clear, and it fits in with other technical terms that people are currently using in news articles, e.g. “prompt engineering”. (Admittedly though, it might be a bit premature to guess what language people will use!)
This is awesome, I love it! Thanks for sharing (-:
Thank you :-)
Thanks, really appreciate it!
I think some of the responses here do a pretty good job of this. It’s not really what I intended to go into with my post since I was trying to keep it brief (although I agree this seems like it would be useful).
And yeah, despite a whole 16 lecture course on convex opti I still don’t really get Bregman either, I skipped the exam questions on it 😆
Oh yeah, I hadn’t considered that one. I think it’s interesting, but the intuitions are better in the opposite direction, i.e. you can build on good intuitions for to better understand MI. I’m not sure if you can easily get intuitions to point in the other direction (i.e. from MI to ), because this particular expression has MI as an expectation over , rather than the other way around. E.g. I don’t think this expression illuminates the nonsymmetry of .
The way it’s written here seems more illuminating (not sure if that’s the one that you meant). This gets across the idea that:
is the true reality, and is our (possibly incorrect) model which assumes independence. The mutual information between and equals , i.e. the extent to which modelling and as independent (sharing no information) is a poor way of modelling the true state of affairs (where they do share information).
But again I think this intuition works better in the other direction, since it builds on intuitions for to better explain MI. The arguments in the expression aren’t arbitrary (i.e. we aren’t working with ), which restricts the amount this can tell us about in general.
Oh yeah, I really like this one, thanks! The intuition here is again that a monomodal distribution is a bad model for a bimodal one because it misses out on an entire class of events, but the other way around is much less bad because there’s no large class of events that happen in reality but that your model fails to represent.
For people reading here, this post discusses this idea in more detail. The image to have in mind is this one:
Love that this exists! Looks like the material here will make great jumping off points when learning more about any of these orgs, or discussing them with others
Thanks Nihalm, also I wasn’t aware of it being free! CraigMichael maybe you didn’t find it cause it’s under “Rationality: From AI to Zombies” not “Sequences”?
The narration is pretty good imo, although one disadvantage is it’s a pain to navigate to specific posts cause they aren’t titled (it’s the whole thing, not the highlights).
Thanks, really appreciate it!