I’m leaving it to the moderators to keep the copies mirrored, or just accept that errors won’t be corrected on this copy. Hopefully there’s some automatic way to do that?
jsteinhardt
Oops, thanks, updated to fix this.
Thanks! I removed the link.
Thanks! I removed the link.
What will GPT-2030 look like?
Glad it was helpful!
Hi Alex,
Let me first acknowledge that your write-up is significantly more thorough than pretty much all content on LessWrong, and that I found the particular examples interesting. I also appreciated that you included a related work section in your write-up. The reason I commented on this post and not others is because it’s one of the few ML posts on LessWrong that seemed like it might teach me something, and I wish I had made that more clear before posting critical feedback (I was thinking of the feedback as directed at Oliver / Raemon’s moderation norms, rather than your work, but I realize in retrospect it probably felt directed at you).
I think the main important point is that there is a body of related work in the ML literature that explores fairly similar ideas, and LessWrong readers who care about AI alignment should be aware of this work, and that most LessWrong readers who read the post won’t realize this. I think it’s good to point out Dan’s initial mistake, but I took his substantive point to be what I just summarized, and it seems correct to me and hasn’t been addressed. (I also think Dan overfocused on Ludwig’s paper, see below for more of my take on related work.)
Here is how I currently see the paper situated in broader work (I think you do discuss the majority but not all of this):
* There is a lot of work studying activation vectors in computer vision models, and the methods here seem broadly similar to the methods there. This seems like the closest point of comparison.
* In language, there’s a bunch of work on controllable generation (https://arxiv.org/pdf/2201.05337.pdf) where I would be surprised if no one looked at modifying activations (at least I’d expect someone to try soft prompt tuning), but I don’t know for sure.
* On modifying activations in language models there is a bunch of stuff on patching / swapping, and on modifying stuff in the directions of probes.
I think we would probably both agree that this is the main set of related papers, and also both agree that you cited work within each of these branches (except maybe the second one). Where we differ is that I see all of this as basically variations on the same idea of modifying the activations or weights to control a model’s runtime behavior:
* You need to find a direction, which you can do either by learning a direction or by simple averaging. Simple averaging is more or less the same as one step of gradient descent, so I see these as conceptually similar.
* You can modify the activations or weights. Usually if an idea works in one case it works in the other case, so I also see these as similar.
* The modality can be language or vision. Most prior work has been on vision models, but some of that has also been on vision-language models, e.g. I’m pretty sure there’s a paper on averaging together CLIP activations to get controllable generation.So I think it’s most accurate to say that you’ve adapted some well-explored ideas to a use case that you are particularly interested in. However, the post uses language like “Activation additions are a new way of interacting with LLMs”, which seems to be claiming that this is entirely new and unexplored, and I think this could mislead readers, as for instance Thomas Kwa’s response seems to suggest.
I also felt like Dan H brought up reasonable questions (e.g. why should we believe that weights vs. activations is a big deal? Why is fine-tuning vs. averaging important? Have you tried testing the difference empirically?) that haven’t been answered that would be good to at least more clearly acknowledge. The fact that he was bringing up points that seemed good to me that were not being directly engaged with was what most bothered me about the exchange above.
This is my best attempt to explain where I’m coming from in about an hour of work (spent e.g. reading through things and trying to articulate intuitions in LW-friendly terms). I don’t think it captures my full intuitions or the full reasons I bounced off the related work section, but hopefully it’s helpful.
I’ll just note that I, like Dan H, find it pretty hard to engage with this post because I can’t tell whether it’s basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn’t really help in this regard.
I’m not sure what you mean about whether the post was “missing something important”, but I do think that you should be pretty worried about LessWrong’s collective epistemics that Dan H is the only one bringing this important point up, and that rather than being rewarded for doing so or engaged with on his substantive point, he’s being nitpicked by a moderator. It’s not an accident that no one else is bringing these points up—it’s because everyone else who has the expertise to do so has given up or judged it not worth their time, largely because of responses like the one Dan H is getting.
Complex Systems are Hard to Control
Principles for Productive Group Meetings
Emergent Deception and Emergent Optimization
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
Here is my take: since there’s so much AI content, it’s not really feasible to read all of it, so in practice I read almost none of it (and consequently visit LW less frequently).
The main issue I run into is that for most posts, on a brief skim it seems like basically a thing I have thought about before. Unlike academic papers, most LW posts do not cite previous related work nor explain how what they are talking about relates to this past work. As a result, if I start to skim a post and I think it’s talking about something I’ve seen before, I have no easy way of telling if they’re (1) aware of this fact and have something new to say, (2) aware of this fact but trying to provide a better exposition, or (3) unaware of this fact and reinventing the wheel. Since I can’t tell, I normally just bounce off.
I think a solution could be to have a stronger norm that posts about AI should say, and cite, what they are building on and how it relates / what is new. This would decrease the amount of content while improving its quality, and also make it easier to choose what to read. I view this as a win-win-win.
- Natural Abstractions: Key Claims, Theorems, and Critiques by Mar 16, 2023, 4:37 PM; 241 points) (
- Trying to disambiguate different questions about whether RLHF is “good” by Dec 14, 2022, 4:03 AM; 106 points) (
- AI Neorealism: a threat model & success criterion for existential safety by Dec 15, 2022, 1:42 PM; 67 points) (
- ML Safety Research Advice—GabeM by Jul 23, 2024, 1:45 AM; 29 points) (
- Apr 29, 2023, 3:29 PM; 21 points) 's comment on [SEE NEW EDITS] No, *You* Need to Write Clearer by (
- Jan 8, 2023, 5:40 PM; 1 point) 's comment on Nothing New: Productive Reframing by (
I think this might be an overstatement. It’s true that NSF tends not to fund developers, but in ML the NSF is only one of many funders (lots of faculty have grants from industry partnerships, for instance).
Thanks for writing this!
Regarding how surprise on current forecasts should factor into AI timelines, two takes I have:
* Given that all the forecasts seem to be wrong in the “things happened faster than we expected” direction, we should probably expect HLAI to happen faster than expected as well.
* It also seems like we should retreat more to outside views about general rates of technological progress, rather than forming a specific inside view (since the inside view seems to mostly end up being wrong).
I think a pure outside view would give a median of something like 35 years in my opinion (based on my very sketchy attempt of forming a dataset of when technical grant challenges were solved), and then ML progress seems to be happening quite quickly, so you should probably adjust down from that.
Actually pretty interested how you get to medians of 40 years, that seems longer than I’d predict without looking at any field-specific facts about ML, and then the field-specific facts mostly push towards shorter timelines.
Forecasting ML Benchmarks in 2023
AI Forecasting: One Year In
Thanks! I just read over it and assuming I understood correctly, this bottleneck primarily happens for “small” operations like layer normalization and softlax, and not for large matrix multiples. In addition, these small operations are still the minority of runtime (40% in their case). So I think this is still consistent with my analysis, which assumes various things will creep in to keep GPU utilization around 40%, but that they won’t ever drive it to (say) 10%. Is this correct or have I misunderstood the nature of the bottleneck?
Edit: also maybe we’re just miscommunicating—I definitely don’t think CPU->HBM is a bottleneck, it’s instead the time to load from HBM which sounds the same as what you said. Unless I misread the A100 specs, that comes out to 1.5TB/s, which is the number I use throughout.
Short answer: If future AI systems are doing R&D, it matters how quickly the R&D is happening.
Melanie Mitchell and Meg Mitchell are different people. Melanie was the participant in this debate, but you seem to be ascribing Meg’s opinions to her, including linking to video interviews with Meg in your comments.