jsteinhardt

Karma: 5,716

jsteinhardt Jun 25, 2023, 8:19 PM
10 points
1
in reply to: the gears to ascension’s comment on: Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell?
Melanie Mitchell and Meg Mitchell are different people. Melanie was the participant in this debate, but you seem to be ascribing Meg’s opinions to her, including linking to video interviews with Meg in your comments.

jsteinhardt Jun 10, 2023, 12:37 AM
2 points
0
in reply to: reallyeli’s comment on: What will GPT-2030 look like?
I’m leaving it to the moderators to keep the copies mirrored, or just accept that errors won’t be corrected on this copy. Hopefully there’s some automatic way to do that?

jsteinhardt Jun 9, 2023, 3:52 AM
5 points
0
in reply to: Daniel Kokotajlo’s comment on: What will GPT-2030 look like?
Oops, thanks, updated to fix this.

jsteinhardt Jun 9, 2023, 3:51 AM
2 points
0
in reply to: reallyeli’s comment on: What will GPT-2030 look like?
Thanks! I removed the link.

jsteinhardt Jun 9, 2023, 3:50 AM
2 points
0
in reply to: Sheikh Abdur Raheem Ali’s comment on: What will GPT-2030 look like?
Thanks! I removed the link.

What will GPT-2030 look like?

jsteinhardtJun 7, 2023, 11:40 PM

185 points

43 comments23 min readLW link

(bounded-regret.ghost.io)

jsteinhardt May 27, 2023, 5:46 PM
LW: 2 AF: 1
0
AF
in reply to: TurnTrout’s comment on: Steering GPT-2-XL by adding an activation vector
Glad it was helpful!

jsteinhardt May 19, 2023, 7:53 PM
LW: 65 AF: 33
24
AF
in reply to: TurnTrout’s comment on: Steering GPT-2-XL by adding an activation vector
Hi Alex,
Let me first acknowledge that your write-up is significantly more thorough than pretty much all content on LessWrong, and that I found the particular examples interesting. I also appreciated that you included a related work section in your write-up. The reason I commented on this post and not others is because it’s one of the few ML posts on LessWrong that seemed like it might teach me something, and I wish I had made that more clear before posting critical feedback (I was thinking of the feedback as directed at Oliver / Raemon’s moderation norms, rather than your work, but I realize in retrospect it probably felt directed at you).
I think the main important point is that there is a body of related work in the ML literature that explores fairly similar ideas, and LessWrong readers who care about AI alignment should be aware of this work, and that most LessWrong readers who read the post won’t realize this. I think it’s good to point out Dan’s initial mistake, but I took his substantive point to be what I just summarized, and it seems correct to me and hasn’t been addressed. (I also think Dan overfocused on Ludwig’s paper, see below for more of my take on related work.)
Here is how I currently see the paper situated in broader work (I think you do discuss the majority but not all of this):
* There is a lot of work studying activation vectors in computer vision models, and the methods here seem broadly similar to the methods there. This seems like the closest point of comparison.
* In language, there’s a bunch of work on controllable generation (https://arxiv.org/pdf/2201.05337.pdf) where I would be surprised if no one looked at modifying activations (at least I’d expect someone to try soft prompt tuning), but I don’t know for sure.
* On modifying activations in language models there is a bunch of stuff on patching / swapping, and on modifying stuff in the directions of probes.
I think we would probably both agree that this is the main set of related papers, and also both agree that you cited work within each of these branches (except maybe the second one). Where we differ is that I see all of this as basically variations on the same idea of modifying the activations or weights to control a model’s runtime behavior:
* You need to find a direction, which you can do either by learning a direction or by simple averaging. Simple averaging is more or less the same as one step of gradient descent, so I see these as conceptually similar.
* You can modify the activations or weights. Usually if an idea works in one case it works in the other case, so I also see these as similar.
* The modality can be language or vision. Most prior work has been on vision models, but some of that has also been on vision-language models, e.g. I’m pretty sure there’s a paper on averaging together CLIP activations to get controllable generation.
So I think it’s most accurate to say that you’ve adapted some well-explored ideas to a use case that you are particularly interested in. However, the post uses language like “Activation additions are a new way of interacting with LLMs”, which seems to be claiming that this is entirely new and unexplored, and I think this could mislead readers, as for instance Thomas Kwa’s response seems to suggest.
I also felt like Dan H brought up reasonable questions (e.g. why should we believe that weights vs. activations is a big deal? Why is fine-tuning vs. averaging important? Have you tried testing the difference empirically?) that haven’t been answered that would be good to at least more clearly acknowledge. The fact that he was bringing up points that seemed good to me that were not being directly engaged with was what most bothered me about the exchange above.
This is my best attempt to explain where I’m coming from in about an hour of work (spent e.g. reading through things and trying to articulate intuitions in LW-friendly terms). I don’t think it captures my full intuitions or the full reasons I bounced off the related work section, but hopefully it’s helpful.

jsteinhardt May 18, 2023, 11:24 PM
LW: 30 AF: 7
14
AF
in reply to: Raemon’s comment on: Steering GPT-2-XL by adding an activation vector
I’ll just note that I, like Dan H, find it pretty hard to engage with this post because I can’t tell whether it’s basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn’t really help in this regard.

I’m not sure what you mean about whether the post was “missing something important”, but I do think that you should be pretty worried about LessWrong’s collective epistemics that Dan H is the only one bringing this important point up, and that rather than being rewarded for doing so or engaged with on his substantive point, he’s being nitpicked by a moderator. It’s not an accident that no one else is bringing these points up—it’s because everyone else who has the expertise to do so has given up or judged it not worth their time, largely because of responses like the one Dan H is getting.

Complex Systems are Hard to Control

jsteinhardtApr 4, 2023, 12:00 AM

42 points

5 comments10 min readLW link

(bounded-regret.ghost.io)

Principles for Productive Group Meetings

jsteinhardtMar 22, 2023, 12:50 AM

60 points

1 comment13 min readLW link

(bounded-regret.ghost.io)

Emergent Deception and Emergent Optimization

jsteinhardtFeb 20, 2023, 2:40 AM

64 points

0 comments14 min readLW link

(bounded-regret.ghost.io)

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

RowanWang, Alexandre Variengien, Arthur Conmy, Buck and jsteinhardt

Oct 28, 2022, 11:55 PM

101 points

9 comments9 min readLW link 2 reviews

(arxiv.org)

jsteinhardt Oct 7, 2022, 2:51 AM
105 points
98
on: So, geez there’s a lot of AI content these days
Here is my take: since there’s so much AI content, it’s not really feasible to read all of it, so in practice I read almost none of it (and consequently visit LW less frequently).
The main issue I run into is that for most posts, on a brief skim it seems like basically a thing I have thought about before. Unlike academic papers, most LW posts do not cite previous related work nor explain how what they are talking about relates to this past work. As a result, if I start to skim a post and I think it’s talking about something I’ve seen before, I have no easy way of telling if they’re (1) aware of this fact and have something new to say, (2) aware of this fact but trying to provide a better exposition, or (3) unaware of this fact and reinventing the wheel. Since I can’t tell, I normally just bounce off.
I think a solution could be to have a stronger norm that posts about AI should say, and cite, what they are building on and how it relates / what is new. This would decrease the amount of content while improving its quality, and also make it easier to choose what to read. I view this as a win-win-win.
What links here?

jsteinhardt Jul 25, 2022, 1:59 AM
8 points
1
in reply to: Adam Jermyn’s comment on: Hiring Programmers in Academia
I think this might be an overstatement. It’s true that NSF tends not to fund developers, but in ML the NSF is only one of many funders (lots of faculty have grants from industry partnerships, for instance).

jsteinhardt Jul 21, 2022, 7:07 PM
4 points
0
on: Personal forecasting retrospective: 2020-2022
Thanks for writing this!
Regarding how surprise on current forecasts should factor into AI timelines, two takes I have:
* Given that all the forecasts seem to be wrong in the “things happened faster than we expected” direction, we should probably expect HLAI to happen faster than expected as well.
* It also seems like we should retreat more to outside views about general rates of technological progress, rather than forming a specific inside view (since the inside view seems to mostly end up being wrong).
I think a pure outside view would give a median of something like 35 years in my opinion (based on my very sketchy attempt of forming a dataset of when technical grant challenges were solved), and then ML progress seems to be happening quite quickly, so you should probably adjust down from that.
Actually pretty interested how you get to medians of 40 years, that seems longer than I’d predict without looking at any field-specific facts about ML, and then the field-specific facts mostly push towards shorter timelines.

Forecasting ML Benchmarks in 2023

jsteinhardtJul 18, 2022, 2:50 AM

36 points

20 comments12 min readLW link

(bounded-regret.ghost.io)

AI Forecasting: One Year In

jsteinhardtJul 4, 2022, 5:10 AM

132 points

12 comments6 min readLW link

(bounded-regret.ghost.io)

jsteinhardt Jun 12, 2022, 3:38 PM
2 points
in reply to: orausch’s comment on: How fast can we perform a forward pass?
Thanks! I just read over it and assuming I understood correctly, this bottleneck primarily happens for “small” operations like layer normalization and softlax, and not for large matrix multiples. In addition, these small operations are still the minority of runtime (40% in their case). So I think this is still consistent with my analysis, which assumes various things will creep in to keep GPU utilization around 40%, but that they won’t ever drive it to (say) 10%. Is this correct or have I misunderstood the nature of the bottleneck?

Edit: also maybe we’re just miscommunicating—I definitely don’t think CPU->HBM is a bottleneck, it’s instead the time to load from HBM which sounds the same as what you said. Unless I misread the A100 specs, that comes out to 1.5TB/s, which is the number I use throughout.

jsteinhardt Jun 11, 2022, 11:12 PM
11 points
in reply to: Evan R. Murphy’s comment on: How fast can we perform a forward pass?
Short answer: If future AI systems are doing R&D, it matters how quickly the R&D is happening.

jsteinhardt

What will GPT-2030 look like?

Com­plex Sys­tems are Hard to Control

Prin­ci­ples for Pro­duc­tive Group Meetings

Emer­gent De­cep­tion and Emer­gent Optimization

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

Fore­cast­ing ML Bench­marks in 2023

AI Fore­cast­ing: One Year In

Complex Systems are Hard to Control

Principles for Productive Group Meetings

Emergent Deception and Emergent Optimization

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

Forecasting ML Benchmarks in 2023

AI Forecasting: One Year In