A quick example of how paper reading works in my research:
2017: Cyclegan comes out, and produces cool pictures of zebras and horses. I skim the paper because it seems cool, file away the concept, but don’t make an effort to replicate the results because in my experience GANs are obnoxious to train
2018: “Which Training Methods for GANs do actually Converge?” comes out, but even though it contains the crucial insight to making GAN’s trainable, I don’t read it because it’s not very popular- I never see it
2019: Stylegan comes out, and cites “Which Training Methods for GANs do actually Converge?” I read both papers, mostly forget stylegan because it seems like a “we have big gpu do good science” paper, but am very impressed with “Which Training Methods for GANs do actually Converge?” and take a day or two to replicate it.
2020?: Around this time I also read all of gwern’s anime training exploits, and update my priors towards “maybe large gans are actually trainable.”
2022: I need to convert unlabeled dxa images into matching radiographs as part of a larger project. I’m generally of the opinion that GANs aren’t actually useful, but the problem matches the problem solved by cyclegan exactly, and I’m out of options. I initally try the open source cyclegan codebase, but as expected it’s wildly unstable and miserable. I recall that “Which Training Methods for GANs do actually Converge?” had pretty strong theory backing up gradient penalties on the discriminator, and I was able to replicate their experiments, so I dust off that replicating code, verify that it still works, add a cycle consistency loss, and am able to translate my images. Image translator in hand, I slog back into the larger problem.
--
What does this have to do with the paper reading cargo cult?
- Papers that you can replicate by downloading a code base are useful, but papers that you can replicate from the text without seeing code are solid gold. If there are any paper reading clubs out there that ask the presenter to replicate the results without looking at the author’s code, I would love to join- not just because the replication is valuable, but because it would narrow down the kinds of papers presented in a valuable way.
- Reading all the most hyped GAN papers, which is basically what I did, would probably not get me an awesome research result in the field of GANs. However, it served me pretty well as a researcher in an adjacent field. In particular, the obscure but golden insight eventually filtered its way into the citations of the hyped fluffy flagship paper. For alignment research, hanging out in a few paper reading groups that are distantly related to alignment should be useful, even if an alignment research group isn’t useful.
- I had to read so many papers to come across 3 useful ones for this problem. However, I retain the papers that haven’t been useful yet- there’s decent odds that I’ve already read the paper that I’ll need to overcome the next hurdle.
- This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so ¯\_(ツ)_/¯
If there are any paper reading clubs out there that ask the presenter to replicate the results without looking at the author’s code, I would love to join
This is something that I would be interested in as well. I’ve been attempting to reproduce MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attentionfrom scratch, but I am finding it difficult, partially due to my present lack of experience with reproducing DL papers. The code for MQTransformer is not available, at least to my knowledge. Also, there are several other papers which use LSTMs or Transformers architectures for forecasting that I hope to reproduce and/or employ for use on Metaculus API data in the coming few months. If reproducing ML papers from scratch and replicating their results (especially DL for forecasting) sounds interesting (perhaps I could publish these reproductions w/ additional tests in ReScience C) to anyone, please DM me, as I would be willing to collaborate.
Hi there—one of the authors of MQTransformer here. Feel free to send us an email and we can help you with this! (Our emails should be on the paper—if you cant find it, let us know here and we’ll add it)
There’s no difference in the actual model (or its architecture) - but we realized that the “trades” (this can be made more precise if you’d like) MQT would be a martingale against encompass a large class of volatility definitions, so we gave an example of a novel volatility measure (or a trade) that isn’t the classical definition and showed MQT works well against it (Theorem 8.1 and Eqn 14).
>This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so shrugs.
As an advisor used to remark that working on applications can lead to directions related to more fundamental research. How it can happen is something like this: 1. Try to apply method to domain; 2. Realize shortcomings of method; 3. Find & attempt solutions to address shortcoming; 4. If shortcoming isn’t well-addressed or has room for improvement despite step 3 then you _might_ have a fundamental problem on hand. Note that while this provides direction, it doesn’t guarantee that the direction is a one that is solve-able in the next t months.
excellent comment. Not everyone needs to push the envelope of the field they read papers on. Applications are just as important (collectively even more so!) as the foundational theory, and replication work already is the most major step/hurdle towards an application, even if it’s a toy, on a more applied field/problem.
I wouldn’t mind that kind of reading club, either :)
A quick example of how paper reading works in my research:
2017: Cyclegan comes out, and produces cool pictures of zebras and horses. I skim the paper because it seems cool, file away the concept, but don’t make an effort to replicate the results because in my experience GANs are obnoxious to train
2018: “Which Training Methods for GANs do actually Converge?” comes out, but even though it contains the crucial insight to making GAN’s trainable, I don’t read it because it’s not very popular- I never see it
2019: Stylegan comes out, and cites “Which Training Methods for GANs do actually Converge?” I read both papers, mostly forget stylegan because it seems like a “we have big gpu do good science” paper, but am very impressed with “Which Training Methods for GANs do actually Converge?” and take a day or two to replicate it.
2020?: Around this time I also read all of gwern’s anime training exploits, and update my priors towards “maybe large gans are actually trainable.”
2022: I need to convert unlabeled dxa images into matching radiographs as part of a larger project. I’m generally of the opinion that GANs aren’t actually useful, but the problem matches the problem solved by cyclegan exactly, and I’m out of options. I initally try the open source cyclegan codebase, but as expected it’s wildly unstable and miserable. I recall that “Which Training Methods for GANs do actually Converge?” had pretty strong theory backing up gradient penalties on the discriminator, and I was able to replicate their experiments, so I dust off that replicating code, verify that it still works, add a cycle consistency loss, and am able to translate my images. Image translator in hand, I slog back into the larger problem.
--
What does this have to do with the paper reading cargo cult?
- Papers that you can replicate by downloading a code base are useful, but papers that you can replicate from the text without seeing code are solid gold. If there are any paper reading clubs out there that ask the presenter to replicate the results without looking at the author’s code, I would love to join- not just because the replication is valuable, but because it would narrow down the kinds of papers presented in a valuable way.
- Reading all the most hyped GAN papers, which is basically what I did, would probably not get me an awesome research result in the field of GANs. However, it served me pretty well as a researcher in an adjacent field. In particular, the obscure but golden insight eventually filtered its way into the citations of the hyped fluffy flagship paper. For alignment research, hanging out in a few paper reading groups that are distantly related to alignment should be useful, even if an alignment research group isn’t useful.
- I had to read so many papers to come across 3 useful ones for this problem. However, I retain the papers that haven’t been useful yet- there’s decent odds that I’ve already read the paper that I’ll need to overcome the next hurdle.
- This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so ¯\_(ツ)_/¯
This is something that I would be interested in as well. I’ve been attempting to reproduce MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention from scratch, but I am finding it difficult, partially due to my present lack of experience with reproducing DL papers. The code for MQTransformer is not available, at least to my knowledge. Also, there are several other papers which use LSTMs or Transformers architectures for forecasting that I hope to reproduce and/or employ for use on Metaculus API data in the coming few months. If reproducing ML papers from scratch and replicating their results (especially DL for forecasting) sounds interesting (perhaps I could publish these reproductions w/ additional tests in ReScience C) to anyone, please DM me, as I would be willing to collaborate.
Hi there—one of the authors of MQTransformer here. Feel free to send us an email and we can help you with this! (Our emails should be on the paper—if you cant find it, let us know here and we’ll add it)
This is great; thank you! I will send an email in the coming month. Also, I suppose a quick clarification, but what’s the relation between: MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention and MQTransformer: Multi-Horizon Forecasts with Context Dependent Attention and Optimal Bregman Volatility
Looking forward to it!
There’s no difference in the actual model (or its architecture) - but we realized that the “trades” (this can be made more precise if you’d like) MQT would be a martingale against encompass a large class of volatility definitions, so we gave an example of a novel volatility measure (or a trade) that isn’t the classical definition and showed MQT works well against it (Theorem 8.1 and Eqn 14).
>This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so shrugs.
As an advisor used to remark that working on applications can lead to directions related to more fundamental research. How it can happen is something like this: 1. Try to apply method to domain; 2. Realize shortcomings of method; 3. Find & attempt solutions to address shortcoming; 4. If shortcoming isn’t well-addressed or has room for improvement despite step 3 then you _might_ have a fundamental problem on hand. Note that while this provides direction, it doesn’t guarantee that the direction is a one that is solve-able in the next t months.
excellent comment. Not everyone needs to push the envelope of the field they read papers on. Applications are just as important (collectively even more so!) as the foundational theory, and replication work already is the most major step/hurdle towards an application, even if it’s a toy, on a more applied field/problem.
I wouldn’t mind that kind of reading club, either :)