Suppose (I) someone who doesn’t have practical experience with machine learning research (but is generally aware of the structure of a number of different NN architectures, has read some papers, has a decent level of mathematical maturity, etc.) has an idea for a (as far as they/I know) novel NN architecture. And, suppose (I) they understand that, given (my) their lack of experience in the topic, and how most ideas from outsiders are unlikely to work, the idea probably will almost certainly not work. Suppose also that implementing and testing the idea well would either take (me) this person much longer than it would people who are versed in the field, or just straight up be beyond their (my) current skill-set. But, suppose that the idea seems, in the unlikely case that it would work, more likely to improve capabilities than to improve interpretability or safety. (For concreteness, assume that the improvement in capabilities wouldn’t be particularly fundamental. Like, “an idea for potentially increasing the context window length cheaply in LLMs” or something like that.)
What is the appropriate protocol for me/such-a-person to follow? Some ideas that come to mind are: 1) don’t bother anyone with the idea, as it is unlikely to work, and is irrelevant (this option seems safe of course, but less satisfying than getting confirmation that it wouldn’t/doesn’t actually work) 2) spend a bunch of time learning how, and then try testing it privately (without making the idea public), and in the unlikely even that it seems to maybe-kinda-work in the small-scale that one(I) is/am able to try, approach someone who is more experienced and is appropriately concerned about AI safety issues and ask them to evaluate whether it is likely to actually work when scaled up (or, whether it has perhaps already been done before), and if it is novel and works, what to do about it 3) consult some kind of list or something to determine whether capability improvements in that direction are sufficient safe that, if the idea works, it would be better for it to be publicly available, than to keep it private, or visa versa 4) rather than trying to test it oneself(myself), (after spending time searching the literature checking if it’s already been tried), ask someone knowledgeable whether it sounds likely to work 5) perhaps, if the idea would work, and such an inexperienced person comes up with it, then it would be extremely likely that someone who is experienced at ML research, would have either already thought of it, or would come up with it soon regardless, making whether the inexperienced person shares the idea irrelevant, to the point that they shouldn’t worry about risks of sharing it, and should just go ahead (provided that it doesn’t waste too much of others’ time)
Of course, that’s far from an exhaustive list. An answer does not need to resemble any of those ideas.
Is there a protocol for this? Or, maybe “protocol” is too formal a word. I just mean, is there a consensus here about what person in such a situation should do?
(not “what is obligatory” so much as “what would be best?”.)
For what it’s worth, if what you’re carefully not discussing actually is related to context length, quite a lot of people have put quite a lot of work into attempting to make context lengths longer/long context lengths more efficient, by a wide variety of means, some of which looked very plausible when written up as a paper, but very few have had much actual effect, so far. Generally the ones that have helped have not changed the actual structure of how attention works, just been caching mechanisms that made it more efficient to implement on current hardware. Typically the effect on capabilities hasn’t been to make pretraining much more efficient (since pretraining tends to be done with fairly short context lengths, with longer context lengths added later by finetuning), but just to make inference cheaper (which is rather a smaller capabilities effect).
Thanks! The specific thing I was thinking about most recently was indeed specifically about context length, and I appreciate the answer tailored to that, as it basically fully addresses my concerns in this specific case.
However, I also did mean to ask the question more generally. I kinda hoped that the answers might also be helpful to others who had similar questions (as well as if I had another idea meeting the same criteria in the future), but maybe thinking other people with the same question would find the question+answers here, was not super realistic, idk.
[Question] What’s the protocol for if a novice has ML ideas that are unlikely to work, but might improve capabilities if they do work?
This may be a silly question.
Suppose (I) someone who doesn’t have practical experience with machine learning research (but is generally aware of the structure of a number of different NN architectures, has read some papers, has a decent level of mathematical maturity, etc.) has an idea for a (as far as they/I know) novel NN architecture. And, suppose (I) they understand that, given (my) their lack of experience in the topic, and how most ideas from outsiders are unlikely to work, the idea probably will almost certainly not work.
Suppose also that implementing and testing the idea well would either take (me) this person much longer than it would people who are versed in the field, or just straight up be beyond their (my) current skill-set.
But, suppose that the idea seems, in the unlikely case that it would work, more likely to improve capabilities than to improve interpretability or safety.
(For concreteness, assume that the improvement in capabilities wouldn’t be particularly fundamental. Like, “an idea for potentially increasing the context window length cheaply in LLMs” or something like that.)
What is the appropriate protocol for me/such-a-person to follow?
Some ideas that come to mind are:
1) don’t bother anyone with the idea, as it is unlikely to work, and is irrelevant (this option seems safe of course, but less satisfying than getting confirmation that it wouldn’t/doesn’t actually work)
2) spend a bunch of time learning how, and then try testing it privately (without making the idea public), and in the unlikely even that it seems to maybe-kinda-work in the small-scale that one(I) is/am able to try, approach someone who is more experienced and is appropriately concerned about AI safety issues and ask them to evaluate whether it is likely to actually work when scaled up (or, whether it has perhaps already been done before), and if it is novel and works, what to do about it
3) consult some kind of list or something to determine whether capability improvements in that direction are sufficient safe that, if the idea works, it would be better for it to be publicly available, than to keep it private, or visa versa
4) rather than trying to test it oneself(myself), (after spending time searching the literature checking if it’s already been tried), ask someone knowledgeable whether it sounds likely to work
5) perhaps, if the idea would work, and such an inexperienced person comes up with it, then it would be extremely likely that someone who is experienced at ML research, would have either already thought of it, or would come up with it soon regardless, making whether the inexperienced person shares the idea irrelevant, to the point that they shouldn’t worry about risks of sharing it, and should just go ahead (provided that it doesn’t waste too much of others’ time)
Of course, that’s far from an exhaustive list. An answer does not need to resemble any of those ideas.
Is there a protocol for this? Or, maybe “protocol” is too formal a word. I just mean, is there a consensus here about what person in such a situation should do?
(not “what is obligatory” so much as “what would be best?”.)
For what it’s worth, if what you’re carefully not discussing actually is related to context length, quite a lot of people have put quite a lot of work into attempting to make context lengths longer/long context lengths more efficient, by a wide variety of means, some of which looked very plausible when written up as a paper, but very few have had much actual effect, so far. Generally the ones that have helped have not changed the actual structure of how attention works, just been caching mechanisms that made it more efficient to implement on current hardware. Typically the effect on capabilities hasn’t been to make pretraining much more efficient (since pretraining tends to be done with fairly short context lengths, with longer context lengths added later by finetuning), but just to make inference cheaper (which is rather a smaller capabilities effect).
Thanks! The specific thing I was thinking about most recently was indeed specifically about context length, and I appreciate the answer tailored to that, as it basically fully addresses my concerns in this specific case.
However, I also did mean to ask the question more generally. I kinda hoped that the answers might also be helpful to others who had similar questions (as well as if I had another idea meeting the same criteria in the future), but maybe thinking other people with the same question would find the question+answers here, was not super realistic, idk.