1: I expect that it’s easier for authors to write longer thoughtful things that make sense;
I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don’t want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it’s a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it’s easier to find 1,000 people to write 60 pages each than it is to find 100 people to write 600 pages each. There’s also much less risk on both your side and theirs. If someone drops out half way through writing you lose 30 pages not 300.
Based on this comment:
I state: we’d be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won’t be possible to complete, and to pay for them proportionally.
I’m now under the impression that you’d be willing to pay out the 20k for 10 runs of 100 steps each (subject to reasonable quality control) and bringing that about was my main goal in commenting.
The other major worry I have about this pitch is the experimental design. I’m still happy you’re doing this, but this doesn’t seem to be the best project crafting in my mind. Briefly my concerns are:
This is a very topically specific ask of unclear generalization. I would prefer a more generic ask that is not directly connected to D&D.
In my experience training large language models, the number of examples is more important than the length of examples. Training on 100 shorter sequences is better than training on 10 longer sequences if the total length is the same. In particular, I think “You would also expect scarier systems to have an easier time learning without overnarrowing from 100 big examples instead of 10,000 small examples.” is not clearly true and very plausibly false.
Using this dataset in a meaningful fashion requires making a priori unrelated breakthroughs, making it overly inaccessible. I think that your comment “I don’t want to freeze into the dataset the weird limitations of our current technology, and make it be useful only for training dungeons that are weird the same way 2021 dungeons are weird,” is thinking about this the wrong way. The goal should be to maximize the time that we can effectively use this dataset, not be content with the fact that one day it will be useful.
This is a pilot for the real thing you’re after, but the “pilot” is a multi-year million-dollar effort. That doesn’t seem like a very well designed pilot to me.
I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don’t want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it’s a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it’s easier to find 1,000 people to write 60 pages each than it is to find 100 people to write 600 pages each. There’s also much less risk on both your side and theirs. If someone drops out half way through writing you lose 30 pages not 300.
Based on this comment:
I’m now under the impression that you’d be willing to pay out the 20k for 10 runs of 100 steps each (subject to reasonable quality control) and bringing that about was my main goal in commenting.
The other major worry I have about this pitch is the experimental design. I’m still happy you’re doing this, but this doesn’t seem to be the best project crafting in my mind. Briefly my concerns are:
This is a very topically specific ask of unclear generalization. I would prefer a more generic ask that is not directly connected to D&D.
In my experience training large language models, the number of examples is more important than the length of examples. Training on 100 shorter sequences is better than training on 10 longer sequences if the total length is the same. In particular, I think “You would also expect scarier systems to have an easier time learning without overnarrowing from 100 big examples instead of 10,000 small examples.” is not clearly true and very plausibly false.
Using this dataset in a meaningful fashion requires making a priori unrelated breakthroughs, making it overly inaccessible. I think that your comment “I don’t want to freeze into the dataset the weird limitations of our current technology, and make it be useful only for training dungeons that are weird the same way 2021 dungeons are weird,” is thinking about this the wrong way. The goal should be to maximize the time that we can effectively use this dataset, not be content with the fact that one day it will be useful.
This is a pilot for the real thing you’re after, but the “pilot” is a multi-year million-dollar effort. That doesn’t seem like a very well designed pilot to me.