The hard part would then be making that list algorithmically. An easier algorithmic method would be to do approximate string matches with previous quote threads, using something like the Smith-Waterman algorithm for pairwise local sequence alignment. This is what biologists do when they have a gene sequence and want to know if something like it is already in the databases, and there’s no reason why the method shouldn’t also apply just as well to English text.
The way this would look to users is just a text box where you paste in the quote, and it’ll tell you if the quote has been posted before. Even easier to use than a full list of quotes.
The hard part would then be making that list algorithmically. An easier algorithmic method would be to do approximate string matches with previous quote threads, using something like the Smith-Waterman algorithm for pairwise local sequence alignment. This is what biologists do when they have a gene sequence and want to know if something like it is already in the databases, and there’s no reason why the method shouldn’t also apply just as well to English text.
The way this would look to users is just a text box where you paste in the quote, and it’ll tell you if the quote has been posted before. Even easier to use than a full list of quotes.