This is a problem that machine learning can tackle. Feel free to contact me by PM for technical help.
To make sure I understand your problem:
We have many copies of the Big Book. Each copy is a collection of many sheets. Each sheet was produced by a single tool, but each tool produces many sheets. Each shop contains many tools, but each tool is owned by only one shop.
Each sheet has information in the form of marks. Sheets created by the same tool at similar times have similar marks. It may be the case that the marks monotonically increase until the tool is repaired.
Right now, we have enough to take a database of marks on sheets and figure out how many tools we think there were, how likely it is each sheet came from each potential tool, and to cluster tools into likely shops. (Note that a ‘tool’ here is probably only one repair cycle of an actual tool, if they are able to repair it all the way to freshness.)
We can either do this unsupervised, and then compare to whatever other information we can find (if we have a subcollection of sheets with known origins, we can see how well the estimated probabilities did), or we can try to include that information for supervised learning.
I’m glad you mentioned the repair cycle of tools. There are some tools that are regularly repaired (let’s just call them “Big Tools”) and some that aren’t (“Little Tools”). Both are expensive at first and to repair, but it seems the Print Shops chose to repair Big Tools because they were subject to breakage that significantly reduced performance.
I should add another twist since you mentioned sheets of known origins: Assume that we can only decisively assign origins to single sheets. There are two problems stemming from this assumption: first, not all relevant Marks are left on such sheets; second, very few single sheet publications survive. Collations greater than one sheet are subject to all of the problems of the Big Book.
I’m most interested in the distinction between unsupervised and supervised learning. And I will very likely PM you to learn more about machine learning. Again, thanks for your help!
EDIT: I just noticed a mistake in your summary. Each sheet is produced by a set of tools, not a single tool. Each mark is produced by a single tool.
I just noticed a mistake in your summary. Each sheet is produced by a set of tools, not a single tool. Each mark is produced by a single tool.
Okay. Are the classes of marks distinct by tool type- that is, if I see a mark on a sheet, I know whether it came from tool type X or tool type Y- or do we need to try and discover what sort of marks the various tools can leave?
This is a problem that machine learning can tackle. Feel free to contact me by PM for technical help.
To make sure I understand your problem:
We have many copies of the Big Book. Each copy is a collection of many sheets. Each sheet was produced by a single tool, but each tool produces many sheets. Each shop contains many tools, but each tool is owned by only one shop.
Each sheet has information in the form of marks. Sheets created by the same tool at similar times have similar marks. It may be the case that the marks monotonically increase until the tool is repaired.
Right now, we have enough to take a database of marks on sheets and figure out how many tools we think there were, how likely it is each sheet came from each potential tool, and to cluster tools into likely shops. (Note that a ‘tool’ here is probably only one repair cycle of an actual tool, if they are able to repair it all the way to freshness.)
We can either do this unsupervised, and then compare to whatever other information we can find (if we have a subcollection of sheets with known origins, we can see how well the estimated probabilities did), or we can try to include that information for supervised learning.
That’s a hell of a summary, thanks!
I’m glad you mentioned the repair cycle of tools. There are some tools that are regularly repaired (let’s just call them “Big Tools”) and some that aren’t (“Little Tools”). Both are expensive at first and to repair, but it seems the Print Shops chose to repair Big Tools because they were subject to breakage that significantly reduced performance.
I should add another twist since you mentioned sheets of known origins: Assume that we can only decisively assign origins to single sheets. There are two problems stemming from this assumption: first, not all relevant Marks are left on such sheets; second, very few single sheet publications survive. Collations greater than one sheet are subject to all of the problems of the Big Book.
I’m most interested in the distinction between unsupervised and supervised learning. And I will very likely PM you to learn more about machine learning. Again, thanks for your help!
EDIT: I just noticed a mistake in your summary. Each sheet is produced by a set of tools, not a single tool. Each mark is produced by a single tool.
Okay. Are the classes of marks distinct by tool type- that is, if I see a mark on a sheet, I know whether it came from tool type X or tool type Y- or do we need to try and discover what sort of marks the various tools can leave?
Fortunately, we know which tool types leave which marks. We also have a very strong understanding of the ways in which tools break and leave marks.
Thanks again for entertaining this line of inquiry.
Good point!
Also yay combining multiple fields of knowledge and expertise! applause
Seriously though, the world does need more of it, and I felt the need to explicitly reward and encourage this.
Thanks! I feel explicitly encouraged.