matt comments on Marketplace Transactions Open Thread

matt 19 Jun 2012 4:03 UTC
3 points
I’m looking for someone to help with me on a paid basis with statistical analysis. I have problems like the following:

1. When to inspect?
I have 10k documents per month steaming to office staff for data entry in offices scattered around the world. I have trained staff at HQ doing inspections of the data entry performed by the office staff, detecting errors and updating fields in which they detected errors. I will soon have random re-checking by HQ inspectors of entries already checked by other HQ staff.
The HQ staff currently detect errors on ~15% of documents (between nearly none and ~6% errors on particular fields on documents). I don’t yet have a good estimate of how many of those events are false positives and how many errors are not detected at all. Users show learning (we detect fewer errors from users who have entered data on more documents) that continues over their first 2000 or so documents (where I start running out of data). Required: I need to decide when a document can skip secondary inspection. I need to decide when users (HQ or practice users) don’t understand something and need training (their error rate seems high for the difficulty of data entry on that field). When I change the user interface I need to decide whether I helped or hurt, and I need future error prediction (after I changed the data entry environment) to recover quickly.

2. What works?
We have a number of businesses that sell stuff, and we often change how that’s done and how we promote (promotions, press placements (that I can work to get), changes in price, changes in product, changes in business websites, training for our sales people, etc.). I’d like to learn more than I am from the things we change, so that I can focus our efforts where they work best. There is a huge amount of noise in this data.

Proposals should be sent to jobs@trikeapps.com, should reference this comment, and should include answers to the following two questions (and please don’t post your answers to the questions on this site):
1. In my first example job above, across 200 users the average error rate in their first 10 documents was 12% (that is, of the set of 2000 documents made from the first 10 document entered by each of 200 users, 12% contained at least one error). Across so few documents from each user (only 10) there is only a small indication that the error rate on the 10th document is lower than the error rate on the first document (learning might be occurring, but isn’t large across 10 documents). A new user has entered 9 documents without any errors. What is the probability that they will error on their next document?
2. What question should I ask in this place to work out who will be good at doing this work? What question will effectively separate those who understand how to answer questions like this with data from those who don’t understand the relevant techniques?