Thanks again for your interest – your questions have helped me clarify my thinking.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I don’t know about multilevel modelling.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.
Thanks again for your interest – your questions have helped me clarify my thinking.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.