Thanks for the excellent answers. That all makes sense and clears up a lot in my mind about this post and the previous one. Just two quick question/comments for now:
And then, are you also inferring that people care different amounts about these traits?
There are two things that you might be asking here. If you’re asking about how much people tend to care about a trait in general, one can do logistic regression and look at regression coefficients.
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I don’t know about multilevel modelling.
But perhaps surprisingly, these features are contaminated with the decisions we’re trying to predict, to such a degree that if one didn’t notice this, one would end up with a model with far greater predictive power than one would have in practice. This has to do with the R(B) is generically formed using ratings that B was given after the date that A and B went on, ratings that one would not have access to in practice… Rather than using (**), we imagine that at the event, B had been on a date with someone other than A, who we call a “surrogate” of A.
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
Thanks again for your interest – your questions have helped me clarify my thinking.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I don’t know about multilevel modelling.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.
Thanks for the excellent answers. That all makes sense and clears up a lot in my mind about this post and the previous one. Just two quick question/comments for now:
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I don’t know about multilevel modelling.
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
Thanks again for your interest – your questions have helped me clarify my thinking.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.