No problem, the post is a lot to take in at once. Thanks very much for your interest, more than anything else I’m happy that someone is reading my posts :-).
In my last post, I made reference to features like “average sincerity rating” without being precise about what I meant. Here I’m just giving precise definitions
Ultimately, we’re trying to infer the participant’s rating of another participant, right?
Ultimately we’re trying to infer the participant’s decision on another partner.
And then you mention traits. Here are you talking about how attractive, fun, ambitious, et cetera the person is?
The correlation matrixes from my last post showed that if you want to predict e.g. how a woman will rate a man’s ambition, you get predictive power by looking at the average of how other women rate his ambition, and moreover the predictive power is greater than the predictive power that one obtains by looking at the average of how women rate his attractiveness, or how fun he is, or how intelligent he is, or how sincere he is.
So the average of ratings of his ambition are picking up on some underlying trait that he possesses. The simplest guess is that the trait is what we think of when we think of ambition, but in practice it’s really whatever about him makes women perceive him to be ambitious – it could be that he seems very focused on work, it could be that he tends to wear business suits, etc.
Once we form the average, we have a measurement of the underlying trait and we can use it for any number of things. We can try to use it to estimate how desirable women find him on average. We can try to use it to estimate how selective he is on average. We can compare it with the corresponding metric for women and explore whether men tend to prefer women who are of a similar level of ambition in general.
And then, are you also inferring that people care different amounts about these traits?
There are two things that you might be asking here. If you’re asking about how much people tend to care about a trait in general, one can do logistic regression and look at regression coefficients.
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
What I did find is that in the special cases of attractiveness and fun, one has enough statistical power so that one can extract nontrivial information that yields incremental predictive power by just looking at the correlations between a participant’s decisions and his or her partner’s attractiveness and fun averages, and use it to get incremental predictive power.
So you’re talking about the average of how all other individuals rated B. Are you averaging across fun, ambition, intelligence and sincerity here to find out B’s overall popularity, or are you trying to figure out how fun, ambitious or intelligent they are?
R is a fixed rating type, so the latter. I do average across averages of ratings on different dimensions to determine overall popularity, as I discuss in the section “A composite index to closely approximate a ratee’s desirability” in my last post, but that’s a separate matter.
Thanks for the excellent answers. That all makes sense and clears up a lot in my mind about this post and the previous one. Just two quick question/comments for now:
And then, are you also inferring that people care different amounts about these traits?
There are two things that you might be asking here. If you’re asking about how much people tend to care about a trait in general, one can do logistic regression and look at regression coefficients.
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I don’t know about multilevel modelling.
But perhaps surprisingly, these features are contaminated with the decisions we’re trying to predict, to such a degree that if one didn’t notice this, one would end up with a model with far greater predictive power than one would have in practice. This has to do with the R(B) is generically formed using ratings that B was given after the date that A and B went on, ratings that one would not have access to in practice… Rather than using (**), we imagine that at the event, B had been on a date with someone other than A, who we call a “surrogate” of A.
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
Thanks again for your interest – your questions have helped me clarify my thinking.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I don’t know about multilevel modelling.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.
No problem, the post is a lot to take in at once. Thanks very much for your interest, more than anything else I’m happy that someone is reading my posts :-).
In my last post, I made reference to features like “average sincerity rating” without being precise about what I meant. Here I’m just giving precise definitions
Ultimately we’re trying to infer the participant’s decision on another partner.
The correlation matrixes from my last post showed that if you want to predict e.g. how a woman will rate a man’s ambition, you get predictive power by looking at the average of how other women rate his ambition, and moreover the predictive power is greater than the predictive power that one obtains by looking at the average of how women rate his attractiveness, or how fun he is, or how intelligent he is, or how sincere he is.
So the average of ratings of his ambition are picking up on some underlying trait that he possesses. The simplest guess is that the trait is what we think of when we think of ambition, but in practice it’s really whatever about him makes women perceive him to be ambitious – it could be that he seems very focused on work, it could be that he tends to wear business suits, etc.
Once we form the average, we have a measurement of the underlying trait and we can use it for any number of things. We can try to use it to estimate how desirable women find him on average. We can try to use it to estimate how selective he is on average. We can compare it with the corresponding metric for women and explore whether men tend to prefer women who are of a similar level of ambition in general.
There are two things that you might be asking here. If you’re asking about how much people tend to care about a trait in general, one can do logistic regression and look at regression coefficients.
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
What I did find is that in the special cases of attractiveness and fun, one has enough statistical power so that one can extract nontrivial information that yields incremental predictive power by just looking at the correlations between a participant’s decisions and his or her partner’s attractiveness and fun averages, and use it to get incremental predictive power.
R is a fixed rating type, so the latter. I do average across averages of ratings on different dimensions to determine overall popularity, as I discuss in the section “A composite index to closely approximate a ratee’s desirability” in my last post, but that’s a separate matter.
Thanks for the excellent answers. That all makes sense and clears up a lot in my mind about this post and the previous one. Just two quick question/comments for now:
But maybe you’re asking about whether I’m inferring how much individual people care about the different traits, preempting my next post. One could do logistic regression for each individual, but there are ~15 dates per person and ~5 traits, so one doesn’t really have enough data for this to be informative. My impression is that the right way to go about this is via multilevel modeling, but I haven’t yet figured out how to adapt the general methodology to my particular situation.
Yes, the latter is the question. I imagine that you might be able to use a collaborative filtering algorithm as described here. In the video, Andrew Ng supposes that you’re matching films with individuals, using a sparsely populated matrix of match values, assuming that you know which genres different individuals like. Your problem seems identical, just you know the features of the people, rather than their tastes.
I don’t know about multilevel modelling.
So as I understand, you still used data that you wouldn’t’ve had in practice? Would it be a viable alternative to just take the average from dates that preceded the one you’re trying to predict? In general, predicting future from past seems simple and good if the data is time-labelled, though I might be missing the issue here.
Thanks again for your interest – your questions have helped me clarify my thinking.
I had tried this: the code that I used is here: for each date at an event, I looked at the rating matrix associated with the event but with a missing entry corresponding to the date, and then used an R library called recommenderlab to produce a guess for the missing rating.
The situation is that one doesn’t get almost anything more than what one would get if one just uses the average rating that the rater gave and the average rating that the ratee received. A typical example of how the actual ratings and guesses compare is given here – here the raters are men, the ratees are women, and the rating type is attractiveness. The rows correspond to raters and the columns to ratees. The rows have been normalized, so that the sum of a given rater’s ratings is 0. You can see that after controlling for variation in rating scales, the guesses for a given ratee are virtually identical across raters.
Yet collaborative filtering appears to have been applied successfully in the context of online dating, for example, as reported on by Brozovsky and Petricek (2007) and the papers that cite it, even in contexts where the average number of ratings per person is not so large, so I don’t know why I didn’t have more success with this approach.
I’ve been exploring this over the past few days, and will probably write about it in my next post. For simplicity, say we want to model the probability of a decision by the k’th rater as
logit(P) = A(k) + B(k)*attrAvg
where P is the decision probability, A(k) and B(k) are constants specific to the k’th rater, and attrAvg is the average attractiveness of the rater’s partner. Rather than determining A(k) and B(k) by simply doing a linear regression for each rater, we can instead fit A(k) and B(k) using a prior on the distributions of A(k) and B(k): for example, one can assume that they’re normally distributed. My first impression is that determining the means of the hypothesized normal distributions is simply a matter of fitting
logit(P) = A + B*attrAvg
where A and B are uniform over raters, and that the nontrivial part is determining the standard deviations of the hypothesized distributions while simultaneously estimating all of the A(k) and B(k).
The reason why I hesitated to go in that direction issue is just that the sample sizes are already small: if one is talking about a speed dating event with 16 men and 16 women, one can use the rating averages from the first 8 rounds and use them to predict what will happen in the last 8 rounds, but the loss in statistical power could be very large: given that women’s decisions were yes only ~33% of the time, the decision frequencies when the rater is a woman would be based only on 2-3 decisions per woman.
But I should do a cross check, and see whether predictive power diminishes as a function of how late a date occurred in the event. I’ll do this and get back to you.