Thoughts on developing a frontpage Recommendation algorithm for LW
It was striking how much this project felt like experiencing the standard failure modes they warn you about with AI/optimizers. The primary underlying recommendation engine we’ve been using is recombee.com, saving us the work of building out a whole pipeline and possible benefiting from their expertise. Using their service means that things are extra degree of black-boxy because we don’t even really know what algorithms they’re using under the hood.
What was clear, is their models were effective. Our earliest results seemed to be really doing something with clickthrough-rate of randomly assigned test group being 20-30% higher than control. Only closer investigation showed that this CTR was achieved via showing people the most clickbait-y posts (e.g. If you weren’t such an idiot...). The algorithm is good at the stated objective but not my actual objective.
Two main tweaks to the algorithms default behavior get us most of the way to its current state:
(a) initially I didn’t configure the “rotation rate” very well, and so the algorithm would keep showing the same recommendations, i.e. those from the top of its list, i.e. the most clickbaity ones.
(b) we applied a ~hard constraint that the algorithm needs to serve 50% posts older than a year, 30% between 30 and 365 days, and 20% with an age under 30 days.
These changes, particularly constraining the age, did hurt the clickthrough rate achieved, but just seems like the obviously correct choice rather than let LessWrong drift towards clickbait-y-ness. Having worked on a recommendation engine, I begin to feel more deeply why the rest of the Interest looks how it does: a slow decay from algorithms locally pursuing CTR.
This is obviously a very crude way to make the algorithm serve the kinds of posts we think are actually good, but forcing the algorithm to serve a wider variety and older posts seems to diversify what it serves in a good way. My own feeling is post quality is 7⁄10 and personalization is 6⁄10 currently.
The signals the algorithm is being fed is “did you click on it?” “did you read it?” (measured as scrolling long enough that you reached the comments section) and “did you vote?”
Unfortunately, people seem to vote on the more clickbait-y content at least as much as newer content, and also voting is overall rare, so voting is not a signal that cuts through clickbait-y-ness. Also people seem to vote much less on older content. Perhaps it seems less like it matters to vote at that point. I’m not sure that means people value this content less, which is hard. These elements mean that we still don’t have a great feedback loop from what seems actually good. Our iteration currently is a mix of “clickthrough rate” and “manually inspecting the recommendations for seeming good”.
Thoughts on developing a frontpage Recommendation algorithm for LW
It was striking how much this project felt like experiencing the standard failure modes they warn you about with AI/optimizers. The primary underlying recommendation engine we’ve been using is recombee.com, saving us the work of building out a whole pipeline and possible benefiting from their expertise. Using their service means that things are extra degree of black-boxy because we don’t even really know what algorithms they’re using under the hood.
What was clear, is their models were effective. Our earliest results seemed to be really doing something with clickthrough-rate of randomly assigned test group being 20-30% higher than control. Only closer investigation showed that this CTR was achieved via showing people the most clickbait-y posts (e.g. If you weren’t such an idiot...). The algorithm is good at the stated objective but not my actual objective.
Two main tweaks to the algorithms default behavior get us most of the way to its current state:
(a) initially I didn’t configure the “rotation rate” very well, and so the algorithm would keep showing the same recommendations, i.e. those from the top of its list, i.e. the most clickbaity ones.
(b) we applied a ~hard constraint that the algorithm needs to serve 50% posts older than a year, 30% between 30 and 365 days, and 20% with an age under 30 days.
These changes, particularly constraining the age, did hurt the clickthrough rate achieved, but just seems like the obviously correct choice rather than let LessWrong drift towards clickbait-y-ness. Having worked on a recommendation engine, I begin to feel more deeply why the rest of the Interest looks how it does: a slow decay from algorithms locally pursuing CTR.
This is obviously a very crude way to make the algorithm serve the kinds of posts we think are actually good, but forcing the algorithm to serve a wider variety and older posts seems to diversify what it serves in a good way. My own feeling is post quality is 7⁄10 and personalization is 6⁄10 currently.
The signals the algorithm is being fed is “did you click on it?” “did you read it?” (measured as scrolling long enough that you reached the comments section) and “did you vote?”
Unfortunately, people seem to vote on the more clickbait-y content at least as much as newer content, and also voting is overall rare, so voting is not a signal that cuts through clickbait-y-ness. Also people seem to vote much less on older content. Perhaps it seems less like it matters to vote at that point. I’m not sure that means people value this content less, which is hard. These elements mean that we still don’t have a great feedback loop from what seems actually good. Our iteration currently is a mix of “clickthrough rate” and “manually inspecting the recommendations for seeming good”.
Is the algorithm getting a signal on positive vs negative voting, or are all votes treated as a positive signal?
Signal on positive vs. negative voting, I am pretty sure
Can confirm this