I think it’s interesting to note that this is the precise reason why Google is so insistent on defending its retention of user activity logs. The logs contain proxies under control of the end user, rather than the content producer, and thus allow a clean estimate of (the end user’s opinion of) search result quality. This lets Google spot manipulation after-the-fact, and thus experiment with new algorithm tweaks that would have counterfactually improved the quality of results.
(Disclaimer: I currently work at Google, but not on search or anything like it, and this is a pretty straightforward interpretation starting from Google’s public statements about logging and data retention.)
Thanks for this. I’m constantly amazed at the relevant information that has been turning up here.
I agree that if anything is to be improved, information from other stakeholder groups with different incentives (such as end users) must be integrated. Given the amount by which end-users outnumber manipulators, this is a pretty good source of data, especially for high-traffic keywords.
However, what would stop spammers that focus on some low-traffic keyword to start feeding innocent-looking user logs into the system? I guess the fundamental question is, besides raw quantity, how would someone trust the user logs to be coming from real end-users?
(I understand that it may not be possible for you to get into a discussion about this, if so, no worries)
I’m afraid I can’t say much beyond what I’ve already said, except that Google places a fairly high value on detecting fraudulent activity.
I’d be surprised if I discovered that no bad guys have ever tried to simulate the search behavior of unique users. But (a) assuming those bad guys are a problem, I strongly suspect that the folks worried about search result quality are already on to them; and (b) I suspect bad guys who try such techniques give up in favor of the low hanging fruit of more traditional bad-guy SEO techniques.
I think it’s interesting to note that this is the precise reason why Google is so insistent on defending its retention of user activity logs. The logs contain proxies under control of the end user, rather than the content producer, and thus allow a clean estimate of (the end user’s opinion of) search result quality. This lets Google spot manipulation after-the-fact, and thus experiment with new algorithm tweaks that would have counterfactually improved the quality of results.
(Disclaimer: I currently work at Google, but not on search or anything like it, and this is a pretty straightforward interpretation starting from Google’s public statements about logging and data retention.)
Thanks for this. I’m constantly amazed at the relevant information that has been turning up here.
I agree that if anything is to be improved, information from other stakeholder groups with different incentives (such as end users) must be integrated. Given the amount by which end-users outnumber manipulators, this is a pretty good source of data, especially for high-traffic keywords.
However, what would stop spammers that focus on some low-traffic keyword to start feeding innocent-looking user logs into the system? I guess the fundamental question is, besides raw quantity, how would someone trust the user logs to be coming from real end-users?
(I understand that it may not be possible for you to get into a discussion about this, if so, no worries)
I’m afraid I can’t say much beyond what I’ve already said, except that Google places a fairly high value on detecting fraudulent activity.
I’d be surprised if I discovered that no bad guys have ever tried to simulate the search behavior of unique users. But (a) assuming those bad guys are a problem, I strongly suspect that the folks worried about search result quality are already on to them; and (b) I suspect bad guys who try such techniques give up in favor of the low hanging fruit of more traditional bad-guy SEO techniques.