I hereby propose the Turing test for spam—if a human judge cannot reliably distinguish spam from “genuine” content, it passes the test and qualifies as actual content. Or does it?
You may say the same about industrially processed food: yes, you can consume it. Eventually you become malnourished and the receptor sites of your cell membranes have trouble distinguishing the hormones you really need from the hormone inhibitors that get to the receptor sites first. Over time, you will lose vitality.
Spam of course, is a ‘meat-like’ substance, so the analogy may hold....
Spam content degrades the whole, just like bad food saps your vitality over time.
Indeed, if you can’t tell spam from content, you may have identified the ‘correct’ definition of the quality you are trying to measure. I think one deviousness of the made-for-adsense content is that it can’t be too informative, otherwise the visitors have no incentive to click on the ads. It balances between informative enough to get the users through but not enough to satisfy them. Normal content is not usually like that. But figuring that out is like judging intent, a task difficult for humans, never mind machines. Would the true definition of quality need to catch even that type of abuse? hmm..
My cynicism leads me to speculate that Google’s ownership of both the adword market and the search market means it may already have the data set it would need to notice people finding a page via search and then moving on to click on the ads because the content didn’t satisfy them.
The “metrics” from the two systems are probably very voluminous and may not be strongly bound to each other (like within session GUIDs to make things really easy) so it wouldn’t be trivial to correlate them in the necessary ways, but it doesn’t strike me as impossible. A simple estimate of the “ad bounce through” (percent of users who click on ads at a site within N seconds of arriving there via search) could probably be developed and added to PageRank as a negative factor if this is not already in the algorithm.
However, despite access to the necessary data set, Google may not have the incentive to do this.
This is a very good thought I hadn’t considered. Thinking about it, on the one hand, I can imagine it easy to circumvent by switching ad providers. On the other hand this would drive many spammers to using alternative ad providers, which would degrade those services so it may be strategically good for Google. Or perhaps by driving spammers and affiliate marketers on to a competitor, it will help them acheive critical mass, something google would like to avoid. Also, using some kind of ‘ad bounce through’ ratio may have unacceptably high false positive ratios, again a bad outcome.
I hope this was not too much rambling, thanks for the interesting perspective.
I hadn’t thought of that angle. If we end up with a lot of actually good original machine-generated content (somehow) then surely that wouldn’t be a loss.
This is indeed happening. Not so much the machine-generated aspect, but the second biggest question I ask myself about my SEO clients these days is “What interesting media could they author about their field of expertise?” The biggest question is, of course, “How do I persuade them that they need to actually DO this?”
In extremis, of course, we end up with comparethemeerkat. It’s the only way to make a financial services aggregator unboring enough to get people to link to it.
Yes, and imagine if spammers went through the effort to make an android indistinguishable from a human on the outside (in behavior and form), and had it “spam” you after reading your internet postings/websites, on the pretense that it has some questions and wants to collaborate with you.
Then, it fakes an entire friendship, in which it gives you many useful ideas, in order to be able to slip in a few remarks here and there of the form, “Hey, I know a good Mexican pharmacy where you can get cheap Viagra.” (Which you point out to your “friend” is probably a scam.)
If that’s what spam comes to look like one day, I don’t want a filtered inbox!
I expectt there would still be a range of spam—crude spam only needs a very low success rate to continue to be produced—so you’ll still want your filters.
Kinda sounds like having a useful service and supporting it with an ad-based model (but without clearly delineating the ‘sponsored links’). If I could have someone interact with my work and give me useful ideas, I would probably pay for the privilege.
This reminds me of a short story by O. Henry. I don’t remember many of the specifics, but it’s set in the world of American (or perhaps it was Mexican) small-town politics and graft. There’s a character, a career con-man, who gets to be town mayor by discovering what he says is the best graft of all: honesty. You just do what you say you’re going to do and don’t try to con people. They’ll flock to do business with you, and you make a pile of money without having to steal anything! They can’t even put you in jail for it!
ETA: A quick look at Wikipedia suggests this is from his collection of linked short stories, Cabbages and Kings, set in Central America.
Great phrase. It’s a reminder that: you know you have a good proxy when you’re not sure that people who are gaming it are actually doing any harm.
I hereby propose the Turing test for spam—if a human judge cannot reliably distinguish spam from “genuine” content, it passes the test and qualifies as actual content. Or does it?
You may say the same about industrially processed food: yes, you can consume it. Eventually you become malnourished and the receptor sites of your cell membranes have trouble distinguishing the hormones you really need from the hormone inhibitors that get to the receptor sites first. Over time, you will lose vitality.
Spam of course, is a ‘meat-like’ substance, so the analogy may hold....
Spam content degrades the whole, just like bad food saps your vitality over time.
Indeed, if you can’t tell spam from content, you may have identified the ‘correct’ definition of the quality you are trying to measure. I think one deviousness of the made-for-adsense content is that it can’t be too informative, otherwise the visitors have no incentive to click on the ads. It balances between informative enough to get the users through but not enough to satisfy them. Normal content is not usually like that. But figuring that out is like judging intent, a task difficult for humans, never mind machines. Would the true definition of quality need to catch even that type of abuse? hmm..
My cynicism leads me to speculate that Google’s ownership of both the adword market and the search market means it may already have the data set it would need to notice people finding a page via search and then moving on to click on the ads because the content didn’t satisfy them.
The “metrics” from the two systems are probably very voluminous and may not be strongly bound to each other (like within session GUIDs to make things really easy) so it wouldn’t be trivial to correlate them in the necessary ways, but it doesn’t strike me as impossible. A simple estimate of the “ad bounce through” (percent of users who click on ads at a site within N seconds of arriving there via search) could probably be developed and added to PageRank as a negative factor if this is not already in the algorithm.
However, despite access to the necessary data set, Google may not have the incentive to do this.
This is a very good thought I hadn’t considered. Thinking about it, on the one hand, I can imagine it easy to circumvent by switching ad providers. On the other hand this would drive many spammers to using alternative ad providers, which would degrade those services so it may be strategically good for Google. Or perhaps by driving spammers and affiliate marketers on to a competitor, it will help them acheive critical mass, something google would like to avoid. Also, using some kind of ‘ad bounce through’ ratio may have unacceptably high false positive ratios, again a bad outcome.
I hope this was not too much rambling, thanks for the interesting perspective.
I hadn’t thought of that angle. If we end up with a lot of actually good original machine-generated content (somehow) then surely that wouldn’t be a loss.
This is indeed happening. Not so much the machine-generated aspect, but the second biggest question I ask myself about my SEO clients these days is “What interesting media could they author about their field of expertise?” The biggest question is, of course, “How do I persuade them that they need to actually DO this?”
In extremis, of course, we end up with comparethemeerkat. It’s the only way to make a financial services aggregator unboring enough to get people to link to it.
Yes, and imagine if spammers went through the effort to make an android indistinguishable from a human on the outside (in behavior and form), and had it “spam” you after reading your internet postings/websites, on the pretense that it has some questions and wants to collaborate with you.
Then, it fakes an entire friendship, in which it gives you many useful ideas, in order to be able to slip in a few remarks here and there of the form, “Hey, I know a good Mexican pharmacy where you can get cheap Viagra.” (Which you point out to your “friend” is probably a scam.)
If that’s what spam comes to look like one day, I don’t want a filtered inbox!
http://www.smbc-comics.com/index.php?db=comics&id=1024#comic
I think it’s freaking awesome that someone had already made a comic about that concept.
Eliezer has suggested this would be bad.
I expectt there would still be a range of spam—crude spam only needs a very low success rate to continue to be produced—so you’ll still want your filters.
Eh, I was just going for a zinger. You’re right, it would be more accurate to say, “I don’t want my inbox to call that spam!”
Don’t forget your VK couples testing
But it could suggest fake online shops that appear similar to the real ones you use, and you’d be more likely to fall for it than the viagra ones.
Kinda sounds like having a useful service and supporting it with an ad-based model (but without clearly delineating the ‘sponsored links’). If I could have someone interact with my work and give me useful ideas, I would probably pay for the privilege.
This reminds me of a short story by O. Henry. I don’t remember many of the specifics, but it’s set in the world of American (or perhaps it was Mexican) small-town politics and graft. There’s a character, a career con-man, who gets to be town mayor by discovering what he says is the best graft of all: honesty. You just do what you say you’re going to do and don’t try to con people. They’ll flock to do business with you, and you make a pile of money without having to steal anything! They can’t even put you in jail for it!
ETA: A quick look at Wikipedia suggests this is from his collection of linked short stories, Cabbages and Kings, set in Central America.