For those who are not sure what I mean by a scraper, it is someone who steals your content without giving you credit for your work. They typically put Adsense all over their pages, and use your RSS feed as the mechanism to get your content. I have found many “mirror” sites of my content all over the place, these sites don’t usually have a long shelf life, but they are very annoying, hence my battle to stop scraping.
I have found some interesting ways to find sites that scrape your content (and stop scraping), and then republish it as their own.
- Include links to your own site inside the post, thus when they republish the content you end up with a link back from their site to your content, which is funny. Watch your comments for these links and then I then usually leave a friendly comment about how if they keeps stealing my content I’ll come over and pound sand up their tuchoses, or something friendly like that. What is even funnier is sometimes they scrape Carnivals as well, so I end up with more links from these “Cloned Carnivals” as well.
- Insert your own advertising inside of your RSS feed, most of these sites are automated, so they will then including the advertising in their feed as well, so you end up advertising your own site on their site.
- Search for posts you have done in the past and see if there are the exact same posts elsewhere on BING or GOOGLE, that seems obvious, but you would be surprised how many of my own articles appear before their cloned post on my site! Very annoying.
- Include header HTML comments and other tags like that, it makes your content that much easier to find. Most of my own posts have a copyright marking on the top pointing out who really owns the content. If the scraper doesn’t notice, it’s easy to find your cloned content this way as well.
- Run tools like Wordfence and other tools to see what “bots” are invading your site. The ability to block by country might slow some scrapers down as well.