Dealing With Blog Content Scraping

Check out my new guide for starting a blog to learn how to go about properly creating your own blog.

When you’re managing a blog, it’s not all about just writing content and marketing. There is also the dreaded technical side that gets some bloggers cringing in fear. Yes there is some technical work that is best left to professionals, but you should also have a basic understanding of things and be capable of dealing with some of it on your own.

If you’re new to this series, check out some of the previous blogging tips posts:

And now the scary technical lesson…

How to Deal with Blog Content Scrapers

Lately I’ve had several blogger friends e-mail me worried about a blog that is stealing their content. Or in nerd lingo, they are ‘scraping their content’. In this post I’ll explain exactly what this is, why you shouldn’t worry and how you can deal with it.

What Is Content Scraping?

This is when a blog is setup to publish your new posts on their blog too. Essentially they are using your handy RSS feed against you. Some content scrapers will cite the content source and sometimes only display an excerpt from your post. The highly unethical ones will republish your entire posts with no mention of where the post came from. They might even be trying to pass it off as their own content.

While this strategy comes in different flavors, it all boils down to someone trying to benefit off the writing that you put your blood, sweat and tears into. Sorry scammer but that is not what RSS feeds are meant for. So don’t even try to justify it.

Some content scraping blogs are putting their own ads on your scraped content to benefit directly. Others have a more manipulative strategy – they are purposely trying to get trackback links. As they build up stats they very well might remove the scraped content and exploit the built up stats.

Should You Be Concerned?

Some of you may have heard that Google doesn’t like duplicate content. Because of this you may be pretty worried that your content may be duplicated on other blogs. Instead of your content being completely unique to your blog, it might be all over other blogs too.

Luckily search engines are smart enough to know where it was published first and who really has the rights to the content. When your blog is more established, search engines give more priority to checking your blog for new content. The content scrapers tend to target more established blogs so that they are more likely to have a consistent content source. The content scraper blogs on the other hand are almost never established. They’re likely to piss off too many people while building up and end up getting shut down or they end up in the search engines’ bad books.

Sure it sucks that they are ripping off your posts and potentially benefiting from it, but ultimately it doesn’t affect your own blog. There isn’t a reason to stress out about it.

You can also ensure you actually get some minor benefits from scraping by making a point of including internal links in every single post. Technically you should be doing that anyway for SEO best practices. If anyone is going to scrape your content, you might as well get some extra backlinks, even if they are low quality. It reinforces who actually owns that content oo.

Can You Do Something About It Anyway?

While I personally choose to ignore this nonsense, I realize that you may be more protective over your work. The principle of it all could make you want to put an end to the digital theft. So what can you do?

Contacting the Sketchy Blogger: The first method you could try to deal with this is politely e-mailing the offending blogger. In some cases, they could have somehow not realized they were offending anyone…not that I buy that lame excuse. Still, go along with their excuses and be civil if you want to get the issue resolved. If you want, you can swear at your computer monitor while putting on a false front in your e-mail.

One problem is that a lot of these content thieves will not have an easy way to contact them on their blog. They probably had too many people sending them hate mail. When you cannot find a working contact page, try checking for an e-mail address attached to their domain by entering the domain name into a WHOIS website such as Domain Tools. Likely that attempt won’t work with these kinds of unethical bloggers and the contact info will be private. As a last resort try e-mailing generic e-mail addresses that might be associated with that domain such as webmaster@ or info@.

Just remember that if you get visibly angry with these people, it might hurt your blog more. An online enemy can come back to haunt you as they watch your every move waiting for their chance to strike. Or they might laugh at you pissing you off even more. It’s just not worth being rude to them.

Notifying Google: To ensure you are fully protected with Google, there is a way to notify Google about content that violates copyright laws. To do this, use their DMCA reporting form (DMCA = Digital Millennium Copyright Act). Basically you fill out a short form listing which specific pages are stolen content and where the original source url is.

Once Google confirms that they are not the original publisher, they will remove the stolen pages from their index. I don’t know for sure, but if enough of their content is stolen, they might de-index their entire site. Does anyone know if this is the case? Either way, Google has ways to determine who likely owns content.

Reporting to Their Web Host: This is the strategy that can produce a much more meaningful impact. It could result in the blog being taken offline by their web hosting company. With any luck they don’t pop up again with a different web host.

You may be asking “how do you know which company is hosting their website?” Try using a website called Who Is Hosting This. Simply enter the domain name and it will show which company they think is hosting the website. Unfortunately this is not foolproof as some web hosting companies lease servers from other companies. For my own blog they managed to get it wrong, likely due to such a setup.

If you are able to find out who hosts the website, you should be able to report the issue to them. A bigger web hosting company will sometimes have their own DMCA reporting form which will send your report to the right department for quick results. If you cannot find such a form by googling ‘host name’ + DMCA, try just using their website contact form. Explain the situation citing specific stolen content urls and the original posts.

The web hosting company doesn’t want to be hosting any kind of illegal content including copyright infringements. So a reputable host will take action if the offense is rather obvious.

Summary

How you react to this behavior is ultimately up to you, but do be aware of what your approach will potentially accomplish. Sometimes the effort you put into trying to deal with this just results in a worse situation or wasted time. It is nice if someone fights back against these guys, but it’s not going to hurt your blog if you choose to just ignore it. Thank you to the people who do choose to fight the fight though.

Photo Source