Will Google Think I’m a “Scraper”?

Filed in Blogging, Featured, SEO by Matt McGee on October 12, 2011

ice-scrapingAbout six weeks ago, Google put webmasters on alert about more upcoming changes to how it fights spam. Specifically, Google asked for help in identifying “scraper” sites — sites that copy content from somewhere else and republish it on their own site. I wrote about this on Search Engine Land and, in the comments there, a reader named Trent asked a good question that I wanted to answer here in a little more detail.

Trent’s question was as follows:

Do you think Google is only looking for those that copy content verbatim? Sometimes we’ll summarize an article (and rewrite the title), then cite/ link to the original source.

In other words, how much “scraping” is too much?

This is an important question because when I give blogging advice to small business owners, I always point out that it’s a good practice to write about interesting articles from other sites/blogs that will benefit your own readers — and in the process of doing that, it’s okay to quote part of the article you’re writing about.

But, as Trent asks … how much is too much? When does quoting become scraping? I can’t speak for Google (or Bing), but here’s my opinion:

1.) If the only thing you do on your site/blog is summarize other people’s articles and link to the original, you might be considered a “scraper.”

A local real estate agency recently started blogging and, unfortunately, many of their early articles were copied word-for-word from another real estate site. Here’s one of the articles on their site:

example

The original version of the article was published here: http://www.kcmblog.com/2011/09/15/how-to-pick-your-lender/ — that blog belongs to a company that offers various training and guidance to real estate agencies and agents. So, for all I know, both parties might be perfectly fine with having the articles copied in full.

But here’s the thing: Google and Bing won’t be perfectly fine with it. Search engines don’t want to crawl and index multiple versions of the same content. So, business relationship or not, the local real estate agency is at risk of getting the “scraper” label because many of its blog posts are direct copies from another site.

2.) If you actually write something original about the other article in the process of linking to it, that should be fine.

I just did this very thing last week in this post: Two Must-Reads About Blogging.

There were two articles I wanted to share with readers. I quoted a small portion of each one, wrote some of my own commentary and linked to the originals. There’s nothing wrong with doing that. The Internet was built on the idea of linking to interesting content on other sites; you shouldn’t get penalized for doing that, as long as you’re not copying/quoting too much of the original article.

3.) Ultimately, as long as you have plenty of your own high-quality, original content being published alongside these shorter pieces that link to other content, you’ll be fine.

Great content that’s successfully promoted creates trust, and when you’ve earned the search engines’ trust, you shouldn’t have to worry about being labeled a “scraper.”

(Stock photo via Shutterstock and used with permission.)

Comments (8)

Trackback URL | Comments RSS Feed

  1. Kevin says:

    Great write up on scrapers. A site that is well coded, and content that is well structured (including code) with quality content will always win. You can be an SEO genius, and it won’t make up for working with crappy, copied content.

  2. Liz says:

    Quality content is one thing but I think quantity is important too because the “more” quality content you have, the more any quotations or similarities will be dwarfed – hopefully!

  3. Ben Dover says:

    I agree on Liz, for quantity really gets you on the map faster. But if most of your so called quantity materials are not that good then it is literally useless. What Im saying is that a little bit more on quantity than quality, like 60%-40% will do.

  4. Matt B. says:

    In cases like this, my instinct says the spammers and scrapers will always stay a step ahead of the search engines. Many people think Google just made the situation worse with their efforts earlier this year. Until they improve or perfect it, companies should lean toward more original content to make sure they aren’t being penalized by Google.

  5. james ascot says:

    Do you think this applies to blog and twitter/facebook content?
    We often put similar content on both as creating different original for each site would be too time consuming for a small business.

    • Matt McGee says:

      Not sure what you’re asking, James. Twitter has a 140-character limit, so how could you be posting the same content there as on your blog? You’re not posting tweets on your blog, are you? Likewise, Facebook isn’t really designed for blog-style posts. The stuff you post on social networking sites should support what you post on your blog, but should not be the same.