With the latest Google update currently rolling out for thin, duplicate and poor content (called Panda) I thought I would do post on duplicate content v syndicated content and spell out exactly what constitutes an issue for Google.

Unless you have been living under a rock for the last ten years, you must be used to hearing the phrase content is king! And it is! Google loves it! Why? Because Google uses it’s spiders to crawl the web looking at your content and deciding what it is your website is about and amongst other signals uses this to decide which websites should rank highly for which key phrases.

So What’s The Issue?

Google’s index is massive and pre-2012 it was quite easy to manipulate the rankings using a mixture of lots of content and spammy links. Webmasters got lazy and would basically write content for search engines without users in mind. You would find lots of pages heavily laced with keywords which was almost unreadable to the human eye, with poor grammar and punctuation. This became a big issue with the introduction of content farms. Content farms were heavy on content but very thin on actual factual information and would feature quite heavily on keyword phrases to manipulate the rankings. On top of this we had scraper sites which would basically scrape other websites content (plagiarism) and use this content to generate their own search traffic.

Google needed a way to filter out this poor content which was quite effective in clogging up their search engines and released the Panda update in 2011 (named after an engineer called “Panda”). This basically looked to penalise poor content and effected upto 12% of searches! It focused on content farms, scraper sites, sites that were too ad-to-content heavy and other factors. With subsequent panda updates we have had a switch to “thin” content. Thin content is poorly written, duplicate or just too keyword centric with no real visitor value.
For example:-

  • Having a page for “plumber Sheffield” , “plumber York”, “plumber Doncaster” with the same content for each page with just the city changing.
  • Mentioning a keyword excessively, hiding text in the footer or header
  • Copying manufacturers descriptions for a product word for word
  • Showing the same page for various URL’s (issue for ecommerce sites) for example when using a content management system there maybe many ways to reach the same page i.e products/blue-widget, search/products/blue-widget, /blue-widget

Some sites, especially ecommerce sites took major hits and have never quite recovered from the first Panda update.

How To Fix

I love SEO in its present day as it is a return to common sense. We are not focusing on how many times we repeat a keyword or making sure we italicise, bold or underline words, were employing copywriters, figuring out buying persona’s for our buyers, engaged in ways to increase conversion using survey’s, heatmapping and lead generation forms. The truth is it’s easy. Forget search engines and concentrate on users. Your pages need to have value and look to engage your audience. Search engines have got very clever on what you do. I am not trying to talk myself out of a job here but my job is to point Google In the right direction of what is you want to be found for, what your buyers look for using your website to give Google a gentle nudge.

Canonical URL’s

As I mentioned above content management systems and ecommerce sites especially took a hit with Panda, sometimes unfairly so. This came about because of the way pages are made via dynamic sites. You see on a static site you have actual pages that exist. For example, downloads.html, services.html, products.html. With dynamic sites there are often written in a programming language called PHP. The page doesn’t actually exist but is formed when a request is made. This is great for speeding up pages but it can often lead to many ways of reaching the same page as the example I gave above. Google realised this was an issue with and came up with a piece of code you could put in the back end of your site called rel=canonical. This would allow you to tell Google which pages were duplicates and which page they should actually index. For more information on canonicalization see this post from Moz.

Syndicated Content

Before we end this article on duplicate content, I just want to touch on Syndicated content. If you’re not sure what syndicated content is think of PR. Articles are often written up and then distributed across many channels to get as much exposure as possible. The key here is exposure and not manipulation and so Google doesn’t have a problem with it. So if you want to host your article on many sites to spread the net, then do it but realise that Google will only index one version of the article and if the sites you host it on have more authority then you, then their version will show not yours! I would much rather someone read an article on my site and then I can do calls-to-action to other areas of the site or hopefully get someone to sign up to our newsletter rather than loose them on a third party site that isn’t interested in my bottom line.