Duplicate content is where content appears on multiple sites on the internet, or multiple times on one domain.
Some people think that duplicate content penalties, or sites being adversely affected by other copies of their content on the web is a myth.
This is not true. Having other copies of your content on the web can potentially hurt your site and sometimes does.
Google’s bots can get confused about who created the content or which copy should rank higher.
Sometimes google will even rank another site higher than the original source for content. If another site outranks the original, some of the original creators think they have received a “duplicate content penalty”.
Take for instance this article where Matt Cutts, a google engineer, responds to many people complaining that sites that are scraping, or creating duplicate content, are ranking higher than the original: Source
Improved Scraper Detection
The next update will target a common webmaster complaint related to the original Panda/Farmer update: sites that scrape and re-publish content and are out-ranking the original source of the content.
“A change has been approved that should help with that issue,” Cutts said.”
Google acknowledges that some sites that have duplicate content stolen and republished on the web are being affected poorly. Call it bad luck, call it a duplicate content penalty, call it whatever, but it’s possible that your site can be affected by other people republishing your content.
Is this temporary? Will it be “fixed” at the next google update? I’m sure google will try and do better, however there is no need to take chances. Let’s do what we can to prevent duplicate content.
How We Can Prevent Duplicate Content
1. Don’t publish FULL rss feeds. – I like reading full rss feeds as much as the next reader, however this is giving your content away with a nice bow on it to scrapers.
2. Check For Stolen Content Regularly – Google and search for phrases form your blog posts with quotes to find copies of your work. Spot check from time to time.
3. Remove duplicate content from other sites. Half of the time an email to the webmaster will have the content removed. You might also have to contact the violators web host. It’s a tough game to get rid of duplicated content on other sites but email is a good start.
What About Duplicate Content On The Same Site
Sometimes you can have multiple copies of the same article on the same site, like having it on the home page, permanent link, archives, popular page, etc.
Google is getting better at picking the best version however I would still try and reduce duplicate content as much as possible.
Never trust a google bot will pick the same copy to show to users as you want shown to users.
The easiest way to exclude duplicate content on the same site for search engines is to use the NOINDEX tag on pages like categories and archives. The easiest way to do that is to use a plugin like the Yoast SEO.
Duplicate Content Is NOT A Myth
Websites are affected by multiple copies of pages being found around the web. Not always but it’s a potential problem. You can do things to protect yourself like not publishing full rss feeds, checking regularly, and using plugins.
Not every duplicated web page is affected badly, but do you really want to make the gamble it won’t hurt you and ignore it?
here is matt cutts talking about duplicate content