Home SEO Tools FAQs Guest post Contact Donate

Prevent duplicate content by blocking archives from search engines

| 62 comments

Why would you want to block archive pages anyway? The answer is to prevent Google from penalizing your blog due to duplicate content. You see archive pages do not have contents of their own. What you see on them are duplicates, taken from individual post pages. And if both pages (the individual post and archive pages having the same content) are indexed by Google, that constitutes duplicate content.

That’s exactly the case for this blog –duplicate content due to archive pages. And I paid the price for it during the last Google PageRank update. This blog’s PR went down from 4 to 3.

archive duplicate in SERP

The problem probably started when I added Blogger Archive widget to the sidebar. This widget provided access for web crawlers to the archive pages, which eventually led to the pages being indexed by Google.

 

Preventing duplicate content

You can prevent duplicate content by telling search engines not to index archive pages. This can be achieved by adding a “noindex” robots meta tag to the archive pages (and archive pages only).

Here’s how:

  1. Login to your Blogger account.
  2. Go to Dashboard > Design > Edit HTML.
  3. Find the <head> tag and add the following code below it:
<b:if cond='data:blog.pageType == &quot;archive&quot;'>
<meta content='noindex,noarchive' name='robots'/>
</b:if> 

 

What happens to the archive pages already listed in SERPs?

The archive pages will eventually drop off from search result pages. However if you want them removed quickly, remove them from SERPs using the URL removal tool.

62 comments to "Prevent duplicate content by blocking archives from search engines"

Yana July 11, 2011 9:20 PM    

Hi Greenlava..

This is very useful. So many times founded contents duplicated, thank you for sharing this!

Yana July 11, 2011 9:23 PM    

Wait.. what if you put "NOINDEX" means your blog will not indexed by Google???

zulkbo July 12, 2011 2:29 AM    

salam..
saya pernah buat satu entri mengenai ini dahulu kerana saya pernah terbaca di laman forum barat mengatakan entri kita akan lebih kekal lama sekiranya kita off blog arkib..saya tak tahu dan tak berani buat, sehinggalah saya baca entri anda ini terima kasih Bro..

http://www.zulkbo.com/2011/01/no-archive-perlukah-tip-blogging.html

Omemee News July 12, 2011 11:24 AM    

Great tip! Used it right away on our village newspaper !

Greenlava July 12, 2011 2:58 PM    

@Yana
This code will only prevent indexing of archives pages. The rest of the blog will be indexed as usual.

Greenlava July 12, 2011 3:12 PM    

@zulkbo
Cara Zulkbo tu pun boleh jugak. Tapi kalau nak pasang archive widget, kena onkan pastu apply meta tag spt di atas.

Yana July 12, 2011 7:58 PM    

@ Greenlave: Are you sure? What's the different between of them? I still get confused, youre meta tag contains a robot, and NOINDEX. You know, I have asked on Google webmaster forum, ans have asked about that "No Index" meta tag robot, then John from Google webmaster said, I shouldn't put meta tag robot with titled "NOINDEX".. so that I removed and I replaced them immediatelly...

Please let me know which one is right? :S

Terima kasih sebelumnya!

ohcikgu July 13, 2011 1:16 AM    

i see. I never know about that. This is new info to me.

Chris July 13, 2011 4:02 PM    

I just added this code to my blog. How much take it for activate. Because Google still show my blog "archive pages" It really good trick. Many thanks!

Greenlava July 13, 2011 9:05 PM    

@Yana
Yes I'm sure. The meta root code is wrapped in an archive page conditional tag, so it won't affect other pages.

@Chris
The code activates instantly, but SERPs will only update the next time Google bot crawl those pages.
If you're in a hurry, use the URL removal tool.

Chris July 15, 2011 5:04 PM    

Achieve page manually removing is hard process. Never-mind thanks for your reply!

Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.

Classier Corn July 24, 2011 7:41 PM    

Thanks for this useful tip!
Best Regards
Classier Corn

Osho @ Latest Tips And Tricks July 30, 2011 3:20 PM    

Thanks Greenlava For Tips :)

Kun,  July 30, 2011 11:45 PM    

any idea how to use this on wordpress?

Greenlava July 31, 2011 6:23 PM    

@Kun
Install All in One SEO plugin. You then simply tick the "Use noindex for Archives" checkbox.

Hannah August 10, 2011 7:10 AM    

If you use a table of content it would be the same thing as the archive no?

Can you use this on a static page for a table of content as well?

Greenlava August 11, 2011 12:49 PM    

@Hannah
No it wouldn't be the same thing. To the search engine, your table of content IS unique. (And even if it isn't, the table of content is just one page, so it won't harm your site like archive does).

Aftab Ahmad August 11, 2011 6:14 PM    

This Is A GREAT Blog...

Rajib Kumar August 14, 2011 9:49 PM    

Nice and helpful tutorial. I am going to use this. Thanks for share.

Guduru Pradeep Kumar August 18, 2011 2:38 AM    

Thanks for sharing nice tutorial. Keep it up dude

Rika Susan At Home August 19, 2011 8:09 PM    

Thanks for this tip. I have been wondering how to prevent Google from indexing the archives.

Mary August 23, 2011 7:50 AM    

I was sceptical when I saw this issue mentioned in the help-forum first, but recently saw evidence of the very problem, and just applied your fix to the problem blog now.

Isn't it rather crazy that we have to do this? It seems odd that Google manages to tell itself not to index based on Labels, but we have to tell it not to do Archive.

teach August 26, 2011 4:56 PM    

the same experience here encountered this problem also.. having duplicate content for my articles

ponselbaru August 28, 2011 11:59 PM    

Thanks Greenlava For Tips :)

Naser @ Best Tips For Blogging August 29, 2011 4:34 PM    

I use all in one SEO pack for this in my wordpress blog

Its absolutely incredible sharing, and honestly before read your post i was unaware about it, now after read your post i can say that i have good knowledge about it.

Garold walker September 19, 2011 4:26 PM    

Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.

Greenlava September 20, 2011 9:38 AM    

@Garold walker
No, it doesn't affect Alexa.

LC Hunt September 26, 2011 2:25 PM    

Thank you for this info, but I am confused because Google says the following on their site: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."

Doesn't this mean that we should handle the noindex / noarchive of the Archive pages in this fashion?

Greenlava September 27, 2011 8:57 PM    

@LC Hunt
I believe the excerpt refers to blocking crawlers from accessing a web page using "Disallow" in robots.txt as explained here. This method is discouraged because it doesn't really prevent the duplicate page from being indexed (ie. appear in search results). You need to to worry about this robots thing Blogger don't give access to it.
Noindex meta tag on the other hand allows the bots to crawl the page but FORCE them not to index it. Search engines won't see the unindexed page, hence duplication won't take place.
As for rel="canonical", the canonical link only points to the original page. It doesn't promise to do anything. As I understand it, the duplicate page will get indexed and the page may or may not appear in search results. Besides, it's nearly impossible to implement rel="canonical" in this particular case (and due to Blogger's limitation).
So in the end, I believe the noindex meta tag is the best tool for the job.

ayurvedic india October 7, 2011 4:07 PM    

found very useful article i am going to use this for my blog

Forex Guard October 10, 2011 7:14 PM    

Great one I will use it , my blog still new and google never visit it..

Rohit shukla November 7, 2011 2:51 AM    

I follow your steps and i completed it successfully. I hope it will prevent duplicate content problem in my blog. And today i implemented also post title before blog title code in my blog.

iMi November 9, 2011 4:55 PM    

nice

shah Blogger November 14, 2011 8:27 PM    

Salam greenlarva.. Nice info. Saya pun mengelakkan duplicate content, mmg patut buat begini. anyway saya terus matikan fungsi archives, dan guna iniisiatif lain.

More extra, dalam satu2 keadaan boleh juga letak rel="canonical" ;)

nice nice dude.

Asif Icbal December 2, 2011 12:13 AM    

thank you so much for this SEO tip.

wbxpress December 3, 2011 12:47 AM    

I have followed your instruction and now i can see that the number of google index is drastically reduced. Still I am happy that only quality links will appear in search engine of my blog. thank you.

Best SEO Company December 17, 2011 3:47 AM    

This is such helpful information. THANK YOU for writing this and sharing your knowledge with the world. You just made my day. :) Keep up the great work and have a great day!

Nirmal Tamilnadu December 17, 2011 8:39 PM    

Thanks for the tip. I have added this code in my blog.

Steve Daily December 18, 2011 5:26 AM    

Thanks for the tip. Well-written and helpful.

Aj Banda January 6, 2012 10:11 AM    

Thanks for this tip. I never knew that widgets could affect the SERP like that.

Laurence Norah January 7, 2012 5:19 PM    

Great help. I'd been wondering why people kept hitting up my archive pages instead of my main post pages from searches, and this should solve that issue :) Many thanks!

Clommot Fatan January 12, 2012 4:12 PM    

Thanks for the tips, I hope after implementing this trick, The index will not show my archive anymore.

Anonymous,  January 14, 2012 5:39 PM    

my site todayprice.in ip ut code in my html but serach engine appearing archieves

Rudy Hartono January 19, 2012 6:02 PM    

terimakasih atas infonya yang bermanfaat dan mudah dicerna

Supriya.P @finediningindian.com February 3, 2012 6:57 AM    

this is a good suggestion it prevent our site from downgrading

Dungeoncrawler February 6, 2012 10:20 PM    

Hi, I've wondered this issue for quite some time. I didn't want to remove my Blog Archives widget to make the blog hard to browse. Thanks for a great tip! I'm putting this on the test right now :)

Fatan February 29, 2012 7:19 AM    

Thank you sir, Now I will just give it a test to see what will happen.

Sagar Nargolkar March 9, 2012 2:46 PM    

How do I block search engines from indexing labels?

Greenlava March 11, 2012 8:11 PM    

@Sagar Nargolka
Label-search pages are blocked by Blogger by default in robots.txt

napnipnop March 12, 2012 10:23 PM    

Hi, I really need your help. I want to avoid several pages on my blogger blog from getting indexed and crawl by robot. Could you give me some tips about this?

Greenlava March 27, 2012 10:19 AM    

@napnipnop
Use this for each page you don't want to index:
[b:if cond='data:blog.url == "PUT_PAGE_URL_HERE"']
[meta content='noindex,noarchive' name='robots'/]
[/b:if]

nptechs.blogspot.com April 13, 2012 9:51 PM    

thanks green lava but how to block label page from search engine ?

Greenlava April 16, 2012 11:05 AM    

@nptechs.blogspot.com
Label-search pages are already blocked via Blogger's robots.txt by default.

nptechs.blogspot.com April 30, 2012 7:39 PM    

@ Greenlava if you think so see this in my blog there label search pages are indexed by google see here See This

Greenlava April 30, 2012 11:30 PM    

@nptechs.blogspot.com
This is your robots.txt:
User-agent: *
Allow: /

which is different from Blogger default's:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /

It looks like you're using a custom robots.txt (via Settings > Search Preferences > Crawlers and indexing).

Anonymous,  May 3, 2012 2:46 AM    

Hey Greenlava,

Will this help hide the archive from being shown instead of individual posts on google search?

e.g. http://i.imgur.com/DAVqO.jpg

Greenlava May 3, 2012 7:24 PM    

@Anonymous
Yes exactly. Once you add the meta tag, those monthly archive links will be removed (albeit not immediately) from search results.

Anonymous,  May 3, 2012 8:11 PM    

^ great! thank you very much.

AbHi May 6, 2012 2:29 PM    

Excellent article. Seems i have been penalized by Google on account of having archives in search index. Now i have removed them..Hoping to see my traffic rise again :-) Earlier i associated this with Penguin/panda penalty.. little worries BTW

Geekonweb May 18, 2012 1:05 PM    

This is exactly the same, which I was looking.

Thanks for your tip.

We love to hear from you! Leave us a comment.

To ensure proper display, HTML/XML/Javascript need to be escaped first using this escape tool. Then paste the escaped code here.

If your question is unrelated to this article, please post on our Facebook page.

Pls share this page

Get this
Click to go to top Click to comment