Prevent duplicate content by blocking archives from search engines

Why would you want to block archive pages anyway? The answer is to prevent Google from penalizing your blog due to duplicate content. You see archive pages do not have contents of their own. What you see on them are duplicates, taken from individual post pages. And if both pages (the individual post and archive pages having the same content) are indexed by Google, that constitutes duplicate content.

That’s exactly the case for this blog –duplicate content due to archive pages. And I paid the price for it during the last Google PageRank update. This blog’s PR went down from 4 to 3.

archive duplicate in SERP

The problem probably started when I added Blogger Archive widget to the sidebar. This widget provided access for web crawlers to the archive pages, which eventually led to the pages being indexed by Google.

 

Preventing duplicate content

You can prevent duplicate content by telling search engines not to index archive pages. This can be achieved by adding a “noindex” robots meta tag to the archive pages (and archive pages only).

Here’s how:

  1. Login to your Blogger account.
  2. Go to Dashboard > Design > Edit HTML.
  3. Find the <head> tag and add the following code below it:
<b:if cond='data:blog.pageType == &quot;archive&quot;'>
<meta content='noindex,noarchive' name='robots'/>
</b:if> 

 

What happens to the archive pages already listed in SERPs?

The archive pages will eventually drop off from search result pages. However if you want them removed quickly, remove them from SERPs using the URL removal tool.

94 comments to "Prevent duplicate content by blocking archives from search engines"

Unknown July 11, 2011 at 9:20 PM    

Hi Greenlava..

This is very useful. So many times founded contents duplicated, thank you for sharing this!

Unknown July 11, 2011 at 9:23 PM    

Wait.. what if you put "NOINDEX" means your blog will not indexed by Google???

zulkbo July 12, 2011 at 2:29 AM    

salam..
saya pernah buat satu entri mengenai ini dahulu kerana saya pernah terbaca di laman forum barat mengatakan entri kita akan lebih kekal lama sekiranya kita off blog arkib..saya tak tahu dan tak berani buat, sehinggalah saya baca entri anda ini terima kasih Bro..

http://www.zulkbo.com/2011/01/no-archive-perlukah-tip-blogging.html

Omemee News July 12, 2011 at 11:24 AM    

Great tip! Used it right away on our village newspaper !

Greenlava July 12, 2011 at 2:58 PM    

@Yana
This code will only prevent indexing of archives pages. The rest of the blog will be indexed as usual.

Greenlava July 12, 2011 at 3:12 PM    

@zulkbo
Cara Zulkbo tu pun boleh jugak. Tapi kalau nak pasang archive widget, kena onkan pastu apply meta tag spt di atas.

Unknown July 12, 2011 at 7:58 PM    

@ Greenlave: Are you sure? What's the different between of them? I still get confused, youre meta tag contains a robot, and NOINDEX. You know, I have asked on Google webmaster forum, ans have asked about that "No Index" meta tag robot, then John from Google webmaster said, I shouldn't put meta tag robot with titled "NOINDEX".. so that I removed and I replaced them immediatelly...

Please let me know which one is right? :S

Terima kasih sebelumnya!

ohcikgu July 13, 2011 at 1:16 AM    

i see. I never know about that. This is new info to me.

Chris July 13, 2011 at 4:02 PM    

I just added this code to my blog. How much take it for activate. Because Google still show my blog "archive pages" It really good trick. Many thanks!

Greenlava July 13, 2011 at 9:05 PM    

@Yana
Yes I'm sure. The meta root code is wrapped in an archive page conditional tag, so it won't affect other pages.

@Chris
The code activates instantly, but SERPs will only update the next time Google bot crawl those pages.
If you're in a hurry, use the URL removal tool.

Chris July 15, 2011 at 5:04 PM    

Achieve page manually removing is hard process. Never-mind thanks for your reply!

Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.

Classier Corn July 24, 2011 at 7:41 PM    

Thanks for this useful tip!
Best Regards
Classier Corn

Osho @ Latest Tips And Tricks July 30, 2011 at 3:20 PM    

Thanks Greenlava For Tips :)

Kun,  July 30, 2011 at 11:45 PM    

any idea how to use this on wordpress?

Greenlava July 31, 2011 at 6:23 PM    

@Kun
Install All in One SEO plugin. You then simply tick the "Use noindex for Archives" checkbox.

Hannah August 10, 2011 at 7:10 AM    

If you use a table of content it would be the same thing as the archive no?

Can you use this on a static page for a table of content as well?

Greenlava August 11, 2011 at 12:49 PM    

@Hannah
No it wouldn't be the same thing. To the search engine, your table of content IS unique. (And even if it isn't, the table of content is just one page, so it won't harm your site like archive does).

Aftab Ahmad August 11, 2011 at 6:14 PM    

This Is A GREAT Blog...

Rajib Kumar August 14, 2011 at 9:49 PM    

Nice and helpful tutorial. I am going to use this. Thanks for share.

Guduru Pradeep Kumar August 18, 2011 at 2:38 AM    

Thanks for sharing nice tutorial. Keep it up dude

Rika Susan At Home August 19, 2011 at 8:09 PM    

Thanks for this tip. I have been wondering how to prevent Google from indexing the archives.

Mary August 23, 2011 at 7:50 AM    

I was sceptical when I saw this issue mentioned in the help-forum first, but recently saw evidence of the very problem, and just applied your fix to the problem blog now.

Isn't it rather crazy that we have to do this? It seems odd that Google manages to tell itself not to index based on Labels, but we have to tell it not to do Archive.

teach August 26, 2011 at 4:56 PM    

the same experience here encountered this problem also.. having duplicate content for my articles

ponselbaru August 28, 2011 at 11:59 PM    

Thanks Greenlava For Tips :)

Naser @ Best Tips For Blogging August 29, 2011 at 4:34 PM    

I use all in one SEO pack for this in my wordpress blog

Its absolutely incredible sharing, and honestly before read your post i was unaware about it, now after read your post i can say that i have good knowledge about it.

Garold walker September 19, 2011 at 4:26 PM    

Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.

Greenlava September 20, 2011 at 9:38 AM    

@Garold walker
No, it doesn't affect Alexa.

Lea Marie September 26, 2011 at 2:25 PM    

Thank you for this info, but I am confused because Google says the following on their site: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."

Doesn't this mean that we should handle the noindex / noarchive of the Archive pages in this fashion?

Greenlava September 27, 2011 at 8:57 PM    

@LC Hunt
I believe the excerpt refers to blocking crawlers from accessing a web page using "Disallow" in robots.txt as explained here. This method is discouraged because it doesn't really prevent the duplicate page from being indexed (ie. appear in search results). You need to to worry about this robots thing Blogger don't give access to it.
Noindex meta tag on the other hand allows the bots to crawl the page but FORCE them not to index it. Search engines won't see the unindexed page, hence duplication won't take place.
As for rel="canonical", the canonical link only points to the original page. It doesn't promise to do anything. As I understand it, the duplicate page will get indexed and the page may or may not appear in search results. Besides, it's nearly impossible to implement rel="canonical" in this particular case (and due to Blogger's limitation).
So in the end, I believe the noindex meta tag is the best tool for the job.

ayurvedic india October 7, 2011 at 4:07 PM    

found very useful article i am going to use this for my blog

Forex Guard October 10, 2011 at 7:14 PM    

Great one I will use it , my blog still new and google never visit it..

Rohit shukla November 7, 2011 at 2:51 AM    

I follow your steps and i completed it successfully. I hope it will prevent duplicate content problem in my blog. And today i implemented also post title before blog title code in my blog.

shah Blogger November 14, 2011 at 8:27 PM    

Salam greenlarva.. Nice info. Saya pun mengelakkan duplicate content, mmg patut buat begini. anyway saya terus matikan fungsi archives, dan guna iniisiatif lain.

More extra, dalam satu2 keadaan boleh juga letak rel="canonical" ;)

nice nice dude.

wbxpress December 3, 2011 at 12:47 AM    

I have followed your instruction and now i can see that the number of google index is drastically reduced. Still I am happy that only quality links will appear in search engine of my blog. thank you.

Best SEO Company December 17, 2011 at 3:47 AM    

This is such helpful information. THANK YOU for writing this and sharing your knowledge with the world. You just made my day. :) Keep up the great work and have a great day!

Nirmal Tamilnadu December 17, 2011 at 8:39 PM    

Thanks for the tip. I have added this code in my blog.

Steve Daily December 18, 2011 at 5:26 AM    

Thanks for the tip. Well-written and helpful.

AJ Banda January 6, 2012 at 10:11 AM    

Thanks for this tip. I never knew that widgets could affect the SERP like that.

Laurence January 7, 2012 at 5:19 PM    

Great help. I'd been wondering why people kept hitting up my archive pages instead of my main post pages from searches, and this should solve that issue :) Many thanks!

Clommot Fatan January 12, 2012 at 4:12 PM    

Thanks for the tips, I hope after implementing this trick, The index will not show my archive anymore.

Anonymous,  January 14, 2012 at 5:39 PM    

my site todayprice.in ip ut code in my html but serach engine appearing archieves

Anonymous,  January 19, 2012 at 6:02 PM    

terimakasih atas infonya yang bermanfaat dan mudah dicerna

Supriya.P @finediningindian.com February 3, 2012 at 6:57 AM    

this is a good suggestion it prevent our site from downgrading

Tane Norther February 6, 2012 at 10:20 PM    

Hi, I've wondered this issue for quite some time. I didn't want to remove my Blog Archives widget to make the blog hard to browse. Thanks for a great tip! I'm putting this on the test right now :)

Fatan February 29, 2012 at 7:19 AM    

Thank you sir, Now I will just give it a test to see what will happen.

Sahil Patel March 9, 2012 at 2:46 PM    

How do I block search engines from indexing labels?

Greenlava March 11, 2012 at 8:11 PM    

@Sagar Nargolka
Label-search pages are blocked by Blogger by default in robots.txt

napnipnop March 12, 2012 at 10:23 PM    

Hi, I really need your help. I want to avoid several pages on my blogger blog from getting indexed and crawl by robot. Could you give me some tips about this?

Greenlava March 27, 2012 at 10:19 AM    

@napnipnop
Use this for each page you don't want to index:
[b:if cond='data:blog.url == "PUT_PAGE_URL_HERE"']
[meta content='noindex,noarchive' name='robots'/]
[/b:if]

nptechs.blogspot.com April 13, 2012 at 9:51 PM    

thanks green lava but how to block label page from search engine ?

Greenlava April 16, 2012 at 11:05 AM    

@nptechs.blogspot.com
Label-search pages are already blocked via Blogger's robots.txt by default.

nptechs.blogspot.com April 30, 2012 at 7:39 PM    

@ Greenlava if you think so see this in my blog there label search pages are indexed by google see here See This

Greenlava April 30, 2012 at 11:30 PM    

@nptechs.blogspot.com
This is your robots.txt:
User-agent: *
Allow: /

which is different from Blogger default's:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /

It looks like you're using a custom robots.txt (via Settings > Search Preferences > Crawlers and indexing).

Anonymous,  May 3, 2012 at 2:46 AM    

Hey Greenlava,

Will this help hide the archive from being shown instead of individual posts on google search?

e.g. http://i.imgur.com/DAVqO.jpg

Greenlava May 3, 2012 at 7:24 PM    

@Anonymous
Yes exactly. Once you add the meta tag, those monthly archive links will be removed (albeit not immediately) from search results.

Anonymous,  May 3, 2012 at 8:11 PM    

^ great! thank you very much.

Abhi May 6, 2012 at 2:29 PM    

Excellent article. Seems i have been penalized by Google on account of having archives in search index. Now i have removed them..Hoping to see my traffic rise again :-) Earlier i associated this with Penguin/panda penalty.. little worries BTW

AK May 18, 2012 at 1:05 PM    

This is exactly the same, which I was looking.

Thanks for your tip.

Adam from KeywordLuv May 28, 2012 at 1:07 AM    

Hi,
I understand the point very well that archive pages cause duplicate content to be indexed by search engines and we can eliminate this problem by no-index. But I have another question regarding this. Has it anything related with PR? Will it cause decrease or increase in PR?

Greenlava June 5, 2012 at 2:35 AM    

@Adam from KeywordLuv
No-index has no effect on PR.

Kumar June 5, 2012 at 11:16 PM    

Nice Article.Thank you for sharing.

michkhoo June 7, 2012 at 12:15 PM    

I found out that "archve months" are my top pages in google index so i'm going to use this for better seo optimization, it's a pitty that google blogger templates won't use this automatically as archive months deosn't seem to generate much traffic in google analytics.

Shafiq Mujahid June 28, 2012 at 1:18 PM    

awesome.. so usee full.. thank's ^^

Unknown July 4, 2012 at 1:58 PM    

Hello!

At SETTINGS> SEARCH PREFERENCES> CUSTOM HEADER TAGS,

I already ticked 'noindex' at "Archive and Search pages"..
I also entered the code you posted.

will it have conflicts?

Thanks

Greenlava July 7, 2012 at 1:35 PM    

@carlo
No it will no conflict.

Dracuula July 21, 2012 at 6:18 PM    

why you are not using noindex. noarchive yourself?? i have seen your archive pages but couldn't find any.

Greenlava July 23, 2012 at 6:58 PM    

@Dracuula
I am using them, but with another method: Custom robots header tags which is accessible via Settings > Search preferences.
Blogger introduced this feature in March 2012. It basically does the same thing, so you can use either method.

Abhi July 31, 2012 at 10:54 PM    

hi greenlava. Is it necessary to put this piece of code right after the head tag ?
Can i move this code a little bit down within the head tag ?
plz reply

Greenlava August 1, 2012 at 10:14 AM    

@AbHi Shek
You can place it anywhere within the head tag.

Angraj October 25, 2012 at 7:00 PM    

I hope this will work for me. Thanks!!!

software promo November 23, 2012 at 3:08 PM    

you suggested 'noarchive' in the robots tag, what does noarchive means? is this necessary?

Greenlava November 23, 2012 at 3:32 PM    

@Nina Octoviana
It will prevent Google from showing a cache copy of your archive pages in search results.

Umair Butt January 26, 2013 at 12:50 PM    

This is very nice tool for webmastes. Duplicate content is a big issue in search engine. I have applied this in my blog. Thanks a lot for sharing this.

gwilson February 3, 2013 at 5:26 AM    

Thanks much. This post has been very helpful for my and my blogger blog.

Leo February 13, 2013 at 5:53 AM    

excelente :D gracias

Unknown February 16, 2013 at 12:48 PM    

awesome, this is what i'm looking for, luckily my archive still not to many indexed by google (only two of them) thanks for the nice info

Anonymous,  February 26, 2013 at 11:09 PM    

Hello
i have same duplicate issue in HTML Suggestion in Webmaster Tool
. i am also using blogger but Duplicate tag issue automatically create my post end url examples

christmas-twister-2012.html?m=0

christmas-twister-2012.html?m=1

when i configure the url parameter i shocked m parameter already added in my webmaster tool and set into let’s google decided and no delete option to remove this m parameter. could you please tell me What is best setting for this parameter ” m ” in URL Parameter in webmaster tool. can i set this parameter m into ” No URL ”

Really i am very confuse so what i do now what is best setting the remove all these duplicates .

Please Assist me …………
Sania

Greenlava February 27, 2013 at 9:57 PM    

@Anonymous
Yes setting it to "No URLs" should solve the duplicate issue.

jula February 27, 2013 at 11:26 PM    

hello,

thank you for sharing.i really need this information. yesterday when i checked my url in google webmaster tool it says you have 15 duplicate description and 32 duplicate title. All these duplicates are because of lebels and archieves. let me to copy the code and paste into my template. again thank you for helping.

Waseem Rahmani April 11, 2013 at 12:05 AM    

Could it be fixed if we choose not to display the post body on archive pages by using css??

Greenlava April 12, 2013 at 11:46 PM    

@Waseem Rahmani
With CSS you only hides the content (from human), search engine spiders will still be able to see it.

Waseem Rahmani April 15, 2013 at 1:18 AM    

@Greenlava
What about using instead of on archive and label pages? The post.snippet tag brings in only a few characters of the blog post.

or you could also choose not to include either of and on archive pages and the result would be archive pages containing only links to the blogposts. Would it give any SEO advantage, I mean the link juice?

Sorry for a lot of questions, I'm quite new on blogger and advises from blogger ninjas like you would be great.:)

Seine April 28, 2013 at 11:21 PM    

Thanks a lot for sharing this info. I hope it helps.

Anonymous,  May 26, 2013 at 6:28 PM    

Really great blog with all the seo optimized posts..
This site really help my blog to improve its visibility in search results...

Ankur Choudhary June 17, 2013 at 2:37 AM    

How to prevent duplicate content by blocking label links?
I have a lot of label links indexed in google.

Greenlava June 18, 2013 at 12:50 PM    

@Ankur Choudhary
To exclude Archive and Label pages from index:
1. Go to Settings > Search Preferences > Crawlers and Indexing > Custom robots header tags.
2. Click Edit and enable it by checking "Yes" radio button.
3. Check 'noindex' and 'noarchive' checkboxes under "Archive and Search pages".

Shariful Islam Razu September 6, 2013 at 1:44 AM    

Hello Greenlava, First thanks for your helpful article for newbie like me. I successfully use it on my blog. Now i have another question that how i block my label from search engines. I found that label index is not helpful for google.

Greenlava September 24, 2013 at 12:34 PM    

@Shariful Islam Razu
Read reply #90.

Oz October 17, 2013 at 3:58 AM    

Very informative and helpful.
Thanks for sharing.

Arshad Amin January 7, 2014 at 3:38 AM    

Shariful islam says blogger indexd labels arent good for google, what's your take on that GREENLAVA, I will go with your recommendation, though I excluded archive and labels from being indexd on google as per your given approach to comment number 90, but still I want to make sure its not good for google.

Kindly share your thoughts on this one and I AM YOUR FAN FOREVER!!!

Arshad Amin January 7, 2014 at 4:07 AM    

And last thing applying your recommendation as you suggested in comment number 90 for excluding archives and labels, I did that. Though in Settings > Search Preferences > Crawlers and Indexing > Custom robots header tags was Disabled before that.

I just want to know if I need to check any other boxes in there or leave everything else as it is ? Please don't mind I just don't know about all these things being a newbie :)

Greenlava January 15, 2014 at 10:33 AM    

@Arshad Amin
Indexed labels add more pages to your SERP. However they risk duplicate contents and add extra steps for Googlers before reaching the post.
With labels page: SERP > label page > scroll and find > the post
Without labels page: SERP > the post

Just check the 'noindex' and 'noarchive' checkboxes under "Archive and Search pages". Don't touch anything else.