Prevent duplicate content by blocking archives from search engines
Why would you want to block archive pages anyway? The answer is to prevent Google from penalizing your blog due to duplicate content. You see archive pages do not have contents of their own. What you see on them are duplicates, taken from individual post pages. And if both pages (the individual post and archive pages having the same content) are indexed by Google, that constitutes duplicate content.
That’s exactly the case for this blog –duplicate content due to archive pages. And I paid the price for it during the last Google PageRank update. This blog’s PR went down from 4 to 3.

The problem probably started when I added Blogger Archive widget to the sidebar. This widget provided access for web crawlers to the archive pages, which eventually led to the pages being indexed by Google.
Preventing duplicate content
You can prevent duplicate content by telling search engines not to index archive pages. This can be achieved by adding a “noindex” robots meta tag to the archive pages (and archive pages only).
Here’s how:
- Login to your Blogger account.
- Go to Dashboard > Design > Edit HTML.
- Find the
<head>tag and add the following code below it:
<b:if cond='data:blog.pageType == "archive"'> <meta content='noindex,noarchive' name='robots'/> </b:if>
What happens to the archive pages already listed in SERPs?
The archive pages will eventually drop off from search result pages. However if you want them removed quickly, remove them from SERPs using the URL removal tool.
62 comments to "Prevent duplicate content by blocking archives from search engines"
Hi Greenlava..
This is very useful. So many times founded contents duplicated, thank you for sharing this!
Wait.. what if you put "NOINDEX" means your blog will not indexed by Google???
salam..
saya pernah buat satu entri mengenai ini dahulu kerana saya pernah terbaca di laman forum barat mengatakan entri kita akan lebih kekal lama sekiranya kita off blog arkib..saya tak tahu dan tak berani buat, sehinggalah saya baca entri anda ini terima kasih Bro..
http://www.zulkbo.com/2011/01/no-archive-perlukah-tip-blogging.html
Great tip! Used it right away on our village newspaper !
@Yana
This code will only prevent indexing of archives pages. The rest of the blog will be indexed as usual.
@zulkbo
Cara Zulkbo tu pun boleh jugak. Tapi kalau nak pasang archive widget, kena onkan pastu apply meta tag spt di atas.
@ Greenlave: Are you sure? What's the different between of them? I still get confused, youre meta tag contains a robot, and NOINDEX. You know, I have asked on Google webmaster forum, ans have asked about that "No Index" meta tag robot, then John from Google webmaster said, I shouldn't put meta tag robot with titled "NOINDEX".. so that I removed and I replaced them immediatelly...
Please let me know which one is right? :S
Terima kasih sebelumnya!
i see. I never know about that. This is new info to me.
I just added this code to my blog. How much take it for activate. Because Google still show my blog "archive pages" It really good trick. Many thanks!
@Yana
Yes I'm sure. The meta root code is wrapped in an archive page conditional tag, so it won't affect other pages.
@Chris
The code activates instantly, but SERPs will only update the next time Google bot crawl those pages.
If you're in a hurry, use the URL removal tool.
Achieve page manually removing is hard process. Never-mind thanks for your reply!
Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.
Thanks for this useful tip!
Best Regards
Classier Corn
Thanks Greenlava For Tips :)
any idea how to use this on wordpress?
@Kun
Install All in One SEO plugin. You then simply tick the "Use noindex for Archives" checkbox.
If you use a table of content it would be the same thing as the archive no?
Can you use this on a static page for a table of content as well?
@Hannah
No it wouldn't be the same thing. To the search engine, your table of content IS unique. (And even if it isn't, the table of content is just one page, so it won't harm your site like archive does).
This Is A GREAT Blog...
Nice and helpful tutorial. I am going to use this. Thanks for share.
Thanks for sharing nice tutorial. Keep it up dude
Thanks for this tip. I have been wondering how to prevent Google from indexing the archives.
I was sceptical when I saw this issue mentioned in the help-forum first, but recently saw evidence of the very problem, and just applied your fix to the problem blog now.
Isn't it rather crazy that we have to do this? It seems odd that Google manages to tell itself not to index based on Labels, but we have to tell it not to do Archive.
the same experience here encountered this problem also.. having duplicate content for my articles
Thanks Greenlava For Tips :)
I use all in one SEO pack for this in my wordpress blog
Its absolutely incredible sharing, and honestly before read your post i was unaware about it, now after read your post i can say that i have good knowledge about it.
Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.
@Garold walker
No, it doesn't affect Alexa.
Thank you for this info, but I am confused because Google says the following on their site: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."
Doesn't this mean that we should handle the noindex / noarchive of the Archive pages in this fashion?
@LC Hunt
I believe the excerpt refers to blocking crawlers from accessing a web page using "Disallow" in robots.txt as explained here. This method is discouraged because it doesn't really prevent the duplicate page from being indexed (ie. appear in search results). You need to to worry about this robots thing Blogger don't give access to it.
Noindex meta tag on the other hand allows the bots to crawl the page but FORCE them not to index it. Search engines won't see the unindexed page, hence duplication won't take place.
As for rel="canonical", the canonical link only points to the original page. It doesn't promise to do anything. As I understand it, the duplicate page will get indexed and the page may or may not appear in search results. Besides, it's nearly impossible to implement rel="canonical" in this particular case (and due to Blogger's limitation).
So in the end, I believe the noindex meta tag is the best tool for the job.
found very useful article i am going to use this for my blog
Great one I will use it , my blog still new and google never visit it..
I follow your steps and i completed it successfully. I hope it will prevent duplicate content problem in my blog. And today i implemented also post title before blog title code in my blog.
nice
Salam greenlarva.. Nice info. Saya pun mengelakkan duplicate content, mmg patut buat begini. anyway saya terus matikan fungsi archives, dan guna iniisiatif lain.
More extra, dalam satu2 keadaan boleh juga letak rel="canonical" ;)
nice nice dude.
thank you so much for this SEO tip.
I have followed your instruction and now i can see that the number of google index is drastically reduced. Still I am happy that only quality links will appear in search engine of my blog. thank you.
This is such helpful information. THANK YOU for writing this and sharing your knowledge with the world. You just made my day. :) Keep up the great work and have a great day!
Thanks for the tip. I have added this code in my blog.
Thanks for the tip. Well-written and helpful.
Thanks for this tip. I never knew that widgets could affect the SERP like that.
Great help. I'd been wondering why people kept hitting up my archive pages instead of my main post pages from searches, and this should solve that issue :) Many thanks!
Thanks for the tips, I hope after implementing this trick, The index will not show my archive anymore.
my site todayprice.in ip ut code in my html but serach engine appearing archieves
terimakasih atas infonya yang bermanfaat dan mudah dicerna
this is a good suggestion it prevent our site from downgrading
Hi, I've wondered this issue for quite some time. I didn't want to remove my Blog Archives widget to make the blog hard to browse. Thanks for a great tip! I'm putting this on the test right now :)
Thank you sir, Now I will just give it a test to see what will happen.
How do I block search engines from indexing labels?
@Sagar Nargolka
Label-search pages are blocked by Blogger by default in robots.txt
Hi, I really need your help. I want to avoid several pages on my blogger blog from getting indexed and crawl by robot. Could you give me some tips about this?
@napnipnop
Use this for each page you don't want to index:
[b:if cond='data:blog.url == "PUT_PAGE_URL_HERE"']
[meta content='noindex,noarchive' name='robots'/]
[/b:if]
thanks green lava but how to block label page from search engine ?
@nptechs.blogspot.com
Label-search pages are already blocked via Blogger's robots.txt by default.
@ Greenlava if you think so see this in my blog there label search pages are indexed by google see here See This
@nptechs.blogspot.com
This is your robots.txt:
User-agent: *
Allow: /
which is different from Blogger default's:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
It looks like you're using a custom robots.txt (via Settings > Search Preferences > Crawlers and indexing).
Hey Greenlava,
Will this help hide the archive from being shown instead of individual posts on google search?
e.g. http://i.imgur.com/DAVqO.jpg
@Anonymous
Yes exactly. Once you add the meta tag, those monthly archive links will be removed (albeit not immediately) from search results.
^ great! thank you very much.
Excellent article. Seems i have been penalized by Google on account of having archives in search index. Now i have removed them..Hoping to see my traffic rise again :-) Earlier i associated this with Penguin/panda penalty.. little worries BTW
This is exactly the same, which I was looking.
Thanks for your tip.
To ensure proper display, HTML/XML/Javascript need to be escaped first using this escape tool. Then paste the escaped code here.
If your question is unrelated to this article, please post on our Facebook page.