Zen Cart robots.txt Tutorial – E-Commerce for All

Zen Cart robots.txt

Is there a robots.txt with Zen Cart?

Do you have a default robots.txt?

The answer is no… But here is why. Every cart is different, the very nature of Zen Cart’s flexibility prevents the ability to create a robots.txt that can be applied to even most carts. Things like easy pages and additional pages and installation paths make the process basically a custom procedure for each and every shop. The robots protocol isn’t really difficult, as a matter fact it’s quite logical once you get to know it a bit.

First, lets cover the basic function and ability of a robots.txt.

A robots.txt is a text document called robots.txt (exactly), which resides in the root of the domain. You can actually hide your robots.txt from basic Internet users by using a server side application such as Apache to deliver only qualified user agents access to the location and name of the robots directive on your site. This however, is really more work than necessary as password protect directories, such as login admin areas cannot be indexed unless there is a crawl able link to them somewhere.

robots.txt is an exclusion protocol for well behaved web spider or robots to access in order to receive directives regarding where they may and may not crawl your site’s pages. Blocking a page(s) from crawl in robots.txt WILL NOT prevent them from being indexed or referenced in the search engines results…. It only blocks them from being crawled.

I will cover the most common uses of robots.txt and their directives below for your Zen Cart.

First to cover is that there is NO Allow directive in robots.txt, this is in fact the default behavior… So only Disallow is valid. So if I have a page foo.html which I wish not to be crawled by any behaved crawlers, then the syntax will be as follows:

User-agent: *
Disallow: /foo.html

If I wish that a specific crawler, such as GoogleBot not crawl this page, then the syntax is as follows:

User-agent: Googlebot
Disallow: /foo.html

You can view a list of crawlers in the robots database here. But for reference her are your most common ones.

So we know how to block a single page from crawl, how about a page in a directory… But not the whole directory.

User-agent: *
Disallow: /folder/foo.html

All pages within the directory.

User-agent: *
Disallow: /folder/

All pages with folder in them blocked from crawl.

User-agent: *
Disallow: /folder

All pages within a directory can also be accomplished with a wildcard. However, not all crawlers support wildcards. Google does, Yahoo does… Ask.com does not and although Bing claims to have partial support for wildcards…. They don’t. So we must first name a support crawler to apply the directive with a wildcard.

User-agent: Googlebot
Disallow: /folder/*

The wildcard give us some greater control and ability with regard to robots.txt directives… Especially on our dynamically generated Zen Cart pages. The example below blocks a duplicate page from crawling in your Zen Cart shop.

User-agent: Googlebot
Disallow: /*&alpha_filter_id=

This is to block a URI created by the alpha sorter in your product index pages. Essentially, every URL containing the parameter &alpha_filter_id= is blocked from crawl for Googlebot.

Lastly, the Universal Sitemap Protocol is supported by all major search engines and should be included. This is simply the correct way to let the search engines know where your search engine sitemap resides.

Sitemap: http://pro-webs.net/sitemap.xml

Against my better judgment I am going to post what some may consider a generic/default robots.txt for standard Zen Carts installed in to the root of your domain. I strongly suggest you read and learn to properly use the robots protocol and directives instead. I will take no responsibility for this, as it is a suggestion and you have been warned that every Zen Cart is different. Additionally, this WILL NOT work at all if you have rewritten your page’s urls.

# Robots.txt file for http://www.domain.com/

Sitemap: http://www.domain.com/sitemap.xml

User-agent: Googlebot
Disallow: /*&action=notify$
Disallow: /*&number_of_uploads=0&action=notify
Disallow: /index.php?main_page=discount_coupon
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=shippinginfo
Disallow: /index.php?main_page=privacy
Disallow: /index.php?main_page=conditions
Disallow: /index.php?main_page=contact_us
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=unsubscribe
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=product_reviews_write&cPath=
Disallow: /index.php?main_page=tell_a_friend&products_id=
Disallow: /index.php?main_page=product_reviews_write&products_id=
Disallow: /index.php?main_page=popup_shipping_estimator
Disallow: /index.php?main_page=account
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=checkout_shipping_address
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=gv_faq
Disallow: /gv_faq.html?faq_item=
Disallow: /*&sort=
Disallow: /*alpha_filter_id=
Disallow: /*&disp_order=

User-agent: Slurp
Disallow: /*&action=notify$
Disallow: /*&number_of_uploads=0&action=notify
Disallow: /index.php?main_page=discount_coupon
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=shippinginfo
Disallow: /index.php?main_page=privacy
Disallow: /index.php?main_page=conditions
Disallow: /index.php?main_page=contact_us
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=unsubscribe
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=product_reviews_write&cPath=
Disallow: /index.php?main_page=tell_a_friend&products_id=
Disallow: /index.php?main_page=product_reviews_write&products_id=
Disallow: /index.php?main_page=popup_shipping_estimator
Disallow: /index.php?main_page=account
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=checkout_shipping_address
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=gv_faq
Disallow: /gv_faq.html?faq_item=
Disallow: /*&sort=
Disallow: /*alpha_filter_id=
Disallow: /*&disp_order=

User-agent: *
Disallow: /index.php?main_page=faqs_new
Disallow: /index.php?main_page=discount_coupon
Disallow: /index.php?main_page=checkout_shipping
Disallow: /index.php?main_page=shippinginfo
Disallow: /index.php?main_page=privacy
Disallow: /index.php?main_page=conditions
Disallow: /index.php?main_page=contact_us
Disallow: /index.php?main_page=advanced_search
Disallow: /index.php?main_page=login
Disallow: /index.php?main_page=unsubscribe
Disallow: /index.php?main_page=shopping_cart
Disallow: /index.php?main_page=popup_shipping_estimator
Disallow: /index.php?main_page=account
Disallow: /index.php?main_page=password_forgotten
Disallow: /index.php?main_page=checkout_shipping_address
Disallow: /index.php?main_page=logoff
Disallow: /index.php?main_page=gv_faq
Disallow: /gv_faq.html?faq_item=1
Disallow: /gv_faq.html?faq_item=2
Disallow: /gv_faq.html?faq_item=3
Disallow: /gv_faq.html?faq_item=4
Disallow: /gv_faq.html?faq_item=5

Once again, every Zen Cart has other pages that should be blocked from crawl, and many have pages within these blocked pages that should be allowed. This is best used as a guide, once you have learned to use the Robots Exclusion Protocol correctly.

2 responses to “Zen Cart robots.txt Tutorial”

Anatomy of a Proper Search Engine Sitemap | E-Commerce for All says:

March 18, 2010 at 3:33 am

[…] to Google, Bing and Yahoo in their webmasters tools … Or don’t and just add it to your robots.txt. Either way is fine. You do not need to keep resubmitting your sitemap to the search engines, once […]
Steven Gray says:

June 16, 2010 at 11:43 am

Great info Melanie, Thanks for sharing, it’s just what i was looking for!