robots.txt for Zen Cart

Zen Cart robots.txt

Is there a robots.txt with Zen Cart?

Do you have a default robots.txt?

The answer is no… But here is why.  Every cart is different, the very nature of Zen Cart’s flexibility prevents the ability to create a robots.txt that can be applied to even most carts. Things like easy pages and additional pages and installation paths make the process basically a custom procedure for each and every shop. The robots protocol isn’t really difficult, as a matter fact it’s quite logical once you get to know it a bit.

First, lets cover the basic function and ability of a robots.txt.

A robots.txt is a text document called robots.txt (exactly), which resides in the root of the domain. You can actually hide your robots.txt from basic Internet users by using a server side application such as Apache to deliver only qualified user agents access to the location and name of the robots directive on your site. This however, is really more work than necessary as password protect directories, such as login admin areas cannot be indexed unless there is a crawl able link to them somewhere.

robots.txt is an exclusion protocol for well behaved web spider or robots to access in order to receive directives regarding where they may and may not crawl your site’s pages. Blocking a page(s) from crawl in robots.txt WILL NOT prevent them from being indexed or referenced in the search engines results…. It only blocks them from being crawled.

I will cover the most common uses of robots.txt and their directives below for your Zen Cart.

First to cover is that there is NO Allow directive in robots.txt, this is in fact the default behavior… So only Disallow is valid. So if I have a page foo.html which I wish not to be crawled by any behaved crawlers, then the syntax will be as follows:

User-agent: *
Disallow: /foo.html

If I wish that a specific crawler, such as GoogleBot not crawl this page, then the syntax is as follows:

User-agent: Googlebot
Disallow: /foo.html

You can view a  list of crawlers in the robots database here. But for reference her are your most common ones.

So we know how to block a single page from crawl, how about a page in a directory…  But not the whole directory.

User-agent: *
Disallow: /folder/foo.html

All pages within the directory.

User-agent: *
Disallow: /folder/

All pages with folder in them blocked from crawl.

User-agent: *
Disallow: /folder

All pages within a directory can also be accomplish