{"id":764,"date":"2009-12-14T10:12:02","date_gmt":"2009-12-14T14:12:02","guid":{"rendered":"http:\/\/pro-webs.net\/blog\/?p=764"},"modified":"2010-10-05T06:33:55","modified_gmt":"2010-10-05T10:33:55","slug":"zen-cart-robots-txt-tutorial","status":"publish","type":"post","link":"https:\/\/pro-webs.net\/blog\/2009\/12\/14\/zen-cart-robots-txt-tutorial\/","title":{"rendered":"Zen Cart robots.txt Tutorial"},"content":{"rendered":"<p><strong> <\/strong><\/p>\n<figure id=\"attachment_771\" aria-describedby=\"caption-attachment-771\" style=\"width: 135px\" class=\"wp-caption alignleft\"><strong> <\/strong><strong><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-771 \" title=\"robots-txt\" src=\"http:\/\/pro-webs.net\/blog\/wp-content\/uploads\/2009\/12\/robots-txt.gif\" alt=\"robots.txt for Zen Cart\" width=\"135\" height=\"143\" \/><\/strong><figcaption id=\"caption-attachment-771\" class=\"wp-caption-text\">Zen Cart robots.txt<\/figcaption><\/figure>\n<p><strong>Is there a robots.txt with Zen Cart?<\/strong><\/p>\n<p><strong>Do you have a default robots.txt?<\/strong><\/p>\n<p>The answer is no&#8230; But here is why.\u00a0 Every cart is different, the very nature of Zen Cart&#8217;s flexibility prevents the ability to create a robots.txt that can be applied to even most carts. Things like easy pages and additional pages and installation paths make the process basically a custom procedure for each and every shop. The robots protocol isn&#8217;t really difficult, as a matter fact it&#8217;s quite logical once you get to know it a bit.<\/p>\n<h2><strong>First, lets cover the basic function and ability of a robots.txt. <\/strong><\/h2>\n<p>A robots.txt is a text document called robots.txt (exactly), which resides in the root of the domain. You can actually hide your robots.txt from basic Internet users by using a server side application such as Apache to deliver only qualified user agents access to the location and name of the robots directive on your site. This however, is really more work than necessary as password protect directories, such as login admin areas cannot be indexed unless there is a crawl able link to them somewhere.<\/p>\n<p>robots.txt is an exclusion protocol for well behaved web spider or robots to access in order to receive directives regarding where they may and may not crawl your site&#8217;s pages. Blocking a page(s) from crawl in robots.txt WILL NOT prevent them from being indexed or referenced in the search engines results&#8230;. It only blocks them from being crawled.<\/p>\n<p>I will cover the most common uses of robots.txt and their directives below for your Zen Cart.<\/p>\n<p>First to cover is that there is NO Allow directive in robots.txt, this is in fact the default behavior&#8230; So only Disallow is valid. So if I have a page foo.html which I wish not to be crawled by any behaved crawlers, then the syntax will be as follows:<\/p>\n<p><em>User-agent: *<br \/>\nDisallow: \/foo.html<\/em><\/p>\n<p>If I wish that a specific crawler, such as GoogleBot not crawl this page, then the syntax is as follows:<\/p>\n<p><em>User-agent: Googlebot<br \/>\nDisallow: \/foo.html<\/em><\/p>\n<p>You can view a\u00a0 list of crawlers in the robots database <a title=\"Robots Databse of Crawlers\" href=\"http:\/\/www.robotstxt.org\/db.html\" target=\"_blank\">here<\/a>. But for reference her are your most common ones.<\/p>\n<ul>\n<li><a title=\"GoogleBot crawler info\" href=\"http:\/\/www.google.com\/support\/webmasters\/bin\/answer.py?answer=80553\" target=\"_blank\">Google Search is <strong>Googlebot<\/strong><\/a><\/li>\n<li><a title=\"Teoma Crawler\" href=\"http:\/\/about.ask.com\/en\/docs\/about\/webmasters.shtml\" target=\"_blank\">Ask.com is <strong>Teoma<\/strong><\/a><\/li>\n<li><a title=\"Bing Crawler Info\" href=\"http:\/\/help.live.com\/help.aspx?mkt=en-us&amp;project=wl_webmasters\" target=\"_blank\">Bing is <strong>MSNBot<\/strong><\/a><\/li>\n<li><a title=\"Slurp Crawler information\" href=\"http:\/\/help.yahoo.com\/l\/us\/yahoo\/search\/webcrawler\/\" target=\"_blank\">Yahoo is <strong>Slurp<\/strong><\/a><\/li>\n<\/ul>\n<p><strong>So we know how to block a single page from crawl, how about a page in a directory&#8230;\u00a0 But not the whole directory.<\/strong><\/p>\n<p><em>User-agent: *<\/em><br \/>\n<em>Disallow: \/folder\/foo.html<\/em><\/p>\n<p><strong>All pages within the directory.<\/strong><\/p>\n<p><em>User-agent: *<\/em><br \/>\n<em>Disallow: \/folder\/<\/em><\/p>\n<p><strong>All pages with folder in them blocked from crawl.<\/strong><\/p>\n<p><em>User-agent: *<\/em><br \/>\n<em>Disallow: \/folder<\/em><\/p>\n<p>All pages within a directory can also be accomplished with a wildcard. However, not all crawlers support wildcards. Google does, Yahoo does&#8230; Ask.com does not and although Bing claims to have partial support for wildcards&#8230;. They don&#8217;t. So we must first name a support crawler to apply the directive with a wildcard.<\/p>\n<p><em>User-agent: Googlebot<br \/>\nDisallow: \/folder\/*<\/em><\/p>\n<p>The wildcard give us some greater control and ability with regard to robots.txt directives&#8230; Especially on our dynamically generated Zen Cart pages. The example below blocks a duplicate page from crawling in your Zen Cart shop.<\/p>\n<p><em>User-agent: Googlebot<br \/>\nDisallow: \/*&amp;alpha_filter_id=<\/em><\/p>\n<p>This is to block a URI created by the alpha sorter in your product index pages. Essentially, every URL containing the parameter<em> &amp;alpha_filter_id=<\/em> is blocked from crawl for Googlebot.<\/p>\n<p>Lastly, the <a title=\"Universal Sitemap Protocol\" href=\"http:\/\/www.seocog.com\/universal-sitemap-protocol\/\" target=\"_blank\">Universal Sitemap Protocol<\/a> is supported by all major search engines and should be included. This is simply the correct way to let the search engines know where your search engine sitemap resides.<\/p>\n<p><em>Sitemap: http:\/\/pro-webs.net\/sitemap.xml<\/em><\/p>\n<p>Against my better judgment I am going to post what some may consider a generic\/default robots.txt for standard Zen Carts installed in to the root of your domain. I strongly suggest you read and learn to properly use the robots protocol and directives instead. I will take no responsibility for this, as it is a suggestion and you have been warned that every Zen Cart is different. Additionally, this WILL NOT work at all if you have rewritten your page&#8217;s urls.<\/p>\n<blockquote><p># Robots.txt file for http:\/\/www.domain.com\/<\/p>\n<p>Sitemap: http:\/\/www.domain.com\/sitemap.xml<\/p>\n<p>User-agent: Googlebot<br \/>\nDisallow: \/*&amp;action=notify$<br \/>\nDisallow: \/*&amp;number_of_uploads=0&amp;action=notify<br \/>\nDisallow: \/index.php?main_page=discount_coupon<br \/>\nDisallow: \/index.php?main_page=checkout_shipping<br \/>\nDisallow: \/index.php?main_page=shippinginfo<br \/>\nDisallow: \/index.php?main_page=privacy<br \/>\nDisallow: \/index.php?main_page=conditions<br \/>\nDisallow: \/index.php?main_page=contact_us<br \/>\nDisallow: \/index.php?main_page=advanced_search<br \/>\nDisallow: \/index.php?main_page=login<br \/>\nDisallow: \/index.php?main_page=unsubscribe<br \/>\nDisallow: \/index.php?main_page=shopping_cart<br \/>\nDisallow: \/index.php?main_page=product_reviews_write&amp;cPath=<br \/>\nDisallow: \/index.php?main_page=tell_a_friend&amp;products_id=<br \/>\nDisallow: \/index.php?main_page=product_reviews_write&amp;products_id=<br \/>\nDisallow: \/index.php?main_page=popup_shipping_estimator<br \/>\nDisallow: \/index.php?main_page=account<br \/>\nDisallow: \/index.php?main_page=password_forgotten<br \/>\nDisallow: \/index.php?main_page=checkout_shipping_address<br \/>\nDisallow: \/index.php?main_page=logoff<br \/>\nDisallow: \/index.php?main_page=gv_faq<br \/>\nDisallow: \/gv_faq.html?faq_item=<br \/>\nDisallow: \/*&amp;sort=<br \/>\nDisallow: \/*alpha_filter_id=<br \/>\nDisallow: \/*&amp;disp_order=<\/p>\n<p>User-agent: Slurp<br \/>\nDisallow: \/*&amp;action=notify$<br \/>\nDisallow: \/*&amp;number_of_uploads=0&amp;action=notify<br \/>\nDisallow: \/index.php?main_page=discount_coupon<br \/>\nDisallow: \/index.php?main_page=checkout_shipping<br \/>\nDisallow: \/index.php?main_page=shippinginfo<br \/>\nDisallow: \/index.php?main_page=privacy<br \/>\nDisallow: \/index.php?main_page=conditions<br \/>\nDisallow: \/index.php?main_page=contact_us<br \/>\nDisallow: \/index.php?main_page=advanced_search<br \/>\nDisallow: \/index.php?main_page=login<br \/>\nDisallow: \/index.php?main_page=unsubscribe<br \/>\nDisallow: \/index.php?main_page=shopping_cart<br \/>\nDisallow: \/index.php?main_page=product_reviews_write&amp;cPath=<br \/>\nDisallow: \/index.php?main_page=tell_a_friend&amp;products_id=<br \/>\nDisallow: \/index.php?main_page=product_reviews_write&amp;products_id=<br \/>\nDisallow: \/index.php?main_page=popup_shipping_estimator<br \/>\nDisallow: \/index.php?main_page=account<br \/>\nDisallow: \/index.php?main_page=password_forgotten<br \/>\nDisallow: \/index.php?main_page=checkout_shipping_address<br \/>\nDisallow: \/index.php?main_page=logoff<br \/>\nDisallow: \/index.php?main_page=gv_faq<br \/>\nDisallow: \/gv_faq.html?faq_item=<br \/>\nDisallow: \/*&amp;sort=<br \/>\nDisallow: \/*alpha_filter_id=<br \/>\nDisallow: \/*&amp;disp_order=<\/p>\n<p>User-agent: *<br \/>\nDisallow: \/index.php?main_page=faqs_new<br \/>\nDisallow: \/index.php?main_page=discount_coupon<br \/>\nDisallow: \/index.php?main_page=checkout_shipping<br \/>\nDisallow: \/index.php?main_page=shippinginfo<br \/>\nDisallow: \/index.php?main_page=privacy<br \/>\nDisallow: \/index.php?main_page=conditions<br \/>\nDisallow: \/index.php?main_page=contact_us<br \/>\nDisallow: \/index.php?main_page=advanced_search<br \/>\nDisallow: \/index.php?main_page=login<br \/>\nDisallow: \/index.php?main_page=unsubscribe<br \/>\nDisallow: \/index.php?main_page=shopping_cart<br \/>\nDisallow: \/index.php?main_page=popup_shipping_estimator<br \/>\nDisallow: \/index.php?main_page=account<br \/>\nDisallow: \/index.php?main_page=password_forgotten<br \/>\nDisallow: \/index.php?main_page=checkout_shipping_address<br \/>\nDisallow: \/index.php?main_page=logoff<br \/>\nDisallow: \/index.php?main_page=gv_faq<br \/>\nDisallow: \/gv_faq.html?faq_item=1<br \/>\nDisallow: \/gv_faq.html?faq_item=2<br \/>\nDisallow: \/gv_faq.html?faq_item=3<br \/>\nDisallow: \/gv_faq.html?faq_item=4<br \/>\nDisallow: \/gv_faq.html?faq_item=5<\/p><\/blockquote>\n<p>Once again, every Zen Cart has other pages that should be blocked from crawl, and many have pages within these blocked pages that should be allowed. This is best used as a guide, once you have learned to use the <a title=\"Robots Exclusion Protocol\" href=\"http:\/\/www.robotstxt.org\/robotstxt.html\" target=\"_blank\">Robots Exclusion Protocol<\/a> correctly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am going to post what some may consider a generic\/default robots.txt for standard Zen Carts installed in to the root of your domain. I strongly suggest you read and learn to properly use the robots protocol and directives instead. I will take no responsibility for this, as it is a suggestion and you have been warned that every Zen Cart is different. Additionally, this WILL NOT work at all if you have rewritten your page&#8217;s urls.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[1442,1433,1447,689,1431,1445,1430,1441,1438,2626],"class_list":["post-764","post","type-post","status-publish","format-standard","hentry","category-ecommerce-seo","tag-crawler","tag-exclusion-protocol","tag-robots-exclusion-protocol","tag-robots-txt","tag-search-engines-results","tag-syntax","tag-uses-of-robots","tag-web-robots","tag-web-spider","tag-zen-cart"],"_links":{"self":[{"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/posts\/764","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/comments?post=764"}],"version-history":[{"count":0,"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/posts\/764\/revisions"}],"wp:attachment":[{"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/media?parent=764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/categories?post=764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pro-webs.net\/blog\/wp-json\/wp\/v2\/tags?post=764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}