Increasing your crawl rate and crawl budget from Google is as simple as training GoogleBot. Crawl rate is simply how often GoogleBot visits your website to crawl. Crawl budget is a much harder metric to define, but in its simplest terms is the amount of data transfer Google allocates to crawling your site. These metrics are based on many things, but most effectively on the need to crawl. If you always have fresh content, Google needs to crawl more often.
Why would we seek to increase our crawl rate and crawl budget?
Very simply, a more frequent crawl rate and more in depth crawl budget will allow your pages to be indexed quicker and your site crawled more deeply. While neither of these things help you rank in a specific manner, they both attribute to your site’s trust and freshness score, which are part of the algorithm. Additionally, there is much information coming out recently with relationship to freshness of content and PageRank…. So even those “old time” PR watchers have reason to take notice.
Fact is, aside from the time needed to create fresh, unique content for you website, this is the easiest thing you can accomplish for your site’s SEO campaign. Creating content regularly is key for these metrics. Note that this is generally going to be new pages, which have the best effect. Freshening content is also important, but not as effective as a new page. Like the old saying, “If you’re not growing, you’re dying”.
So a great example is a client who pays for content for his blog. He buys and receives 30 blog posts in a batch…. He would most effectively make use of this content by releasing them over a longer period of time, instead of in one batch on his website. Again, this goes to creating the habit to crawl more frequently, thus training GoogleBot.
Many ask me what the magic number is…. How many pages a week should I create?
I suspect that this number is going to be a minimum of 1 to 5% of your total pages. Higher percentages are obviously more effective, but if you cannot keep it up on a regular schedule, then the effect is lost. You also do not want to create so much new content that Google has to sandbox you or you start to see huge fluctuations in your indexing and rank. I would think the high end of the number is 10% for this purpose. Remember, sometimes you will have more, like when you launch a new section on your site or have a huge influx of seasonal products for example, that’s okay…. Remember to continue at your regular rate, forever.
How does Google know I have new pages?
Believe it or not, Google knows. Google finds new pages from links, from crawl, from your sitemap and from social media platforms. There was an interesting answer to a sitemap question posted at Google Webmaster Help, where Googler John Mu responded as follows:
Google’s Sitemaps crawler usually reacts to the update frequency of your Sitemap files. If we find new content there every time we crawl, we may choose to crawl more frequently. If you can limit updates of the Sitemap files to daily (or whatever suits your sites best), that may help. Similarly, if you create a shared Sitemap file for these subdomains, that could help by limiting the number of requests we have to make for each subdomain — you could let us know about the Sitemap file by mentioning it in your robots.txt file using a “Sitemap:” directive (the Sitemap file does not have to be on the same host or domain as the site itself). If we’re generally crawling your sites too frequently, you can also set the crawl rate in Webmaster Tools for those sites.
So if you use an xml sitemap, make sure your crawl frequency and last modified stamps are used. If you have a blog on your website, Google can be prompted to visit more often with the pinging service. Once Google gets the link they come check it out, following links from the new post deeper in to your website.
The end result of training GoogleBot is a healthier site, stronger organic ranking ability, fast indexing of new pages (like 10 minutes for example!) and of course a better trust score from Google. Also worth noting that duplicate content and non-textual pages really do not help this “fresh” metric at all…. In fact they hurt your overall crawl rate. Why would GoogleBot crawl your duplicate content more frequently?