June 19, 2008
10:06 pm
In order to show you the most relevant results, we have omitted some entries
So have you seen this ugly message in your Google site command results?
In order to show you the most relevant results, we have omitted some entries very similar to the ### already displayed.
If you like, you can repeat the search with the omitted results included.
What exactly does this daunting phrase mean? I think the key words here are “relevant” and “similar”. Google expressly wishes to display highly relevant and unique pages in their search results. This blog post from Google is a MUST read to help you understand duplicate content.
This is a 2 part problem… The first part is fairly simple, if the page has no value to a search user then block it from indexing. Pages in your store such as shipping, conditions, contact pages etc present no end value to a searcher and thus should not be allowed to be indexed. As with anything else there are a few ways to accomplish this, with the dynamic nature of most e-commerce shops blocking them in your robots.txt is both easy and highly effective.
Part 2… Duplication is far more complicated as the duplicate content can come from a variety of sources. The first step is to understand what duplicate content is and why Google finds it less than useful to the main index.
Duplicate Content: 2 pages within the search engine’s index which are substantially similar in their content.
You see there are in fact many different ways to end up with duplicate content… Here are some common ones we see in the shops.
- Using manufacturers descriptions which are likely to be indexed already on other sites.
- Allowing your session IDs to be crawled and thus indexed.
- Creating a different product page for different styles/colors/etc of products and using the same text for the most part.
- Canonical duplication and lack of a proper default index page.
- Not creating enough product/category unique copy within the template to allow them to be unique from each other.
The problem with duplicate content is that it puts Google in the drivers seat instead of you. When Google determines a page is too duplicate they will make a decisions from the available versions as to which one is most appropriate. There is much theory and discussion as to how much duplicate content causes a page to be dumped from the main search results. I honestly think this number is around 25-30% duplicate.
The duplication is for the WHOLE page, not just the part you are entering for a product description for example. Let’s say for example you are creating a new product and you have little or no textual content specific to that product… This page is VERY likely to be duplicate because obviously your template is pretty much the same on a per page basis. You need to have enough page specific or relevant text to set it apart, not only to prevent duplication, but to help it rank better as well.
Using straight supplier or manufacturer’s descriptions is duplicate in the same manner… Remember Google is not just comparing your pages against your own pages for quality and uniqueness. The best way to avoid this is to take a minute and write some good copy.
Read Full Post
Go straight to Post Page

