On March 22, 2011 Google was granted a new US patent. This patent named “Determining semantically distinct regions of a document” how Google may decide importance of web page structured containers based on location, commonality in web page containers and display. This patent may grant us some great insight in to the valuation of content in various positions and containers on our web pages.
Google grabs the page’s source to attempt to simulate a typical browser display. This act allows Google to fairly reasonably surmise the different page elements, their purpose and thus relative importance. So common areas such as header, footer, sidebars, ads, etc are undoubtedly given less weight/importance that the main content area of the page. One cool idea I would like to see come from this is that Google downgrades the duplication issues for these common areas such as header & footer. Taking these commonly site wide areas of duplication out of the duplication filter would allow us to use a highly usable and site wide template, which is shown consistently to be favored by users, without the need to “pump” up the main content unique page’s text in order to overcome the common template text parts. Wouldn’t that be awesome?
The new patent (7,913,163) lists several “modules” that are used to analyze a web document. These modules link analysis, text analysis, image captioning and snippet construction are all weighted relative to their position on the page. For example a footer link would carry less weight than a link in the main content, by the same rule a link in a paragraph in the main content would thus carry even more weight. Important search terms shoved the in footer or header will not help you nearly as much as the same terms in the main content area. Images surrounded by relevant text are by the same right easier to rank in image search as the text around the image in near proximity provides more weight.
Something else to consider is the use of proper containers throughout your content. A great example we see quite often is a product description that is a list or lacks proper paragraph tags to logically break the content in to separate thoughts. The issue here is that without a proper textual container such as a paragraph Google has little to generate a snippet from. Google doesn’t always our own Meta description, instead they pull a snippet from a container containing or relative the searcher’s query. Here is an example:
This product, Gadsden US Flag is searched for in many different terms. In the illustration below, you can see that effort was made not only to cover all they ways people search for this flag, but to include each one in a contain that