Google Quality & Removal for Hidden Text


Deindexed!
Deindexed!

Recently a shop owner contacted us with an email from Google for a quality violation. The nature of this violation was spam, specifically hidden text. This website was NOT setup in Google’s webmaster tools, but Google managed to send an email to the site owners using normal contact addresses such as info and webmaster.

Exactly what hidden text is and why is it spam?

Hidden text can occur for many reasons, most are in no way malicious. Text can be same color text and background, hidden in html commenting, stuffed in to alts and titles and hidden with numerous CSS styles. All of the common findings of hidden text, malicious or not are a violation of Google’s quality guidelines for hidden text. Hidden text is simply text the search engines can see, but human visitors cannot.

In this particular case the text was injected by a third party through a software vulnerability. The email from Google contained a sample of the offending text and the procedure to recover. I will post the edited letter below…. But Google sent this letter with another message. Removal, on the same day this letter arrived the entire site (4k +) pages was completely removed from the index.

So the procedure is to secure the vulnerability, clean up and quarantine affected files check it thoroughly for ANY quality violation and request that Google review and reconsider indexing the site.

Subject: Removal from Google’s Index

Dear site owner or webmaster of examplesite.com,

While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/support/webmasters/bin/answer.py?answer=35769&hl=en. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.

The following is some example hidden text we found at http://examplesite.com/:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tatio.

In order to preserve the quality of our search engine, pages from examplesite.com are scheduled to be removed temporarily from our search results for at least 30 days.

We would prefer to keep your pages in Google’s index. If you wish to be reconsidered, please correct or remove all pages (may not be limited to the examples provided) that are outside our quality guidelines. One potential remedy is to contact your web host technical support for assistance. For more information about security for webmasters, see http://googlewebmastercentral.blogspot.com/2008/04/my-sites-been-hacked-now-what.html. When such changes have been made, please visit https://www.google.com/webmasters/tools/reconsideration?hl=en to learn more and submit your site for reconsideration.

Sincerely,
Google Search Quality Team

Note: if you have an account in Google’s Webmaster Tools, you can verify the authenticity of this message by logging into https://www.google.com/webmasters/tools/siteoverview?hl=en and going to the Message Center.

So what does this mean for you?

Is all of the software on your site and hosting current? Are you an easy target to be hacked?

You see, the cost of keeping your software up to date is part of the cost of doing business…. It is also inevitably cheaper than paying for the lost sales, cleanup and re inclusion request instead.

Many times when discussing the status of their software with site owners I am met with the “we cannot afford it” rebuttal… Ask yourself this… Can you afford all of this? Worse yet, can you afford to be fined tens of thousands of dollars by the credit card companies for failure to maintain PCI compliance?

Just like anything else…. You get back what you put in. If you never change your oil, your car will have much higher maintenance costs and downtime. This is just common business sense, right?

For those who wondered… Google located this and moved on it with full removal within 3 days of the hack!