Google just added new features to Webmaster Tools that allow a web site owner to better control the way that a site is crawled by Google’s search engine spiders. The examples that they show in the video are very real problems for a large number of e-commerce sites, as well as many older web sites.
The issue is that multiple variations of a URL can represent a single page in many web sites — especially in e-commerce sites. This can cause each variation to be crawled and recorded separately in search engine indexes (their databases). Multiple URLs not only make it harder for search engines to crawl sites, they also create a problem with duplicate content. Duplicate content can create ranking problems.
Not all web sites experience a problem with multiple URLs representing the same page. For example, a properly configured WordPress site will not allow this to happen. However, many older web sites and custom-built e-commerce sites have a very serious problem that can prevent the sites from ranking as well as they should in search results pages.
With search engine optimization, the rule for dealing with URLs is, “One and only one URL should represent a page. No exceptions.” That means no variations with or without the www subdomain, no variations with different mixtures of upper and lower case characters, no variations with different query strings (name-value pairs) tagged to the end of URLs, etc. We most often see problems with URL variations on sites running on Microsoft web servers, because those servers allow different mixes of upper and lower case characters in URLs to represent a page, while Linux and Unix servers do not allow this.
Sometimes this cannot be easily controlled with the system used to build a web site. If you have a site with say perhaps 100 pages, and you see many hundreds or thousands of URLs when you use the “site” query command in Google, the web site has a problem. A site query indicates how many URLs are in a search engine index and works for most major search engines.
To use the site query, simply enter “site:” in a Google search box, followed by a site’s domain name with no spaces in between. The following is an example. Just substitute your site’s domain name.
site:mydomainname.com
If you have not signed up for a free Google Webmaster Tools account, it would be a good idea to do so. Webmaster Tools will give you useful insight regarding crawling issues that Google’s GoogleBot spider experiences with your site. It also provides a list of web pages for other sites that link to your site.