Crawling a Magento site and robots.txt

28th January, 2011 - Less than a minute read

If you have ever needed to restrict search engine access to web pages on a Magento site (and we have), but are unsure how best to do it, then look no further than this post from Ecommerce Web Design.

It shows in detail which of the Magento folders to restrict but more importantly which of the paramatised querystring URLs to restrict. Magento has a large number of internal links due to its searching functionality and the paging links on search results. These can cause duplication problems for meta data and page titles.

Preventing the crawl bots from indexing these URLs with the robots.txt files helps to prevent this duplication and crawl errors flagged in your Webmaster Tools consoles.