<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Using the robots.txt File</title>
	<atom:link href="http://www.tech-evangelist.com/2008/10/18/robotstxt/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tech-evangelist.com/2008/10/18/robotstxt/</link>
	<description>Technical Articles, Musings and Opinions from Tech-Evangelist</description>
	<lastBuildDate>Wed, 21 Jul 2010 17:18:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
	<item>
		<title>By: Doogie</title>
		<link>http://www.tech-evangelist.com/2008/10/18/robotstxt/comment-page-1/#comment-11302</link>
		<dc:creator>Doogie</dc:creator>
		<pubDate>Wed, 17 Jun 2009 20:08:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.tech-evangelist.com/?p=281#comment-11302</guid>
		<description>Hi Joseph

You do not need to disallow anything other than directories that you do not want spiders to index. If there are no links in your site to a scripts directory and it does not appear in any URL found on your site, then you do not have to include that directory. The only way a spider will get into a directory is when someone provides a link to it or it appears in a path in a URL.  

One important rule it to never include directories or folders in the robots.txt file that reveals a secret area, such as the administration area of a site. If you want to hide a directory and you put it in the robots.txt file, anyone can easily view it. 

You are correct. Do not disallow the images directory if you want Google to index your images. I always block the images directory because too many people think that anything they find in Google Images is free to use. It is not. The overwhelming number of images in Google&#039;s index are copyrighted images taken from web sites without permission. I am a bit surprised that there hasn&#039;t been a legal challenge to that. However, the way to prevent it is to block the images directory.</description>
		<content:encoded><![CDATA[<p>Hi Joseph</p>
<p>You do not need to disallow anything other than directories that you do not want spiders to index. If there are no links in your site to a scripts directory and it does not appear in any URL found on your site, then you do not have to include that directory. The only way a spider will get into a directory is when someone provides a link to it or it appears in a path in a URL.  </p>
<p>One important rule it to never include directories or folders in the robots.txt file that reveals a secret area, such as the administration area of a site. If you want to hide a directory and you put it in the robots.txt file, anyone can easily view it. </p>
<p>You are correct. Do not disallow the images directory if you want Google to index your images. I always block the images directory because too many people think that anything they find in Google Images is free to use. It is not. The overwhelming number of images in Google&#8217;s index are copyrighted images taken from web sites without permission. I am a bit surprised that there hasn&#8217;t been a legal challenge to that. However, the way to prevent it is to block the images directory.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Carringer</title>
		<link>http://www.tech-evangelist.com/2008/10/18/robotstxt/comment-page-1/#comment-11297</link>
		<dc:creator>Joseph Carringer</dc:creator>
		<pubDate>Tue, 16 Jun 2009 15:20:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.tech-evangelist.com/?p=281#comment-11297</guid>
		<description>One follow up:

If I want my images to be indexed by google images I should not disallow my images, correct?</description>
		<content:encoded><![CDATA[<p>One follow up:</p>
<p>If I want my images to be indexed by google images I should not disallow my images, correct?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Carringer</title>
		<link>http://www.tech-evangelist.com/2008/10/18/robotstxt/comment-page-1/#comment-11296</link>
		<dc:creator>Joseph Carringer</dc:creator>
		<pubDate>Tue, 16 Jun 2009 15:18:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.tech-evangelist.com/?p=281#comment-11296</guid>
		<description>Hello,

Two questions:

1) What is a short list of recommended files or folders to disallow in robot.txt? ie. /images, /styles, /cgibin, /subdomians, /scripts etc

2) If you disallow subdomain site folders in the root directory of the main site folder will they still be accessible by the robots through their individual URL?

Thank you,

Joseph</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>Two questions:</p>
<p>1) What is a short list of recommended files or folders to disallow in robot.txt? ie. /images, /styles, /cgibin, /subdomians, /scripts etc</p>
<p>2) If you disallow subdomain site folders in the root directory of the main site folder will they still be accessible by the robots through their individual URL?</p>
<p>Thank you,</p>
<p>Joseph</p>
]]></content:encoded>
	</item>
</channel>
</rss>
