What is robot.txt file? A robots.txt file is the first file that a search engine robot visits your website. As a snooty nightclub bouncer with a velvet rope, the robots.txt file decides that robots are welcome and need to switch to the less-exclusive joint down the street. Robots.txt can accept or reject robot on the whole site, directory-by-directory or page-based bypage.
Do you need robot.txt file?
Sometimes you do not need this one. Without robot.txt file, all robot will have access to crawl all pages on your site. To decide if you need robot.txt file or not you have to answer this question first.
• Are there any pages or directories on my site that I do not want listed on the search engines, such as an intranet or internal phone list?
• Are there any specific search engines that I do not want to display my site?
• Do I know of any dynamic pages or programming features that might cause problems for spiders, like getting caught in a loop (infinitely bouncing between two pages)?
• Does my website contain pages with duplicate content?
• Are there directories on the site that contain programming scripts only, not viewable pages?
If the answer for these question is yes, you should make it ones…
How to make robot.txt?
If you decide to make robot.txt and looking for sample, you can find it here. Usually. robot.txt looks something like this.
User-agent: googlebot Disallow: /private-files/ Disallow: /more-private-files/ User-agent: * Disallow: /cgi-scripts/
In this example, the spider Google (called Googlebot), is excluded from the index files in two directories called private files, and more specific files, and all the robots (designated Wild-Card with an asterisk *) are excluded from the index directory called CGI-scripts.
Here’s a bonus: Robots.txt can also be used to tell search engines where your XML Sitemap.
If you use Drupal, you can fix robot.txt file like this one:
1. Make a backup of the robots.txt file.
2. Open the robots.txt file for editing. If necessary, download the file and open it in a local text editor.
3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you’ve turned on clean URLs or not.
Drupal covers you either way. They look like this:
# Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/
4. Duplicate the two sections (simply copy and paste them) so that you have four sections—two of the # Paths (clean URLs) sections and two of # Paths (no clean URLs) sections.
5. Add ‘fixed!’ to the comment of the new sections so that you can tell them apart.
6. Delete the trailing / after each Disallow line in the fixed! sections. You should end up with four sections that look like this:
# Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/ # Paths (clean URLs) – fixed! Disallow: /admin Disallow: /comment/reply Disallow: /contact Disallow: /logout Disallow: /node/add Disallow: /search Disallow: /user/register Disallow: /user/password Disallow: /user/login # Paths (no clean URLs) – fixed! Disallow: /?q=admin Disallow: /?q=comment/reply Disallow: /?q=contact Disallow: /?q=logout Disallow: /?q=node/add Disallow: /?q=search Disallow: /?q=user/password Disallow: /?q=user/register Disallow: /?q=user/login
7. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn’t you?).
Robot meta tag
You may want to use the robots meta tag or robots.txt file because it is easier to lay out your web page template and not using a robots.txt file, or maybe you just want to make a short, temporary exclusion. Another possible cause is that you do not have access to the root directory of the site.
To exclude the robots from a page using the robots meta tag, simply include the following code in the HTML head of the page:
<meta name=”robots” content=”noindex, nofollow”>
This will prevent search engine robots from listing the page on which the tag resides.
















Leave Your Comments Below