search engine marketing
Home Services Portfolio Articles About Us Contact Us

Web Design Services

Web Design - Redesign

CMS Development

Website Maintenance

Graphic Designing

Flash Presentation

SEO Services

Keyword Research

URL Naming

Title, Description, Keywords tags

Link Building

Web Analytics

PPC Management

Keyword Research

Target Audiences

Geo-targeting

Placement Targeting

Detailed Analysis & Reporting

Robots Txt Tutorial


Robots.txt is the standard (Robots Exclusion Protocol) that instructs Web Crawlers which file/directory should not be crawled. It is a text file placed in the root directory with a special name robots.txt. Almost every search engine spiders or crawlers look for this file and follows the instructions entered in this file. If this file is not present in the root directory or left blank, search engine crawler assumes every link is allowed to be downloaded and indexed. Special format is used in making this file. Special instructions can be given for specific crawler or default settings can be made for all crawlers.

User-agent

Search engine crawler name is specified in this field. For example if you want to give special instructions to Google search engine crawler "googlebot", it is done as follows:

User-agent: googlebot

To make a default entry for all robots, wildcard character "*" can be used as follows:

User-agent: *

Disallow

This field directs crawler which file/directory is not to be indexed. Every User-agent can have one or more Disallow fields in separate lines. For example if you want crawler not to index restricted.html file it is done as follows:

Disallow: restricted.html

If this file is not in a root directory but other directory called "private", you can make following entry:

Disallow: /private/restricted.html

If you want to disallow whole directory, you can do so as follow:

Disallow: /private/

If Disallow: /private is used, /private.html and /private/restricted.html will be restricted.

If Disallow: is left blank, all the files can be indexed. If you want to restrict whole web site Disallow: / is used.

Some Examples

The following example specifies that no robots should visit and download any file in the website:

User-agent: *
Disallow: /

The following example specifies that every robot can download every file on the website:

User-agent: *
Disallow:

The following example specifies that no robots should index /private/restricted.html file, temp directory and restricted.html file on the website:

User-agent: *
Disallow: /private/restricted.html
Disallow: /temp/
Disallow: /restricted.html

The following example specifies that one robot is restricted from indexing and rest are permitted to index whole site:

User-agent: cybercracker
Disallow: /

User-agent: *
Disallow: