![]() |
|
|
ArticlesGoogle Adwords Advertising Search Engines Web Directories About Meta Tags About Robots.txt Web Statistics | ||
Robots Txt TutorialRobots.txt is the standard (Robots Exclusion Protocol) that instructs Web Crawlers which file/directory should not be crawled. It is a text file placed in the root directory with a special name robots.txt. Almost every search engine spiders or crawlers look for this file and follows the instructions entered in this file. If this file is not present in the root directory or left blank, search engine crawler assumes every link is allowed to be downloaded and indexed. Special format is used in making this file. Special instructions can be given for specific crawler or default settings can be made for all crawlers. User-agentSearch engine crawler name is specified in this field. For example if you want to give special instructions to Google search engine crawler "googlebot", it is done as follows:User-agent: googlebot To make a default entry for all robots, wildcard character "*" can be used as follows: User-agent: * DisallowThis field directs crawler which file/directory is not to be indexed. Every User-agent can have one or more Disallow fields in separate lines. For example if you want crawler not to index restricted.html file it is done as follows:Disallow: restricted.html If this file is not in a root directory but other directory called "private", you can make following entry: Disallow: /private/restricted.html If you want to disallow whole directory, you can do so as follow: Disallow: /private/ If Disallow: /private is used, /private.html and /private/restricted.html will be restricted. If Disallow: is left blank, all the files can be indexed. If you want to restrict whole web site Disallow: / is used. Some ExamplesThe following example specifies that no robots should visit and download any file in the website:User-agent: * The following example specifies that every robot can download every file on the website: User-agent: * The following example specifies that no robots should index /private/restricted.html file, temp directory and restricted.html file on the website: User-agent: * The following example specifies that one robot is restricted from indexing and rest are permitted to index whole site: User-agent: cybercracker User-agent: * |