Showing posts from January, 2013

The Web Robots and The Robot Exclusion Standard

The Web Robots

Web Robots also known as Web Wanderers, Crawlers, or Spiders are programs that traverse the Web automatically. Search engines such as Google and Bing a use web spiders, also known as robots, to create the indexes for their search databases.  These robots transverse HTML trees by loading pages and following hyperlinks, and they report the text and/or meta-tag information to create search indexes.  ROBOTS.TXT, a file that spiders look in for information on how the site is to be cataloged.  It is a ASCII text file that sits in the document root of the server.  It defines what documents and/or directories that confirming spiders are forbidden to index.

In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In o…