What Robots.txt is And Search Engine Robots Explained
You have a website and it has to be indexed by search engines so that users can find you through search engine search. This task is accomplished by search engine robots.
Search engine robot is also known as crawler or spider.
Search engine robots are also called search engine spiders, search engine crawlers, etc. These robots are human written software programs that can automatically and constantly visit millions of websites everyday and include what they find into search engine databases. This process is called crawling or spidering.
When a robot visits your site, the very first file it looks for is robots.txt file which should locate in your web root directory. That is the directory where your home page locates. For example, http://www.yoursite.com/robots.txt
Simply put, robots.txt gives you total control of
Using Robots.txt is not compulsory. If missing, search engine robots assume your entire site is allowed to be visited and indexed by any crawlers.
Search engine robots are automated crawling software that visit websites and travel the web via web links. Commercial robots follow The Robots Exclusion Protocol. Not all robots comply with the Protocol, but majorities do.
You can create robots.txt by using simple text editor such as Notepad. In this file, you control the following two things:
1. Which search engine robots can visit your site.
robots.txt code examples:
If robots.txt file does not exist or the file exists but has empty content, it indicates that all robots are allowed to access any part of the site.
Note: in the following syntax, anything after # is comments. Robots ignore that part.
More robots.txt tools and resources
Ban bad web robots
Not all search engine robots are good. Anyone with good programming skills in languages such as C++, and scripting languages such as Python or PHP can write a robot program.
Because of its low Barriers to Entry, there are hundreds of web crawlers out on the internet. Some of these robots are written to fetch and download web pages of a full website onto computer hard drives for the purpose of plagiarism. These robots sometimes do not even obey the rules you set up in robots.txt. These spambots could seriously eat up your website's monthly bandwidth usage.
If you are concerned about spambots, you can set up bad robots trap and then ban them.
Read the following two articles for some great information:
Copyright © 2014 GeeksEngine.com. All Rights Reserved.
This website is hosted by LunarPages.
No portion may be reproduced without my written permission. Software and hardware names mentioned on this site are registered trademarks of their respective companies. Should any right be infringed, it is totally unintentional. Drop me an email and I will promptly and gladly rectify it.