Website Designers/Search Engine Marketers are asking the question, “Do I really need a robots.txt file?” The simple answer is yes. And this is why.
Robots.txt will improves your search engine positioning and it also provides robots with information about which directories, or folders, and files that you do not want search engine robots to index. Lets say you are a Medical Professional with a newly created website/domain. Your website designer integrated your patient files into your website so you can keep track of your patients while you are away from the office. You don’t want GoogleBot to crawl those sensitive files because that would be a violation of HIPPA.
There are plenty of scenarios like the one I just explained. And I think you get the idea.
Anyway, a robots.txt file can also point robots to your sitemap by using a simple command.
Basically, when a robot, like GoogleBot crawls your website, it is looking for a file called Robots.txt. And if the robot cannot find your robots.txt file it is going to automatically assume that it has full access to your entire site, therefore indexing everything it finds. 404 errors can also be generated unnecessarily if you don’t have a robots.txt file.
This simple step by step guide will show you how to create a robots.txt file assuming you want GoogleBot and other web crawling robots to index your entire site…but it will also go over the “Disallow” command so you can weed out directories you don’t want indexed/crawled.
- First, create a “new text file” on your desktop named, robots.txt
- You will want to edit the file to look like this:
- User-agent: *
- Disallow:
Save the file and then upload it to your root directory on your website.
This file just told all robots/crawlers to not index any part of your website!
Oh no, why did we do that? Well because by default if you don’t have a robots.txt file then all of your domain is indexed!
So how do I make it not index certain parts of my site?
Thats easy!
Your robots.txt file will look similar to something like this:
User-Agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~private/
If you notice, you just add the phrase “Disallow: /directory/” on a line, where “/directory/” directory is the name of the directory you don’t want indexed.
Its that simple, and it will help with your search engine rankings!
If you need help, especially in creating a robots.txt file. Feel free to email me or visit www.XTELWEB.com for more information on contacting me!