Custom Search
 


What If You Don't Want Your Pages To Be Crawled and Cached by Search Engines



Nowadays there are many ways to create web pages. You don't really need a full blown website to be able to create a web page. You may have web pages that you want to hide from general public. The pages meet the following criteria:

  • only accessible by trusted users if they know page URLs.
  • no links on the web that point to these pages.
  • no username and password are required to gain access as long as you know page URLs.

Let's see this scenario:

One day you created a page and you didn't put a link to it on your site. Then you told your family members about the page's URL. You thought nobody else would find it.

You just made a mistake. Google and Yahoo will find your page if you or any one from your family ever visited the pages with either Google toolbar PageRank enabled or Yahoo Companion Toolbar.

Google PageRank function records the URL you're visiting

When you use Google toolbar with PageRank enabled, the toolbar automatically sends and records the web page's URL you're visiting in Google's database. If a page's URL is not found in Google's database, Googlebot - the web crawler of Google, will visit this page later to index it.

When you install Google Toolbar, Google does remind you about the fact that information will be sent to Google about the web page you're visiting. Here is cited from Google Toolbar installation step 2 - Choose Your Configuration:

By using the advanced features of the Google Toolbar, you may be sending information about the sites you visit to Google. This is needed to make some features of the Toolbar available to you.

In order to show you more information about a site, the Google Toolbar's PageRank feature has to tell us what site you're visiting, which it does by sending us the URL. Google will not provide personally identifiable information to any third parties except as described in the Google Privacy Policy. To learn more about the privacy protections we have built into this system, read our Toolbar Privacy Policy.


Your surfing activities are tracked whether you use Google Toolbar to search the web or directly type a page's URL in Google search page. Google records your visits anyway.

One day when you check what pages on your site have been indexed by Google, your hidden page comes up and you are worried. Furthermore, this page is cached. Even though you remove that page from your site, it can still be found and viewed from the cached version.

How to check what pages have been indexed?

Go to Google, type in "site:www.yoursite.com" without quotes. This query will list all the pages that have been indexed but it will only display up to 999 records as this is the limit set by Google for any queries.

How to prevent your hidden pages to be indexed and cached?

One simple but not sound solution is to disable PageRank function on the toolbar. To stop Google automatically track your surfing information, you can uncheck the PageRank checkbox to disable it.

Steps to disable PageRank function:

  1. Click Settings in the toolbar (at the right end of the bar) and then click Options in the dropdown.
  2. In the pop-up window, click the tab More and uncheck the PageRank checkbox.



See Google Toolbar Privacy Policy for what information Google is collecting.

Unfortunately, disable the PageRank function is not going to completely solve your problem because, in our example, your other family members could have PageRank enabled.

A sound solution

Your problem can be tackled by using meta robots html tag. The following two tags are what you need to use. Put the tag in the <head> section of your HTML documents.

<meta name="robots" content="noindex,nofollow,noimageindex">
Search engines will read this page but will not index the page content and any images on this page and no links on this page will be traversed through to other pages.

<meta name="robots" content="noarchive">
Search engines will not archive/cache the page content.

How to remove an indexed and cached page

If your page has already been indexed and cached, to remove from search engine databases, use one of the following two methods:

Method #1:
Add <meta name="robots" content="noindex,nofollow,noarchive,noimageindex"> to your page HEAD tag section. Next time when Googlebot or other robots visit your page, your page will be removed from their index and cache.

Method #2:
If you need a speedy removal, use URL Removals tool in Google Webmaster Tools. I had great experience with it. My individual URLs were removed within a few hours after submitting removal request.

For detailed explanation about URL Removals tool, read Google Webmaster Central Blog Requesting removal of content from our index for more information.

One last note

Is your page now 100% hidden? Not really. If you have outbound links on the hidden page and you click the links and navigate to other websites, your hidden page's URL will appear in other sites web traffic log as HTTP referrer.

You can remove outbound links from your hidden pages if that's suitable.

What If...
Now you know how to safeguard any page on your site. What if you don't want to search engines to crawl all files in a particular directory? Read this article Robots.txt And Search Engine Robots to find out.


Copyright© GeeksEngine.com



Related Articles:

1.Absolute Path and Relative Path Explained
2.The Difference Between Dynamic URLs and Static URLs
3.Robots Meta HTML Tag Syntax Explained
4.What Robots.txt is And Search Engine Robots Explained
5.The Right Domain Name Drives More Website Traffic


Other Recent Articles from the Webmaster Help category:

1.How to load IP addresses for countries into your MySQL database
2.How to set up your website connection details in FileZilla
3.The Difference Between Dynamic URLs and Static URLs
4.How To Find Out Everything You Want To Know About A Website
5.Robots Meta HTML Tag Syntax Explained
6.What Robots.txt is And Search Engine Robots Explained
7.How to Tweak HTML Table To Speed Up Page Load Time

Copyright © 2024 GeeksEngine.com. All Rights Reserved.

This website is hosted by HostGator.

No portion may be reproduced without my written permission. Software and hardware names mentioned on this site are registered trademarks of their respective companies. Should any right be infringed, it is totally unintentional. Drop me an email and I will promptly and gladly rectify it.

 
Home | Feedback | Terms of Use | Privacy Policy