Saturday, March 21, 2009

Thought of sharing about web crawlers

Great! after almost 6 months thought of writing a little piece on web crawlers ......To be honest I never had an idea about this, until my Project Manager shoot up a mail.
This article is based upon my search for web crawler and standards to exclude them from domains.

What is a web crawler?

Web crawlers are automated computer program which searchs the world wide web based upon some alogrithm. to be more precise it is a automated robot which searches WWW and index it in its lists which when will display when user searches based upon the term.

So whats the big deal.....Well most of websites are designed to be viewable by all, But then few of them want to be protective of their content and images ( Image search).


For example I have a portal which shows right managed photographs and i wish not to show these images on any search engine results.... Here you go web crawlers can become villains in this scenario.


So is there a way where I can block web crawlers from searching my website...Thank God the answer is yes.......


Robots.txt

Most of the bots(web crawlers or robots) honor the robots.txt file. Once the web crawler comes to search your domain then it first searchs for robots.txt and based upon the acess details and restrictions it will search.


How to write a robots.txt file

The contents of robots.txt are basically two syntax

1. User-agent:
2. Disallow:

User agent: It is the name of the search engine web crawler name. Ex: google crawler is known as googlebot, alta vista has scooter and so on... entire list can be seen @ http://www.robotstxt.org/db.html.

Disallow: Is a statement which says allow or disallow of the files in the path mentioned.

Examples:
User-agent: *
Disallow:
This will allow all the webcrawlers to access all the files in the domain. * means all the robots of search engine and empty content after disallow means it is allowed.

User-agent: *
Disallow:/
This will restrict all the files from web crawler , forward slash is for disallow.

User-agent: *
Disallow:/images/
This will restrict all the files in images folder for all the web crawlers.

User-agent: Googlebot-Image
Disallow:/images/
This will restrict all the files in images folder for all the googlebot-image web crawler.

Wednesday, November 5, 2008

Monday, June 2, 2008

Placement of logo in a website

Face of a Website
People say that the identity of a website is a Logo, of course otherwise it is hard to determine what site are we viewing.
In real life we have a name and a face so also in website term domain is a name of the site and logo the face of website(People do remember the Face rather than Name) .

Importance of the placement of logo
So people remember face rather than name that itself gives the importance of the placement.

Left-top
Mostly the logo should be on the left-top side of the site, this is because the user's eye will always popup of the left top side(Human by nature are like this).
So the safest way to place a logo on website is on left, but if you would like to have it other way(try to break the common path) then you need to act a bit.

Right-top
When you are placing the logo in the right top then try to keep a bg element or a tab kind of bg so that the logo just emboss on top to give clarity to user about its origin.

Bottom
This has to be avoided , basically you don't know the users browser tabs and the height ,
this will make your logo to hidden below the fold of the browser.

If you are making a fixed height website then you can keep the logo at the bottom.(Ex. flash sites).
But I strongly feel this position of logo has to be avoided.