Not Signed In Join the Community
Microsoft Advertising Community
Crawlers and Robot.txt Files -

pubCenter Blog

Bookmark and Share

Crawlers and Robot.txt Files

posted Wed, Mar 25 2009
by Harlan

What are crawlers and robot.txt files?
A crawler (or spider) is a program which is designed to visit a web site and determine the content of that site. It does this by following all of the links and branches throughout that site and pulling relevant keywords from the pages.

Microsoft Crawlers
Microsoft pubCenter uses two different crawlers to review web sites. It is recommended you explicitly allow both Microsoft crawlers access to your site.

User-agent: MSNBot and User-agent: MSNPTC
Providing Microsoft Crawlers access to your property allows your site content to be read in order to provide more relevant ads. If you have the User-agent: * or no robots.txt file, Microsoft Crawlers will still have access to

your site.

Robots.txt File
Crawlers should be placed in the Robots.txt file. This is a file that resides at the top level of a domain and is used to instruct automated crawlers about where they may or may not look for information which is relevant to that property. This is important for two reasons:

1. It allows a web site to direct the information automatically obtained about the site and hence control the image of the site for search engines and advertisers such as Microsoft.

2. It allows the site to block specific crawlers which may be found to be particularly frequent or load intensive.

Return to pubCenter Community Support Center