FP98: Limiting the Access of the Import Web Wizard
ID: Q193942
|
The information in this article applies to:
-
Microsoft FrontPage 98 for Windows
SUMMARY
This article describes the method used to prevent Web robots, also called
Web spiders or Web crawlers (such as the FrontPage Import Web Wizard) from
searching through your Web and taking files meant to be private.
This article also describes some examples on how to use this method on your
server to prevent a FrontPage 98 (or any Web robot) user from bypassing
your security.
MORE INFORMATION
Overview
Web robots are programs that cross many pages in the World Wide Web by
recursively retrieving linked pages. FrontPage 98 has an Import Web Wizard
the works just like a Web robot.
There have been occasions where Web robots have visited Web servers where
they were not allowed.
This particular type of situation required many Web server administrators
to implement a method to prevent Web robots, like the Import Web Wizard,
from accessing areas where they may not be allowed or wanted.
The Method
The method used to exclude Web robots from a server is to create a file on
the server, which specifies an access policy for them.
For this method to be effective, the following criteria are used:
- The file is called robots.txt.
- The file is located in the Root Web.
This approach was chosen because it can be easily implemented on any
existing Web server, and a Web robot can find the access policy with only a
single document retrieval.
The Format of Robots.txt
The record starts with one or more User-agent lines, followed by one or
more disallow lines. The following lines describe the structure for your
"robots.txt" file.
NOTE: Unrecognized headers are ignored.
# The pound sign (#) is used for comments.
User Agent: *
Disallow: folder name
The following example restricts access to the "bak" and "_private" folders
located in the subWeb named "myweb." To use the example, follow these
steps:
- Create a file in the Root Web called robots.txt
.
- Place the following lines of text in the "robots.txt" file:
# do not access these 2 folders
User-agent: *
Disallow: /myweb/_private/ # This is my private URL space
Disallow: /bak/ # these are backup folders
If you want to restrict access to the entire web site, follow these steps:
- Create a file in the Root Web called robots.txt.
- Place the following lines of text in the "robots.txt" file:
# do not access anything on the web
User-agent: *
Disallow: /
NOTE: The presence of an empty "robots.txt" file has no explicit
access restrictions. It will be treated as if it was not present and Web
robots will be allowed throughout the web.
REFERENCES
For more information about Web robots, go to a search engine on the Word
Wide Web (for example, www.yahoo.com or www.infoseek.com) and search on
robots.txt.
For more information about Web robots, please visit the following World
Wide Web site:
http://info.webcrawler.com/mak/projects/robots/robots.html
Additional query words:
99 Import Web Wizard WWW bot
Keywords :
Version : WINDOWS:
Platform : WINDOWS
Issue type :
Last Reviewed: July 30, 1999