Computer Sciences Department, University of Technology/Baghdad


The World Wide Web (WWW) has grown from a few thousand pages in
1993 to more than eight billion pages at present. Due to this explosion in size,
web search engines are becoming increasingly important as the primary means
of locating relevant information.
This research aims to build a crawler that crawls the most important web
pages, a crawling system has been built which consists of three main
techniques. The first is Best-First Technique which is used to select the most
important page. The second is Distributed Crawling Technique which based on
UbiCrawler. It is used to distribute the URLs of the selected web pages to
several machines. And the third is Duplicated Pages Detecting Technique by
using a proposed document fingerprint algorithm.