A web crawler stores fixed length representations of document addresses in a buffer and a disk file, and optionally in a cache. When the web crawler downloads a document from a host computer, it identifies URL's (document addresses) in the downloaded document. Each identified URL is converted into a...http://www.google.com.au/patents/US6952730?utm_source=gb-gplus-sharePatent US6952730 - System and method for efficient filtering of data set addresses in a web crawler