Strings, such as Web pages or other documents, are fingerprinted in order to detect substantially similar strings, so as to avoid processing duplicate strings. At the same time determine a computerized method estimates the probability that a collision among fingerprints of dissimilar strings. As fingerprints...http://www.google.com.au/patents/US5974481?utm_source=gb-gplus-sharePatent US5974481 - Method for estimating the probability of collisions of fingerprints