I need to create unique numerical ids for some short strings.
some.domain.com -> 32423421
another.domain.com -> 23332423
yet.another.com -> 12131232
Is there a Perl CPAN module that will do something like this?
I've tried using Digest::MD5 but the resulting numbers are too long:
some.domain.com -> 296800572457176150356613937260800159845
-
Just take the first 8 digits of the MD5 hash. This works because MD5 is uniformly distributed over its hash address space. This means that any consecutive sequence of MD5 hash digits will itself be a uniformly distributed hash.
Alternatively, just use some other uniformly-distributed hashing mechanism that returns 8 numbers. Whatever's easiest for you.
git-noob : but then the probability of a collision goes up?John Feminella : That's right, but your probability of a collision always goes up when you reduce the address space. You'd have precisely the same problem using a shorter hash no matter how it's created. -
Either Digest::CRC or String::CRC32. The first gives you option to calculate 8-, 16- and 32-bit chcecksums, while second only supports 32-bit.
-
Given the fact that the strings look like a host names, perhaps you will just resolve them to ip, and present the ip as integer?
Kind of like:
perl -le 'my $ip = gethostbyname("depesz.com"); my $num = unpack("N", $ip); print $num' 1311657670
innaM : What if they all point to the same IP? There are IPs out there that serve some 10 million host names.
0 comments:
Post a Comment