fastest hashing in a unix environment? fastest hashing in a unix environment? unix unix

fastest hashing in a unix environment?


The cksum utility calculates a non-cryptographic CRC checksum.


How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.

And cmp won't give you any false positives or negatives :-)

pax> echo hello >qq1.txtpax> echo goodbye >qq2.txtpax> cp qq1.txt qq3.txtpax> cmp qq1.txt qq2.txt >/dev/nullpax> echo $?1pax> cmp qq1.txt qq3.txt >/dev/nullpax> echo $?0

Based on your question update:

I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.

I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).

#!/bin/bashrm -rf qqtempmkdir qqtemp((i = 0))while [[ $i -ne 5000 ]] ; do        #dig microsoft.com +short >qqtemp/microsoft.com.$i        dig microsoft.com +short >/dev/null        ((i = i + 1))done

The elapsed times at 5 runs each are:

File I/O  |  /dev/null----------+-----------    3:09  |  1:52    2:54  |  2:33    2:43  |  3:04    2:49  |  2:38    2:33  |  3:08

After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only 1/1250th of a second per item.

However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.

Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.

Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.