fastest hashing in a unix environment?

The cksum utility calculates a non-cryptographic CRC checksum.

How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.

And cmp won't give you any false positives or negatives :-)

pax> echo hello >qq1.txtpax> echo goodbye >qq2.txtpax> cp qq1.txt qq3.txtpax> cmp qq1.txt qq2.txt >/dev/nullpax> echo $?1pax> cmp qq1.txt qq3.txt >/dev/nullpax> echo $?0

Based on your question update:

I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.

I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).

#!/bin/bashrm -rf qqtempmkdir qqtemp((i = 0))while [[ $i -ne 5000 ]] ; do        #dig microsoft.com +short >qqtemp/microsoft.com.$i        dig microsoft.com +short >/dev/null        ((i = i + 1))done

The elapsed times at 5 runs each are:

File I/O  |  /dev/null----------+-----------    3:09  |  1:52    2:54  |  2:33    2:43  |  3:04    2:49  |  2:38    2:33  |  3:08

After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only ¹/₁₂₅₀th of a second per item.

However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.

Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.

Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.

CodeHunter

fastest hashing in a unix environment?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last