What is a Hash Function?
A hash function uses a specific calculation to convert a piece of data of any size to a numerical value with a fixed number of bits. The value calculated by a hash function is called a Hash Value. The hash function is designed to always return the same hash value for the same data.
Uses of Hashing
If there are many large pieces of data to be compared, it will take a long time to compare them all individually. In this situation, it would be effective to find and compare the hash values of each piece of data. If the hash values are different, the data they refer to is also different, so mismatches can be found in a short period of time and with a high rate of probability. When the hash values are the same, the original data must be compared. Even when the hash values are the same, the data might be different. However, overall, the time required for comparison can be greatly reduced.
You can also use a hash value to confirm that data has not been corrupted. For example, when saving data to a backup device, a hash value for all the saved data can be appended to the data string. When the data is loaded, the hash value for the loaded data is obtained and compared with the value stored at the end of the data string to confirm that the data is not corrupted. Similarly, this method can be used to confirm the integrity of data sent over a network.
The Message Digest
There are cases in which a cryptographic hash function is desired. A cryptographic hash function makes it impossible to restore the original data from the hash value. The hash value has a minimum size of 80 bits, and might be around 512 bits, depending on how it is to used. A cryptographic hash function is designed so that it is impossible to determine the original data from a given hash value. By impossible, we mean that it would take an extremely long period of time to compute the original data using current technology and thus is computationally infeasible. When a cryptographic hash function is used to detect tampering, the hash function is called a Message Digest Function and the hash value is called a Message Digest Value.
Using Message Digests to Detect Tampering
It is possible to use the message digest value to prevent data tampering by malicious third parties. Since a message digest value is short and has a fixed length, it is easy to distribute safely. Assuming a message digest value can be distributed safely, if the message digest for a large data set that is sent separately matches the original message digest value, we know that the data sent or received is authentic. This is because the one-way nature of the function makes it impossible to create false data that generates the same message digest value.
Using HMAC for Detection of Tampering
What do you do if you need to send the message digest value with the data? The message digest function will always return the same message digest value for the same data no matter who uses it. Accordingly, if you send the message digest value with the data, a malicious third party can substitute falsified data along with a falsified message digest value. In this case, you can use HMAC (Keyed Hashing for Message Authentication Code). In simple terms, HMAC is a mechanism that provides additional encryption of the hash value. If you provide an HMAC function with the data to hash, the data size, and the encrypted key data and key size, the function can produce a hash value encrypted with the key. Because this value cannot be generated as long as the encrypted key data is not known, if the keyed hash values match when the data is received, one can be sure that the data was sent by someone who knows the key. However, with HMAC, both the sender and the receiver use the same key (the shared-key system), so that a recipient program could analyze and determine the key, and then falsify the data. To prevent this, an electronic signature system that uses public-key encryption is needed. This library is not included in the NITRO-SDK.
Hash Functions Included in the SDK
NITRO-SDK includes the following hash functions (non-cryptographic).
| 8-bit Checksum | MATH_Checksum8* | Computes the one's complement of the one's complement sum in 8-bit units. | |
|---|---|---|---|
| 16-bit Checksum | MATH_Checksum16* | Computes the one's complement of the one's complement sum in 16-bit units. | Used in IP, TCP, and UDP protocols. |
| CRC-8 | MATH_CRC8* | Computes the 8-bit CRC. The generator polynomial is x8+x2+x1+10 and NOT is not applied to the output. |
|
| CRC-16 | MATH_CRC16* | Computes the 16-bit CRC. The generator polynomial is x16+x15+x2+10 and NOT is not applied to the output. |
Used in ARC, LHA, and other tools. |
| CRC-16 / CCITT | MATH_CRC16CCITT* | Computes the 16-bit CRC. The generator polynomial is x16+x12+x5+10xffff |
Defined in CCITT X.25, and used in communication frames of various standards. |
| CRC-32 | MATH_CRC32* | Computes the 32-bit CRC. The generator polynomial is x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x1+10xffffffff |
Used in PKZIP, PNG, and Ethernet. See ISO 3309, RFC 2083 etc. |
| CRC-32 / POSIX | MATH_CRC32POSIX* | Computes the 32-bit CRC. The generator polynomial is x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x1+10. NOT is applied to output. |
Used in the POSIX-compliant Unix cksum command. Defined in POSIX 1003.2 (IEEE Std 1003.2-1992). |
checksum is about twice as fast as CRC, but it generates the same value even when the data order is changed. This makes it a poor hash function. The checksum function included in the SDK will result in a value of 0 if the original data is added to the checksum value to obtain the checksum for all the data. Also, in the one's complement world, 0xffff and 0 express the same value. Therefore, when a checksum value is 0, it can be replaced by 0xffff and the original data plus the checksum value will still be 0. Based on this fact, you can prevent 0 from being used for normal checksum values and use it for a null value (such as uncalculated). For details, see the general UDP Checksum description.
When the data length is below a certain value, CRC is mathematically designed so that it is guaranteed to find errors up to a certain number of bits. For example, with CRC-16 and CRC-16/CCITT, when the data length (including the CRC) is up to 32,767 bits, the CRC will find errors in up to any 3 bits or an error of contiguous 16 bits or less. Of course, even when these conditions are not met, it can find errors with a high probability. CRC has error correction functionality but in many cases, CRC is used only for error detection and not for correction. The SDK includes SVC_GetCRC16, which returns the same results as MATH_CalcCRC16, but the MATH CRC function creates the calculation tables in advance, which speeds up operation.
Note that these are not cryptographic hash functions, so it is easy to falsify data so that it returns the same hash value.
The following cryptographic hash functions are included.
| SHA-1 | MATH_CalcSHA1 |
The message digest value length is 160 bits. | Used widely as a successor to MD5. See RFC 3174. |
|---|---|---|---|
| HMAC-SHA-1 | MATH_CalcHMACSHA1 |
HMAC that uses SHA-1. The message digest value length is 160 bits. | See FIPS PUB 180-2. |
| SHA-256 | MATH_SHA256* | The message digest value length is 256 bits. | Used as a successor to SHA-1. See RFC 3174. |
| HMAC-SHA-256 | MATH_CalcHMACSHA256 | HMAC that uses SHA-256. The message digest value length is 256 bits. | See FIPS PUB 180-2. |
MD5 has come into wide use as a one-way hash function, but as research into it has progressed, there have been a number of reports regarding its flaws under certain specific conditions. Therefore, the use of SHA-1 as a one-way hash function has currently become more widespread. When putting a new one-way hash function into use, it is probably best to use SHA-1.
These cryptographic hash functions will still provide sufficient utility as hash functions if bits that are computed partway through computation are used. (Saving and using the first 80 bits in HMAC-SHA-1
MATH Function List (Hash Value), MATH Function List (Message Digest)
2007/10/31 Based on recent trends, removed MD5-related descriptions.
2006/05/24 Based on recent trends, explicitly stated that SHA-1 is preferable to MD5.
2005/07/08 Updated output samples from the dgt-2 demo following changes to the SHA-1 implementation.
2005/06/24 Updated output samples from the dgt-2 demo following changes to the MD5 implementation.
2005/04/18 Initial version.
CONFIDENTIAL