Thursday, August 30, 2012

Compile time MD5 using constexpr

Today, someone on stackoverflow asked how to perform compile-time hashing in C++. After providing a small, naive, example in my answer, I thought it would actually be interesting to implement some well known hashing algorithm using C++11 features, such as constexpr. So I gave MD5 a try, and after some hours of trying stuff, I was able to create compile-time MD5 hashes.

The MD5 hashing algorithm.


Implementing the algorithm itself was not as hard as I thought it would be. The algorithm is roughly like this:
  1. It starts with 4 32-bits integers which contain a predefined value.
  2. The algorithm contains 4 "rounds". On each of them, a different function is applied to those integers and the input string.
  3. The result of those rounds serves as input to the next ones.
  4. Add the 4 resulting 32-bit integers with the original, predefined integer values.
  5. Interpret the resulting integers as a chunk of 16 bytes.
That's it. Note that I've somehow stripped down the algorithm to short strings. A real implementation would iterate several times doing the same stuff on the entire buffer. But since nobody is going to hash large strings on compile time, I've simplified the algorithm a little bit.

The implementation


So basically the implementation was not that hard, some template metaprogramming and several constexpr functions. Since there was a pattern on the way the arguments were provided as input to the functions in each round, I used a specialization which avoided lots of repeated code.

The worst part was generating the input for that algorithm. The input is not just the string which is to be hashed. The steps to generate that input is roughly as follows:
  1. Create a buffer of 64 bytes, filled with zeros.
  2. Copy the input string into the start of the buffer.
  3. Add a 0x80 character on buffer[size_of_input_string].
  4. On buffer[56], the value sizeof_input_string * 8 must be stored.
That's all. The algorithm will only work with strings of no more than 31 bytes. It could be generalized by modifying the appropriate bytes on buffer[57:60], but that was not my objective.

In order to achieve this buffer initialization, I had to somehow decompose the input string into characters, and then join them and those special bytes, into an array which could be used during compile time. In order to achieve this, I implemented a constexpr_array, which is just a wrapper to a built-in array, but provides a data() constexpr member, something that std::array does not have(I'm not really sure why):
template<typename T, size_t n>
struct constexpr_array {
    const T array[n];
    
    constexpr const T *data() const {
        return array;
    }
};
The decomposition ended up pretty simple, but I had to struggle to figure out how the hell to implement it.

Finally, the interface for the compile time MD5 would be the following:
template<size_t n> constexpr md5_type md5(const char (&data)[n]) 

The typedef for the function's result is the following:
typedef std::array<char, 16> md5_type;

Note that everything in the implementation resides in namespace ConstexprHashes.

As an example, this code generates a hash and prints it:
#include <iostream>
#include "md5.h"

int main() {
    constexpr auto value = ConstexprHashes::md5("constexpr rulz");
   
    std::cout << std::hex;
    for(auto v : value) {
        if(((size_t)v & 0xff) < 0x10)
            std::cout << '0';
        std::cout << ((size_t)v & 0xff);
    }
    std::cout << std::endl;
}

This prints out: "b8b4e2be16d2b11a5902b80f9c0fe6d6", which is the right hash for "constexpr rulz".

Unluckily, the only compiler that is able to compile this is clang 3.1(I haven't tested it on newer versions). GCC doesn't like the code, and believes some constexpr stuff is actually non-constexpr. I'm not 100% sure that clang is the one that's right, but it looks like it is.

You can find the MD5 implementation here. Maybe I'll implement SHA1 soon, whenever I have some time.

Well, this was the first time I used constexpr, it was a nice experience. Implementing this using only TMP would be an extreme pain in the ass.

3 comments:

  1. Very cool. Works perfectly with Visual Studio 2015 Update 1

    ReplyDelete
  2. Is the implementation only working for strings upto 32 bytes? Would it be hard to extend it?

    ReplyDelete