The Wright Way – DRY Preprocessing

Introduction

I have always felt like there is a dearth of cohesive C/C++ education after school. Each software engineer seems to be on their own when it comes to learning new features, how to use a new library, or what the best practices are for common situations.

This newsletter is my attempt to share some of what I have learned; the way I do things. It is not meant to be authoritative, but, rather, a launching pad for further discussion and investigation. Often, I’ll try something new just so that I can vet it for future use or just to experiment. Not everything pans out, so take what you will and leave the rest.

The Preprocessor – A Four Letter Word?

Somewhere along the line I was told to avoid macros and it stuck. I think I missed the point. Now days, I’m rehabilitated, and the preprocessor and I are good friends again. It is another tool that I can reach for when the need arises. So far, I find that I use the preprocessor mainly as a code generator.

Code generation is one of the tools that can aid in adhering to the DRY principle, or the Don’t Repeat Yourself principle. In The Pragmatic Programmer, Andrew Hunt and David Thomas spend quite a few pages delving into the, “evils of duplication”, and the concept underlies many useful programming practices. I won’t repeat their work here, but instead it can be distilled down to doing whatever is necessary to not duplicate data or logic. The preprocessor can help in this regard.

Real World Application

The beauty of the preprocessor is that it is already built into the build system; no extra modification or external requirement is needed. While the preprocessor can obviously be used directly, Boost provides a much nicer and safer API that can make it much more effective. Many things that the Boost Preprocessor library can do, I didn’t even know was possible.

Repetition

The main feature of the Boost Preprocessor library that I use the most is repetition. For example, I have a log() function in my code that takes variable arguments; variable in type and number. Like so:

void log(int code);
template< typename T0 > void log(int code, T0 const& t0);
template< typename T0, typename T1 > void log(int code, T0 const& t0, T1 const& t1);
...

That gets old really fast, and this is essentially repeating the same information over and over again. There’s no value in each individual function. It’s just more of the same. In the future, when more arguments are needed, and that’ll always happen, additional copy and pastes will need to be made.

The simplest way to repeat code is with the BOOST_PP_REPEAT macro. When working with the preprocessor I tend to write the code normally, and then convert it piece by piece into generated code. To start with, define the number of log() functions needed:

#define MAX_LOG_ARGS 5

When more arguments are needed, all that needs to change is this one define. It could even be passed in on the command line to the compiler.

The next step is to define the function itself and start repeating it:

#include "boost/preprocessor/repetition/repeat.hpp"

#define MAX_LOG_ARGS 5
#define LOG(z, i, _) void log(int code);

BOOST_PP_REPEAT(MAX_LOG_ARGS, LOG, _)

#undef LOG
#undef MAX_LOG_ARGS

It’s considered best practice to undefine the macros as soon as you don’t need them anymore. As seen above, the macros aren’t needed again after the expansion of BOOST_PP_REPEAT.

This will produce:

void log(int code);
void log(int code);
void log(int code);
void log(int code);
void log(int code);

Not even close to what is needed, but it is a good start. The arguments to the BOOST_PP_REPEAT macro are:

  • MAX_LOG_ARGS – The number of times to repeat.
  • LOG – The macro to repeat.
  • _ – This is any extra data that should be passed to the macro that is being repeated. The underscore is generally used when there isn’t any extra data.

The arguments to the LOG macro are:

  • z – An internal thing to the library. It helps with recursively calling BOOST_PP_REPEAT. This can be ignored.
  • i – The current repetition index, e.g. 0, 1, 2, 3, and 4 in this example.
  • _ – This is whatever was passed in as the last argument to the BOOST_PP_REPEAT macro.

I won’t cover all the arguments for all the macros, as that would just be repeating the information in the library documentation, but the above should be a good primer.

The log() function needs template specifications, so let’s add them in:

#include "boost/preprocessor/repetition/repeat.hpp"
#include "boost/preprocessor/repetition/enum_params.hpp"

#define MAX_LOG_ARGS 5
#define LOG(z, i, _) \
   template < BOOST_PP_ENUM_PARAMS(i, typename T) > \
   void log(int code);

BOOST_PP_REPEAT(MAX_LOG_ARGS, LOG, _)

#undef LOG
#undef MAX_LOG_ARGS

The BOOST_PP_ENUM_PARAMS macro is perfect for this situation; think of it as a specialized BOOST_PP_REPEAT. It’ll repeat its second argument the number of times specified in the first. This lets us use the i argument passed into the LOG macro. This will produce:

template < > void log(int code); 
template < typename T0 > void log(int code); 
template < typename T0, typename T1 > void log(int code); 
template < typename T0, typename T1, typename T2 > void log(int code); 
template < typename T0, typename T1, typename T2, typename T3 > void log(int code);

This would be great except for when i equals 0. In that case we were left with an empty template specification. Luckily, the preprocessor can also do conditionals:

#include "boost/preprocessor/control/expr_if.hpp"
#include "boost/preprocessor/repetition/repeat.hpp"
#include "boost/preprocessor/repetition/enum_params.hpp"

#define MAX_LOG_ARGS 5
#define LOG(z, i, _) \
   BOOST_PP_EXPR_IF(i, template < BOOST_PP_ENUM_PARAMS(i, typename T) >) \
   void log(int code);

BOOST_PP_REPEAT(MAX_LOG_ARGS, LOG, _)

#undef LOG
#undef MAX_LOG_ARGS

The BOOST_PP_EXPR_IF macro will expand to its second argument if the first argument is greater than 0. That works perfectly for this situation. There is also a BOOST_PP_IF macro that provides if-else functionality.

Next, the template types should be used to add arguments to the log() function. The BOOST_PP_ENUM_PARAMS function might be useful here again, but this situation is slightly different; both the type and the argument name to be enumerated. In addition, a comma needs to be added after the code argument when the number of arguments is greater than 0. This is all much easier than it sounds:

#include "boost/preprocessor/control/expr_if.hpp"
#include "boost/preprocessor/repetition/repeat.hpp"
#include "boost/preprocessor/repetition/enum_params.hpp"
#include "boost/preprocessor/repetition/enum_trailing_binary_params.hpp"

#define MAX_LOG_ARGS 5
#define LOG(z, i, _) \
   BOOST_PP_EXPR_IF(i, template < BOOST_PP_ENUM_PARAMS(i, typename T) >) \
   void log(int code BOOST_PP_ENUM_TRAILING_BINARY_PARAMS(i, T, const& t));

BOOST_PP_REPEAT(MAX_LOG_ARGS, LOG, _)

#undef LOG
#undef MAX_LOG_ARGS

This is a common enough practice that Boost comes with one macro to do all of that, BOOST_PP_ENUM_TRAILING_BINARY_PARAMS. The result:

void log(int code); 
template < typename T0 > 
   void log(int code, T0 const& t0); 
template < typename T0, typename T1 > 
   void log(int code, T0 const& t0, T1 const& t1); 
template < typename T0, typename T1, typename T2 > 
   void log(int code, T0 const& t0, T1 const& t1, T2 const& t2); 
template < typename T0, typename T1, typename T2, typename T3 > 
   void log(int code, T0 const& t0, T1 const& t1, T2 const& t2, T3 const& t3);

Wait—something isn’t quite right. The MAX_LOG_ARGS macro said the maximum number of log arguments should be 5, but this only produces log() functions that take four. Yet again, the preprocessor amazes; it can do math:

#include "boost/preprocessor/arithmetic/inc.hpp"
#include "boost/preprocessor/control/expr_if.hpp"
#include "boost/preprocessor/repetition/repeat.hpp"
#include "boost/preprocessor/repetition/enum_params.hpp"
#include "boost/preprocessor/repetition/enum_trailing_binary_params.hpp"

#define MAX_LOG_ARGS 5
#define LOG(z, i, _) \
   BOOST_PP_EXPR_IF(i, template < BOOST_PP_ENUM_PARAMS(i, typename T) >) \
   void log(int code BOOST_PP_ENUM_TRAILING_BINARY_PARAMS(i, T, const& t));

BOOST_PP_REPEAT(BOOST_PP_INC(MAX_LOG_ARGS), LOG, _)

#undef LOG
#undef MAX_LOG_ARGS

The BOOST_PP_INC simply increments the MAX_LOG_ARGS macro so that the desired behavior is achieved; one additional log() function is generated:

void log(int code); 
template < typename T0 > 
   void log(int code, T0 const& t0); 
template < typename T0, typename T1 > 
   void log(int code, T0 const& t0, T1 const& t1); 
template < typename T0, typename T1, typename T2 > 
   void log(int code, T0 const& t0, T1 const& t1, T2 const& t2); 
template < typename T0, typename T1, typename T2, typename T3 > 
   void log(int code, T0 const& t0, T1 const& t1, T2 const& t2, T3 const& t3);
template < typename T0, typename T1, typename T2, typename T3, typename T4 > 
   void log(
      int code, 
      T0 const& t0, 
      T1 const& t1, 
      T2 const& t2, 
      T3 const& t3, 
      T4 const& t4);

Other arithmetic operators are available, including BOOST_PP_ADD, BOOST_PP_SUB, BOOST_PP_MUL, and BOOST_PP_DIV.

To finish off this example, this is a simple definition of the log() function:

#include "boost/preprocessor/arithmetic/inc.hpp"
#include "boost/preprocessor/cat.hpp"
#include "boost/preprocessor/control/expr_if.hpp"
#include "boost/preprocessor/repetition/repeat.hpp"
#include "boost/preprocessor/repetition/enum_params.hpp"
#include "boost/preprocessor/repetition/enum_trailing_binary_params.hpp"
#include "boost/preprocessor/stringize.hpp"

#define MAX_LOG_ARGS 5
#define STREAM_PARAM(z, i, _) \
   BOOST_PP_EXPR_IF(i, << ",") << BOOST_PP_STRINGIZE(i=) << BOOST_PP_CAT(t, i)
#define LOG(z, i, _) \
   BOOST_PP_EXPR_IF(i, template < BOOST_PP_ENUM_PARAMS(i, typename T) >) \
   void log(int code BOOST_PP_ENUM_TRAILING_BINARY_PARAMS(i, T, const& t)) \
   { \
      std::cout << "code=" << code BOOST_PP_EXPR_IF(i, << ",") \
         BOOST_PP_REPEAT(i, STREAM_PARAM, _) \
         << std::endl; \
   }

BOOST_PP_REPEAT(BOOST_PP_INC(MAX_LOG_ARGS), LOG, _)

#undef LOG
#undef STREAM_PARAM
#undef MAX_LOG_ARGS

This merely streams the arguments to standard out, but it shows how BOOST_PP_REPEAT can be used recursively. This also uses two new macros; the BOOST_PP_CAT macro, which is a safer ## (concatenation) operator, and the BOOST_PP_STRINGIZE macro, which is a safer # (stringification) operator. Besides being safer than their operator counterparts, they also show much more intent.

We now have a log function that is easy to use:

#include <cstdlib>

#include "log.h"

int main(int argc, char* argv[])
{
   log(0);
   log(1, "Hello");
   log(2, "Hello", "World", 6, 7.0f);

   return EXIT_SUCCESS;
}

Output:

code = 0
code = 1, 0 = Hello
code = 2, 0 = Hello, 1 = World, 2 = 6, 3 = 7

Sequences

With the same goal of reducing repetition, preprocessor sequences can help. If many very similar functions are required, the parts that are unique can be extracted, and the parts that remain can be implemented just once.

To demonstrate, consider these functions:

template< typename T > T add(T t0, T t1)
{
   return t0 + t1;
}

template< typename T > T sub(T t0, T t1)
{
   return t0 - t1;
} 

template< typename T > T mul(T t0, T t1)
{
   return t0 * t1;
} 

template< typename T > T div(T t0, T t1)
{
   return t0 / t1;
}

They are pretty repetitive. To eliminate the repetition, sequences can be used to define just the parts that are different. While the structure or algorithm of the functions is defined elsewhere.

Instead of building this up from scratch, which is left to the reader, the solution would look like this:

#include "boost/preprocessor/seq/elem.hpp"
#include "boost/preprocessor/seq/for_each.hpp"

#define OPERATIONS \
   ((add)(+)) \
   ((sub)(-)) \
   ((mul)(*)) \
   ((div)(/))

#define INIT_OPERATION(r, _, elem) \
   template< typename T > T BOOST_PP_SEQ_ELEM(0, elem)(T t0, T t1) \
   { \
      return t0 BOOST_PP_SEQ_ELEM(1, elem) t1; \
   }

BOOST_PP_SEQ_FOR_EACH(INIT_OPERATION, _, OPERATIONS)

#undef INIT_OPERATION
#undef OPERATIONS

As can be seen, the names of the functions and the corresponding operators can be defined in the OPERATIONS macro, and the body of the function in the INIT_OPERATION macro, which is repeated once for each operation. Adding a new operation, perhaps %, would only take a second.

Wrap-up

Avoiding duplication, whether data, logic, algorithm, or code should be avoided whenever possible. The preprocessor is only one of many tools that can assist in that regard, however, it is one of the easiest to integrate into existing code, and it already has support from IDEs and the build system.

Obviously the preprocessor must be used carefully. The pitfalls of macros, that I was warned against early on, must still be avoided, but the utility of the preprocessor should be embraced to drive out repetition and to enhance code when possible.

The Boost Preprocessor library has many more macros that I haven’t touched on as I only covered ones that I have personally used in code. I hope that even this small subset can find a place in your tool belt and use in your code.