Safe C Macros

Introduction

Macros in C are notoriously tricky, giving a bad reputation to an otherwise useful tool. GCC, GNU's compiler, has an extension, apparently also supported by the clang compiler, to make macros safer than what the standard offers.

This extension is called statement exprs in the documentation, but it's not an easy way to remember it, so even though this article is merely a less technical copy of the manual page, hopefully it will give a more memorable name.

This article will talk only about C. If you need this feature for C++, read the actual documentation.

Unsafe Macros

Of course before talking about safe macros we have to define what's unsafe: simply put, it's a macro that evaluates its arguments more than once. The “unsafety” comes from the fact that if one or more of the macro arguments have side effects, those effects will be applied multiple times.

Let's make a very simple example. Consider the following macro (based on GCC's manual):

#define max(a, b) ((a) > (b)) ? (a) : (b)

int main(void) {
     int i = 1;
     int j = 2;

     printf("%d\n", max(++i, ++j));

     return 0;
}
	  

Without knowing the definition of max, one would expect the code to print 3; however, after executing it the printed number is actually 4. This is because the ++j statement is executed two times, as shown by the macro expansion:

printf("%d\n", ((++i) > (++j)) ? (++i) : (++j));
	  

This is a trivial example, but imagine if the macro was something bigger and the arguments were to write to disk or send a network request! Because of the many implications of evaluating the argument two times, macros like max as defined earlier are unsafe.

A Step Towards Safety

After the C99 standard had been published, macros managed to get a standard way to get some safety. Consider this definition:

#define foo(x, y) do {           \
     int _tmpx = (x);            \
     int _tmpy = (y);            \
                                 \
     do_something(_tmpx, _tmpy); \
} while (0)
	  

The ANSI/C89 standard doesn't allow variable declarations in a position that isn't at the start of a function, so compilers not implementing the C99 standard can't do this without an extension.

Because the do statement creates a new block, it's possible to declare some temporary variables (which will disappear after the end of the do) to hold the computation of the arguments. This way, side effects will happen only once as expected.

Unfortunately, this method still has a major issue: it has no return value. The do statement doesn't return anything so using it to define the max macro will generate a compilation error:

#define max(a, b) do {                \
     int _tmpa = (a);                 \
     int _tmpb = (b);                 \
     (_tmpa > _tmpb) ? _tmpa : _tmpb; \
} while(0)

int main(void) {
     int i = 1;
     int j = 2;

     printf("%d\n", max(++i, ++j));

     return 0;
}
	  

For many macros, this method is still fine, as there are only so many reasons to use a value-returning macro instead of a function, but for those cases when you need to, this method is still not viable.

Safe Macros With a Return Value

This is were the GCC extension comes in. By using it, it's possible to define a safe macro with a return value, including a “void” one.

The extension has a syntax similar to the do method above, but instead of placing the new block within two keywords, it's placed within two parentheses:

#define max(a, b) ({                  \
     int _tmpa = (a);                 \
     int _tmpb = (b);                 \
     (_tmpa > _tmpb) ? _tmpa : _tmpb; \
})
	  

The return value of the last expression is the return value of the macro; in this case it's an int, but if the last expression was a function returning a pointer to a struct that would be its return value. It can even “return” void if the last expression were to, say because it's a for loop or a function with no return value.

While we're at it, let's consider another problem that arises from writing macros this way: how can we preserve the type of the arguments? In the macro above, the temporary variables are defined to have an int type, making them unfeasible for use with float, struct or anything that an int can't safely represent.

GCC (and clang) has another extension for this kind of situation: typeof: its purpose is to infer the type of its arguments and have it returned. This happens at compile time, so in the following example, both variables have the uint64_t type:

uint64_t hello = 13;
typeof(hello) world = 26;
	  

Normally the typeof keyword doesn't evaluate the argument, so it's safe to call it even when the argument would generate a side effect. For more informations, read the manual.

If we substitute typeof for int in the macro above, we can now use a type-agnostic max (as long as the arguments are numbers, that is):

#define max(a, b) ({                  \
     typeof(a) _tmpa = (a);           \
     typeof(b) _tmpb = (b);           \
     (_tmpa > _tmpb) ? _tmpa : _tmpb; \
})
	  

Alas, the manual page for statement exprs warns us that the temporary variables can cause shadowing. In fact, if we execute this program, using the macro above:

int main(void) {
     int _tmpa = 1;
     int _tmpb = 2;

     printf("%d\n", max(_tmpa, _tmpb));

     return 0;
}
	  

it's noticeable that something's wrong.

There's not much that can be done about it syntactically, and since C doesn't have gensym, the only thing we can do is rely on some more preprocessor magic to make those temporary variables unique:

#define max(a, b) ({                                                          \
     typeof(a) _tmpa##__LINE__ = (a);                                         \
     typeof(b) _tmpb##__LINE__ = (b);                                         \
     (_tmpa##__LINE__ > _tmpb##__LINE__) ? _tmpa##__LINE__ : _tmpb##__LINE__; \
})
	  

A bit tedious to write, but since the __LINE__ macro has a different value for each line (obviously), every invocation will have variables with a different name. Well, as long as the macro is not used recursively. Aside from some lucky exceptions, recursive macros will end up with variables with the same name.

Conclusions

The situations in which this extension falls short aren't many, especially if the macro itself isn't as generic as max is (mostly in the sense that there's no real need to call it recursively), so all in all using it will make macros safe regardless of the arguments passed to them.