Introduction
Macros in C are notoriously tricky, giving a bad reputation to an otherwise useful tool. GCC, GNU's compiler, has an extension, apparently also supported by the clang compiler, to make macros safer than what the standard offers.
This extension is called statement exprs in the documentation, but it's not an easy way to remember it, so even though this article is merely a less technical copy of the manual page, hopefully it will give a more memorable name.
This article will talk only about C. If you need this feature for C++, read the actual documentation.
Unsafe Macros
Of course before talking about safe macros we have to define what's unsafe: simply put, it's a macro that evaluates its arguments more than once. The “unsafety” comes from the fact that if one or more of the macro arguments have side effects, those effects will be applied multiple times.
Let's make a very simple example. Consider the following macro (based on GCC's manual):
#define max(a, b) ((a) > (b)) ? (a) : (b) int main(void) { int i = 1; int j = 2; printf("%d\n", max(++i, ++j)); return 0; }
Without knowing the definition of max
, one
would expect the code to print 3; however, after executing
it the printed number is actually 4. This is because
the ++j
statement is executed two times, as
shown by the macro expansion:
printf("%d\n", ((++i) > (++j)) ? (++i) : (++j));
This is a trivial example, but imagine if the macro was
something bigger and the arguments were to write to disk or
send a network request! Because of the many implications of
evaluating the argument two times, macros
like max
as defined earlier are unsafe.
A Step Towards Safety
After the C99 standard had been published, macros managed to get a standard way to get some safety. Consider this definition:
#define foo(x, y) do { \ int _tmpx = (x); \ int _tmpy = (y); \ \ do_something(_tmpx, _tmpy); \ } while (0)
The ANSI/C89 standard doesn't allow variable declarations in a position that isn't at the start of a function, so compilers not implementing the C99 standard can't do this without an extension.
Because the do
statement creates a new
block, it's possible to declare some temporary
variables (which will disappear after the end of
the do
) to hold the computation of the
arguments. This way, side effects will happen only once as
expected.
Unfortunately, this method still has a major issue: it
has no return value. The do
statement
doesn't return anything so using it to define
the max
macro will generate a compilation
error:
#define max(a, b) do { \ int _tmpa = (a); \ int _tmpb = (b); \ (_tmpa > _tmpb) ? _tmpa : _tmpb; \ } while(0) int main(void) { int i = 1; int j = 2; printf("%d\n", max(++i, ++j)); return 0; }
For many macros, this method is still fine, as there are only so many reasons to use a value-returning macro instead of a function, but for those cases when you need to, this method is still not viable.
Safe Macros With a Return Value
This is were the GCC extension comes in. By using it, it's possible to define a safe macro with a return value, including a “void” one.
The extension has a syntax similar to the do
method above, but instead of placing the new block within
two keywords, it's placed within two parentheses:
#define max(a, b) ({ \ int _tmpa = (a); \ int _tmpb = (b); \ (_tmpa > _tmpb) ? _tmpa : _tmpb; \ })
The return value of the last expression is the return
value of the macro; in this case it's
an int
, but if the last expression was a
function returning a pointer to a struct
that
would be its return value. It can even
“return” void
if the last
expression were to, say because it's a for
loop or a function with no return value.
While we're at it, let's consider another
problem that arises from writing macros this way: how can we
preserve the type of the arguments? In the macro above, the
temporary variables are defined to have an int
type, making them unfeasible for use
with float
, struct
or anything
that an int
can't safely represent.
GCC (and clang) has another extension for this kind of
situation: typeof
: its purpose is to infer the
type of its arguments and have it returned. This happens at
compile time, so in the following example, both variables
have the uint64_t
type:
uint64_t hello = 13; typeof(hello) world = 26;
Normally the typeof
keyword doesn't
evaluate the argument, so it's safe to call it even
when the argument would generate a side effect. For more
informations,
read the
manual.
If we substitute typeof
for int
in the macro above, we can now use a
type-agnostic max
(as long as the arguments are
numbers, that is):
#define max(a, b) ({ \ typeof(a) _tmpa = (a); \ typeof(b) _tmpb = (b); \ (_tmpa > _tmpb) ? _tmpa : _tmpb; \ })
Alas, the manual page for statement exprs warns us that the temporary variables can cause shadowing. In fact, if we execute this program, using the macro above:
int main(void) { int _tmpa = 1; int _tmpb = 2; printf("%d\n", max(_tmpa, _tmpb)); return 0; }
it's noticeable that something's wrong.
There's not much that can be done about it
syntactically, and since C doesn't
have gensym
, the only thing we can do is rely
on some more preprocessor magic to make those temporary
variables unique:
#define max(a, b) ({ \ typeof(a) _tmpa##__LINE__ = (a); \ typeof(b) _tmpb##__LINE__ = (b); \ (_tmpa##__LINE__ > _tmpb##__LINE__) ? _tmpa##__LINE__ : _tmpb##__LINE__; \ })
A bit tedious to write, but since
the __LINE__
macro has a different value for
each line (obviously), every invocation will have
variables with a different name. Well, as long as the
macro is not used recursively. Aside from some lucky
exceptions, recursive macros will end up with variables
with the same name.
Conclusions
The situations in which this extension falls short
aren't many, especially if the macro itself
isn't as generic as max
is (mostly in
the sense that there's no real need to call it
recursively), so all in all using it will make macros safe
regardless of the arguments passed to them.