Variadic Functions in C without va_list

The term "variadic" (or "variable", see footnote) functions refers to functions whose number and types of arguments are not known at compile time. Instead, the function infers them from its input. This is the kind of C hackery that one encounters early in one's exposure to the C language (the classic example being the printf() function), and yet for many non-C programmers and beginning C programmers, the mechanism by which such functions operate is covered by a veil of mystery. This post is not meant to be a tutorial on how to use the C library interface for variadic functions (although I describe that too). The point is to make educated guesses about what the implementation of such an interface might look like, and see if I can write a variadic function that works without using this interface.

The Ingredients

The most fundamental document on which discussions about the C language are based is the ISO/IEC 9899 C standard. Throughout the article, I will refer to the latest draft of it (known as C2x), because that is the version I have access to.

I am going to use environment-specific features heavily. The code that I am going to reference works on my system, but I do not guarantee it will work on any other system. I am going to assume that:

I am allowing myself a certain degree of sloppiness, as it is possible that not every item in that list is necessary for what I am talking about, and I might accidentally make additional assumptions not explicitly listed above. I did my best to document every environment-specific extension I am using, but your mileage may vary.

The Calling

Let me start with a sample program:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    printf("%d %d %d %d %d %d %d %d %d\n", 111, 222, 333, 444, 555, 666, 777,
        888, 999);
    return 0;
}

Back in the old days, when Intel machines had 32-bit general purpose registers, the standard convention of passing arguments to a function was to push them onto stack in reverse order, so that the arguments were laid out, starting at a known base address, linearly, in the same order as they appear in the source code (because the stack grows toward lower addresses). The printf() call in the program above, compiled with gcc -m32 -masm=intel -S filename.c, is translated to (irrelevant parts omitted):

	sub	esp, 8               ; some alignment thing?
	push	999              ; push arguments onto stack
	push	888
	push	777
	push	666
	push	555
	push	444
	push	333
	push	222
	push	111
	push	OFFSET FLAT:.LC0 ; format string pointer
	call	printf           ; call the variadic function (printf)
	add	esp, 48              ; clean up

We see the arguments are pushed on stack by the caller, and the cleanup is done by the caller too. This is important because the entity cleaning up the stack has to know the value by which to move the stack pointer (see last line), so it has to know the composition (number and types) of the arguments. Because cleaning up is done by the caller, (the code around the function) and not the callee (the printf function), the callee does not have to know the precise composition of the arguments.*

The convention by which the arguments are placed on stack, rather than the theoretically faster alternative of placing them in the registers was chosen presumably because x86 has few general purpose registers (and most of the time, the stack will be cached in the cpu cache anyway). The x86_64 case is a bit more tricky. Because this architecture has more general purpose registers, the first few arguments are passed via registers, and the rest is pushed on stack. Our sample program compiles to:

	push	999               ; push arguments onto stack
	push	888
	push	777
	push	666
	mov	r9d, 555
	mov	r8d, 444
	mov	ecx, 333
	mov	edx, 222
	mov	esi, 111
	mov	edi, OFFSET FLAT:.LC0 ; format string pointer
	mov	eax, 0                ; ???
	call	printf            ; call the variadic function (printf)
	add	rsp, 32               ; clean up

(*) What I have written here is not entirely true. The callee will have to figure out the composition of variadic arguments based on its input (in this case, the format string), but this is part of the user program logic. As far as the compiler is concerned, by placing the responsibility to clean up the stack on the caller, this calling convention makes generating cleanup code easy for the compiler.

The Hacking

Equipped with this knowledge, we can hack together our own version of the printf() function. We will need to figure out how many arguments we have to pop off stack. This is done based on the format string - every time we encounter a '%' character in the format string, we know we need to pop one argument off stack. The type of the argument is encoded in the modifier (e.g '%d' for integers).

It is the caller's responsiblity to make sure that variadic arguments match the format string. If the format string expects more data than is passed, the function will read past the array of arguments. If there is more data than the format string requires, additional arguments won't be used.

Our simplified printf() looks like this:

void millonis_printf(const char *fmt, ...)
{
    const char *fmt_curr = fmt;
    char ch;
    int val;
    int arg_idx = 0;
    register int esi asm ("esi");
    register int edx asm ("edx");
    register int ecx asm ("ecx");
    register long r8 asm ("r8");
    register long r9 asm ("r9");
    register long rbp asm ("rbp");
    int esi_saved;
    int edx_saved;
    int ecx_saved;
    int r8d_saved;
    int r9d_saved;
    void *stack_arg_ptr;
    char buf[16] = { '\0' };

    /* registers may be written to, so save the arguments */
    esi_saved = esi;
    edx_saved = edx;
    ecx_saved = ecx;
    r8d_saved = (int)r8;
    r9d_saved = (int)r9;
    stack_arg_ptr = (void *)rbp;
    stack_arg_ptr += sizeof(void *); /* skip saved ebp value */
    stack_arg_ptr += sizeof(void *); /* skip saved return address */

    ch = *fmt_curr;
    while (ch != '\0') {
        if (ch == '%') {
            fmt_curr++;
            ch = *fmt_curr;
            if (ch == '\0') {
                break;
            }
            if (ch == 'd') {
                switch (arg_idx) {
                  case 0:
                    val = esi_saved;
                    arg_idx++;
                    break;
                  case 1:
                    val = edx_saved;
                    arg_idx++;
                    break;
                  case 2:
                    val = ecx_saved;
                    arg_idx++;
                    break;
                  case 3:
                    val = r8d_saved;
                    arg_idx++;
                    break;
                  case 4:
                    val = r9d_saved;
                    arg_idx++;
                    break;
                  default:
                    val = *((int *)stack_arg_ptr);
                    /* int's get promoted to 64-bit values, so skip 8 bytes
                     * instead of 4 */
                    stack_arg_ptr += sizeof(long);
                    break;
                }
                int_to_string(buf, val);
                fputs(buf, stdout);
            }
            /* and so on for other modifiers... */
        } else {
            putchar(ch);
        }
        fmt_curr++;
        ch = *fmt_curr;
    }
}

Where the implementation of int_to_string() is left to the reader. register int esi asm ("esi"); is a gcc extension for accessing a register via a variable.

Of course this is not a serious implementation, and I wouldn't recommend that anyone use it. It is probably not going to be very efficient. Just looking at the assembly generated from this code made me cringe. Not only that, but the code is not portable. It makes assumptions about the CPU architecture and uses GCC compiler extensions. Simply recompiling it into 32-bit assembly would break it.

The Tooling

From looking at the last section, it is easy to see why there exists in the standard C library an interface to abstract away operating on variable arguments. The interface is defined in the stdarg.h header and consists of a data type called va_list and four macros. I will describe three of these macros: va_start, va_arg and va_end. What this interface essentially does is encapsulate the procedure described in the last section.

va_start is used to initialize the object of type va_list and va_end is supposed to be called after you are done with it. va_start additionally gets passed the name of the rightmost argument just before the ... (in the case of printf(), this is the format string). This is presumably so that the va_start can figure out at what base address to start popping things off the stack (remember that earlier arguments will be at lower addresses). va_arg is passed the type of the next argument (which makes sense, since the macros has to know how many bytes to pop off the stack). Each invocation of va_arg returns the next argument and moves the argument stack "pointer" so that it will point to the next one. A simple printf function constructed using these primitives may look like this:

void millonis_printf2(const char *fmt, ...)
{
    va_list ap;
    const char *fmt_curr = fmt;
    char ch;
    int val;
    char buf[16] = { '\0' };

    va_start(ap, fmt);
    ch = *fmt_curr;
    while (ch != '\0') {
        if (ch == '%') {
            fmt_curr++;
            ch = *fmt_curr;
            if (ch == '\0') {
                break;
            }
            if (ch == 'd') {
                val = va_arg(ap, int);
                int_to_string(buf, val);
                fputs(buf, stdout);
            }
            /* and so on for other modifiers... */
        } else {
            putchar(ch);
        }
        fmt_curr++;
        ch = *fmt_curr;
    }
    va_end(ap);
}

The Wrapping Up

Originally this post was supposed to include an investigation of the implemenation of these macros in GCC (glibc defines them as simply calling the compiler's built-in functions), but I got tired of looking at the GCC source code. In any case, the general implementation is quite sophisticated and architecture-dependent. For the interested reader, gcc/builtins.c contains the base implementation (which calls architecture-specific code).

And so this was is attempt at a simple printf() without va_list. My intention was to dive into GCC's implementation of it, but as I finish writing this I am almost as ignorant of the internals as I was when I started. If someone who has knowledge of it is reading this and would like to englighten me, please do. I could use an expert. :)

Footnote: The C standard section that describes the va_list type and its macros is called "Variable arguments". No mention is made of "variable functions". I have seen functions that make use of va_list and company be called "variadic functions", and I have seen the name "variadic arguments" be used.