Skip to content

Understanding printf

October 29, 2012

Introduction

Despite being one of the most commonly used functions in their code, most beginners at the C programming language do not really understand how the  printf function works very well. This article attempts to demystify it; to follow the article you will need some basic knowledge of C (or C++). I don’t propose to produce a list of the various % formatters that printf uses – a list of them is here.

What It Is

printf is a function declared in the stdio.h header file. It’s not a reserved word or part of the C language itself, and there is nothing magical about it. The function is in fact a member of a family of functions, which includes sprintf, fprintf and vsprintf, which are probably all somewhat more useful and more commonly used in real programs than is printf itself.

The Parameters

The printf function makes use of what are known in C as variable argument lists. This means you can call the function with different numbers of parameters:

printf( "Hello World\n" );
printf( "Hello %s\n", "fred" );
printf( "First 3 primes are %d, %d and %d\n", 2, 3, 5 );

In C, variable argument lists are represented by by three dots, also known as the ellipsis, so the declaration of the function looks like this:

int printf( const char * fmt, ... );

As with all C functions, you have to provide the parameters that have an actual type – in this case the format string. However, you can then supply as many other parameters as you want, including none, and it is up to the function to somehow work out how many you provided.

Many beginners when first coming across the ellipsis notation think "Cool! I can write functions like this:"

int sum( ... ) {
   // add up parameters somehow
   // return sum of parameters
}

which you would call like this:

int total = sum( 1, 42, 16 );

Unfortunately, you can’t do that. The C language provides no built-in way of finding out how many parameters are represented by the ellipsis, and no way of finding out what the types (i.e. ints, floats, longs, pointers) of the ellipsis parameters are. It is up to the programmer writing functions to that use the ellipsis to provide some means of telling the function both the number of parameters, and their types – this is what the format string does for printf.

The Format String

As indicated above, the format string is used by printf  to find out how many parameters are being passed to the function, and what their types are, together with other information needed by the function. I don’t propose to go into detail about the mini-languages used by printf  to do this here, but I’ll point out a couple of gotchas you may not be aware of.

Firstly, the number of %-prefixed placeholders in the format string should exactly match the number of parameters passed to the functions via the ellipsis.  The C Standard says that providing too few ellipsis parameters leads to undefined behaviour, though  providing too many is OK, but really you want to get it just right. It’s easy to see why not providing enough parameters is a problem:

printf( "%d %d", 42 );    // bad news!

The printf function works by walking along the format string and for every place-holder grabbing the next ellipsis parameter. In this case, the first one it grabs is 42, which is fine, but it then tries to grab a parameter that is not there. This is equivalent to dereferencing a bad pointer (you are trying to access memory you don’t own), and the results are similar – the program may crash, it may output garbage, or (worst of all) it may appear to "work", but be in a corrupted state. Such is the magic of undefined behaviour.

Secondly, the types of the place-holders in the format strings must match up with the equivalent parameter in the ellipsis list. Once again, if things don’t match up you will be off in Undefined Behaviour Land, and once again the reason is fairly easy to see.  Suppose you have this code:

printf( "%s", 42 );      // more bad news!

The format string tells the function that the first ellipsis parameter is actually a pointer to a C-style, null-terminated string . To deal with such a type, the first thing that printf has to do is to dereference that pointer. In this case, it tries to dereference the value 42, which is extremely unlikely to point at a C-style, null-terminated string, or indeed at anything legally accessible. If by some cosmic chance it _does_ point at such a thing, the program may appear to "work", but really you have undefined behaviour.

Some compilers will warn you about problems like this, but others won’t – why not? Well, the compiler doesn’t necessarily really understand what the format string of printf is actually doing. The C language standard certainly does not require it to, so if it does, it’s a compiler extension. The GCC C compiler will check it for you if you compile with the -Wall option to enable common warnings. With that enabled, code like this:

int a = 42;
printf( "%d %d %d", a );

will produce warnings like this:

warning: format '%d' expects a matching 'int' argument

which is a very good reason for always using the -Wall option (and its companion option -Wextra) when compiling with GCC.

Promotions

Something that even quite knowledgeable C programmers don’t realise is that the things that you pass to printf as parameters are quite often not the same things that the function implementation actually sees. For example, given this code:

float val = 1.23f;
printf( "%f", val );

then printf will never see a float parameter – instead, the value passed to the function will always be a double. This is because when functions are passed via the ellipsis (to any function, not just to printf) they will always have some standard promotions applied to them.  The rules for this are that characters, shorts and bit-fields are always converted to int, and floats are promoted to doubles – this is all handled silently by the compiler. It does explain why there are no special place-holder formatting characters for floats  (%f, %g and %e all work on doubles) – it’s because printf can never see a float value.

Return Value

I suspect that if asked, most people who have heard of it would tell you that printf returns void – i.e. it returns nothing. In fact this isn’t so, its return type is an int. The function returns the number of characters that it printed, or a negative value if some sort of error occurred (note errors do not cover undefined behaviour). This looks pretty useless, and for printf it is very rarely used. However, for some of the other members of the function family, such as sprintf and fsprintf, the return value can sometimes be quite handy.

Side Story; some years ago when I was an instructor with a commercial training company, we had an "Advanced C" course, which I was jointly responsible for. This wasn’t really so advanced (mostly it was about data structures), but it was quite hard to teach. One Monday morning I walked into the classroom to start teaching, only to discover that there had been some mysterious "improvements" to the course text and OHP slides in that all the occurrences of printf (of which there were many) had suddenly sprouted a cast like this:

(void) printf( "hello world" );

This slightly confused me (though I guessed the reason) and completely threw the attendees, who kept asking if all their code had to have these casts too. I assured them it didn’t, and at lunchtime went searching for the obvious culprit, the other instructor responsible for the course. After I had finished hitting him, he confessed that he did it because the lint static analysis tool complained about an "unused return value" for printf. I pointed out that the correct thing to do was to create an exclusion  list for lint, not to befuddle our clients, who were having enough problems implementing AVL trees without worrying about stylistically dubious casts.

The printf Family

As I said before, the printf function is a member of a family of functions. The most notable of these are fprintf, sprintf and vsprintf.

The fprintf function is the simplest to understand. It’s declaration looks like this:

int fprintf( FILE * f, const char * fmt, … );

where the FILE pointer f refers to a file opened for writing using the fopen function. The function works exactly like printf, except that the formatted output produced is written to the file instead of to standard output.

The sprintf function looks like this:

int sprintf( char * s, const char * fmt, … );

In this case, the function performs the formatting as per printf,  but writes its output to the character array pointed to by its first parameter. So after this:

int n = 42;
char meaning[100];
sprintf( meaning, "The meaning of life is %d", n );

the array meaning will contain the null-terminated string "The meaning of life is 42". The sprintf function is very handy for creating strings, and is often a better bet than the more frequently used strcat and strcpy functions when copying and concatenating. Note that sprintf does not magically create the storage for the array – it is up to you to provide one of a suitable size.  Also, sprintf does not check the array size, and writing past the end will put you in undefined behaviour territory. You can restrict the number of characters sprintf writes by using the closely related snprintf function.

The last of the printf family is the most complicated, but in some ways the most powerful. This is the declaration for the vsprintf function (there are also related vprintf and vsnprintf functions):

int vsprintf (char * s, const char * fmt, va_list args );

The first two parameters are the same as sprintf, and do the same thing, but the latter looks peculiar. The va_list "type" is effectively that of a variable length argument list – in other words it is the type of the ellipsis. The type is declared in the standard C header file <stdarg.h> and may not be a be a type at all, but a macro of some sort. It’s purpose is to allow you to write your own functions that use the ellipsis, something that is more difficult than it seems. You might think you could create your own functions like this:

int myprintf( const char * fmt, ... ) {
    printf( fmt, ... );
}

but unfortunately, you cannot – you have to use the macros in <stdarg.h>.  The vsprintf functions provide a handy way of using these macros for string formatting tasks

A tutorial on the the use of the va_ macros is probably a bit out of the scope of this article, but I’ll demonstrate a rather advanced use in a function that formats and allocates a null-terminated string directly (note memory exhaustion not dealt with to keep the code reasonably simple):

char * mprintf( const char * fmt, ... ) {
    int size = 40, n;
    char * buffer = NULL;
    do {
        size *= 2;
        va_list valist;
        va_start( valist, fmt );
        buffer = realloc( buffer, size );
        n = vsnprintf( buffer, size, fmt, valist );
        va_end( valist );
     } while( n >= size );
    return buffer;
}

With a  function like this, there is no need to worry about whther the array you are outputting to is big enough. You can simply say things like:

char * p = mprintf( "The meaning of life is %d", 42);

and the mprintf function will allocate memory for you which will contain the null-terminated string "The meaning of life is 42". Of course, you do need to  remember to call free() to dispose of the memory when you are done.

Conclusion

This article has presented an introduction to the printf family of functions, going into perhaps a little more detail than most tutorials do. 

Advertisements

From → c++, tutorial

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: