Skip to content

Inline Functions

July 4, 2011

Introduction

One of the (many?) features of C++ and other compiled languages about which there is some confusion is the use of inline functions. This article attempts to explain inline functions, and to show what the compiler does and does not do when it comes across one. I’ll also have look at where the compiler can and cannot perform inlining, and at the specific uses of inline in C++.

Firstly, what do we mean by inline. Well, historically, inlining a function means taking the body of the function and using it to replace calls to it at the function’s call sites. For example, consider this small program:

int add1( int n ) {
    return n + 1;
}

int main() {
    int x = 0, y = 0;
    y = add1( x );     //1
    return y;
}

Without inlining, the compiler will generate a call instruction at the line marked with //1. With inlining, the compiler will inline the code that forms the body of the add1() function, making the compiler output the same (or at least similar) code to what would have been produced had the original code actually read:

int main() {
    int x = 0, y = 0;
    y = x + 1;     //1
    return y;
}

Why would we want this? Well, a function call has historically been an expensive thing to do – if we can get rid of it, we can maybe make our code run faster. However, inlining has its downsides, which I’ll discuss later, and function calls are no longer as expensive as they used to be.

What The Compiler Produces

Let’s have a look at some real, compiler-produced assembler code. Don’t worry if you don’t know assembler – I’ll point out the salient bits and try to explain them. If you want to have a go at compiling the code and examining the output yourself, you will need the GCC compiler and the gdb debugger. If you are on Windows, take a look at this article for details of how to install them, if you are on *nix, you probably already have them.

Assuming we have the code above in a file called inlf.cpp, we can compile it from a command line prompt and produce an executable called inlf, like this:

g++ inlf.cpp -o inlf

and we can examine the file with the gdb debugger:

gdb inlf

which produces some reams of text and the gdb prompt, which looks like (gdb). At the prompt, we type in the command:

disas main

to disassemble the machine code of main into assembler mnemonics. It will look something like this (the markers //1 et al are added by me):

0x0040134d :    push   ebp<
0x0040134e :    mov    ebp,esp
0x00401350 :    and    esp,0xfffffff0
0x00401353 :    sub    esp,0x20
0x00401356 :    call   0x4019b0                    //1
0x0040135b :    mov    DWORD PTR [esp+0x1c],0x0    //2
0x00401363 :    mov    DWORD PTR [esp+0x18],0x0
0x0040136b :    mov    eax,DWORD PTR [esp+0x1c]    //3
0x0040136f :    mov    DWORD PTR [esp],eax
0x00401372 :    call   0x401344                    //4
0x00401377 :    mov    DWORD PTR [esp+0x18],eax    //5
0x0040137b :    mov    eax,DWORD PTR [esp+0x18]
0x0040137f :    leave                              //6
0x00401380 :    ret

The numbers on the left are addresses – the interesting stuff is on the right. Incidentally, if you are trying this yourself, and the output looks very different, you need to type the command

set disassembly-flavor intel

before disassembling.

OK, to explain the code; All the stuff up to and including the line labelled //1 is to do with setting up the C++ runtime environment, and getting the parameters for main – it needn’t concern us. The next bit of code, at //2 creates the two variables x and y on the stack, and sets them to zero. Then at //3 the code pushes a copy of the x variable on to the stack as the parameter of the add1 function, and at //4 calls the function (we won’t concern ourselves with what goes on in the function here). Then at //5 the code moves the return value of the function into the variable y, and at //6, main is exited.

You can see that quite a lot of the code is concerned with making the function call – first the compiler needs to push the parameters of the function on to the stack, then make the call, and then deal with the return value. If some or all of this could be avoided, there might be significant performance gains. So let’s see what happens when we inline the function. That seems like it should be easy; surely all we need to do is change add1 so it looks like this:

inline int add1( int n ) {
    return n + 1;
}

and recompile? Well, if you did that you would be disappointed – the code for main would almost certainly look exactly the same. This is because the inline keyword is only a hint to compiler that inlining might be a good idea, and in this case the compiler has seen fit to ignore it. Note that the compiler could have gone the other way and chosen to inline the function even without the inline keyword being used. However, it is unlikely to have done this, because we are not doing an optimised build, and most compiler won’t inline anything if the build is not optimised.

Forcing Inlining

So the obvious solution is to optimise the build? Well, you might think so, and if you do this:

g++ -O2 inlf.cpp -o inlf

then the code almost certainly would be inlined, if it were not for the fact that the compiler can optimise away the call to add1 altogether, because it has no observable side effects! We would end up with an empty main, with almost no code generated. There are clever ways to defeat the optimiser, but rather than look at them we’ll use a non-optimised build plus a bit of non-standard code. We will re-write add1 like this:

int add1( int n ) __attribute__((always_inline));

int add1( int n ) {
    return n + 1;
}

The non-standard __attribute__ keyword is a note to the compiler, telling it in this case that we really want add1 to be inlined, come what may. You do not want to be using this kind of thing in your day-to-day code (you should use the -Ox optimisation switches) but it is handy to let us see what is going on here. If we now recompile and re-examine the code with gdb, we get this:

0x0040134d :    push   ebp
0x0040134e :    mov    ebp,esp
0x00401350 :    and    esp,0xfffffff0
0x00401353 :    sub    esp,0x10
0x00401356 :    call   0x4019b0                  //1
0x0040135b :    mov    DWORD PTR [esp+0xc],0x0   //2
0x00401363 :    mov    DWORD PTR [esp+0x8],0x0
0x0040136b :    mov    eax,DWORD PTR [esp+0xc]   //3
0x0040136f :    mov    DWORD PTR [esp+0x4],eax
0x00401373 :    mov    eax,DWORD PTR [esp+0x4]
0x00401377 :    inc    eax                       //4
0x00401378 :    mov    DWORD PTR [esp+0x8],eax   //5
0x0040137c :    mov    eax,DWORD PTR [esp+0x8]
0x00401380 :    leave                            //6
0x00401381 :    ret

As before, the code up to //1 is to do with getting main set up properly, and that at //2 with creating and initialising the x and y variables. Now at //3, instead of setting up for a function call, the compiler sets up to increment the x value (i.e. to add 1 to it) and actually perform the increment at //4. Then the new value is is stored in the y variable at //5 and main is exited, as before, at //6. Notice that there is no function call to add1 – instead the code that add1 would have performed has been placed in the body of main.

Discussion

Will this speed things up? Well, that depends. Notice that the code for main with inlining is actually longer, by one instruction that that with the function call. That will be insignificant in a small program like this, but can cause problems in larger programs, where if many functions are inlined the size of the executable can increase. This means that the code might no longer fit into the CPU cache, and thus inlining could actually slow things down! However, inlining does not always increase code size – depending on the CPU architecture (specifically the number of instructions needed to make a function call) and what the inlined function actually does, inlining can reduce code size.

The bottom line is that when it comes to performance, if you think inlining will help you, you should test it with your actual code. As you can see from the above code, its relatively easy to switch inlining on and off, and you should do performance profiling of your performance sensitive code both with and without it – the results may surprise you.

What Cannot Be Inlined?

Not all functions can be inlined. The compiler may (or may not) flat-out refuse to inline the following kinds of functions:

  • Functions whose address is taken and stored an a function pointer. To do this a function must have an address, and an inlined function doesn’t.
  • Recursive functions generally, although certain specific functions may be inlineable.
  • Functions in another translation unit. To perform its inlining magic, the compiler typically has to see the body of the function at the same time as it is compiling the calls to that function.
  • Functions over a certain “complexity”. The compiler typically has a concept of function complexity – if a function is too complicated, it won’t be inlined.
  • Any function the compiler doesn’t fancy inlining – remember inline is only a hint! The compiler can always ignore it.

Inlining In C++

Note that using the inline keyword is not the only way that a function can be declared as inline. In C++, all class member functions are inline candidates. So in a class like this:

class A {
    public:
    int f() {
        return 1;
    }
};

then f() is being implicitly declared inline. Of course, whether any given member function actually is inlined is subject to the constraints I just described.

Lastly, a common use of inline in C++ is to prevent multiple definition errors. If we have a function which for convenience sak we have placed in a header file, like this:

// dice.h
#ifndef DICE_H
#define DICE_H
#include <cstdlib>
int roll() {
    return (rand() % 6) + 1;
}
#endif

which we include in several translation units (i.e. in several .cpp files), the linker will give us “multiple definition of roll” errors. We can prevent that by declaring the function inline:

// dice.h
#ifndef DICE_H
#define DICE_H
#include <cstdlib>
inline int roll() {
    return (rand() % 6) + 1;
}
#endif

This works because the C++ standard says that there can be more than one definition of an inline function provided that the definition consists of the same tokens.

Summary

To summarise:

  • Inlining replaces afunction call with the function body at the call site.
  • Whether or not to inline is up to the compiler.
  • Inlining may or may not boost performance – always measure your own code.
  • Some functions cannot be inline.
  • In C++, member functions are inline by default, and inline is often used to prevent multiple definition errors.
Advertisements

From → c++, tutorial

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: