🔴 🟡 🟢Lambdas and std::function
← Back to Posts

Lambdas and std::function

Lambdas

A lambda is an anonymous function object:

int i = 0, j = 1;
auto func = [i, &j](bool b, float f){++j; std::cout << i << ", " << b << ", " << f << std::endl; };

func(true, 1.0f); // 0, 1, 1

There's 3 parts in a lambda:

  • capture list [] of variables to be used
  • arguments () passed in at execution time
  • function logic {}

Fun fact: [](){}(); is a useless but valid expression that declares an empty lambda with 0 captured variables and 0 arguments and immediately executes it.

What do we mean by anonymous function object? Well, every lambda expression generates a unique, unnamed class at compile time. So they behave like functors, which are objects that can be called/invoked like a function because of operator() overload. This is an example of a zero-overhead abstraction -- you pay nothing extra for using this higher-level syntactical sugar, it compiles down to the same machine code:

auto add = [](int x, int y) { return x + y; };                  
add(1, 2);  

struct __lambda {
    int operator()(int x, int y) const { return x + y; }
};

__lambda add; 
add(1, 2); 

By the way, you'll often see the word "closure" used, especially around lambdas. The closure is the actual object the compiler creates when you write a lambda. Here, the lambda add leads to the compiler generating something like the struct add. The closure is the struct instance and the closure type is __lambda.

Captures and mutable

The arguments and function logic are perhaps self explanatory, but the capture list is a little interesting. The captures become data members in the unnamed class that is generated from every lambda. This means there isn't any single size for all lambdas. As you might expect, there are three ways variables can be captured:

  • capture-by-value: variables are stored as copies in the closure object
  • capture-by-reference: variables are stored as reference members (here there's a dangling reference risk!)
  • capture-by-move: creates new data members

There's specific syntax for some common kinds of captures:

  • [&](){ i=0; j=0; } captures all variables in use by reference
  • [=](){ k=0; } captures all variables in use by value
  • [&, i, j](){} captures all vars by ref except i and j by value
  • [= &i, &j](){} captures all vars by value except i and j by ref

Now, by default operator() is const, meaning it can't modify its captured variables. Even if the variable is captured by value/move and the lambda owns a copy/moved-in object, the member variables can't be mutated. This is a safety choice. References themselves are const but the underlying object is of course not, so be careful as well here. In cases where you do actually intend to modify the copy (or moved-in object), you explicitly specify the mutable keyword:

auto f = [x]() mutable {
	x = 20;
}

Needing to use mutable on a lambda is relatively rare in practice, but useful for things like accumulating state across calls, consuming a move-only resource (then you could only meaningfully call the lambda once), or lazy initialization like the following:

auto get = [cache = std::optional<Result>{}]() mutable -> Result {
	if (!cache) cache = expensive_compute();
	return *cache;
}

To be clear, the lambda's cache is specific to that lambda instance. This differs from a static cache in a named Result struct.

If you find a need for mutable often, that could be a code smell. Consider using a named struct or class for long-lived objects. Mutable lambdas are better for short-lived, localized stateful callbacks.

Lambda decay to function pointer, and unary trick

Lambdas that don't have captures have no state, so they can be represented as a plain function. The compiler will generate that implicit conversion operator. This is what is meant by "lambda decay to function pointer".

struct __lambda {
    int operator()(int a, int b) const { return a > b; }
    // compiler also generates this:
    using FnPtr = bool(*)(int, int);
    operator FnPtr() const {
        return [](int a, int b) { return a > b; }; // a static forwarder
    }                                                                               
};

This implicitly gets used when the context requires a function pointer, e.g.

auto f = [](int a, int b) { return a > b; };
bool (*fp)(int, int) = f;  // conversion operator called here  

This is useful for compatibility with C-style function pointers. In general, decay usually automatically happens,

void set_callback(void (*cb)(int));
set_callback([](int x) {});  // implicit decay, no forcing needed 

but there are some ambiguous cases when you'll need to force the conversion.

  1. overloaded functions with ambiguity
    void foo(void (*)(int));
    void foo(void (*)(double));
    foo([](int x) {});  // ERROR: which overload?  
  2. template type deduction (deduction doesn't trigger conversion)
    template <typename T>
    void foo(T* fn); //expects a pointer
    
    foo([](int x) {}); // Error: T can't be deduced -- lambda is not a pointer
  3. auto in a pointer context
    void (*fp)(int) = [](int x) {}; // implicit decay works here -- target type is known
    auto fp = [](int x) {}; // auto deduces closure type, not pointer

How do you force a conversion? You could use a static_cast:

auto func = static_cast<int(*)(int)> ([](int x) { return x * x; });

Or you could use the unary plus trick to trigger the implicit conversion operator without a verbose cast. This works because the operator + isn't defined for the lambda's closure type. The compiler looks for a conversion to make + applicable, finds the implicit conversion operator to a function pointer, applies it, and then applies unary + to the resulting pointer. The type of decayed_ptr here is int(*)(int). The * is the pointer declarator, and sanity check! -- we use the parens for grouping, otherwise int* (int) of course means a function taking int, returning int*.

auto decayed_ptr = +[](int x) { return x * x; };

But function pointers can't be inlined (pointer target is unknown at compile time) and require that indirect pointer call at runtime. On the other hand, a lambda passed to e.g. std::sort can be fully inlined because the closure type is unique and operator() is statically known.

std::sort(v.begin(), v.end(), [](int a, int b) { return a > b; });

When we pass callables in templates, we let the compiler see through the callable and optimize. This is why std algorithms are all templated -- our std::sort example above instantiates with the concrete lambda type, inlines the body into the loop, and enables vectorization. We also might want to use templates to customize behavior of a class or algorithm at compile time:

template <typename Comparator>
class PriorityQueue {
	Comparator cmp;
	public:
		void push(int x) { /* uses cmp */ }
	};

PriorityQueue<decltype([](int a, int b) { return a > b; })> pq; 

In this example the comparator is baked into the type, giving us zero-runtime cost.

In some cases we can't/don't want to use templates (e.g. when you want to store the callable -- can't store a template parameter, or when you want different callables chosen at runtime -- virtual/runtime polymorphism, or across compilation boundaries where you don't want to expose implementation in headers). How can we still pass in a callable?

Type erasure via std::function

When you want the ability to store a callback as a class member, or flexibility for runtime polymorphism, or cleaner separations across API and TU boundaries, you can use std::function, which is basically a generalized box for callable things.

struct Button {
	std::function<void()> on_click;
};

Button b;
b.on_click = []() { printf("clicked\n"); };
b.on_click = []() { printf("Something else\n"); }; //reassignable at runtime

The class member's type must be fixed at class definition time, so you can't template it. std::function gives you that general type to hold any matching callable. Similarly, callbacks that vary at runtime also can't use compile-time templates and can use std::function. And here's an example of how to use std::function to compartmentalize implementation details:

void register_handler(std::function<void(Event)> handler); // lives in a header

If this was a template, the whole implementation would have to live in the header, and every TU would get its own instantiation. With std::function, we can stick the implementation in one .cpp file and keep a stable ABI -- if you ship a compiled library, callers compiled against an old version of the header should still work with a new version of the library binary without recompilation. With templates, every caller would have to recompile.

Another example worth mentioning is when you want to store heterogeneous callables in a container. Since std::vector needs a single element type, and each lambda has a unique type, you can't make a vector of raw lambdas.

std::vector<std::function<void(int)>> callbacks;
callbacks.push_back([](int x) { printf("%d\n", x); });
callbacks.push_back([](int x) { log(x); });

To be clear, this fun flexibility does not come for free! Under the hood, std::function stores the callable either inline (small buffer optimization, for small callables) or on the heap (for larger ones), and also stores a table of function pointers telling std::function how to operate on the object: a function pointer that knows how to invoke the underlying, and function pointers to copy or destroy it.

// std::function internally holds something like this
struct vtable {
	void (*invoke)(void* storage, args...);  // how to call it
	void (*copy)(void* dst, void* src);       // how to copy it
	void (*destroy)(void* storage);           // how to destroy it
};  

So when you call a std::function, it makes one call into std::function::operator(), one indirect call into invoke_fn(), and then calls the underlying. Because this is type erased, the compiler can't inline these calls for any optimizations, and you're always going through a layer of indirection regardless of how simple the underlying callable is.

Small buffer/function optimization (SBO/SFO)

Heap allocation is expensive: it goes to the allocator and might cause cache misses. The allocated memory also probably lives far away in memory away from the std::function object. So the idea of small buffer optimization is to have a small fixed-size buffer directly inside the std::function object and store small callables there. Otherwise it heap allocates and heap_ptr points to it:

	// conceptually what std::function might look like internally 
	struct function {
		vtable* vt;
		union {
			void* heap_ptr;              // pointer to heap if callable is large
			char  buffer[sizeof(void*)   // inline storage if callable is small
						  * 3];        // can hold 3 pointers worth of bytes
		} storage;                                                                            
	};

The size of the buffer is implementation-defined, but usually something like 16-32 bytes. So for instance, a lambda capturing a few ints probably fits, but a lambda capturing a std::vector wouldn't. Note that SBO doesn't eliminate the vtable indirection cost! It just removes that heap allocation cost. SBO also makes the std::function larger. And to be clear, SBO isn't just an optimization for std::function and callables -- the same kind of buffer optimization is used for strings and other types as well.

Performance tradeoffs

Let's recap when to use what for different considerations. Lambdas allow for inlining and compiler is free to perform optimizations. There's no indirection, so they're the fastest kind of callable. Function pointers don't allow inlining, but they're faster than std::function due to no vtable, no heap, and no type erasure overhead.std::function doesn't allow for inlining or compiler optimizations, and depending on the size/storage, may come with heap allocation costs. They also incur two layers of indirection at call time. You should use std::function for runtime polymorphism, storage, or separation of TUs or stable ABI boundaries. But if you can, you should use lambdas + templates, especially for performance-critical code or hot loops, or for compile-time optimizations. Function pointers are useful mainly for C API compatibility.

Sources

https://shaharmike.com/cpp/lambdas-and-functions/#capture-by-value-vs-by-reference

https://www.youtube.com/watch?v=aC-aAiS5Wuc

https://www.youtube.com/watch?v=qmd_yxSOsAE