Tech Notes: Return by value

Return by value

December 15, 2011

So you've been writing some C after using Python or Go or Haskell or pretty much anything other than C and you're jealous of being able to return more than one thing from a function. How do you do it in C?

The standard way is to return one thing (perhaps the "primary" thing you're returning) as the return value, and then have the caller provide pointers for the rest as "output parameters". You could return everything by value in a struct, but it feels like all those copies might be bad, maybe?

Let's check. To reduce inlining confusion, let's split it across multiple object files. So here's the interface:

typedef struct {
  int a;
  int b;
} Pair;

Pair return_pair();
void fill_pair(Pair* p);
void fill_ints(int* a, int* b);

And the trivial implementation:

#include "lib.h"
Pair return_pair() {
  Pair p = { 3, 5 };
  return p;
}
void fill_pair(Pair* p) {
  p->a = 3;
  p->b = 5;
}
void fill_ints(int* a, int* b) {
  *a = 3;
  *b = 5;
}

And finally here's main to run it, including calls to the functions so we can see what work the caller must do.

#include "lib.h"
int main(int argc, char** argv) {
  Pair p1 = return_pair();
  Pair p2;
  fill_pair(&p2);
  int a, b;
  fill_ints(&a, &b);
  return p1.a + p2.a + a;
}

And now to a disassembler.

Starting with the last function, fill_ints(). Passing in two pointers means that two registers get addresses put into them:

   0x00000000004004e7 <+23>:	lea    0x8(%rsp),%rsi
   0x00000000004004ec <+28>:	lea    0xc(%rsp),%rdi
   0x00000000004004f1 <+33>:	callq  0x400530 <fill_ints>

and the implementation of fill_ints() fills in the pointees. Pretty much what you'd expect.

Dump of assembler code for function fill_ints:
   0x0000000000400530 <+0>:	movl   $0x3,(%rdi)
   0x0000000000400536 <+6>:	movl   $0x5,(%rsi)
   0x000000000040053c <+12>:	retq

The fill_pair implementation is similar, but with just one pointer and two offsets.

return_pair is quite different:

Dump of assembler code for function return_pair:
   0x0000000000400510 <+0>:	movabs $0x500000003,%rax
   0x000000000040051a <+10>:	retq

Because two ints fit in a 64-bit register, the whole function can be implemented with one immediate load and no memory accesses!

But surely, you say, that's just because your Pair type is simple. How about pointers? If the second field were a pointer, it wouldn't fit into a single register.

Here's what a pair of an int and a pointer compiles down to:

Dump of assembler code for function return_pair2:
   0x0000000000400510 <+0>:	mov    $0x40062c,%edx
   0x0000000000400515 <+5>:	mov    $0x3,%eax
   0x000000000040051a <+10>:	retq

Again no memory references, just registers.

Ok, how about something that can't fit in multiple registers? Like say a buffer.

typedef struct {
  int a;
  int big[1024];
} Pair;

Pair return_pair3();

and the associated code:

Pair return_pair3() {
  Pair p;
  p.a = 3;
  p.big[0] = 5;
  return p;
}

Here's the dump:

Dump of assembler code for function return_pair3:
   0x0000000000400510 <+0>:	sub    $0xfa0,%rsp
   0x0000000000400517 <+7>:	mov    %rdi,%rax
   0x000000000040051a <+10>:	movl   $0x3,(%rdi)
   0x0000000000400520 <+16>:	movl   $0x5,0x4(%rdi)
   0x0000000000400527 <+23>:	add    $0xfa0,%rsp
   0x000000000040052e <+30>:	retq

To "return" a large structure, the caller provides stack space for it and the function fills in the caller's copy — sorta like the return value optimization. This code is the same as the code that explicitly passes a pointer. (I don't get why this function adjusts %rsp, it seems like it doesn't even use it...)

In each of these cases, returning by value seems to equal to or better in terms of generated code to the approaches using pointers. So why not do it?

Here are some reasons. (Note that I'm avoiding C++ here, which has its own additional complicated rules as described in the above wikipedia article.)

Most importantly, you need to create a new tuple type whenever you want to pass more than one value around. It is inconvenient, especially when the caller already has a variable handy for the value it wants to get back from the function and could just pass its address.
Passing structures via registers appears to be a newer ABI; gcc has -fpcc-struct-return and -freg-struct-return to select between them. But my system appears to be built with return-via-registers on (it appears that it as introduced into gcc around the year 2000) and even when I manually select returning via memory it just means return_pair and return_pair2 decompose into the behavior of return_pair3.
If your structure contains any character buffer the function gains a bunch of checking code due to -fstack-protector, removing the benefit.

For larger structures, you may have to worry about stack space. But such things don't belong on the stack in the first place; you are working with pointers to them to start with so functions that fill in those pointers are more convenient anyway.
(This point and the following were contributed by Jeffrey Yasskin after the post was first published.) If you return several different variables, depending on conditions inside the function, NRVO doesn't kick in. This is often a missed optimization in the compiler, but we still have to deal with it.
If the return value owns some allocated space, you can often save allocation time by passing in a variable that already has the space allocated.
Insert your reason here. What else am I missing?