Return by value

December 15, 2011

So you've been writing some C after using Python or Go or Haskell or pretty much anything other than C and you're jealous of being able to return more than one thing from a function. How do you do it in C?

The standard way is to return one thing (perhaps the "primary" thing you're returning) as the return value, and then have the caller provide pointers for the rest as "output parameters". You could return everything by value in a struct, but it feels like all those copies might be bad, maybe?

Let's check. To reduce inlining confusion, let's split it across multiple object files. So here's the interface:

typedef struct {
  int a;
  int b;
} Pair;

Pair return_pair();
void fill_pair(Pair* p);
void fill_ints(int* a, int* b);

And the trivial implementation:

#include "lib.h"
Pair return_pair() {
  Pair p = { 3, 5 };
  return p;
void fill_pair(Pair* p) {
  p->a = 3;
  p->b = 5;
void fill_ints(int* a, int* b) {
  *a = 3;
  *b = 5;

And finally here's main to run it, including calls to the functions so we can see what work the caller must do.

#include "lib.h"
int main(int argc, char** argv) {
  Pair p1 = return_pair();
  Pair p2;
  int a, b;
  fill_ints(&a, &b);
  return p1.a + p2.a + a;

And now to a disassembler.

Starting with the last function, fill_ints(). Passing in two pointers means that two registers get addresses put into them:

   0x00000000004004e7 <+23>:	lea    0x8(%rsp),%rsi
   0x00000000004004ec <+28>:	lea    0xc(%rsp),%rdi
   0x00000000004004f1 <+33>:	callq  0x400530 <fill_ints>

and the implementation of fill_ints() fills in the pointees. Pretty much what you'd expect.

Dump of assembler code for function fill_ints:
   0x0000000000400530 <+0>:	movl   $0x3,(%rdi)
   0x0000000000400536 <+6>:	movl   $0x5,(%rsi)
   0x000000000040053c <+12>:	retq

The fill_pair implementation is similar, but with just one pointer and two offsets.

return_pair is quite different:

Dump of assembler code for function return_pair:
   0x0000000000400510 <+0>:	movabs $0x500000003,%rax
   0x000000000040051a <+10>:	retq

Because two ints fit in a 64-bit register, the whole function can be implemented with one immediate load and no memory accesses!

But surely, you say, that's just because your Pair type is simple. How about pointers? If the second field were a pointer, it wouldn't fit into a single register.

Here's what a pair of an int and a pointer compiles down to:

Dump of assembler code for function return_pair2:
   0x0000000000400510 <+0>:	mov    $0x40062c,%edx
   0x0000000000400515 <+5>:	mov    $0x3,%eax
   0x000000000040051a <+10>:	retq

Again no memory references, just registers.

Ok, how about something that can't fit in multiple registers? Like say a buffer.

typedef struct {
  int a;
  int big[1024];
} Pair;

Pair return_pair3();

and the associated code:

Pair return_pair3() {
  Pair p;
  p.a = 3;
  p.big[0] = 5;
  return p;

Here's the dump:

Dump of assembler code for function return_pair3:
   0x0000000000400510 <+0>:	sub    $0xfa0,%rsp
   0x0000000000400517 <+7>:	mov    %rdi,%rax
   0x000000000040051a <+10>:	movl   $0x3,(%rdi)
   0x0000000000400520 <+16>:	movl   $0x5,0x4(%rdi)
   0x0000000000400527 <+23>:	add    $0xfa0,%rsp
   0x000000000040052e <+30>:	retq

To "return" a large structure, the caller provides stack space for it and the function fills in the caller's copy — sorta like the return value optimization. This code is the same as the code that explicitly passes a pointer. (I don't get why this function adjusts %rsp, it seems like it doesn't even use it...)

In each of these cases, returning by value seems to equal to or better in terms of generated code to the approaches using pointers. So why not do it?

Here are some reasons. (Note that I'm avoiding C++ here, which has its own additional complicated rules as described in the above wikipedia article.)