Rust trait object layout
Rust makes a distinction between values and references that you generally learn to work with while using the language. This week I learned an interesting new corner around how that distinction applies to trait objects, despite using the language for quite a long time. Maybe it will surprise you too!
Background: dynamically sized types
When you have some x: usize
you can say x
is the usize
itself, while some
y: &usize = &x
is a reference to that value. y
's concrete value is a
pointer.
Similarly, the type [u8; 40]
means 40 bytes itself. If you put it in a struct
or pass it to a function, you're moving around 40 bytes.
Finally, the type [u8]
means a series of bytes of (compile-time) unknown size.
You don't interact with these often because you can't usually put these in a
struct or pass them to a function because of their unknown size. Instead you
typically work with references to these as &[u8]
, which concretely are a
pointer and a length.
(cheats.rs has nice pictures of this.)
But you still do sometimes see [u8]
as a type without a reference, in types
like Box<[u8]>
. And further, you can wrap a dynamically sized type in a struct
as the last member, making that struct dynamically sized as well:
struct Test<X: ?Sized> {
inner: X,
}
// the type Test<[u8]> is now also dynamically sized
Background: trait objects
The type File
represents an open file. Concretely, it's a struct with some
private fields.
The trait Read
represents things that can be read from, with a .read()
method. The trait implies a trait object type dyn Read
, which is the type of
things that implement the trait. File
implements Read
, so I will use File
and Read
for the following examples.
Concretely, the layout of a trait object dyn Read
is the same as the
underlying value it's wrapping, e.g. a File
(spec ref).
(This is possibly only useful to know as compiler trivia; even cheats.rs doesn't
document this!) Much like [u8]
, because their concrete size is not known at
compile time, you don't typically interact with these directly but instead via
references.
The type &dyn Read
is a reference to a trait object. Concretely it's a pointer
to the object and a pointer to a static vtable of the methods for that type that
implement the trait.
(More pictures from cheats.rs.) Also like
[u8]
, you might more often use Box<dyn Read>
, which holds ownership over the
underlying Read
-able type.
(It was useful for my understanding to contrast these with C++ objects and vtables. In C++, the vtable pointer is always embedded in the struct itself. In Rust, the struct never contains a vtable pointer, and instead the vtable pointer is part of the reference to the trait object, which has a separate pointer to the struct.)
Background: coercion
Though it's relatively rare in Rust, there are a few places where one type will
silently convert to another. One you may have used without thinking is using a
&mut T
in a place that needs a &T
.
When using trait objects there is another coercion:
let f: File = ...;
let r: &dyn Read = &f;
Here, &f
is &File
, but the compiler converts it to &dyn Read
.
Finally, the surprise
Did you know that there is another trait-related coercion involving generics? Consider:
let f: File = ...;
let b: BufReader<File> = BufReader::new(f);
let r: &BufReader<dyn Read> = &b; // !!! legal
Here, the File
inside the BufReader<>
was able to coerce to a trait object.
Concretely, r
here is like to a reference to a trait object, in that it is a
pair of a pointer to the BufReader
along with a pointer to a Read
vtable.
(Poking at the compiler, it uses the same Read
vtable as you would get from a
plain File
.)
The
underlying spec reference: [coerce.unsized.composite]
for why this is allowed is pretty involved! But at a hand-wavy level it's
allowed because BufReader<dyn Read>
is a dynamically sized type where the last
field is the place where the dyn Read
is used. Note that, for example, you
cannot have a similar coercion to two traits &SomeType<dyn A, dyn B>
because
the reference can only carry a single vtable.
(How is this useful? It came up in some code I was reviewing from someone new to Rust; I'm not sure.)
(Bonus question: The above doesn't compile if you substitute Box
or Rc
for
BufReader
. Why not? Something about the CoercedUnsized impls on those? I don't
know the answer.)
Unsized coercion
This does compile though:
let f: File = ...;
let b: Box<File> = Box::new(f);
let r: Box<dyn Read> = b; // consumes b
This is due to some magic traits implemented by Box
(and also Rc
, etc.).
I believe this behavior is the ultimate reason for these corners of support in
the compiiler. You want to sometimes be able to implicitly convert between a
struct and a trait object, and you also sometimes want to be able to wrap things
(in e.g. Box
or Rc
) of unknown size, and then you want those two features to
combine.
PS: Playground link, if you'd like to poke at it yourself.