Rust trait object layout
Rust makes a distinction between values and references that you generally learn to work with while using the language. This week I learned an interesting new corner around how that distinction applies to trait objects, despite using the language for quite a long time. Maybe it will surprise you too!
Background: dynamically sized types
When you have some x: usize you can say x is the usize itself, while some
y: &usize = &x is a reference to that value. y's concrete value is a
pointer.
Similarly, the type [u8; 40] means 40 bytes itself. If you put it in a struct
or pass it to a function, you're moving around 40 bytes.
Finally, the type [u8] means a series of bytes of (compile-time) unknown size.
You don't interact with these often because you can't usually put these in a
struct or pass them to a function because of their unknown size. Instead you
typically work with references to these as &[u8], which concretely are a
pointer and a length.
(cheats.rs has nice pictures of this.)
But you still do sometimes see [u8] as a type without a reference, in types
like Box<[u8]>. And further, you can wrap a dynamically sized type in a struct
as the last member, making that struct dynamically sized as well:
struct Test<X: ?Sized> {
inner: X,
}
// the type Test<[u8]> is now also dynamically sized
Background: trait objects
The type File represents an open file. Concretely, it's a struct with some
private fields.
The trait Read represents things that can be read from, with a .read()
method. The trait implies a trait object type dyn Read, which is the type of
things that implement the trait. File implements Read, so I will use File
and Read for the following examples.
Concretely, the layout of a trait object dyn Read is the same as the
underlying value it's wrapping, e.g. a File
(spec ref).
(This is possibly only useful to know as compiler trivia; even cheats.rs doesn't
document this!) Much like [u8], because their concrete size is not known at
compile time, you don't typically interact with these directly but instead via
references.
The type &dyn Read is a reference to a trait object. Concretely it's a pointer
to the object and a pointer to a static vtable of the methods for that type that
implement the trait.
(More pictures from cheats.rs.) Also like
[u8], you might more often use Box<dyn Read>, which holds ownership over the
underlying Read-able type.
(It was useful for my understanding to contrast these with C++ objects and vtables. In C++, the vtable pointer is always embedded in the struct itself. In Rust, the struct never contains a vtable pointer. Instead the reference to the trait object is two pointers, to the value and the vtable.)
Background: coercion
Though it's relatively rare in Rust, there are a few places where one type will
silently convert to another. One you may have used without thinking is using a
&mut T in a place that needs a &T.
When using trait objects there is another coercion:
let f: File = ...;
let r: &dyn Read = &f;
Here, &f is &File, but the compiler converts it to &dyn Read.
Finally, the surprise
Did you know that there is another trait-related coercion involving generics? Consider:
let f: File = ...;
let b: BufReader<File> = BufReader::new(f);
let r: &BufReader<dyn Read> = &b; // !!! legal
Here, the File inside the BufReader<> was able to coerce to a trait object.
Concretely, r here is like to a reference to a trait object, in that it is a
pair of a pointer to the BufReader along with a pointer to a Read vtable.
(Poking at the compiler, it uses the same Read vtable as you would get from a
plain File.)
The
underlying spec reference: [coerce.unsized.composite]
for why this is allowed is pretty involved! But at a hand-wavy level it's
allowed because BufReader<dyn Read> is a dynamically sized type where the last
field is the place where the dyn Read is used. Note that, for example, you
cannot have a similar coercion to two traits &SomeType<dyn A, dyn B> because
the reference can only carry a single vtable.
(How is this useful? It came up in some code I was reviewing from someone new to Rust; I'm not sure.)
(Bonus question: The above doesn't compile if you substitute Box or Rc for
BufReader. Why not? Something about the CoercedUnsized impls on those? I don't
know the answer.)
Unsized coercion
This does compile though:
let f: File = ...;
let b: Box<File> = Box::new(f);
let r: Box<dyn Read> = b; // consumes b
This is due to some magic traits implemented by Box (and also Rc, etc.).
I believe this behavior is the ultimate reason for these corners of support in
the compiiler. You want to sometimes be able to implicitly convert between a
struct and a trait object, and you also sometimes want to be able to wrap things
(in e.g. Box or Rc) of unknown size, and then you want those two features to
combine.
PS: Playground link, if you'd like to poke at it yourself.