No User Serviceable Parts Inside

Published on 2023, Feb 12

Estimated reading time: .

I’m having troubles with Miri (again) and this is a snapshot of my understanding of the problem at time of writing.

Contents

Introduction

People periodically come to me and ask me, “myrrlyn, is what you’re doing in this crate actually legal Rust?” or sometimes directly tell me, “what you’re doing in this crate is definitely not legal Rust”.

I put a lot of work into being right when I tell the askers that yes it is, and into finding satisfactory changes when the experts slap my wrist a little too hard. It’s something about which I care a great deal, though by and large it’s not something I’ll make a hard blocker of my work on bitvec. My philosophy here is that the standard library and/or language built-ins don’t get to complain that I’m smuggling work past them when they don’t give me interfaces fluent enough for me to communicate what I’m trying to do. My code has satisfied rustc since late 2018, so bitvec produces correct machine code. Higher-level analyzers like Miri complain when they get smart enough to understand me, at which point I either adjust to fit their rules, or accept that the complaint is neither solvable nor preventing my use in well-formed programs.

Unfortunately, I like being able to claim that my code survives analysis by strict inspectors such as Miri, and therefore can guarantee correct compilation as rustc incorporates more of Miri’s rules. Plus, Miri has genuinely helped me track problems with shared-element mutation in bitvec’s alias analysis, so I would prefer to be able to use it.

Experiments are ongoing in the standard library and in the evolution of Miri’s concept of “Stacked Borrows” to follow developments such as the CHERI architecture by attaching metadata to pointers during interpretation, and observing whether a program obeys rules about how pointers are produced, moved around, and dereferenced. Miri is getting smarter about following pointers through program execution, and the standard library is sketching out APIs that too-clever library authors such as myself can use as official cover when doing pointer value manipulation that the compiler doesn’t inherently understand.

Unfortunately, at the time of writing, Miri is intelligent enough to see past the mechanism I use to access structural components of pointers, and the standard library hasn’t stabilized the APIs I need to satisfy Miri that I am obeying its rules. All ferrilab crates are committed to using the stable release series, so I can’t use unstable APIs in released code, and I am not yet willing to pull the Volkswagen trick of detecting when I am under test and using the correct API then and the incorrect API on the road.

A Brief Review of the `bitvec` Problem

Go read The Ad-Dressing of Bits if you want more detail. I really do mean “brief” this time.

Until Rust stabilizes the pointer-metadata feature that allows us to declare how pointers to our own types look, bitvec works by packing extra data into the components of a slice pointer. It modifies both the address and the length, which means that it can’t be typed as a &[T] slice reference or the compiler will make some assumptions about the validity of the region to which it refers, and those assumptions are wrong. Even for &[u8], where bitvec doesn’t modify the address component, it still multiplies the length component by 8. This is a buffer overrun, which is UB.

So bitvec uses &[()] for everything, which definitionally can point anywhere for any length, because () has no size and is never dereferenceable. Rust shouldn’t care about the bit-patterns of these values.

They Wouldn’t Seal It Up So Tight If There Wasn’t Cool Stuff In There

The standard library doesn’t actually publicly guarantee that slice-pointers have the representation

#[repr(C)]
union PtrRepr<T: ?Sized> {
  const_ptr: *const T,
  mut_ptr: *mut T,
  components: PtrComponents<T>,
}

#[repr(C)]
struct PtrComponents<T: ?Sized> {
  data_address: *const (),
  metadata: <T as Pointee>::Metadata,
}

Source code for Rust pointers

even though I just copied that out of the standard library definition, where it’s been (more or less) stable for the five years I’ve been looking at it and will almost certainly never change.

Frustratingly, the [T] slice lang item does not implement the Pointee trait in core, because it’s a lang item. So there’s no ordinary function definition for accessing the slice-pointer metadata – <[T]>::len forwards to a slice_len_fn lang item, which is implemented in HIR as a custom lowering instruction.

In practice, wide-pointer representation is likely ossified deeply into Rust’s and LLVM’s understanding of how these pointers experience codegen, making layout changes unlikely due to wide-spread ramifications, so technically mem::transmute isn’t off the table.

But because the whole point of this article and my work is that I’m trying to convince the compiler pipeline that everything is above-board, it is. Plus mem::transmute doesn’t tolerate sizing inside generic contexts very well.

Because core refuses to make wide-pointer type definitions publicly available for people like me to interact with the components directly (a good and correct decision), I am restricted to going solely through the public APIs exposed in core::ptr, and because #![feature(ptr_metadata)] is still unstable, I can’t use the correct solution of

pub struct BitSlice<T, O> {
  _ty: PhantomData(T, O),
}

impl<T, O> Pointee for BitSlice<T, O> {
  type Metadata = BitSpanMetadata;
}

struct BitSpanMetadata(usize);

to instruct the compiler that pointers to BitSlice regions have metadata that I know how to use properly. Since dyn Trait dispatch tables are statically generated by the compiler and cannot ever be modified, the only way to store state in a wide-pointer component is to use <[()]>’s usize field and hope for the best.

Point Not Into the `void` Lest the `void` Point Also Into You

You’ll notice up above that the PtrComponents struct stores the address as a *const (). This is vaguely Rust’s answer to the C void* type, which is to say that it marks a memory address but it doesn’t say anything about what data is stored at that address. We don’t know about write permissions or how to interpret what is read from it (and, because it’s a pointer to (), we aren’t actually even able to read from it, because () doesn’t exist).

So I follow the standard library’s example and use pointers to () everywhere I need to indicate that I am holding a memory address that I don’t want the compiler to try to look through.

Unfortunately, while the compiler seems to accept that unit-pointers are unknowable and should be left alone until better typed, Miri takes the opposite perception: there’s nothing to know about unit-pointers, and so it discards its memory of how you got there, which means you can never go back.

This strikes me as weird, because the standard library does it, and presumably Miri is fine with that.

But the Miri errors I’m getting in bitvec at the moment are happening in the same part of the code as always – trying to dereference a pointer I decoded and re-attached typing information to – with an error that I wildly misunderstood to start:

error: Undefined Behavior: trying to retag from <217630> for SharedReadOnly
permission at alloc86438[0x0], but that tag does not exist in the borrow stack
for this location
--> /.../rust/library/core/src/slice/raw.rs:100:9
     |
100  |         &*ptr::slice_from_raw_parts(data, len)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |         |
     |         trying to retag from <217630> for SharedReadOnly permission at
     |         alloc86438[0x0], but that tag does not exist in the borrow
     |         stack for this location
     |         this error occurs as part of retag at alloc86438[0x0..0x8]
help: <217630> would have been created here, but this is a zero-size retag
(`[0x0..0x0]`) so the tag in question does not exist anywhere

The “zero-size retag” part made me worry that I was improperly trying to create a one-past-the-end pointer during bit-slice traversal: due to the paradigms inherited from C, programs frequently need to create a pointer whose address is to the next location after a region and then not dereference it. bitvec does this as well, and I thought that maybe the zero-size area was a symptom of creating a pointer beyond the source region and Miri alarming on it before any memory attempt occurs. After all, the fault was occurring inside BitPtr::read, so surely this is because of an invalid memory dereference, right? Plus, the permission tag should only vanish when I depart the original provenance location.

Spoilers: my pointer arithmetic, which has not changed in years, is still correct.

When the arithmetic turned out to be correctly breaking loops before creating any invalid pointers, I turned to the only other possible culprit: the part where I turn a &BitSlice reference into a slice reference by doing, essentially, unsafe { &*(ptr as *const BitSlice<_, _> as *const [()])} to get an untyped slice reference that I can then call .as_ptr() and .len() on.

This was it. Casting through [()] causes Miri to decide that the pointer is now to a zero-sized region, which means any later casts where I restore typing information (&*(slice.as_ptr().offset(something) as *const T)) no longer have access to the provenance history.

This is, to put it lightly, a pretty serious problem.

Minimal Reproduction

In my quest to rip as many components as I can out of bitvec that aren’t directly related to managing individual bits, my pointer manipulation code largely lives in funty. bitvec’s only raw-pointer work is the encoding used to pack data into a slice pointer. Conveniently, this means that I can create a test in funty to trigger the same failure, and not have to deal with all of bitvec’s baggage in trying to get a small program running.

#[test]
fn cast_through_zero() {
  let data = [4u32, 8u32];
  let slice_ref: &[u32] = &data[..];
  let slice_ptr = slice_ref as *const [u32];
  // let slice_ptr = slice_ptr as *const [()];
  let unit_slice = unsafe { &*slice_ptr };
  let slice_ref = unsafe {
    core::slice::from_raw_parts(
      unit_slice.as_ptr() as *const u32,
      unit_slice.len(),
    )
  };
  assert_eq!(slice_ref[0], 4);
}

Minimal reproduction of the bug, on the Rust Playground

The “source” link takes you to a Rust Playground where you can run this directly. Use Tools > Miri to run the program, then uncomment the line and run it under Miri again.

With the cast through unit-slice commented out, this test passes successfully. With that line enabled, Miri alarms with the error I described above. Note that nowhere in this test do I modify the address value of the pointer, or even route it through as usize. The only transformation that occurs is opacifying the referent type in order to mark the value as not dereferenceable.

What Is To Be Done

Well, I’m not getting a Ph.D. in how to model pointers in an abstract machine in a way that allows programs to mathematically prove that they aren’t conjuring addresses out of thin air and disobeying the rules of legal access, and people on the Miri team are. So I don’t exactly feel qualified to give them instructions.

But it would be nice if Miri would treat *() and *[()] as pointing to opaque regions, rather than transparently empty regions. Casting through unit or slice-of-unit (or any other zero-width type) should not discard provenance information and cause all subsequent pointer manipulation to become unable to regain access to real memory.

I fully understand the desire to limit the set of well-formed programs so that we can better reason about the ones that survive. That is, after all, Rust’s entire value proposition. I may be able to sidestep the problem by forcing Miri to assume that I am exposing addresses to systems it can’t see and then retrieving them later, but I would feel much more comfortable if I were acting under Miri’s supervision. It’s a genuinely useful tool that has helped me out a lot, and intentionally blinding it exposes me to risks of my own fallibility.

I want Miri to be intelligent and useful. Ideally, intelligent enough to understand and allow the things I am doing that it was not programmed to expect. I’m not the only project that encodes non-address data into pointers, and we’re going to want Miri to be able to understand all of these.

So please. Until the standard library stabilizes pointer-value-manipulation APIs, at least let us have *() and *[()]. I think merely preserving history when encountering these types will suffice – returning back out of the untyped pointer will require conformance with the original history information in order for the program to be well-formed, so programs that do this correctly should still be understandable.

But that’s what happens when you’re on the bleeding edge of an experiment. Sometimes the blood is yours.