Bitfields in Rust
Estimated reading time: .
A walkthrough of the recent bitfield behavior I implemented in bitvec
If you don’t care about bit collections in other languages, use the table of contents to jump ahead.
I am the author of a Rust library called bitvec
. This is the most powerful memory manipulation crate in the Rust ecosystem and, to my knowledge, the world.
Almost every language that is used in the “systems programming” domain has some form of capability, either in the language itself or in a specialized library, for manipulating memory as sequences of raw bits, rather than of typed values.
The specific implementations in each language are largely overlapping, with some advantages and disadvantages for each, but they are all largely similar to each other. They all define a single, fixed, ordering of bits in a memory element; most of them do not permit users to specify the type of memory element in use to aggregate bits in memory, and only a few (C and C++ bitfields, Erlang bitstrings) permit users to treat an arbitrary bit region as a value location where you can write or read typed numeric data against it.
bitvec
can do all of the things I just said other languages can’t.
Enough about everyone else. Let’s talk about me.
Creating bitfields with bitvec
In order to cook an apple pie, you must first create the universe, and in order for me to explain something, I must first deliver a CS101 lecture.
This section is the CS102.
Treat some memory as bits
In order to have a region of memory we can use as bitfields, we must first allocate a region of memory, either on the heap, or the stack, or in the static section.
use bitvec::prelude::*;
let mut stack_raw = [0u16; 4];
let stack_bits = stack_raw.bits_mut::<BigEndian>();
let mut heap = BitVec::<Local, Word>::with_capacity(64);
static mut STATIC_RAW: [u32; 2] = [0; 2];
let static_bits = unsafe {
STATIC_RAW.bits_mut::<LittleEndian>()
};
We now have 64 mutable, contiguous, bits in each of the local stack frame, the heap, and the static memory segment. It doesn’t matter where they are; the main working type of this crate is the &/mut BitSlice<C, T>
reference, which applies equally to them all.
Notice that each of those three allocations uses a different Rust fundamental: u16
on the stack, Word
on the heap, and u32
in static. bitvec
allows you to use the unsigned integer types that correspond to register widths in your CPU as storage types: u8
, u16
, u32
, and (only on 64-bit-word processors) u64
. The Word
type aliases to your local usize
2.
You may also notice that each of the three allocations uses a different first type parameter. The first type parameter is an implementation of the Cursor
trait. The LittleEndian
type means “counts from the least significant bit first to the most significant bit last”, and the BigEndian
type means “counts from the most significant bit first to the least significant bit last”, in whatever integer type the slice is using as its group size. The Local
type aliases to LittleEndian
on little-endian byte-order architectures, and BigEndian
on big-endian byte-order architectures.
Other languages restrict you from one and/or both of these options. This is unfortunate, because as it turns out, there is not a universal convention for these among all I/O protocols.
Choose a region of contiguous bit indices within that memory
I emphasized the word indices in that heading, because bitvec
does not expose bit positions in memory to you. The two type parameters in all of the data structures the library exposes map from abstract unsigned integers into the actual shift-and-mask procedures used to access memory. Pick the combination that works for you, and then forget all about memory, and just pretend that the memory in a slice is a one-dimensional sequence of individual bits, starting at [0]
.
BitSlice
has absolutely no restrictions on where in memory you start or end a region (except for bounds checks, which it strictly enforces). We have 64 bits available. Grab any start and end number you want. I’m going to roll some dice offscreen:
- 27
- 13
bitvec
currently requires that ranges are strictly in the increasing direction, from lower numbers to higher3, which means that we are interested in the memory range [13 .. 27]
. That’s 14 bits. bitvec
disallows storing a type whose bit width is smaller than a region, so we can’t store u8
in it, but u16
, u32
, and u64
are all fair game.
Put some data in that region
stack_bits [13 .. 27].store(0x3123u16);
heap [13 .. 27].store(0x0000_3456u32);
static_bits[13 .. 27].store(0x00000000_0000_3789u64);
That’s it. That’s the whole API.
Truncation is from the most-significant-bit downward. For an n
-bit region, the n
least significant bits of the value are transferred into or out of the bit slice. This means that the highest two bits of a u16
are discarded in a 14-bit region.
This is why the first non-zero digit in the numbers above is 3
: anything higher would get truncated, and will not be written into the region.
Pull that data back out
let s: u16 = stack_bits [13 .. 27].load().unwrap();
let m: u32 = heap [13 .. 27].load().unwrap();
let l: u64 = static_bits[13 .. 27].load().unwrap();
The load
method returns an Option
, because I elected to be calm rather than panicky when presented with a BitSlice
of length 0
or more than the type being returned.
assert!(stack_bits[13 .. 27].load::<u8>().is_none());
The region has 14 bits available; a u8
can’t fill them when store
ing or receive them when load
ing. store
exits without effect, load
returns None
.
I’m not going to demean myself by posting a decompiled example here to show that the s
, m
, and l
values all match exactly what we put in. They do.4
More than just variable-width data storage
So bitvec
can compress data storage. If you know you have a number that will never surpass 1023
, you can treat it as a u16
when holding it and pack it into a u10
when storing it. That doesn’t impress you; C can do that:
struct three_tens {
uint16_t eins : 10;
uint16_t zwei : 10;
uint16_t drei : 10;
uint16_t _pad : 2;
};
and so can Erlang:
three_tens = <<
eins:10,
zwei:10,
drei:10,
_pad:2
>>
This is the part where I remind you that C can’t store u16
s in a byte array, or in a word array, only in a u16
array. That struct is two u16
s. Also, you don’t get to choose the storage order. It’s from the LSbit on little-endian architectures and from the MSbit on big-endian5.
I have absolutely no idea what the backing memory of Erlang bitstrings is, or of any other language that has this functionality.
Compacted machine memory isn’t cool. You know what’s cool?
Declaring the layout of an I/O protocol in your type system.
I/O Packet Destructuring
Let’s pick an example out of thin air, like, for instance, an IPv4 packet.
How would we use BitSlice
to describe memory we know contains it?
type Ipv4Pkt = BitSlice<(/* ??? */), (/* ??? */)>;
According to the Wikipedia table I linked above, the IPv4 packet uses 32-bit words as its logical stride, so that’s a guess as to the backing element type.
Let’s skip the exploration and I’ll tell you why BitSlice<_, u32>
is the wrong answer: the kernel I/O interface gives you a sequence of u8
, and does not promise that they’re aligned to the 4-byte step that u32
requires. Also, the bytes are in network order (big-endian) and your CPU is probably little-endian, so casting the bytes as u32
is not only undefined behavior, but also gives you the wrong numeric values.
The IPv4 table explains that it is enumerated in MSB-0 order, so, most-significant bit on the left. This means that the packet uses the <BigEndian, u8>
type parameters6:
type Ipv4Pkt = BitSlice<BigEndian, u8>;
Let’s pretend that our program has just received a raw socket buffer from the operating sysetm, and parse it as IPv4. To start, we’ll grab the IHL field, as that holds a dynamic partition point between the IPv4 header and payload:
let bytes: &[u8] = recv();
let bits: Ipv4Pkt = bytes.bits();
let ihl = bits[4 .. 8].load::<u8>() as usize;
if ihl < 5 {
return Err(InvalidIhl);
}
let split = ihl * 32;
let (ipv4_hdr, payload) = bits.split_at(split);
We can do the same behavior for most of the other fields of the packet: look up their range in the protocol, then call .load()
with the appropriate type on that range.
There is one field in the IPv4 header that stymies this approach, and I’ll cover it now: Fragment Offset.
Byte Endianness Gotchas
Fragment Offset is in word [1]
, bits [19 .. 32]
. This translates to bits [51 .. 64]
of the bit slice. Note that, in the protocol diagram, bits [51 .. 56]
are in byte [6]
, and bits [56 .. 64]
are in byte [7]
. As I mentioned above, the bytes are in big-endian order as u32
, which means byte [6]
is more significant than byte [7]
.
However, your processor almost certainly uses little-endian byte ordering, and bitvec
respects this. The implementation of load
means that it will take the five bits in byte [6]
and treat them as the five least significant bits of the field, then load the eight bits of byte [7]
as more significant than them in the produced u16
value.
This is not what the IPv4 protocol wants. The five bits of byte [6]
are the most significant bits of the value, and the eight bits of byte [7]
are the least significant bits.
Writing this article made me realize I need to add specific methods for correctly processing big- and little- endian memory, independently of the local machine architecture. At the time that I publish this, I have not done so; I will update this article once I do.
So you have to do the endian switch yourself, sorry:
let mut bytes = [0u8; 2];
bytes[0] = bits[51 .. 56].load().unwrap();
bytes[1] = bits[56 .. 64].load().unwrap();
let fragment_offset = u16::from_be_bytes(bytes);
In the future, .load_be()
will interpret the memory as big-endian, and .load_le()
will interpret it as little-endian.
Building a Bitfield Struct
Rust does not have bitfield syntax. bitvec
does not provide this; it is purely a library, not a syntax extension. This means that access to bitfields in a struct, such as for a protocal packet or matching a C type API, requires using methods, rather than fields.
For a C structure such as this:
struct SixFlags {
uint16_t eins : 3;
uint16_t zwei : 2;
uint16_t drei : 3;
uint16_t vier : 3;
uint16_t funf : 2;
uint16_t seis : 3;
};
“six” in German is “sechs”, which is too many letters.
You might write a corresponding Rust structure like this:
type SixFlagsBits = BitSlice<Local, u16>;
#[repr(C)]
#[derive(Copy, Clone, Default)]
pub struct SixFlags {
inner: u16,
};
impl SixFlags {
pub fn eins(&self) -> &SixFlagsBits {
&self.inner.bits()[0 .. 3]
}
pub fn eins_mut(&mut self) -> &mut SixFlagsBits {
&mut self.inner.bits()[0 .. 3]
}
pub fn zwei(&self) -> &SixFlagsBits {
&self.inner.bits()[3 .. 5]
}
pub fn zwei_mut(&mut self) -> &mut SixFlagsBits {
&mut self.inner.bits()[3 .. 5]
}
// you get the idea…
}
Filling out such a structure in Rust:
let mut flags = SixFlags::default();
flags.eins_mut().store(2u8);
flags.zwei_mut().store(0u8);
flags.drei_mut().store(4u8);
flags.vier_mut().store(5u8);
flags.funf_mut().store(1u8);
flags.seis_mut().store(7u8);
is guaranteed to be binary-compatible with its equivalent C structure:
struct SixFlags flags = get_from_rust();
flags.eins; // 2
flags.zwei; // 0
// …etc
whenever you use the Local
ordering, and match your interior layout to the C ABI with #[repr(C)]
and faithful transcription of the memory types.
Summary
Rust has bitfields now. More flexible than C, about as capable as Erlang, though without the language support, and miles beyond the sequence libraries in every other language.
I fully intend for bitvec
to be the universal Rust library for lowest-level direct construction and interpretation of memory segments. If bitvec
does not work for you, please get in touch with me directly or file an issue.
bitvec
optimizes fairly well. The steps I’ve taken to implement the library in a manner that fits in the existing Rust language and library pattern means it has certain unavoidable performance costs that just have to be paid for a fully capable bit-slice type. The assembly, even in --release
, for a .store()
call is far larger than an equivalent hand-written shift-and-mask operation would be.
This non-zero-cost abstraction is due to the runtime computations that must be done for correctness, and cannot yet be moved into the compiler. As the compiler’s constant evaluator gets more powerful, it will be able to perform ahead-of-time range computations on BitSlice
handles, reducing the runtime load on statically-known slice boundaries.
Personally, I am of the opinion that offloading shift/mask and split computations to the machine in favor of much simpler source code is a worthwhile trade. If you need to tighten a hot loop, BitSlice
offers you access to the raw memory elements, and you can drop down to directly-computed shift/mask operations.
And if your processor can afford a hundred-instruction store function (whose actual runtime will be significantly less; load
and store
branche heavily based on runtime conditions of the slice, and must include code for all paths), the comprehension gain in the source code – clear text, automatic bounds checks, and idiomatic Rust patterns – is a benefit you do not want to miss.
- ↩
Ruby’s
Integer
class is, in fact, implemented as a hybrid between ani31
and a bit-vector so that it can have arbitrary-sized integers with minimal cost. No, you are not tricking me into explaining what ani31
is in this article. Footnotes don’t nest. - ↩
For technical reasons, including but not limited to the fact that
usize
is a discrete type and not an alias tou32
oru64
,bitvec
disallowsusize
as backing storage. I might remove this restriction later. I figured out how to do this in0.17
. - ↩
I might change that in the future, but
std
has the same requirement, so why get wild too soon? It would be pretty neat to have[high .. low]
provide reversed directionality, though. - ↩
This is, of course, checked by the test suite.
- ↩
Matching the (bad) behavior of existing C code is the other reason I chose
<Local, Word>
as the default type parameter. - ↩
<BigEndian, u8>
used to be the default parameter choice inbitvec
types, as it appears to be a very common sequence type. I changed it since<Local, Word>
gives better performance for users who don’t care about layout, and users who do care about layout will specify it.