Type Solidity

Published on

If you already know the general concept of generics, skip section one. If you already know how Rust generic rules work, skip section two. If you really know how Rust generics work, skip section three.

  1. Overview of Parametric Types
  2. Rust Type Rules
  3. What Really Is A Type
  4. My Point
    1. My Point, Concisely
  5. Feedback

Overview of Parametric Types

Brief overview and background: of the programming languages that have a concept of generics or parametric types, I am most well versed in Rust, somewhat versed in C++ and Java and C♯, and not at all versed in Haskell or anything like it.

In case you’re somehow reading this and don’t know what generic types are, they are a system by which you are able to build a class or struct or datatype that has components (generally interior fields) whose type is provided by the user of your type, and not by you.

C1 does not2 have them, so let me provide a negative example:

1
2
3
4
5
6
7
8
9
10
typedef struct {
    unsigned char* ptr;
    size_t len;
    size_t cap;
} vec_uchar;
typedef struct {
    signed int* ptr;
    size_t len;
    size_t cap;
} vec_sint;

If I want a vector of unsigned char, that is a different type than a vector of signed int, and I have to build them twice and implement them twice (and again for any other type I might want to store in a vector).

Yes, the simpler solution is to make a single vector of unsigned char and have the user store and retrieve memory representations as slabs of bytes but this is Bad, Actually, and I don’t need to go into that here. It also falls down on use cases other than vectors.

C++ solves this with templating, where you declare a class template and then the compiler uses that template to make finished classes behind the scenes.

1
2
3
4
5
6
7
template <class T>
class vector {
private:
    T* ptr;
    size_t len;
    size_t cap;
};

Now, when I use a vector<unsigned char>, the above template gets specialized by dropping unsigned char in every T spot, and the same happens when I use a vector<signed int>, and they’re two different classes that only had to be written once.

Rust calls them generics instead of templates, but is otherwise more or less identical (except C++ templates are more powerful; as of 1.28 Rust can only use types where C++ can use values also).

1
2
3
4
5
6
//  This is not really what std::vec::Vec looks like but whatever
struct Vec<T> {
    ptr: *mut T,
    len: usize,
    capacity: usize,
}

And I don’t know about other languages so I’m not going to give examples there but also this post is about the Rust type system! So I’m going to actually talk about that now.

Rust Type Rules

Rust’s separation of information (struct Struct) from behavior (impl Struct, impl Trait for Struct) means that it needs some rules for who can attach behavior to a type. The rules are this:

  • only the crate that defines a type can provide inherent (impl Struct) method blocks.
  • a crate that defines a type can implement any trait on it, no matter what crate defined the trait
  • a crate that defines a trait can implement it for any type, no matter what crate defined the type
  • a crate cannot implement a non-local trait on a non-local type

So if you have a trait that you wrote, you can impl MyTrait for Vec<T> for the Vec<T> type from the crate std, and if you have a type that you wrote, you can impl Display for MyType with the Display trait from the crate std, but you cannot implement new methods for Vec<T> directly, nor can you impl<T> Display for Vec<T> where both the trait and the type are not from your local crate.

This is overall a good decision, and not one that I want to change in any way.

What Really Is A Type

I had a nicely illuminating conversation with ubsan while this post was sitting in my drafts folder, and I’m very glad for it because she prodded my understanding of what types are in abstract and how they could be represented in Rust in ways I hadn’t exercised thorougly before.

My main takeaway from that chat was a confirmation of what I’d been thinking, expressed more clearly and succinctly than I had yet managed: Rust has multiple levels of types.

At the zeroth level, we have the primitives, empty structs, and the empty tuple. These are as concrete as can be, type-wise. They are the atoms of all our type algebra.

1
2
3
4
bool
i32
struct Aleph;
()

At the first level, we have non-empty structs, and traits (marker or not).

1
2
3
4
5
6
7
8
struct Coord {
    x: i32,
    y: i32,
}
trait Useful {
    fn use(&self);
}
trait Useless {} // but not for long

Tuples and arrays are specialized structs; I’m not going to bother with them here because they’re beside the point.

Above these, we have the generic structs and traits:

1
2
3
4
5
6
7
struct Pair<T> {
    left: T,
    right: T,
}
trait Joinable<T=Self> {
    fn join(&self, other: &T);
}

These aren’t types themselves; they’re types of types. They describe the shape of types to come, and implement functions for creating them. Generic types are to concrete types what concrete types are to piles of bits.

The struct Pair<T> declaration above is the type of all possible Pair<_> instances. The type Pair<i32> is of type Pair<T>, as is Pair<bool>, but Example<i32> is not.

Rust currently offers some reasoning about types at multiple levels of abstraction. It is already perfectly possible to define a generic type, and work with it in varying levels of abstract precision. Let me pull from a project of mine:

1
2
3
4
5
6
7
8
9
10
pub struct Macro<T> where
    T: Clone + Debug + Default + Parsed,
{ /* ... */ }

impl<T> Parsed for Macro<T> where
    T: Clone + Debug + Default + Parsed
     + Describable + Described + Nameable,
{ /* ... */ }

impl Macro<Paramater> { /* ... */ }

The struct definition creates a type of types, and lays out the constraints on the whole type family in a small where-clause.

The Parsed implementation only applies to a subset of all possible Macro<T> types, because of the larger where-clause. The Parsed trait is only live when a Macro is built with a T that satisfies all those constraints. It is perfectly okay to build a Macro with a T that doesn’t satisfy the second line of the where-clause; as long as it satisfies the clause on the struct definition, it is acceptable. The Parsed implementation is just missing on Macro types that aren’t good enough.

And lastly, Macro<Parameter> is a concrete type which can be the subject of an impl block. (Parameter must, of course, satisfy the four traits in the struct definition, but need not necessarily satisfy any others.)

Macro<Parameter> is a type of real objects that take up real addresses at runtime with real bits. Macro<T> is not.

Ignore runtime of our program, for a moment. We are now only concerned with the runtime of the compiler.

During compilation, Macro<Parameter> is a specific value, whose type is Macro<T>3. Macro<T> is a function that, when called with the value Parameter, creates a specific value of its type. That function is called wherever we write expressions that use a concrete instance of Macro<T>, or wherever we write impl blocks that bind to it somehow.

Because the Rust type system is very good, we have the ability to write generic impl blocks. These are functions that generate concrete slabs of code when invoked with a specific type, and these can be made as general or as narrow as we like.

My Point

Enough prelude. I trust I’ve laid enough ground work for what generics are in compiled code, and what they are during compilation.

Here’s what I want from the Rust type system:

Given that a fully-qualified type (Pair<i32>) is one value in a type family (Pair<T>), and given that Rust’s type algebra knows how to measure the specificity of a type, I think it is perfectly feasible to relax the rules about type collision and coherence in a manner that can be useful without becoming chaotic.

Specifically, I want the following behavior:

A crate, pear, defines a type, Pear<T>. Inside crate pear, any code can be written that works with Pear<T>, including blanket implementations, general implementations with trait bounds, everything the current rules permit.

1
2
3
4
5
6
7
8
9
//  crate pear
pub struct Pair<T> { }
pub struct Pear;

impl<T: Clone> Clone for Pair<T> { }

impl<'a> From<&'a T> for Pair<'a, &'a T> { }

impl Pair<Pear> { }

Crate pear does not know about crate peach, and so no code in pear refers to or depends on code from peach. It is wholly impossible for pear::Pear as defined in its own crate to use peach types in any way.

Crate peach depends on pear.

1
2
3
4
5
6
7
8
9
10
11
//  crate peach
extern crate pear;
use pear::{Pair, Pear};

#[derive(Clone)]
pub struct Peach;
pub trait Peachy { }

impl Peachy for Pear { }
impl Pair<Peach> { }
impl Clone for Pair<Peach> { }

This is impossible in current Rust. Rust currently considers Pair<Peach> to belong to the pear crate, even though pear cannot create that type or any values of it because it does not know what Peach is, and thanks to the strict acyclic nature of crate layouts, it never can.

This does, of course, lead to problems. Peach is Clone, and pear defines a Clone implementation on Pair for all interior types that are themselves Clone. This means that there are two impl Clone for Pair<Peach>: one provided by pear’s impl<T: Clone> Clone for Pair<T> and one provided by peach’s impl Clone for Pair<Peach>.

The solution here is simple: Pair<Peach> is a more concrete type than Pair<T: Clone>, and so all Pair<Peach> instances use the implementation from peach. Other Pair<_> types use the implementation from pear unless they also provide an override.

This is sound because pear can never work directly with any code that involves Peachs. It cannot depend on the crate; it cannot load the symbol; it will never ever see a Peach and peach is therefore free to ignore parts of pear if it wants to do so by overriding them with itself.

This also extends to a third crate, plum.

1
2
3
4
5
6
7
8
9
10
//  crate plum
extern crate pear;
extern crate peach;
use pear::{Pair, Pear};
use peach::{Peach, Peachy};

pub struct Plum;

impl Peachy for Plum { }
impl Peachy for Pair<Plum> { }

The peach crate exported a trait, Peachy. The plum crate can implement Peachy on its own internal types, but it cannot implement it on foreign types, ever. This is in keeping with Rust’s current coherence rules, which do not need to be changed.

However, while Pair<T> is a foreign type from pear, Pair<Plum> is not. As I discussed for Peach, pear can’t see the types from crates that depend on it, and so can’t ever work with them directly. Therefore, the concrete type Pair<Plum> can only exist in the plum crate, and the plum crate is able to implement Peachy for it without trouble (because peach can’t see plum either).

My Point, Concisely

A crate which defines a generic type has absolute ability to define generic or specific implementations on that type. However, generic implementations should be considered “soft”

A crate which uses foreign generic types should be able to consider concrete versions of those types, finished with types local to itself, as also being local. Current Rust considers Pair<Plum> to belong to pear, when it could correctly be considered to belong to plum.

Note that visibility rules still apply unchanged: only the Pair symbol was exported from pear, which means plum’s Pair<Plum> still cannot see the interior private fields of Pair<_>.

A crate may implement a foreign trait on a local type, including a local type that has a foreign outer component, so long as the total concrete type is strictly local. The coherence rules are not modified to permit the implementation of a foreign trait on a fully foreign type.

In short: make Foreign<Local> a local type, not a foreign one, for coherence purposes only. This does not alter visibility, and brings Foreign<Local> in line with Local<Foreign> in terms of which crate may implement on a mixed type.

Blanket conflicts are resolved by using order of specificity:

  1. impl Foreign<Local> can only be done in the local crate, and has high precedence
  2. impl<T> Foreign<T> can only be done in the foreign crate, and has low precedence

When Rust searches for an impl block on a type, it first checks the fully instantiated type (Foreign<Local>), and only searches the generic type (Foreign<T>) if it did not find anything suitable. This permits specialization without conflict, and collisions are still avoided by the existing coherence system.

Feedback

I am not a gifted or learned type theorist. I cannot be the first person to have had these ideas, and the ongoing debate in Rust about how to permit specialized trait implementations clearly indicates that this is a hard problem.

So I actively look forward to having holes poked in my understanding of how type algebra currently works in Rust, and why this wouldn’t work. The fact that I have this mental model means I am likely missing some important information about the concept in general or Rust’s behavior in particular, and that absence makes me misinformed on the topic.

Or maybe I don’t have tunnel vision, and I’m not horribly off base, and this might be useful! I don’t know. I look forward to hearing from you.

And this isn’t even 2,500 words, so no complaints about length.

  1. I considered using Go as my negative example, but in the time it took me to get this piece written they finally drafted some for Go 2

  2. C11 does, but it’s weird, and I can’t use C11 at work so I don’t know it that well (and nobody means C11 when they talk about C, they mean C99). 

  3. That is not at all how rustc works, but I’m not a compiler dev. I’m talking about abstract principles on whiteboards and paper, not in real programs.