My Random Posts

Yet another blog

Rust, self-referential structs and futures: part 1

Tags = [ rust ]

In this post, we'll have an introduction to self-referential structs, pinning, futures, and executors to be used in a follow-up post.

Self-referential structs

Normally, if you move a struct that you have ownership of in Rust, everything should be all right; all its containing fields will be moved together:

struct MyStruct {
    field: String,
}

fn main() {
    let my_struct = MyStruct {
        field: "hei".to_string(),
    };
    let my_struct_addr = &raw const my_struct;
    let field_addr = &raw const my_struct.field;

    // move the struct to the heap, then back to the stack
    let my_struct = *Box::new(my_struct);

    // in a typical build, the address of the struct and its field
    // before and after the move will have changed
    assert_ne!(my_struct_addr, &raw const my_struct);
    assert_ne!(field_addr, &raw const my_struct.field);
}

And Rust must not let you violate memory safety without the use of unsafe:

struct MyStruct {
    field: String,
    pointer_field: *const String,
}

fn main() {
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
        pointer_field: std::ptr::null(),
    };
    // my_struct.pointer_field is pointing to my_struct.field
    // so the struct is self-referential
    my_struct.pointer_field = &raw const my_struct.field;

    let field_addr = &raw const my_struct.field;
    let pointer_before_move = my_struct.pointer_field;

    // move the struct to the heap, then back to the stack
    let my_struct = *Box::new(my_struct);

    // in a typical build, the address of the struct field
    // before and after the move will have changed
    assert_ne!(field_addr, &raw const my_struct.field);

    // confirm that pointer didn't change even though the field
    // address changed, so this pointer is now dangling!
    assert_eq!(pointer_before_move, my_struct.pointer_field);
}

The code above is still okay because a raw pointer is an inert thing as long as it's not dereferenced, and dereferencing it requires unsafe. If you added code that dereferenced this pointer, then that code would need to uphold the invariant that the pointer still points to a valid String.

If you try to do something similar with safe Rust, you will find yourself unable to move the struct:

struct MyStruct<'a> {
    field: String,
    pointer_field: &'a String,
}

fn main() {
    let x = "".to_string();
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
        pointer_field: &x,
    };
    my_struct.pointer_field = &my_struct.field;

    // this won't work: compilation error
    // cannot move the struct because it is borrowed
    // let my_struct = *Box::new(my_struct);
}

Pinning

So what is Pin? It's basically a struct that helps us create a safe interface to interact with values whose validity depends on their address staying stable, such as self-referential objects:

First let's start considering the trait Unpin; most types in Rust are not self-referential and implement the trait Unpin automatically. However, they will not automatically implement Unpin if they have a field that doesn't implement it. So, let's consider the two cases.

  • Implements Unpin: the first case is easy; if a type T implements Unpin, then pinning doesn't impose extra restrictions, and you can, with safe Rust, convert &mut T to or from Pin<&mut T> with Pin::new and Pin::get_mut/Pin::into_inner. This is useful if some method expects a Pin<&mut T> but we have a &mut T.

  • Does not implement Unpin: then it's impossible, with safe Rust, to convert &mut T to Pin<&mut T> directly using Pin::new, but there is a macro std::pin::pin! that allows us (with safe Rust) to take ownership of a value T and create a local pinned binding, giving us a Pin<&mut T>. While that pinned binding is alive, safe Rust cannot move the value out again. In this case, it's also impossible, with safe Rust, to convert Pin<&mut T> to &mut T, so assuming you don't have ownership of T anymore (i.e. you didn't use unsafe Rust to get Pin<&mut T> to begin with), it's not possible to move the value out of its location, so that place in memory is pinned. Another common option is Box::pin, which pins a value on the heap.

So let's illustrate this with an example:

use std::{marker::PhantomPinned, pin::Pin};

#[derive(Debug)]
struct MyStruct {
    field: String,
}

#[derive(Debug)]
struct MyStructPP {
    field: String,
    pin: PhantomPinned,
}

fn main() {
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
    };

    let pinned: Pin<&mut MyStruct> = Pin::new(&mut my_struct);
    let mut_ref: &mut MyStruct = Pin::into_inner(pinned);
    dbg!(mut_ref);

    let my_struct = MyStructPP {
        field: "hei".to_string(),
        pin: PhantomPinned,
    };

    // won't work: compilation error
    // let pinned: Pin<&mut MyStructPP> = Pin::new(&mut my_struct);

    let pinned: Pin<&mut MyStructPP> = std::pin::pin!(my_struct);

    // won't work: compilation error
    // the macro took away our owned value of my_struct
    // dbg!(my_struct);

    // won't work: compilation error
    // let mut_ref: &mut MyStructPP = Pin::into_inner(pinned);

    // we can only get back to a &mut with unsafe
    unsafe {
        let pointer_back: &mut MyStructPP = Pin::into_inner_unchecked(pinned);
        dbg!(pointer_back);
    }
}

Futures, executors

To begin, here's an ultra-quick intro on futures and executors, assuming you already have some experience with async Rust.

A future in Rust can be created by an async block or closure, an async function, or manually by implementing the Future trait:

async fn give_me_some_future() {}

fn main() {
    let _some_future = give_me_some_future();
    let _another_future = async {};
    let an_async_closure = async || {};
    let _another_future = an_async_closure();
}

Or manually by implementing the Future trait:

use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};

fn main() {
    let _some_future = SomeFuture(3);
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

The manually implemented future makes it quite clear that futures are lazy, such that just creating a future, in general, doesn't do anything per se. We need to call await on them and for that we need an executor, e.g. add tokio = { version = "1", features = ["rt", "macros"] } to your Cargo.toml or run cargo add tokio --features=rt,macros and then you can run this:

use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let some_future = SomeFuture(3);
    assert_eq!(some_future.await, 3);
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

What an async executor does, in simple terms, is poll futures until they return Poll::Ready; when a future returns Poll::Pending, the executor normally waits until the future's Waker signals that it may make progress.

Note: in practice, it's not a busy loop calling until it gets a Poll::Ready, but rather the executor gives the future a Waker inside a Context, which the future uses to tell the executor that it's ready to be polled again and make progress, but this is almost beyond the scope of this post.

So we can simulate this with:

use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};

// run
// cargo add futures
// or add
// futures = "0.3"
// to your Cargo.toml
use futures::task::noop_waker;

fn main() {
    // create a dummy waker and context
    // just to make the type system work
    let waker = noop_waker();
    let mut cx = Context::from_waker(&waker);

    let mut fut = SomeFuture(3);
    let pfut = Pin::new(&mut fut);
    assert_eq!(pfut.poll(&mut cx), Poll::Ready(3))
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

Or to visualize with something a little more complex:

use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
    time::SystemTime,
};

use futures::task::noop_waker;

fn main() {
    let waker = noop_waker();
    let mut cx = Context::from_waker(&waker);

    let mut fut = SomeFuture(0);
    let mut pfut = Pin::new(&mut fut);
    loop {
        match pfut.as_mut().poll(&mut cx) {
            Poll::Pending => {}
            Poll::Ready(val) => {
                println!("\nFinal result: {val}");
                break;
            }
        }
    }
}

struct SomeFuture(u64);
impl Future for SomeFuture {
    type Output = u64;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<u64> {
        // Unix epoch timestamp in microseconds
        let timestamp = SystemTime::now()
            .duration_since(SystemTime::UNIX_EPOCH)
            .unwrap()
            .as_nanos();
        // A rudimentary random number generator
        let remainder_of_timestamp_divided_by_200 = timestamp % 200;
        if remainder_of_timestamp_divided_by_200 == 0 {
            Poll::Ready(self.0)
        } else {
            self.get_mut().0 += 1;
            // Tells the async executor that
            // we are ready to be polled again
            cx.waker().wake_by_ref();
            Poll::Pending
        }
    }
}

Self-referential futures

Not every future is self-referential per se; in fact, all the manually defined futures in the previous sections are non-self-referential.

The compiler-generated futures (through functions and async blocks or closures) are generally marked as !Unpin, even when they technically are not self-referential, so even this won't compile:

let mut fut = async {};
let x = Pin::new(&mut fut); // compile error!

To conclude, let's give an example of a truly self-referential future we will explore in a follow-up post:

use std::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};

// This future is self-referential
async fn some_future() -> String {
    let mut mstr = "Hello".to_string();
    mstr.push('1');

    // let's call this the child future
    async {
        // On the first poll, execution reaches pending_once().await
        // and returns Poll::Pending. At that point, the parent future's
        // state contains mstr and also contains the child future, which
        // has borrowed mstr. Moving the parent future after this point
        // could invalidate that internal reference.
        pending_once().await;
        mstr.push('2');
    }
    .await;

    mstr.push('3');
    mstr
}

// This future is not self-referential
// So this compiles and runs:
// fn main() {
//     let mut fut = pending_once();
//     let _x = Pin::new(&mut fut);
// }
fn pending_once() -> impl Future<Output = ()> {
    struct PendingOnce {
        polled_once: bool,
        completed: bool,
    }
    impl Future for PendingOnce {
        type Output = ();

        fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
            let this = self.get_mut();
            if this.completed {
                panic!("future resumed after completion")
            } else if this.polled_once {
                this.completed = true;
                Poll::Ready(())
            } else {
                cx.waker().wake_by_ref();
                this.polled_once = true;
                Poll::Pending
            }
        }
    }
    PendingOnce {
        polled_once: false,
        completed: false,
    }
}

Follow-up post