Rust, self-referential structs and futures: part 1

In post, we'll have an introduction to self-referential structs, pinning, futures and executors to be used in a follow up post.

Self-referential structs

Normally, if you move a struct that you have ownership in Rust, everything should be all right, all its containing fields will be moved together:

struct MyStruct {
    field: String,
}

fn main() {
    let my_struct = MyStruct {
        field: "hei".to_string(),
    };
    let my_struct_addr = &raw const my_struct;
    let field_addr = &raw const my_struct.field;

    // move the struct to the heap, then back to the stack
    let my_struct = *Box::new(my_struct);

    // confirm that the address of the struct and its field
    // before and after the move have changed
    assert_ne!(my_struct_addr, &raw const my_struct);
    assert_ne!(field_addr, &raw const my_struct.field);
}

And Rust must not let you violate memory safety without the use of unsafe:

struct MyStruct {
    field: String,
    pointer_field: *const String,
}

fn main() {
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
        pointer_field: std::ptr::null(),
    };
    // my_struct.pointer_field is pointing to my_struct.field
    // so the struct is a self-referential
    my_struct.pointer_field = &raw const my_struct.field;

    let field_addr = &raw const my_struct.field;
    let pointer_before_move = my_struct.pointer_field;

    // move the struct to the heap, then back to the stack
    let my_struct = *Box::new(my_struct);

    // confirm that the address of the struct field
    // before and after the move has changed
    assert_ne!(field_addr, &raw const my_struct.field);

    // confirm that pointer didn't change even though the field
    // address changed, so this pointer is now dangling!
    assert_eq!(pointer_before_move, my_struct.pointer_field);
}

The code above is still okay because a raw pointer is an inert thing as long it's not dereferenced, and that requires unsafe, but if you added code that dereferenced this pointer, then either it should be in an unsafe function (so the caller has to take care of the unsafety) or you need Pin.

If you try to do something similar with safe Rust, you find ourself unable to move the struct:

struct MyStruct<'a> {
    field: String,
    pointer_field: &'a String,
}

fn main() {
    let x = "".to_string();
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
        pointer_field: &x,
    };
    my_struct.pointer_field = &my_struct.field;

    // this won't work: compilation error
    // cannot move the struct because it is borrowed
    // let my_struct = *Box::new(my_struct);
}

Pinning

So what is Pin? It's basically a struct that helps us create a safe interface to interact with self-referential objects:

First let's start considering the Trait Unpin, most types in Rust are not self-referential and implement the trait Unpin automatically. However they will not automatically implement Unpin if they have a field that doesn't implement it. So, let's consider the two cases.

Implements Unpin: the first case is easy, if a type T implements Unpin, then you can, with safe Rust, convert &mut T to/from Pin<&mut T> with Pin::new and Pin::get_mut/Pin::into_inner. This is usefull to call if some method expects a Pin<&mut T> but we have a &mut T.
Does not implement Unpin: then it's impossible, with safe Rust, to convert &mut T to Pin<&mut T> directly, but there is a macro std::pin::pin! that allows us (with safe Rust) to convert an owned value T to Pin<&mut T>, losing access to ownership permanently (except though unsafe). In this case, it's also impossible, with safe Rust, to convert Pin<&mut T> to &mut T, so assuming you don't have ownership of T anymore (i.e: you didn't use unsafe Rust to get Pin<&mut T> to begin with), it's not possible to move the value out of its location, so that place in memory is pinned.

So let's illustrate this with an example:

use std::{marker::PhantomPinned, pin::Pin};

#[derive(Debug)]
struct MyStruct {
    field: String,
}

#[derive(Debug)]
struct MyStructPP {
    field: String,
    pin: PhantomPinned,
}

fn main() {
    let mut my_struct = MyStruct {
        field: "hei".to_string(),
    };

    let pinned: Pin<&mut MyStruct> = Pin::new(&mut my_struct);
    let mut_ref: &mut MyStruct = Pin::into_inner(pinned);
    dbg!(mut_ref);

    let mut my_struct = MyStructPP {
        field: "hei".to_string(),
        pin: PhantomPinned,
    };

    // won't work: compilation error
    // let pinned: Pin<&mut MyStructPP> = Pin::new(&mut my_struct);

    let pinned: Pin<&mut MyStructPP> = std::pin::pin!(my_struct);

    // won't work: compilation error
    // the macro took away our owned value of my_struct
    // dbg!(my_struct);

    // won't work: compilation error
    // let mut_ref: &mut MyStructPP = Pin::into_inner(pinned);

    // we can only can back to a &mut with unsafe
    unsafe {
        let pointer_back: &mut MyStructPP = Pin::into_inner_unchecked(pinned);
    }
}

Futures, executors

To begin, here's have a ultra quick intro on futures and executors, but assuming you already have some experience with async Rust.

A future in Rust can be created by an async block/closure or an async function or manually by implementing the Future trait:

async fn give_me_some_future() {}

fn main() {
    let _some_future = give_me_some_future();
    let _another_future = async {};
    let an_async_closure = async || {};
    let _another_future = an_async_closure();
}

Or manually by implementing the Future trait:

use std::{
    pin::Pin,
    task::{Context, Poll},
};
fn main() {
    let _some_future = SomeFuture(3);
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

The manually implemented future makes it quite clear that futures are lazy, such that just creating a future in general, doesn't do anything per se. We need to call await on them and for that we need an executor, e.g.: add tokio = { version = "1", features = ["rt", "macros"] } to your Cargo.toml or run cargo add tokio --features=rt,macros and then you can run this:

use std::{
    pin::Pin,
    task::{Context, Poll},
};

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let some_future = SomeFuture(3);
    assert_eq!(some_future.await, 3);
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

What an async executor do, in simple terms, is keep calling the poll function from the future until it gets a Poll::Ready.

Note: in pratice it's not a busy loop calling until it gets a Poll::Ready, but rather the executor give the future a Waker inside a Context, which the future use to tell the executor that it's ready to be polled again and make progress, but this is (almost) beyond the scope of this post.

So we can simulate this with:

use std::{
    pin::Pin,
    task::{Context, Poll},
};

// run
// cargo add futures
// or add
// futures = "0.3"
// to your Cargo.toml
use futures::task::noop_waker;

fn main() {
    // create a dummy waker and context
    // just to make the type system work
    let waker = noop_waker();
    let mut cx = Context::from_waker(&waker);

    let mut fut = SomeFuture(3);
    let pfut = Pin::new(&mut fut);
    assert_eq!(pfut.poll(&mut cx), Poll::Ready(3))
}

struct SomeFuture(u8);
impl Future for SomeFuture {
    type Output = u8;
    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<u8> {
        Poll::Ready(self.0)
    }
}

Or to visualize with something a little more complex:

use std::{
    pin::Pin,
    task::{Context, Poll},
    time::SystemTime,
};

use futures::task::noop_waker;

fn main() {
    let waker = noop_waker();
    let mut cx = Context::from_waker(&waker);

    let mut fut = SomeFuture(0);
    let mut pfut = Pin::new(&mut fut);
    loop {
        match pfut.as_mut().poll(&mut cx) {
            Poll::Pending => {}
            Poll::Ready(val) => {
                println!("\nFinal result: {val}");
                break;
            }
        }
    }
}

struct SomeFuture(u64);
impl Future for SomeFuture {
    type Output = u64;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<u64> {
        // Unix epoch timestamp in microseconds
        let timestamp = SystemTime::now()
            .duration_since(SystemTime::UNIX_EPOCH)
            .unwrap()
            .as_nanos();
        // A rudimentary random number generator
        let remainder_of_timestamp_divided_by_200 = timestamp % 200;
        if remainder_of_timestamp_divided_by_200 == 0 {
            Poll::Ready(self.0)
        } else {
            self.get_mut().0 += 1;
            // Tells the async executor that
            // we are ready to be polled again
            cx.waker().wake_by_ref();
            Poll::Pending
        }
    }
}

Self-referential futures

Not every future is self-referential per si, in fact, all the manually defined futures in the previous sections are non self-referential.

The compiler generated futures (through functions and async blocks/closures) are always marked are self-referential (don't implement Pin) though, even when they technically are not, so even this won't compile:

let mut fut = async {};
let x = Pin::new(&mut fut); // compile error!

To conclude, let's give an example of truly self-referential future we will explore in a follow up post:

use std::{
    pin::Pin,
    task::{Context, Poll},
};

// This future a self-referential
async fn some_future() -> String {
    let mut mstr = "Hello".to_string();
    mstr.push('1');

    // let's call this the child future
    async {
        // when first polling the future, we will stop
        // at this point. Here the child future has borrowed mstr
        // from the parent future so if we move the parent future
        // it will move mstr too, but the child future will have
        // the old address of mstr
        pending_once().await;
        mstr.push('2');
    }
    .await;

    mstr.push('3');
    mstr
}

// This future is not self-referential
// So this is compiles and runs:
// fn main() {
//     let mut fut = pending_once();
//     let _x = Pin::new(&mut fut);
// }
fn pending_once() -> impl Future<Output = ()> {
    struct PendingOnce {
        polled_once: bool,
        completed: bool,
    }
    impl Future for PendingOnce {
        type Output = ();

        fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
            let this = self.get_mut();
            if this.completed {
                panic!("future resumed after completion")
            } else if this.polled_once {
                this.completed = true;
                Poll::Ready(())
            } else {
                cx.waker().wake_by_ref();
                this.polled_once = true;
                Poll::Pending
            }
        }
    }
    PendingOnce {
        polled_once: false,
        completed: false,
    }
}

Tags: #rust