In this post, we dive deeper into self-referential futures in Rust, manually implemented with unsafe Rust.
To get a quick introduction (or recap) to futures, executors, and pinning, check my previous post Rust, self-referential structs and futures: part 1.
A simple non self-referential future
Not every future is self-referential, and, in fact, here we will give an example of a non self-referential future that we will reuse in the next section:
use std::{
pin::Pin,
task::{Context, Poll},
};
use futures::task::noop_waker;
fn main() {
let mut fut = PendingOnce::new();
let waker = noop_waker();
let mut cx = Context::from_waker(&waker);
// This can only be called for a type that
// implements Unpin (i.e. it is not self-referential)
// otherwise we need to call an unsafe function
let mut pfut = Pin::new(&mut fut);
assert!(pfut.as_mut().poll(&mut cx).is_pending());
assert!(pfut.as_mut().poll(&mut cx).is_ready());
// we can even call this to get out of the Pin safely
// pfut.get_mut();
}
struct PendingOnce {
polled_once: bool,
completed: bool,
}
impl Future for PendingOnce {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
let this = self.get_mut();
if this.completed {
panic!("future resumed after completion")
} else if this.polled_once {
this.completed = true;
Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.polled_once = true;
Poll::Pending
}
}
}
impl PendingOnce {
fn new() -> Self {
PendingOnce {
polled_once: false,
completed: false,
}
}
}
Note that not a single unsafe block was used here, and we were able to define and use a future that implements Unpin, i.e. it is an object that can be moved in memory without risking memory-safety violations.
The consequences of moving a self-referential future
Now let's define a truly self-referential future and move it with unsafe in an unsound way to see what happens:
use std::{
pin::{Pin, pin},
task::{Context, Poll},
};
use futures::task::noop_waker;
async fn some_future() -> String {
// let's call this the parent future
let mut mstr = "Hello".to_string();
mstr.push('1');
println!("mstr addr: {:p}", &mstr);
dbg!(&mstr);
// let's call this the child future
async {
// when first polling the future, we will stop
// at this point. Here the child future has borrowed mstr
// from the parent future so if we move parent future
// it will move mstr too, but the child future will have
// the old address of mstr
pending_once().await;
mstr.push('2');
println!("mstr addr: {:p}", &mstr);
dbg!(&mstr);
}
.await;
mstr.push('3');
println!("mstr addr: {:p}", &mstr);
dbg!(&mstr);
mstr
}
fn run(with_move: bool) -> String {
println!("==== Run with with_move set to {with_move} ====");
let waker = noop_waker();
let mut cx = Context::from_waker(&waker);
let mstr =
// branch with UB
if with_move {
let mut fut = some_future();
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
let _ = pfut.as_mut().poll(&mut cx);
let fut_addr = &raw const fut;
// UB (undefined behaviour) is created here
// move the future to the heap, then back to the stack
let mut fut = *Box::new(fut);
// confirm that the address of the future
// before and after the move has changed
assert_ne!(fut_addr, &raw const fut);
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
pfut.as_mut().poll(&mut cx)
// branch without UB
} else {
let fut = some_future();
// this macro would be equivalent to calling
// let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
// as before and then making sure that
// we never use the owned fut variable again
let mut pfut = pin!(fut);
let _ = pfut.as_mut().poll(&mut cx);
pfut.as_mut().poll(&mut cx)
};
println!("==== Done ====");
if let Poll::Ready(mstr) = mstr {
mstr
} else {
panic!()
}
}
fn main() {
let string_with_move = run(true);
let string_without_move = run(false);
dbg!(string_with_move);
dbg!(string_without_move);
}
fn pending_once() -> impl Future<Output = ()> {
struct PendingOnce {
polled_once: bool,
completed: bool,
}
impl Future for PendingOnce {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
let this = self.get_mut();
if this.completed {
panic!("future resumed after completion")
} else if this.polled_once {
this.completed = true;
Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.polled_once = true;
Poll::Pending
}
}
}
PendingOnce {
polled_once: false,
completed: false,
}
}
Which will output
==== Run with with_move set to true ====
mstr addr: 0x7ffdd3c244b8
[src/main.rs:13:5] &mstr = "Hello1"
mstr addr: 0x7ffdd3c244b8
[src/main.rs:24:9] &mstr = "Hello12"
mstr addr: 0x7ffdd3c24510
[src/main.rs:29:5] &mstr = "Hello13"
==== Done ====
==== Run with with_move set to false ====
mstr addr: 0x7ffdd3c24600
[src/main.rs:13:5] &mstr = "Hello1"
mstr addr: 0x7ffdd3c24600
[src/main.rs:24:9] &mstr = "Hello12"
mstr addr: 0x7ffdd3c24600
[src/main.rs:29:5] &mstr = "Hello123"
==== Done ====
[src/main.rs:78:5] string_with_move = "Hello13"
[src/main.rs:79:5] string_without_move = "Hello123"
[Finished running. Exit status: 0]
Notice in the UB branch how the address of mstr changed after the move in the parent future but did not in the child future.
On the UB branch, it might be surprising that we didn't have a segmentation fault, but instead we had push('2') successfully modify the string, despite having the old address... however, the result of the next push by the parent future is as if the previous push never existed.
This is probably the case because String keeps both a pointer to a str allocation that lives on the heap as well as the length of the str (and also its reserved space, but that is not relevant here), and therefore moving String doesn't move the str allocation itself but does move the length information (so the last push thought that the str still had length 6).
Manually implement our previous future
For didactic purposes, let's now manually implement the previous future:
use std::{
marker::PhantomPinned,
mem::MaybeUninit,
pin::{Pin, pin},
task::{Context, Poll},
};
use futures::task::noop_waker;
fn run(with_move: bool) -> String {
println!("==== Run with with_move set to {with_move} ====");
let waker = noop_waker();
let mut cx = Context::from_waker(&waker);
let mstr =
// branch with UB
if with_move {
let mut fut = some_future_fancy();
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
let _ = pfut.as_mut().poll(&mut cx);
let fut_addr = &raw const fut;
// UB (undefined behaviour) is created here
// move the future to the heap, then back to the stack
let mut fut = *Box::new(fut);
// confirm that the address of the future
// before and after the move has changed
assert_ne!(fut_addr, &raw const fut);
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
pfut.as_mut().poll(&mut cx)
// branch without UB
} else {
let fut = some_future_fancy();
// this macro would be equivalent to calling
// let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
// as before and then making sure that
// we never use the owned fut variable again
let mut pfut = pin!(fut);
let _ = pfut.as_mut().poll(&mut cx);
pfut.as_mut().poll(&mut cx)
};
println!("==== Done ====");
if let Poll::Ready(mstr) = mstr {
mstr
} else {
panic!()
}
}
fn main() {
let string_with_move = run(true);
let string_without_move = run(false);
dbg!(string_with_move);
dbg!(string_without_move);
}
fn some_future_fancy() -> impl Future<Output = String> {
struct MainFuture<'a> {
mstr: String,
subfuture: Option<SubFuture<'a>>,
state: u8,
_pin: PhantomPinned,
}
impl<'a> MainFuture<'a> {
fn new(mstr: String) -> Self {
MainFuture {
mstr,
subfuture: None,
state: 0,
_pin: PhantomPinned,
}
}
}
impl<'a> Future for MainFuture<'a> {
type Output = String;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<String> {
let this = unsafe { self.get_unchecked_mut() };
if this.state == 0 {
this.mstr.push('1');
println!("mstr addr: {:p}", &this.mstr);
dbg!(&this.mstr);
let raw_mstr = &raw mut this.mstr;
this.subfuture = Some(SubFuture {
mstr: unsafe { &mut *raw_mstr },
state: 0,
});
this.state = 1;
}
if this.state == 1 {
match Pin::new(this.subfuture.as_mut().unwrap()).poll(cx) {
Poll::Ready(()) => {
this.subfuture = None;
this.mstr.push('3');
println!("mstr addr: {:p}", &this.mstr);
dbg!(&this.mstr);
this.state = 2;
return Poll::Ready(this.mstr.clone());
}
Poll::Pending => {
return Poll::Pending;
}
}
}
panic!()
}
}
struct SubFuture<'a> {
mstr: &'a mut String,
state: u8,
}
impl<'a> Future for SubFuture<'a> {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
let this = self.get_mut();
let mstr = &mut *this.mstr;
if this.state == 0 {
this.state = 1;
cx.waker().wake_by_ref();
return Poll::Pending;
}
if this.state == 1 {
mstr.push('2');
println!("mstr addr: {mstr:p}");
dbg!(&mstr);
this.state = 2;
return Poll::Ready(());
}
panic!()
}
}
MainFuture::new("Hello".to_string())
}
This should give us a similar output as before.
More efficiency, more unsafety
While we are at it, we can use some extra unsafe Rust and make the code more efficient:
- Use
MaybeUninitfor storing the future instead of anOption; this is slightly better because, depending on the size of the future,Option<SubFuture<'a>>can causeOption<SubFuture<'a>>to take more space thanSubFuture<'a>, but that's not the case withMaybeUninit<SubFuture<'a>>. Moreover,MaybeUninithas some potentially faster methods likeassume_init(althoughOptionhasunwrap_unchecked, which should be equivalent or close to it, but at that point it is unsafe anyway). - Also use
MaybeUninitfor storing theString. This is useful to avoid theStringclone when the future returnsReady(I wouldn't be surprised if the compiler would avoid the cloning operation altogether through optimizations though). We could also avoid the clone withOption<String>::take, but that would spoil the fun of this section. Well, speaking of spoiling the fun, I have to mention thatOption<String>has the same size asStringdue to niche optimizations, so my previous excuse falls short here.
use std::{
marker::PhantomPinned,
mem::MaybeUninit,
pin::{Pin, pin},
task::{Context, Poll},
};
use futures::task::noop_waker;
fn run(with_move: bool) -> String {
println!("==== Run with with_move set to {with_move} ====");
let waker = noop_waker();
let mut cx = Context::from_waker(&waker);
let mstr =
// branch with UB
if with_move {
let mut fut = some_future_ultra_fancy();
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
let _ = pfut.as_mut().poll(&mut cx);
let fut_addr = &raw const fut;
// UB (undefined behaviour) is created here
// move the future to the heap, then back to the stack
let mut fut = *Box::new(fut);
// confirm that the address of the future
// before and after the move has changed
assert_ne!(fut_addr, &raw const fut);
let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
pfut.as_mut().poll(&mut cx)
// branch without UB
} else {
let fut = some_future_ultra_fancy();
// this macro would be equivalent to calling
// let mut pfut = unsafe { Pin::new_unchecked(&mut fut) };
// as before and then making sure that
// we never use the owned fut variable again
let mut pfut = pin!(fut);
let _ = pfut.as_mut().poll(&mut cx);
pfut.as_mut().poll(&mut cx)
};
println!("==== Done ====");
if let Poll::Ready(mstr) = mstr {
mstr
} else {
panic!()
}
}
fn main() {
let string_with_move = run(true);
let string_without_move = run(false);
dbg!(string_with_move);
dbg!(string_without_move);
}
fn some_future_ultra_fancy() -> impl Future<Output = String> {
struct MainFuture<'a> {
mstr: MaybeUninit<String>,
subfuture: MaybeUninit<SubFuture<'a>>,
state: u8,
_pin: PhantomPinned,
}
impl<'a> MainFuture<'a> {
fn new(mstr: String) -> Self {
MainFuture {
mstr: MaybeUninit::new(mstr),
subfuture: MaybeUninit::uninit(),
state: 0,
_pin: PhantomPinned,
}
}
}
impl<'a> Future for MainFuture<'a> {
type Output = String;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<String> {
let this = unsafe { self.get_unchecked_mut() };
if this.state == 0 {
unsafe { this.mstr.assume_init_mut().push('1') };
println!("mstr addr: {:p}", &this.mstr);
dbg!(&this.mstr);
let raw_mstr = unsafe { &raw mut *this.mstr.assume_init_mut() };
this.subfuture.write(SubFuture {
mstr: unsafe { &mut *raw_mstr },
state: 0,
});
this.state = 1;
}
if this.state == 1 {
match Pin::new(unsafe { this.subfuture.assume_init_mut() }).poll(cx) {
Poll::Ready(()) => {
unsafe { this.subfuture.assume_init_drop() };
unsafe { this.mstr.assume_init_mut().push('3') };
println!("mstr addr: {:p}", unsafe { this.mstr.assume_init_ref() });
dbg!(unsafe { this.mstr.assume_init_ref() });
this.state = 2;
return Poll::Ready(unsafe { this.mstr.assume_init_read() });
}
Poll::Pending => {
return Poll::Pending;
}
}
}
panic!()
}
}
struct SubFuture<'a> {
mstr: &'a mut String,
state: u8,
}
impl<'a> Future for SubFuture<'a> {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
let this = self.get_mut();
let mstr = &mut *this.mstr;
if this.state == 0 {
this.state = 1;
cx.waker().wake_by_ref();
return Poll::Pending;
}
if this.state == 1 {
mstr.push('2');
println!("mstr addr: {mstr:p}");
dbg!(&mstr);
this.state = 2;
return Poll::Ready(());
}
panic!()
}
}
MainFuture::new("Hello".to_string())
}
Which again should produce roughly the same output as before.
Note 1: the efficiency increase here should be really tiny, so this is more for didactic purposes (it would make more sense if we were creating or cloning a lot of Options in a tight loop).
Note 2: I ended up creating a lot of unsafe blocks, to the point that it might have been more readable to just put the whole thing in an unsafe block, but for didactic purposes I wanted to make it more explicit where unsafe code was being called.
Note 3: All unsafe code in this article is Miri-approved, except, of course, for the intentional UB inside the run function. You can try it yourself with cargo +nightly miri run.