Integrating a large Tokio-based Rust library with Haskell

One of the hallmarks of Rust is its relative ease of interoperability with other languages. Python, Ruby, and Node.js all have fairly robust interoperability bridges with Rust. Haskell, on the other hand, currently has a fairly limited set of options for complex integration. Sure, there are examples of calling Rust from Haskell, but they tend to be small, synchronous libraries.

That level of support is great for projects that are already written in Haskell and just need to leverage some Rust code for narrow performance-sensitive problems like compression, encryption, or parsing. But what if you want to integrate with a large Rust library that uses more advanced features like async/await? That takes a bit more elbow grease. Let's take a look at some of the challenges and solutions for how to integrate a Tokio-based Rust library with Haskell.

In my case, I am working on a Haskell library for the Temporal workflow engine. Temporal is an orchestration engine that allows you to write long-running, stateful, distributed applications. The neat thing is that it is built so that you can use your language of choice to write workflow code– but the guarantees it provides make writing a client per language quite difficult.

In order to alleviate the burden of writing a client per language, Temporal provides an sdk-core project that abstracts away much of the complexity of the Temporal protocol. The sdk-core project is written in Rust and uses Tokio for its async runtime. Clients for each language both call into the Rust library and may be reactivated by the Rust library when a blocking call is resumed.

At a high level, the GHC multithreaded runtime and Tokio are very similar. Both use a thread pool to execute tasks. Both maintain a run queue of tasks that want to be executed, and have a method to suspend and resume tasks as they become blocked and unblocked. In theory, this should allow us to integrate the two runtimes fairly easily. In practice, we need to do a bit of work to make sure that cooperatively suspending and resuming between the two doesn't incur too much overhead.

On the Haskell side, the cheapest mechanism that we have for intentionally suspending a thread is an MVar. It is a synchronizing mutable variable that blocks the thread when it is empty and unblocks it when it is full. They are often used to lock access to a shared resource even within native Haskell code.

Naively the easiest way to suspend and resume a Haskell thread from a foreign library would be to do something like this:

import Control.Concurrent.MVar
import Foreign.Ptr
import Foreign.Storable

type ResultCallback a = Ptr a -> IO ()

foreign import ccall "wrapper"
  wrap_resumeHaskellWithResult :: ResultCallback a -> IO (FunPtr (ResultCallback a))

data BlockingThingResultStruct = BlockingThingResultStruct
  { result :: Int
  }

instance Storable BlockingThingResultStruct where
  sizeOf _ = 8
  alignment _ = 8
  peek ptr = do
    result <- peekByteOff ptr 0
    pure BlockingThingResultStruct { result }
  poke ptr BlockingThingResultStruct { result } = do
    pokeByteOff ptr 0 result

foreign import ccall "do_blocking_thing"
  do_blocking_thing :: FunPtr (ResultCallback a) -> IO ()

doBlockingThing :: IO a
doBlockingThing = do
  mvar <- newEmptyMVar
  callback <- wrap_resumeHaskellWithResult $ \result -> do
    putMVar mvar result
  do_blocking_thing callback
  rawResult <- takeMVar mvar
  freeHaskellFunPtr callback
  peek rawResult

Here, we are allocating an MVar, creating a dynamic callback that will put the result into the MVar, and then passing the callback to the Rust library. The Rust library will call the callback when it has completed a blocked operation, and from there the Haskell code can continue on its merry way. This is a pretty common pattern for Haskell FFI code, and it works.

However, the bookkeeping that GHC needs to do when wrapping a callback is not free. In fact, it is quite expensive. Calling back into Haskell too, is pretty pricy. Creating a one-shot callback for every blocking operation is perhaps the easiest thing to do, but for frequent blocking operations, it is not the most efficient.

Luckily, GHC has a mechanism that lets us avoid using callbacks entirely. The GHC runtime provides a C function called hs_try_putmvar that allows a foreign library to resolve an MVar by placing () into it. This is a much cheaper operation, as it does not require any bookkeeping on the Haskell side. It simply enqeues the thread to be resumed on the next available RTS context switch. If we were to write the function signature in Haskell, it would be MVar () -> IO ().

Okay, that's great, but how do we use it if the MVar doesn't carry any information in it? The key is that we only use the MVar as a synchronizing barrier. We can allocate memory in Haskell that the Rust library can write to, and then use read the result out once the MVar is resolved:

import GHC.Conc (newStablePtrPrimMVar, PrimMVar)

makeExternalCall = mask_ $ do
  mvar <- newEmptyMVar
  sp <- newStablePtrPrimMVar mvar
  fp <- mallocForeignPtr
  withForeignPtr fp $ \presult -> do
    cap <- threadCapability =<< myThreadId
    scheduleCallback sp cap presult
    takeMVar mvar `onException`
      forkIO (do takeMVar mvar; touchForeignPtr fp)
    peek presult

foreign import ccall "scheduleCallback"
    scheduleCallback :: StablePtr PrimMVar
                     -> Int
                     -> Ptr Result
                     -> IO ()

Sweet, now we have a way to call into Rust without using callbacks. How do we wire this up on the Rust side? Let's start by defining the basic constructs that mirror the Haskell side:

#[repr(C)]
pub struct MVar {
    _data: [u8; 0],
    _marker:
        core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
}

#[repr(C)]
pub struct Capability {
  pub cap_num: c_int
}

#[link(name ="HSrts", kind="dylib")]
extern "C" {
  pub fn hs_try_putmvar(capability: Capability, mvar: *mut MVar);
}

Here, we're defining a Rust struct that is an opaque representation of an MVar on the Haskell side. For the purposes of Rust integration, we don't need to know anything about the MVar other than its address. We also define a Capability struct that is a newtype of a c_int. When we make a call into Rust from Haskell that is going to block, we pass in the Capability that the MVar is going to suspend on. A capability in GHC RTS parlance is a thing that holds all the state an OS thread/task needs to run Haskell code. This allows us to provide a hint to the GHC runtime about what Haskell thread to resume once the MVar is resolved.

Now that we have the basic constructs, let's think about how to map a Rust Future to Haskell-side results.

A Future is a type that represents a value that will be available at some point in the future. It is a bit like a lazy IO action in Haskell. It has the ability to resolve with a value or an error. Conceptually, as long as we know how to convert the value and error types to and from pointers, we can use turn a future Result<A, E> into a Haskell Either E A. To make it easier to carry around the MVar and the result/error pointers, let's make a happy little struct:

pub struct HsCallback<A, E> {
  pub cap: Capability,
  pub mvar: *mut MVar,
  pub result_slot: *mut*mut A,
  pub error_slot: *mut*mut E,
}

Now, we have a small problem issue to resolve:

Some of the types we want to pass into Haskell are going to be data that we want to turn into a Haskell equivalent, such as numbers, strings, boolean values, etc., but other types are going to be opaque pointers / handles to Rust data structures that we use to operate the sdk-core library via FFI calls. We need a way to convert both of these types into a pointer that we can pass into Haskell, but they have fairly different memory management needs.

Sonos has a really handy Rust library called ffi_convert that provides a number of utilities that make it easy to convert Rust types to and from C-compatible structures. Here, we leverage the RawPointerConverter trait to signify that as long as we know how to put a type into a pointer, we can pass it into Haskell:

use ffi_convert::*;
use std::future::Future;

impl <A, E> HsCallback<A, E> {
  pub fn put_success(self, result: A) 
  where
    A: RawPointerConverter<A>,
  {
    unsafe {
      *self.result_slot = result.into_raw_pointer_mut();
      *self.error_slot = std::ptr::null_mut();
      hs_try_putmvar(self.cap, self.mvar);
    }
  }

  pub fn put_failure(self, error: E) 
  where
    E: RawPointerConverter<E>,
  {
    unsafe {
      *self.error_slot = error.into_raw_pointer_mut();
      *self.result_slot = std::ptr::null_mut();
      hs_try_putmvar(self.cap, self.mvar);
    }
  }

  pub fn put_result(self, result: Result<A, E>) 
  where
    A: RawPointerConverter<A>,
    E: RawPointerConverter<E>,
  {
    match result {
      Ok(result) => self.put_success(result),
      Err(error) => self.put_failure(error),
    }
  }
}

impl Runtime {
  pub fn future_result_into_hs<F, T, E>(&self, callback: HsCallback<T, E>, fut: F)
  where
      F: Future<Output = Result<T, E>> + Send + 'static,
      T: RawPointerConverter<T>,
      E: RawPointerConverter<E>
  {
    let handle = self.core.tokio_handle();
    let _guard = handle.enter();
    let result = handle.block_on(fut);
    callback.put_result(result);
  }
}

That's us sorted on the Rust side. When our Rust library wants to call into Haskell, it can use the future_result_into_hs method on the Runtime struct to convert the outcome of a future into a success or failure pointer, and then resolve the MVar once we have some data to work with.

Now, let's take a look at how we can use this on the Haskell side. Similarly to RawPointerConverter on the Rust side, we need to introduce a notion of a mapping between the raw data that we get from Rust and the Haskell types that we want to use.

class ManagedRustValue r where
  type RustRef r :: Type
  type HaskellRep r :: Type
  fromRust :: proxy r -> RustRef r -> IO (HaskellRep r)

data CArray a = CArray
  { cArrayPtr :: Ptr a
  , cArrayLen :: CSize
  }

instance ManagedRustValue (CArray Word8) where
  type RustRef (CArray Word8) = Ptr (CArray Word8)
  type HaskellRep (CArray Word8) = ByteString
  fromRust _ rustPtr = mask_ $ do
    (CArray bytes len) <- peek rustPtr
    bs <- ByteString.packCStringLen (castPtr bytes, fromIntegral len)
    rust_drop_byte_array rustPtr
    pure bs

Why do we need this? In the case of primitive types like String, we want to copy the data out of the Rust memory and into memory that is managed by the Haskell runtime. After that, we go ahead and free the Rust memory in the fromRust function so that we don't leak memory. In the case of handle-esque types, we need to keep the value on Rust side alive to do anything meaningful.

For these types, we instead want the ability to put manage the result using a ForeignPtr. ForeignPtr allows us to associate finalizers with a pointer, so that when the pointer is garbage collected in Haskell, we can call the Drop implementation on the Rust side. This requires us to write a Rust FFI wrapper for each type that we want to use in Haskell that we then have to import on the Haskell side, but it greatly simplifies the memory management story.

This brings us to the final piece of the puzzle, initiating a call from Haskell that uses Tokio:

newtype TokioResult a = TokioResult (Ptr (RustRef a))

withTokioResult :: (RustRef a ~ Ptr a) => (TokioResult a -> IO b) -> IO b
withTokioResult f = alloca $ \ptr -> do
  poke ptr nullPtr
  f (TokioResult ptr)

peekTokioResult :: (ManagedRustValue a, RustRef a ~ Ptr a) => TokioResult a -> (RustRef a -> IO b) -> IO (Maybe b)
peekTokioResult (TokioResult ptr) f = do
  inner <- peek ptr
  if (inner == nullPtr)
    then return Nothing
    else Just <$> f inner

type TokioCall e a = StablePtr PrimMVar -> Int {- the capability -} -> TokioResult e -> TokioResult a -> IO ()

-- | Dropping can't be done automatically if the result is returned without async exceptions
-- intervening, because we don't want to drop things like `Client` while they're still in use.
-- So we should return ForeignPtrs for things that need to stay alive, and then we can drop when we're done.
makeTokioAsyncCall :: (ManagedRustValue e, RustRef e ~ Ptr e, ManagedRustValue a, RustRef a ~ Ptr a) 
  => TokioCall e a 
  -> (RustRef e -> IO f)
  -> (RustRef a -> IO b)
  -> IO (Either f b)
makeTokioAsyncCall call readErr readSuccess = mask_ $ do
  mvar <- newEmptyMVar
  sp <- newStablePtrPrimMVar mvar
  withTokioResult $ \errorSlot -> withTokioResult $ \resultSlot -> do
    let peekEither = do
          e <- peekTokioResult errorSlot readErr
          case e of
            Nothing -> do
              r <- peekTokioResult resultSlot readSuccess
              case r of
                Nothing -> error "Both error and result are null"
                Just r -> return (Right r)
            Just e -> return (Left e)

    (cap, _) <- threadCapability =<< myThreadId
    call sp cap errorSlot resultSlot
    () <- takeMVar mvar `onException`
      forkIO (takeMVar mvar >> void peekEither)
    peekEither

There we have it! We can now call into Rust from Haskell, free memory at the appropriate times, and cooperatively interoperate between the GHC and Tokio runtimes. Hopefully the ideas here provide a useful starting point for your own Rust-Haskell interop adventures. If you want to see a more complete example, check out the Temporal Haskell SDK– it also includes a custom Setup.hs configuration that uses cargo to build the Rust library and link against it, as well as a basic Nix configuration.