FreeRTOS Thread Pack: Create Multithreaded IoT Code The Easy Way

Updated on 2021-02-27

Using a popular RTOS to enable easy multithreading on your IoT gadgets

C++threads Arduino synchronize multi-threading pooling realtime C++14 synchronization

Introduction

The idea of writing multithreaded code targeting an IoT device is probably off-putting, to say the least. With integrated debugging facilities virtually non-existent, long code upload times leading to long build cycles, limited RAM, and usually a single CPU core to work with, the downsides tend to outweigh the upsides.

However, there's significant untapped opportunity here, especially given devices like ESP32s and various ARM based devices come in multicore configurations, and even in single core configurations virtually all of these devices are severely bottlenecked at most I/O points. At the very least, some sleepy I/O threads could really help us out.

We'll be relying on FreeRTOS to provide the thread scheduling and the device specific "SMP" support to utilize all the cores. FreeRTOS is available as a "library" in the Arduino library manager but that fork doesn't work. FreeRTOS is already baked in to the ESP32's IDF codebase, so it's always available while on the ESP32.

FreeRTOS

FreeRTOS is great, but we can improve it by adding a wrapper around threads, and by providing our own pooling and synchronization facilities, so that's what the FreeRTOS Thread Pack brings to the table.

You can use this library to make your code utilize multiple threads and cores more easily and flexibly.

Notes on Compatibility

The ESP32 variant of FreeRTOS is what this code was written and tested on. There are certain functions it uses which are ESP32 to specific, like the CPU affinity functions. Those must be removed for other implementations. I have not yet tested this for other platforms, but when I do, I'll add conditional compiles.

The Arduino version of FreeRTOS is a fork of the original code, and doesn't support a lot of FreeRTOS. I started to add support for it before realizing how little of the OS it actually implements, at which point I decided not to continue. The synchronization contexts in particular are just not feasible on an AVR device with the Arduino framework version of FreeRTOS. Add that to the lack of CPU support for optimized atomic operations on some of the CPUs like the atMega2560 and I don't have any efficient way to replace the things that the Arduino version lacks.

A Different Approach

I typically describe concepts before getting to code, but here I assume familiarity with concepts like multithreading, thread pooling, and synchronization. I don't want to waste your time with this library if you are new to these things. This is for an audience who already understands the concepts, so this article is divided up basically in "How to use it" and "How I made it" sections.

How to use it

How to Use It

.NETisms

First off, I'd like to start out with the perhaps impolitic suggestion that Microsoft got a number of things right when it comes to threading in .NET. Thread pooling is a great thing. Synchronization contexts are a fantastic idea. We're going to be leveraging these good ideas in our own code, despite it not being .NET. As such, there will be some amount of familiarity to using this code if you're already familiar with C#. Of course, there are differences, being these are vastly different programming environments, but "familiar" is the term that carries the day here.

When to Avoid Using This

Let's talk about when you shouldn't be using this. Threads are not lightweight objects. They need room for a stack, itself allocated from the heap. Each thread imposes a burden in terms of CPU overhead on the thread scheduler. On a tiny device, CPU cycles and RAM are precious. Furthermore, heap allocations themselves are burdensome. Threads are basically heavy relative to these little devices. Using default parameters, on an ESP32 for example, each thread consumes a little over 4kB of heap, and forces the scheduler for that core to add the thread to the list of context switches it must make.

Due to this, you do not want to use a thread unless you're going to get some sort of benefit out of it. If you need to wait on slow I/O while doing other things, a thread might be a good option. If you need to offload some CPU intensive work on another core while continuing to operate using the primary core, a thread might be a good option. Otherwise, there are more efficient ways to time slice on an IoT, device, such as using fibers or even coroutines. Fibers aren't currently supported by this library, but FreeRTOS supports them, and they may be added to this library in the future.

A Threading Hat Trick

We have three major concerns here - thread manipulation, pool management, and synchronization. Covering those areas allows us to be "feature complete" in terms of using threads in the wild, so they're important.

Thread Manipulation with the FRThread Class

FRThread is quite a bit like .NET's Thread class. FRThread::create() and FRThread::createAffinity() create a foregrounded thread (though it can do it at idle priority if indicated). You can get the current thread with FRThread::current() or sleep the current thread with FRThread::sleep(). You can also get the CPU idle thread with FRThread::idle().

Thread Lifetime

A thread is a foreground thread whose lifetime is dictated by the lifetime of the code passed to the creation function. If the code spins an infinite loop, the thread will live until FRThread.abort() is called. Note that the lifetime is not tied to the FRThread instance itself. If it goes out of scope, the thread remains alive until abort() is called, or the code passed to the creation function runs to completion.

Creation Function

A thread creation function has a signature of void(const void*). It may be a lambda or wrapped with std::function<void(const void*>)>. The void* argument is an application defined state passed to the function upon execution. Any code in this function will be run on the target thread, and synchronization should be used on any shared data it has access to. The thread is automatically destroyed when the code exits.

Thread Lifecycle

When FRThread::create() or FRThread::createAffinity() are called, a thread is created and suspended before any of the passed in code is executed. Under FreeRTOS threads are not created in the suspended state, so a thunk is used by FRThread which causes the thread to suspend itself immediately upon creation. That is why it is said that a thread is created and then suspended rather than created in the suspended state, like .NET's Thread is. The difference is subtle, but not necessarily insignificant, as the thread is technically "alive" for a brief blip upon creation before going to sleep. For most scenarios, this is no different than being created suspended.

None of the code in the thread will run until FRThread.start() is called. At any point, FRThread.suspend() can be called to put the thread to sleep until FRThread.start() is called again.

An Example

Here is some example code for using a thread. It's admittedly contrived, but it demonstrates the basics:

#include <FRThreadPack.h>
void setup() {
  Serial.begin(115200);

  // create some mischief
  FRThread mischief = FRThread::create([](const void*){
    while(true) {Serial.print(".");}
  },nullptr);

  // start the mischief
  mischief.start();

  // print out a string. We don't do so all at once, because
  // depending on the platform, Serial.print/println are atomic,
  // meaning a thread can't interrupt them.
  const char*sz="The quick brown fox jumped over the lazy dog\r\n";
  char szch[2];
  szch[1]=0;
  while(*sz) {
    szch[0]=*sz;
    Serial.print(szch);
    ++sz;
  }
  // that's enough foolishness!
  mischief.abort();
}

void loop() {
}

The output will be something like this, but probably with more dots:

...The quic...k brown ....f...ox jumpe...d over th...e lazy d...og

Thread Pooling With the FRThreadPool class

FRThreadPool allows you to enqueue work to be dispatched by a thread from the pool as they become available. Thread pools are great for managing certain kinds of long running operations.

Thread Allocation and Destruction

Under .NET, the ThreadPool class already has several "preallocated" threads waiting in the wings. This is fine for some environments, but not so appropriate for an IoT device where threads are so expensive to even keep around, relatively speaking. Also, with the limited capabilities of an RTOS scheduler, you kind of need to know your hardware and allocate your threads to your different cores yourself. In my experience, FreeRTOS isn't great at auto assigning new threads to cores. We don't use affinity in the example code because some platforms don't have more than one core, but otherwise you should consider it. Also, it's entirely likely that you'll want to use higher priority threads on one core, and lower priority threads on your primary core, and perhaps in the same pool.

Consequently, FRThreadPool demands that you create threads for it yourself. The thread pool uses special "dispatcher" threads rather than general purpose FRThread threads that are more efficient due to being "pool aware". Consequently, to create threads for the FRThreadPool you use FRThreadPool.createThread() and FRThreadPool.createThreadAffinity() to create threads for the pool. These threads are automatically destroyed when FRThreadPool goes out of scope. Note that when you create them you could in theory specify different stack sizes for each one, but in practice doing so just wastes memory, since the code executed in the pool will be constrained by the smallest stack size specified.

These methods return an FRThread but you should not call abort() on these threads. Use FRThreadPool.shutdown() if you want to explicitly destroy all the threads. This method returns immediately, but does not complete until all threads have finished their current operation.

Enqueuing Work

Much like .NET's ThreadPool, you can use FRThreadPool.queueUserWorkItem() to dispatch work to one of the waiting pool threads. If there are no threads waiting, the work will be placed in the backlog queue. If that gets full, queueUserWorkItem() blocks until there's room. The function signature and state parameters are the same as with FRThread::create().

An Example

Some of that probably made this sound a bit more complicated than it is. If so, an example should clear it up:

#include <FRThreadPack.h>

void setup() {
  Serial.begin(115200);
  // create a thread pool
  // note that this thread pool
  // will go out of scope and all
  // created threads will be exited
  // once setup() ends
  FRThreadPool pool;
  // create three threads for the pool
  // all of these threads are now waiting on incoming items.
  // once an item becomes available, one of the threads will
  // dispatch it. When it's complete, it will return to the
  // listening state. You do not use this thread pool the way
  // you use .NET's thread pool. .NET's thread pool
  // has lots of reserve threads created by the system.
  // This threadpool has no threads unless you create them.
  pool.createThread();
  pool.createThread();
  pool.createThread();

  // now queue up 4 work items. The first 3 will start executing immediately
  // the 4th one will start executing once one of the others exits.
  // this is because we have 3 threads available.
  pool.queueUserWorkItem([](void*state){
    delay(3000);
    Serial.println("Work item 1");
  },nullptr);
  pool.queueUserWorkItem([](void*state){
    delay(2000);
    Serial.println("Work item 2");
  },nullptr);
  pool.queueUserWorkItem([](void*state){
    delay(1000);
    Serial.println("Work item 3");
  },nullptr);
  pool.queueUserWorkItem([](void*state){
    Serial.println("Work item 4");
  },nullptr);

  // the thread pool exits here, waiting for pool threads to complete
  // so this can take some time.
}

void loop() {
}

This will most likely output the following - threading is not deterministic by nature:

Work item 3
Work item 4
Work item 2
Work item 1

Synchronization with FRSynchronizationContext

And now for something completely different. If you're familiar with .NET WinForms or WPF development, you've probably used a SynchronizationContext before, albeit indirectly, but you may have never encountered one up close and personal. They are an unusual, but clever abstraction around a thread safe message passing scheme.

What they do is give you the ability to dispatch code to be executed on a particular thread - usually the application's main thread.

A Different Approach to Synchronization

Normally, when we think of synchronization of multithreaded code, we think about creating read/write barriers around data and acquiring or releasing resources.

That's a great way to do things if you have the patience for it. It can be as efficient as you make it. It's also a nightmare to chase down bugs when things go wrong, especially since they so often manifest as intermittent race conditions.

There's another way to do things that involves message passing. Basically, we use thread safe ring buffers to hold messages. We post messages to them from one or more threads to be picked up and acted upon by another thread. Windows uses something like this for its ... windows.

The messages are the data. The synchronization has already been done on the message. There are downsides to this, one of which is the lack of flexibility. How general purpose can a message be? It's usually hard to come up with a message general purpose enough to handle every situation but follow along here, because I promise you that's exactly what a synchronization context provides.

Before we get there, we need to talk about lifetimes of code.

Crouching Entry Point, Hidden Main Loop: Thread "Lifetime" Loops

Almost all IoT applications loop in their main thread. In Arduino apps, you can't see the loop itself in your .ino file, but that loop is there. It's hidden by the IDE's source mangling "feature" but it is what calls the loop() method in your code.

Basically, in essence, if we translate what the Arduino framework does to a classic C application, it would look something like this**:

// forward declarations:
void setup();
void loop();
// "real" entry point:
int main(int argc, char** argv) {
    setup();
    while(true) {
        loop();
    }
    return 0;
}
// your code is inserted here
#include "yourcode.ino"
...

** I'm not saying it literally translates to this code. This is simply for demonstration. What your platform does probably is use FreeRTOS to create a new "task" which calls setup() and loop(), but it achieves the same thing.

The point here is that your app will exit absent some sort of thing to prevent it, and in an IoT device, ultimately you don't exit main(). Ever. Because doing so leads to the abyss. There is nothing after main(), eternal. This way lies dragons. Because of that, there's a loop somewhere instead, or the equivalent that prevents that exit from happening.

If it didn't, your device would almost certainly reboot itself (or failing that, halt) every time it got to the end of main() because there's nothing else to do. This is not a PC. There's no command line or desktop to drop back to. There's no concept of a "process" - just the code that runs on boot. "Process exit" is effectively an undefined condition! There's absolutely nothing to do other than reboot or halt - I mean, unless some devs got cheeky and snuck Space Invaders or a flight simulator onto the die itself. After some of the things I've seen, I wouldn't rule it out. Maybe you'll find treasure.

But for now we loop, one way or another.

You'll note that this is very similar to how the lifetime of other threads are managed as well, absent the rebooting on exit - what I mean is that it's similar in that the thread lives as long as its code does. Once the code exits that which kept it alive is effectively "dead" as well. Out of scope is out of scope, is ... something I'll leave the nihilists to ponder.

The loops that prevent all of this philosophy and the existential problems it brings are what I call "lifetime loops". These loops carry the distinction of defining when your application lives and dies - the loops that are the pulse of your thread and keep it alive - even your main application/thread loop. This is true of Windows GUI applications, of interactive console applications on desktop systems, of daemons on servers, of anything that needs to live more than a "do a task and done" sort of life, regardless of platform.

A synchronization context "lives" in lifetime loops in loops like this, whether in the main thread's lifetime loop or a secondary thread's lifetime loop. It steals cycles from its host loop to process messages coming in from other threads. If we were to modify the hypothetical classic C app from the above, to insert a synchronization context into our main loop, it would look like this:

#include <FRThreadPack.h>
FRSynchronizationContext g_mainSync;
// forward declarations:
void setup();
void loop();
// "real" entry point:
int main(int argc, char** argv) {
  setup();
  while(true) {
    loop();
    // process an incoming
    // message from a thread,
    // if there is one:
    g_mainSync.processOne();
  }
  return 0;
}
// your code is inserted here
#include "yourcode.ino"
...

Now obviously, the Arduino IDE will not let us do that. PlatformIO might, but it's still a hack. Fortunately, we don't need to at all. It was just to illustrate the concept. We can accomplish the exact same thing by moving the relevant code to the .ino file itself, like the #include, and the g_mainSync global declaration, and then calling g_mainSync.processOne() from inside loop():

#include <FRThread.h>
FRSynchronizationContext g_mainSync;

void setup() {
}
void loop() {
  // process incoming messages
  g_mainSync.processOne();
}

But now you see, what we're really doing is "injecting" g_mainSync into the main lifetime loop of the application.

You can do something similar to inject them into the lifetime loops of secondary threads. You can create as many as you need, but most often you'll just have the one, living in the main application's thread, no matter how many secondary threads you have.

The Common Use Case

The typical scenario is to have the main thread create and dispatch long running tasks on secondary threads (perhaps using a pool) and have those threads report their results back to the main thread using message passing.

A More Complicated Use Case

A rarer scenario is you have secondary threads that must orchestrate, for example, reading from some I/O source, doing some post-processing and writing it to some other I/O source might involve more than one thread communicating with each other, and also possibly the main thread. In this case, you might have a synchronization context "living" in the writer thread's lifetime loop that the reader thread posts messages to. You'd also have the main synchronization context in the main thread that both secondary threads can post messages to.

One Message That Says Everything

I've mentioned passing messages around a lot, but I've only hinted at what the messages actually consist of.

We only have one kind of message in a synchronization context, but the message is as flexible as flexible can be. The message holds a std:function<void(void *)> functor. If this were C#, I'd say it held an Action

FreeRTOS Thread Pack: Create Multithreaded IoT Code The Easy Way

Introduction

Notes on Compatibility

A Different Approach

How to Use It

.NETisms

When to Avoid Using This

A Threading Hat Trick

Thread Manipulation with the FRThread Class

Thread Lifetime

Creation Function

Thread Lifecycle

An Example

Thread Pooling With the FRThreadPool class

Thread Allocation and Destruction

Enqueuing Work

An Example

Synchronization with FRSynchronizationContext

A Different Approach to Synchronization

Crouching Entry Point, Hidden Main Loop: Thread "Lifetime" Loops

The Common Use Case

A More Complicated Use Case

One Message That Says Everything

Putting it All Together Finally: Posting Messages

An Example

Post and Send in the Wild

Summarizing Synchronization Contexts

How It Works

The FRThread Class

The FRSynchronizationContext Class

The FRThreadPool Class

Points of Interest

History