Updated on 2024-02-24
Got an embedded or IoT widget with a screen but no real memory or flash space to speak of? Read this.
I'm working on a "Tamagotchi"-like game with some friends, which we plan on implanting into a PC keyboard that has little monochrome OLED screens on it. Specifically, the keyboard is a Boardsource Lulu. We want it to run alongside the existing firmware so that the keyboard remains fully functional. There are two varieties of the Lulu, and we happen to be working with the AVR model, which has 2.5KB of RAM, and 32KB of flash memory, of which 18KB or so of the flash is used by existing code. I'm not as sure about the RAM usage of the existing firmware, but where we're going we won't really need any.
I don't expect you to have one of these keyboards. That would just be mean, as they are pretty expensive, niche interest devices. Instead, I've crafted a PlatformIO** project on an ESP32-S3 or other Arduino compliant device. You'll need something like that, and an SSD1306 screen wired up to it over I2C. On the ESP32-S3, I used SDA of 16, and SCL of 17.
** Sorry folks that are still using the Arduino IDE. I gave up on it for being too limited, and I strongly recommend installing Platform IO even if you don't use it all the time as it's much more realistic for projects with multiple source files, or that need to support multiple devices.
The code is by default configured to drive a 128x32 "bandaid" form factor screen, but if you have the 128x64 model, change the SSD1306_HEIGHT define in the code to reflect that. You'll also need to generate larger images using the cigen tool, since the current ones are 128x32.
The first thing to keep in mind is the framebuffer format on these little displays is weird, and that's putting it charitably. It's a monochrome display so there are 8 pixels per byte, but the pixels are packed vertically rather than horizontally into those bytes. The bytes however, are arranged traditionally left to right, top to bottom, so (0,0)-(0-7) is byte 1 and (1,0)-(1-7) is byte two.
Also because it's monochrome, and because it's write only, you either need to keep a 512-1024 byte framebuffer (depending on the resolution of your display hardware) or you can just stream a framebuffer off of flash direct over I2C to the display, requiring no RAM, but limiting you to static images.
For our project, we only have 2.5KB of RAM in total, and it's shared with other firmware components, so we're taking the latter approach. We won't need any sort of dynamic rendering, and we don't even have the flash space to store that kind of logic anyway.
To save more flash space, we can compress the images using run length encoding. This code actually allows either no compression or 3 styles of RLE so it can choose which yields the smallest images when the images are being generated. Run length encoding is simple, lightweight, and for this type of data, it's typically highly effective.
What we need is an application that will take images and generate RLE compressed uint8_t[] arrays, and then some code to spit those to a display.
Enter cigen. This is a little C# command line application that takes a series of images and generates RLE compressed C array content containing a framebuffer for each passed in image.
cigen v1.0 Copyright c 2024 by honey the codewitch
Usage: cigen {<infile1> [<infileN>]} [/output <outfile>] [/threshold <threshold>]
<infile> The input files
<outfile> The output file - defaults to <stdout>
<threshold> The luminosity threshold (0-255, defaults to 127)
- or -
/help Displays this screen and exits
It's pretty simple to use. You pass it a series of images, each of the same size as your LCD panel (ours is 128x32 in this demo), an optional
When you run it with the Debug arguments provided with the project, you'll get the following output:
#ifndef OUTPUT_H
#define OUTPUT_H
#include <stdint.h>
#include "progmem.h"
const uint8_t output_frame_1[] PROGMEM = {
0xff, 131, 0x3f, 0x9f, 0xcf, 0xef,
0xe7, 0xe7, 0xf3, 0xf3, 0xfb, 0xfb,
0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9,
0xf9, 0xfb, 0xfb, 0xf3, 0xf3, 0xe7,
0xe7, 0xef, 0xcf, 0x9f, 0x3f, 0x7f,
0xff, 98, 0x00, 2, 0xff, 6, 0x83,
0x01, 0x83, 0xc7, 0xff, 7, 0x83,
0x01, 0x83, 0xc7, 0xff, 7, 0x00, 2,
0xff, 97, 0xfc, 0xf9, 0xf3, 0xe7,
0xef, 0xcf, 0xcf, 0x9f, 0x9f, 0xbf,
0xbf, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f,
0x3f, 0x3f, 0xbf, 0xbf, 0x9f, 0x9f,
0xcf, 0xcf, 0xef, 0xe7, 0xf3, 0xf9,
0xfc, 0xfe, 0xff, 96
};
#define OUTPUT_FRAME_1_COMPRESSION 3
// [Compressed to 16.40625% of original. Len = 84 vs 512]
const uint8_t output_frame_2[] PROGMEM = {
0xff, 166, 0x3f, 0x3f, 0xbf, 0x9f,
0x9f, 0xdf, 0xdf, 0xdf, 0xcf, 0xcf,
0xcf, 0xcf, 0xcf, 0xdf, 0xdf, 0xdf,
0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x7f,
0xff, 101, 0x03, 0xf1, 0xfc, 0xfe,
0xfe, 0xff, 3, 0x0f, 0x07, 0x0f,
0x9f, 0xff, 7, 0x0f, 0x07, 0x0f,
0x9f, 0xff, 4, 0xfe, 0xfc, 0xf1,
0x03, 0x0f, 0xff, 97, 0xf8, 0xf3,
0xf7, 0xe7, 0xcf, 0xcf, 0xdf, 0x9f,
0x9e, 0xbf, 0xbf, 0xbf, 0x3f, 0x3f,
0x3f, 0x3f, 0x3f, 0xbf, 0xbf, 0xbe,
0x9f, 0x9f, 0xdf, 0xcf, 0xcf, 0xe7,
0xf7, 0xf3, 0xf8, 0xfc, 0xff, 64
};
#define OUTPUT_FRAME_2_COMPRESSION 3
// [Compressed to 16.40625% of original. Len = 84 vs 512]
const uint8_t output_frame_3[] PROGMEM = {
0xff, 73, 0x7f, 0x3f, 0xbf, 0xbf,
0x9f, 0x9f, 0x9f, 0x9f, 0x9f, 0x9f,
0x9f, 0xbf, 0xbf, 0x3f, 0x7f, 0x7f,
0xff, 105, 0x8f, 0xe7, 0xf3, 0xf9,
0xfc, 0xfe, 0xfe, 0xff, 2, 0xff, 11,
0xff, 3, 0xfe, 0xfc, 0xf9, 0xf3,
0xe7, 0x8f, 0x1f, 0xff, 97, 0x00, 2,
0xff, 6, 0xe0, 0xc0, 0xe0, 0xf1,
0xff, 7, 0xe0, 0xc0, 0xe0, 0xf1,
0xff, 7, 0x00, 2, 0xff, 98, 0xfc,
0xf9, 0xf3, 0xe7, 0xef, 0xcf, 0xdf,
0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x3f,
0x3f, 0x3f, 0x3f, 0x3f, 0xbf, 0x9f,
0x9f, 0xdf, 0xcf, 0xef, 0xe7, 0xf3,
0xf9, 0xfc, 0xfe, 0xff, 33
};
#define OUTPUT_FRAME_3_COMPRESSION 3
// [Compressed to 17.96875% of original. Len = 92 vs 512]
const uint8_t output_frame_4[] PROGMEM = {
0xff, 239, 0x7f, 0x7f, 0x7f, 0x7f,
0xff, 110, 0x0f, 0xe7, 0xe7, 0xf3,
0xf9, 0xf9, 0xfd, 0x3c, 0x1c, 0x1e,
0x1e, 0x3e, 0xfe, 0xfe, 0xfe, 0xfe,
0xfe, 0xfe, 0x3e, 0x1e, 0x1e, 0x1e,
0x3c, 0xfc, 0xfd, 0xf9, 0xf9, 0xf3,
0xe7, 0xe7, 0x0f, 0xff, 97, 0xf8,
0xf3, 0xf3, 0xe7, 0xcf, 0xcf, 0xdf,
0x9e, 0x9c, 0xbc, 0xbc, 0xbe, 0xbf,
0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0xbe,
0xbc, 0xbc, 0xbc, 0x9e, 0x9f, 0xdf,
0xcf, 0xcf, 0xe7, 0xf3, 0xf3, 0xf8
};
#define OUTPUT_FRAME_4_COMPRESSION 3
// [Compressed to 14.0625% of original. Len = 72 vs 512]
const uint8_t* output_images[] = {
output_frame_1,
output_frame_2,
output_frame_3,
output_frame_4
};
const int output_images_compression[] = {
OUTPUT_FRAME_1_COMPRESSION,
OUTPUT_FRAME_2_COMPRESSION,
OUTPUT_FRAME_3_COMPRESSION,
OUTPUT_FRAME_4_COMPRESSION
};
#endif // OUTPUT_H
This header is geared for QMK, but you can just copy what you need of the code, such as the arrays into your own program.
The meat of this application's functionality is in Program.cs in the Run() method. In broad strokes, it loads all of the inputs into System.Drawing.Bitmap instances (which is why this is a .NET Framework app, since it relies on GDI+ which is "Windows only" although I think? Mono will run it on Linux too).
Once it has those bitmaps, it creates a byte array corresponding to each bitmap's dimensions, and packs the bitmap data as monochrome pixels. It does this in the weird format that the SSD1306 uses so we don't have to do any post translation. To convert to monochrome, each pixel has its luminosity computed, and then compared against a Threshold value (typically 127).
Now, with monoized bitmap data in hand, the app tries to compress the data using one of 3 different RLE variants, picking the one that yields the smallest size, or leaving it uncompressed if all of the compression methods yielded larger than original sizes. In one variant, both black and white runs will be encoded. In another, only white runs. Finally, only black runs.
Once this data is crunched, producing the actual header text is trivial.
As I said, I won't force QMK and a Lulu keyboard on you. Instead, we're using an Arduino compliant dev kit to protype this, with the same screen attached, but to an ESP32-S3 instead of the Lulu's AVR Atmega32U4. The main thing to bear in mind if working this way is that QMK is C and Arduino is C++ so code accordingly so that your code can be ported to your final environment. You can use some other Arduino board if you don't have an ESP32-S3. I have so many ESP32-S3s laying around that it made sense to use this one. You'll just have to change the board setting in platformio.ini to match your hardware.
This code is barebones. The interest was small size, not happy abstractions. I avoided all but the most utilitarian abstractions because I didn't want to waste flash space on them.
#include <Arduino.h>
#include <Wire.h>
#ifdef ESP32
#define I2C_SDA 16
#define I2C_SCL 17
#endif
#define SSD1306_HEIGHT 32
const uint8_t output_frame_1[] PROGMEM = {
0xff, 131, 0x3f, 0x9f, 0xcf, 0xef,
0xe7, 0xe7, 0xf3, 0xf3, 0xfb, 0xfb,
0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9,
0xf9, 0xfb, 0xfb, 0xf3, 0xf3, 0xe7,
0xe7, 0xef, 0xcf, 0x9f, 0x3f, 0x7f,
0xff, 98, 0x00, 2, 0xff, 6, 0x83,
0x01, 0x83, 0xc7, 0xff, 7, 0x83,
0x01, 0x83, 0xc7, 0xff, 7, 0x00, 2,
0xff, 97, 0xfc, 0xf9, 0xf3, 0xe7,
0xef, 0xcf, 0xcf, 0x9f, 0x9f, 0xbf,
0xbf, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f,
0x3f, 0x3f, 0xbf, 0xbf, 0x9f, 0x9f,
0xcf, 0xcf, 0xef, 0xe7, 0xf3, 0xf9,
0xfc, 0xfe, 0xff, 96
};
#define OUTPUT_FRAME_1_COMPRESSION 3
// [Compressed to 16.40625% of original. Len = 84 vs 512]
const uint8_t output_frame_2[] PROGMEM = {
0xff, 166, 0x3f, 0x3f, 0xbf, 0x9f,
0x9f, 0xdf, 0xdf, 0xdf, 0xcf, 0xcf,
0xcf, 0xcf, 0xcf, 0xdf, 0xdf, 0xdf,
0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x7f,
0xff, 101, 0x03, 0xf1, 0xfc, 0xfe,
0xfe, 0xff, 3, 0x0f, 0x07, 0x0f,
0x9f, 0xff, 7, 0x0f, 0x07, 0x0f,
0x9f, 0xff, 4, 0xfe, 0xfc, 0xf1,
0x03, 0x0f, 0xff, 97, 0xf8, 0xf3,
0xf7, 0xe7, 0xcf, 0xcf, 0xdf, 0x9f,
0x9e, 0xbf, 0xbf, 0xbf, 0x3f, 0x3f,
0x3f, 0x3f, 0x3f, 0xbf, 0xbf, 0xbe,
0x9f, 0x9f, 0xdf, 0xcf, 0xcf, 0xe7,
0xf7, 0xf3, 0xf8, 0xfc, 0xff, 64
};
#define OUTPUT_FRAME_2_COMPRESSION 3
// [Compressed to 16.40625% of original. Len = 84 vs 512]
const uint8_t output_frame_3[] PROGMEM = {
0xff, 73, 0x7f, 0x3f, 0xbf, 0xbf,
0x9f, 0x9f, 0x9f, 0x9f, 0x9f, 0x9f,
0x9f, 0xbf, 0xbf, 0x3f, 0x7f, 0x7f,
0xff, 105, 0x8f, 0xe7, 0xf3, 0xf9,
0xfc, 0xfe, 0xfe, 0xff, 2, 0xff, 11,
0xff, 3, 0xfe, 0xfc, 0xf9, 0xf3,
0xe7, 0x8f, 0x1f, 0xff, 97, 0x00, 2,
0xff, 6, 0xe0, 0xc0, 0xe0, 0xf1,
0xff, 7, 0xe0, 0xc0, 0xe0, 0xf1,
0xff, 7, 0x00, 2, 0xff, 98, 0xfc,
0xf9, 0xf3, 0xe7, 0xef, 0xcf, 0xdf,
0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x3f,
0x3f, 0x3f, 0x3f, 0x3f, 0xbf, 0x9f,
0x9f, 0xdf, 0xcf, 0xef, 0xe7, 0xf3,
0xf9, 0xfc, 0xfe, 0xff, 33
};
#define OUTPUT_FRAME_3_COMPRESSION 3
// [Compressed to 17.96875% of original. Len = 92 vs 512]
const uint8_t output_frame_4[] PROGMEM = {
0xff, 239, 0x7f, 0x7f, 0x7f, 0x7f,
0xff, 110, 0x0f, 0xe7, 0xe7, 0xf3,
0xf9, 0xf9, 0xfd, 0x3c, 0x1c, 0x1e,
0x1e, 0x3e, 0xfe, 0xfe, 0xfe, 0xfe,
0xfe, 0xfe, 0x3e, 0x1e, 0x1e, 0x1e,
0x3c, 0xfc, 0xfd, 0xf9, 0xf9, 0xf3,
0xe7, 0xe7, 0x0f, 0xff, 97, 0xf8,
0xf3, 0xf3, 0xe7, 0xcf, 0xcf, 0xdf,
0x9e, 0x9c, 0xbc, 0xbc, 0xbe, 0xbf,
0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0xbe,
0xbc, 0xbc, 0xbc, 0x9e, 0x9f, 0xdf,
0xcf, 0xcf, 0xe7, 0xf3, 0xf3, 0xf8
};
#define OUTPUT_FRAME_4_COMPRESSION 3
// [Compressed to 14.0625% of original. Len = 72 vs 512]
const uint8_t* output_images[] = {
output_frame_1,
output_frame_2,
output_frame_3,
output_frame_4
};
const int output_images_compression[] = {
OUTPUT_FRAME_1_COMPRESSION,
OUTPUT_FRAME_2_COMPRESSION,
OUTPUT_FRAME_3_COMPRESSION,
OUTPUT_FRAME_4_COMPRESSION
};
#if SSD1306_HEIGHT == 32
const uint8_t ssd1306_init[] PROGMEM = {
17,
0xAE, 0,
0xA8, 1, 0x1F,
0x20, 1, 0x00,
0x40, 0,
0xD3, 1, 0x00,
0xA1, 0,
0xC8, 0,
0xDA, 1, 0x02,
0x81, 1, 0x7F,
0xA4, 0,
0xA6, 0,
0xD5, 1, 0x80,
0xD9, 1, 0xc2,
0xDB, 1, 0x20,
0x8D, 1, 0x14,
0x2E, 0,
0xAF, 0};
#endif
#if SSD1306_HEIGHT == 64
const uint8_t ssd1306_init[] PROGMEM = {
17,
0xAE, 0,
0xA8, 1, 0x3F,
0x20, 1, 0x00,
0x40, 0,
0xD3, 1, 0x00,
0xA1, 0,
0xC8, 0,
0xDA, 1, 0x12,
0x81, 1, 0x7F,
0xA4, 0,
0xA6, 0,
0xD5, 1, 0x80,
0xD9, 1, 0xc2,
0xDB, 1, 0x20,
0x8D, 1, 0x14,
0x2E, 0,
0xAF, 0};
#endif
void ssd1306_send_screen(int index)
{
const uint8_t *data = output_images[index];
int comp = output_images_compression[index];
Wire.beginTransmission(0x3C);
Wire.write(0x00);
Wire.write(0x22);
Wire.write(0x00);
Wire.write(0xFF);
Wire.write(0x00);
Wire.write(0x21);
Wire.write(0x00);
Wire.write(0x7F);
Wire.endTransmission();
size_t rem = I2C_BUFFER_LENGTH - 1;
int len = 0;
Wire.beginTransmission(0x3C);
Wire.write(0x40);
while (len < (SSD1306_HEIGHT * 16))
{
uint8_t b = pgm_read_byte(data++);
uint8_t count = 1;
if (((comp == 1 || comp == 3) && b == 0) ||
((comp == 2 || comp == 3) && b == 255))
{
count = pgm_read_byte(data++);
}
while (count--)
{
Wire.write(b);
++len;
--rem;
if (rem == 0)
{
rem = I2C_BUFFER_LENGTH - 1;
Wire.endTransmission();
Wire.beginTransmission(0x3C);
Wire.write(0x40);
}
}
}
Wire.endTransmission();
}
void setup()
{
#ifdef ESP32
Wire.begin(I2C_SDA, I2C_SCL, 800 * 1000);
#else
Wire.begin();
#endif
Serial.begin(115200);
Wire.beginTransmission(0x3C);
const uint8_t *init = ssd1306_init;
uint8_t len = pgm_read_byte(init);
const uint8_t *p = init + 1;
while (len--)
{
Wire.write(0x00);
Wire.write(pgm_read_byte(p++));
uint8_t arglen = pgm_read_byte(p++);
while (arglen--)
Wire.write(pgm_read_byte(p++));
}
Wire.endTransmission();
}
void loop()
{
static int index = 0;
ssd1306_send_screen(index++);
delay(100);
if (index == 4)
{
index = 0;
}
}
What's of primary interest here is ssd1306_send_screen(). This routine takes the contents of our images, decompressing them as necessary, and sends them straight to the screen. It doesn't really take any SRAM to operate other than that use for the stack frame since we decompress everything straight to the display. As you can see, the decompression is stupid simple, allowing us to support all methods with a single if() test. The more complicated bit is actually making sure we don't overrun the I2C transmission buffer in Arduino. If we're about to, we simply start a new transmission.
On the ESP32-S3 this code, including the 4 embedded images takes 3.5KB of flash according to the build statistics. I arrived at this figure by comparing the build sizes for an empty project, versus an empty project with this code.
(empty project)
RAM: [= ] 5.8% (used 18880 bytes from 327680 bytes)
Flash: [= ] 8.2% (used 274181 bytes from 3342336 bytes)
(project with code)
RAM: [= ] 5.8% (used 18904 bytes from 327680 bytes)
Flash: [= ] 8.3% (used 277765 bytes from 3342336 bytes)