Blog for my various projects, experiments, and learnings

Talking to Hardware

“Bare Metal” STM32 Programming (Part 9): Fun With DMA

I’ve written a few basic tutorials about bare-metal STM32 development in the past, and even though I’m still learning as I write them, I think that there’s enough groundwork to start covering some ‘real world’ scenarios now. I’d like to start with a very important technique for designing efficient applications: the Direct Memory Access (DMA) peripheral. DMA is important because it lets you move data from one area of memory to another without using CPU time. After you start a DMA transfer, your program will continue to run normally while the data is moved around ‘in the background’.

That’s the basic idea, but the devil is always in the details. So in this post, we’re going to review how the three main types of STM32 DMA peripherals work. Different STM32 chips can have similar peripherals which behave slightly differently, and usually more expensive / newer chips have more fully-featured peripherals. I think that this is how the peripherals are grouped, but I didn’t test every type of STM32 chip and corrections are always appreciated:

  • ‘Type 1’ Simple DMA: F0, L0, F1, L1, F3, L4
  • ‘Type 2’ Double-buffered DMA: F2, F4, F7
  • ‘Type 3’ DMA + DMA multiplexer: G0, G4, L4+

Once we’ve reviewed the basics of how DMA works, I’ll go over how to use it in a few example applications to show how it works with different peripherals and devices. The required hardware for each example will be discussed later, but I’ll present code to:

  • Generate an audio tone by sending a sine wave to the DAC peripheral at a specific frequency.
  • Map an array of colors to a strip of WS2812 or SK6812 ‘NeoPixel’ LEDs.
  • Map a small region of on-chip RAM to a monochrome SSD1306 OLED display.
  • Map a a region of RAM to an ILI9163C or ILI9341 TFT display.

The key to these examples is that the communication with an external device will happen ‘in the background’ while your microcontroller’s CPU is doing other things. Most of the examples won’t even use interrupts; the data transmission is automatic once you start it. But be aware that DMA is not magic. Every DMA ‘channel’ or ‘stream’ shares a single data bus which is also used by the CPU for memory transfers, so there is a limit to how much data you can actually send at once. In practice this probably won’t be a problem unless you have multiple high-priority / high-speed DMA transfers with tight timing requirements, but it’s something to be aware of.

So let’s get started!

“Bare Metal” STM32 Programming (Part 8): Learn to Debug Timing Issues with Neopixels

I haven’t written about STM32 chips in a little while, but I have been learning how to make fun displays and signage using colorful LEDs,  specifically the WS2812B and SK6812 ‘Neopixels’. I talked about the single-wire communication standard that these LEDs use in a post about running them from an FPGA, and I mentioned there that it is a bit more difficult for microcontrollers to run the communication standard. If you don’t believe me, take a look at what the poor folks at Adafruit needed to do to get it working on a 16MHz AVR core. Yikes!

When you send colors, the 1 bits are fairly easy to encode but the 0 bits require that you reliably hold a pin high for just 250-400 nanoseconds. Too short and the LED will think that your 0 bit was a blip of noise, too long and it will think that your 0 is a 1. Using timer peripherals is a reasonable solution, but it requires a faster clock than 16MHz and we won’t be able to use interrupts because it takes about 20-30 clock cycles for the STM32 to jump to an interrupt handler. At 72MHz it takes my code about 300-400 nanoseconds to get to an interrupt handler, and that’s just not fast enough.

There are ways to make it faster, but this is also a good example of how difficult it can be to calculate how long your C code will take to execute ahead of time. Between compiler optimizations and hardware realities like Flash wait-states and pushing/popping functions, the easiest way to tell how long your code takes to run is often to simply run it and check.

Working Neopixel Timing

Pulseview diagram of ‘101’ in Neopixel. I can’t be sure, but I think the ‘0’ pulse might be about 375 nanoseconds long.

Which brings us to the topic of this tutorial – we are going to write a simple program which uses an STM32 timer peripheral to draw colors to ‘Neopixel’ LEDs. Along the way, we will debug the program’s timing using Sigrok and Pulseview with an affordable 8-channel digital logic analyzer. These are available for $10-20 from most online retailers; try Sparkfun, or Amazon/eBay/Aliexpress/etc. I don’t know why Adafruit doesn’t sell these; maybe they don’t want to carry cheap generics in the same category as Salae analyzers. Anyways, go ahead and install Pulseview, brush up a bit on STM32 timers if you need to, and let’s get started!

Learning how to FPGA with ‘Neopixel’ LEDs

Whenever I talk to someone about FPGAs, the conversation seems to follow a familiar routine. It is almost a catechism to say that ‘FPGAs are very interesting niche products that, sadly, rarely make sense in real-world applications’. I often hear that organizations with Money can afford to develop ASICs, while hobbyists are usually better served by today’s affordable and powerful microcontrollers except in some very specific circumstances like emulating old CPU architectures. I don’t have enough experience to know how accurate this is, but I do have a couple of projects that seem like they could benefit from an FPGA, so I decided to bite the bullet and learn the basics of how to use one.

I chose a popular $25 development board called the ‘Icestick‘ to start with. It uses one of Lattice’s iCE40 chips, which is nice because there is an open-source toolchain called Icestorm available for building Verilog or VHDL code into an iCE40 bitstream. Most FPGA vendors (including Lattice) don’t provide a toolchain that you can build from source, but thanks to the hard work of Clifford Wolf and the other Icestorm contributors, I can’t use “maddeningly proprietary tools” as a reason not to learn about this anymore.

One thing that FPGAs can do much better than microcontrollers is running a lot of similar state machines in parallel. I’d eventually like to make a ‘video wall’ project using individually-addressable LEDs, but the common ‘Neopixel’ variants share a maximum data rate of about 800kbps. That’s probably too slow to send video to a display one pixel at a time, but it might be fast enough to send a few hundred ‘blocks’ of pixel data in parallel. As a small step towards that goal, I decided to try lighting up a single strip of WS2812B or SK6812 LEDs using Verilog. Here, I will try to describe what I learned.

Icestick with lights

Blue Icestick

And while this post will walk through a working design, I’m sorry that it will not be a great tutorial on writing Verilog or VHDL; I will try to gloss over what I don’t understand, so I would encourage you to read a more comprehensive tutorial on the subject like Al Williams’ series of Verilog and Icestorm tutorials on Hackaday. Sorry about that, but I’m still learning and I don’t want to present misleading information. This tutorial’s code is available on Github as usual, but caveat emptor.

More Fun with Four-Wire SPI: Drawing to “E-Ink” Displays

In previous tutorials, I covered how to use the STM32 line of microcontrollers to draw to small displays using the SPI communication standard. First with software functions and small ‘SSD1331’ OLED displays, and then with the faster SPI hardware peripheral and slightly larger ‘ILI9341’ TFT LCD displays. Both of those displays are great for cheaply displaying data or multimedia content, because they can show 16 bits of color per pixel and have enough space to present a moderate amount of information. But if you want to design a very low-power application, you might want a display which does not need to constantly drain energy to maintain an image.

Enter ‘E-Ink’ displays, sometimes called “Electrophoretic Displays“. As the name implies, they use the same basic operating principle as techniques like Gel Electrophoresis, which separates polarized molecules such as DNA based on their electric charge. Each pixel in one of these displays is a tiny hollow sphere filled with oppositely-charged ink molecules, and they are separated between the top and bottom of their capsules to make the pixel light or dark. The ink remains in place even after power is removed; I think that they are suspended in a solid gel or something. Modern E-Ink modules sometimes have a third color such as red or yellow, but this post will only cover a humble monochrome display.

Eink Logo

E-ink 😀

Drawing to a Small TFT Display: the ILI9341 and STM32

As you learn about more of your microcontroller’s peripherals and start to work with more types of sensors and actuators, you will probably want to add small displays to your projects. Previously, I wrote about creating a simple program to draw data to an SSD1331 OLED display, but while they look great, the small size and low resolution can be limiting. Fortunately, the larger (and slightly cheaper) ILI9341 TFT display module uses a nearly-identical SPI communication protocol, so this tutorial will build on that previous post by going over how to draw to a 2.2″ ILI9341 module using the STM32’s hardware SPI peripheral.

An ILI9341 display being driven by an STM32F0 chip.

An ILI9341 display being driven by an STM32F0 chip. Technically this isn’t a ‘Nucleo’ board, but the code is the same.

We’ll cover the basic steps of setting up the required GPIO pins, initializing the SPI peripheral, starting the display, and then finally drawing pixel colors to it. This tutorial won’t read any data from the display, so we can use the hardware peripheral’s MISO pin for other purposes and leave the TFT’s MISO pin disconnected. And as with my previous STM32 posts, example code will be provided for both the STM32F031K6 and STM32L031K6 ‘Nucleo’ boards.

When is Now? The DS3231 Real-Time Clock

Time may be an artificial construct, but try telling your boss that ‘Monday’ has no meaning. It is useful for a program to be able to schedule actions for a certain date, display the current time on a clock or calendar, or perform other tasks which use weird units of time like ‘seconds’ or ‘days’. For those types of tasks, an embedded developer might reach for an ‘RTC’ device, which stands for ‘Real-Time Clock’. They provide a way to keep accurate time, often with features like backup power supplies. Many RTCs also offer ‘wakeup’ alarms for other devices, so they are especially useful in energy-efficient designs.

The STM32 line of chips which I’ll continue to use in this tutorial have a built-in RTC peripheral, but they require an external 32.768KHz ‘LSE’ (Low-Speed External) crystal oscillator to keep accurate time. Also, managing the STM32’s backup power supply is sort of complicated.

Instead, this tutorial will walk through using the ‘I2C’ peripheral on an STM32 chip to communicate with a cheap DS3231 RTC module. Specifically, I will talk about a widely-available board labeled ZS-042, which includes 4KB of EEPROM on its I2C bus and space for a “coin cell” battery to provide several years of backup power. But the same commands should work with other DS3231 boards, such as the smaller ones in the upper-left here:

DS3231 Modules

A handful of DS3231 modules and their backup batteries.

An example project demonstrating the concepts outlined in this post using either an STM32F031K6 or STM32L031K6 “Nucleo” board is available on Github.

STM32 Software SPI SSD1331 Sketch

In a previous post, I wrote about designing a ‘breakout board’ for an SSD1331 OLED display with 96×64 pixels and 16 bits of color per pixel. With the hardware already put together, this post will cover writing a basic software driver for the displays. To keep things simple, we will talk to the display using software SPI functions instead of the STM32’s SPI hardware peripheral.

If you want to skip assembling your own boards, you can also buy a pre-made display such as this one sold by Adafruit. They have also written a library for these displays which works with several common types of microcontrollers, if you just want to use them without worrying about the display settings. But if you want to try understanding this sort of communication at a lower level, read on!

Working SSD1331/STM32

The finished program will display a predefined framebuffer, like this little logo!

Since many small microcontrollers – including the STM32F031K6 discussed in this example – don’t have 12KB of RAM available to store a 96×64 display at 16 bits per pixel, I’ll use a framebuffer with just 4 bits per pixel in this example (3KB), and map those 16 values to a palette. This example builds on the first few “Bare Metal STM32 Programming” tutorials that I’ve been writing, so here is a Github repository with the entire example project (including supporting files) if you don’t want to read those.