Using Tabula to Parse PDF Tables
Recently, I wrote about trying to figure out a way to automatically produce visualizations of STM32 peripheral mappings from their datasheets. Unfortunately, it didn’t go too well; I had trouble parsing the output from the command-line pdftotext
utility, so I ended up having to manually clean up each peripheral table after it was half-parsed by a script.
I was thinking of trying to write a better parsing script, but before diving into that rabbit hole I took another look at open-source PDF-parsing programs and found Tabula. It is an MIT-licensed utility with one goal: extracting tables from PDF files. And it seems to work very well with ST’s datasheets.
Step 1: Download Tabula
Tabula runs on Java, so it’s simple to set up on just about any platform. They have good installation instructions on their website and GitHub readme file. If you are using Linux, you can download and unzip the tabula-jar-<version>.zip
version of the latest release to get a runnable JAR file.
Following the instructions in the README.md
file, once you unzip the file and cd
into its tabula/
directory, you can start running a local server with the command that the readme file suggests:
java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula.jar
Once it starts up, you should be able to navigate to http://127.0.0.1:8080 in a browser to access the Tabula UI. There is also a command-line version of the project, which will probably be better for setting up an automatic process. But for now, it’s nice to use the visual selection tools which the UI provides.
Making Pretty(ish) Reference Images for Microcontrollers
When you want to pick a microcontroller for a project, it can be useful to have a bunch of quick references to find the smallest/cheapest one that will fit the parts you want to use. Usually the information is freely available in datasheets, but those are usually PDF files with pinout/peripheral information spanning several pages.
So I’ve been playing with ways to partially automate the process of generating some nice reference images. I’m not exactly happy with what I have yet, but it’s not the sort of thing that merits spending a ton of time on, and I think I scripted away enough of the tedium that it isn’t too frustrating to make these sorts of tables. You wind up with reference images like this:
Honestly, this post would be sort of moot if I could find nice CSV files or spreadsheets of STM32 pinouts and peripherals (see below). Maybe other vendors distribute information in that format, but I couldn’t find many for ST’s chips. And that means that I ended up with a sort of convoluted process.
A fragile Python script runs the pdftotext
utility to convert an STM32 datasheet into a pinout table containing peripheral information. The automatic formatting doesn’t include table cell boundaries, so I couldn’t figure out how to put all of the peripherals associated with a pin on the same line quickly; someone needs to manually perform that step before running a second Python script which generates LaTeX files representing formatted peripheral tables. The PDF files generated by pdflatex
can then be imported into Inkscape, where the table columns can be connected to the pins on an image of the chip’s outline. So, let’s get started!
Note: I ended up finding a better way to parse STM32 datasheets using a project called Tabula – see this newer post for more information. My first attempt presented in this post ended up working pretty poorly, but that’s life for you. Sometimes it takes a few tries to find a good solution.
Getting Started with Bare Metal ESP32 Programming
The ESP32 modules sold by Espressif are very popular in the IoT and embedded development space. They are very cheap, they are quite fast, they include radios and peripherals for WiFi and Bluetooth communication, and in some ways they even appear to bridge the gap between MCU and CPU. And Espressif provides pre-built modules with built-in antennas and external Flash memory, both of which appear to be required for general-purpose ‘IoT’ application development. They can be a bit power-hungry when they are using their wireless communication modules though, and I haven’t found much information on how to develop ESP32 applications without using the heavyweight (but very functional and well-written) “ESP-IDF” toolchain which is distributed by Espressif.
Usually, avoiding bulky and proprietary HALs is a worthwhile goal in and of itself. But Espressif has actually released their ESP-IDF toolchain under a very permissive Apache license, and it looks like a well-thought-out system with solid ongoing support. So if you are looking at starting a new project with the ESP32, I would personally recommend using the ESP-IDF to save time and effort. But sometimes it is nice to learn about how chips work at a deeper level, and ESP-IDF projects are often quite large, and they can take a long time to build depending on your environment.
The large code size also discourages what appears to be one use case that the chip was designed for: to load new instructions into RAM every time that it reboots from an external ‘socket’. The current crop of ESP32 modules use a SPI Flash chip as that ‘socket’, but if you put them in a factory or a field you might want to use Ethernet, or RS-232, or who knows what. I’m not sure how extensible the chip’s ROM bootloader actually is yet, but let’s take a look at what it takes to get a simple C program running on the ESP32 without using the ESP-IDF build system.
Unlike the STM32 and MSP430 microcontrollers which I have written about previously, there are not many software tools available for the ESP32 core. The ESP32’s dual-core architecture uses two ‘Xtensa LX6’ CPU cores which Espressif licenses from Cadence, and I haven’t seen them in any other mainstream microcontrollers. It looks like a core that is intended to be customized for the needs of an application as a step between general-purpose microcontrollers and something like an ASIC, so maybe it is more common in application-specific environments than general-purpose ones. In this case, it looks like the specific application which Espressif chose is wireless communication, and apparently a lot of the WiFi and Bluetooth code is burned directly into the ESP32’s ROM.
The ESP32 also looks more like a proper CPU than many microcontroller cores, with a few hundred kilobytes of on-chip RAM, a 240MHz top speed, an MMU, and support for up to 8 process IDs (2 privileged / 6 unprivileged) per core. People used to make do with much less, but since the ESP32 is complex and somewhat unique, Espressif provides the only toolchain that I know about which can build code for it. That means that while this tutorial will not use the full ESP-IDF development environment, it will still use Espressif’s ports of GCC and OpenOCD for compilation and debugging, as well as their esptool
utility for formatting and flashing the compiled code. The target hardware will be either the ESP32-WROVER-KIT
board which includes a JTAG debugging chip, or any of the smaller generic ESP32 dev boards (such as the ESP32-DevKitC
) combined with an FTDI C232HM
cable.
And like most of my previous tutorials, the software presented here is all open-source and you should be able to build and run it on the platform of your choice. It won’t have any colorful LEDs this time – sorry – but the code is available on GitHub. I’m also still learning about this chip and there is a lot that I don’t know, so corrections and comments are definitely appreciated. So if you’re still interested after those disclaimers, let’s get started by building and installing the toolchain!
Building a Bare-Metal ARM GCC Toolchain from Source
The GCC toolchain is a nice way to get started with bare-metal ARM platforms because it is free, well-supported, and something that many C programmers are already familiar with. It is also constantly improving, and while most Linux distributions include a version of the arm-none-eabi
GCC toolchain in their default package managers, those versions are sometimes old enough that I run into bugs with things like multilib/multiarch support.
So after trying and failing a few times to build and install this toolchain, I thought I would share what eventually worked for me. If you are building this on a system without much free disk space, this did take about 16GB of disk space when I ran it – but you shouldn’t need to keep more than ~500MB of that. I ended up building it on an ext3-formatted 32GB USB drive, and that seemed to work fine. You will probably run into errors if you try to run this on a FAT-formatted drive like a new USB stick, though, because the build process will run chmod
and that will fail on a filesystem that doesn’t support file permissions. You might be able to make it work with an NTFS drive, but I digress.
Anyways, the first step is downloading the code, so let’s get started!
Step 1: Download the Source Code
The first step is to download the arm-none-eabi-gcc
source code from ARM’s website – you want the “Source Invariant” version. If you are using a common architecture such as x86_64
, you can also download a pre-built version of the latest release, but you probably want to build the thing yourself if you are reading this guide. If you want to make sure that the file you downloaded was not corrupted or tampered with, you can check it against the MD5 checksum listed under the file’s name on the download page:
> md5sum gcc-arm-none-eabi-8-2018-q4-major-src.tar.bz2 d6071d95064819d546fe06c49fb9d481 gcc-arm-none-eabi-8-2018-q4-major-src.tar.bz2
That looks like it matches as of the time of writing. The archive contains build scripts and code for the whole toolchain, so create a new directory to build the toolchain in – I’ll use ~/arm_gcc/
as an example. Once it finishes downloading, move the archive to that folder and extract it:
> mkdir ~/arm_gcc > cd ~/arm_gcc > mv [Your_Downloads_Folder]/gcc-arm-none-eabi-8-2018-q4-major-src.tar.bz2 . > tar -xvf gcc-arm-none-eabi-8-2018-q4-major-src.tar.bz2
The file that I downloaded was called gcc-arm-none-eabi-8-2018-q4-major-src.tar.bz2
, but yours will depend on where you are in time. The extracted archive will make a new directory with a similar name – take a look inside, and you’ll see a few scripts and a ‘how-to’ document alongside the source code:
> cd gcc-arm-none-eabi-8-2018-q4-major/ > ls build-common.sh build-toolchain.sh install-sources.sh python-config.sh release.txt build-prerequisites.sh How-to-build-toolchain.pdf license.txt readme.txt src
Learning how to FPGA with ‘Neopixel’ LEDs
Whenever I talk to someone about FPGAs, the conversation seems to follow a familiar routine. It is almost a catechism to say that ‘FPGAs are very interesting niche products that, sadly, rarely make sense in real-world applications’. I often hear that organizations with Money can afford to develop ASICs, while hobbyists are usually better served by today’s affordable and powerful microcontrollers except in some very specific circumstances like emulating old CPU architectures. I don’t have enough experience to know how accurate this is, but I do have a couple of projects that seem like they could benefit from an FPGA, so I decided to bite the bullet and learn the basics of how to use one.
I chose a popular $25 development board called the ‘Icestick‘ to start with. It uses one of Lattice’s iCE40 chips, which is nice because there is an open-source toolchain called Icestorm available for building Verilog or VHDL code into an iCE40 bitstream. Most FPGA vendors (including Lattice) don’t provide a toolchain that you can build from source, but thanks to the hard work of Clifford Wolf and the other Icestorm contributors, I can’t use “maddeningly proprietary tools” as a reason not to learn about this anymore.
One thing that FPGAs can do much better than microcontrollers is running a lot of similar state machines in parallel. I’d eventually like to make a ‘video wall’ project using individually-addressable LEDs, but the common ‘Neopixel’ variants share a maximum data rate of about 800kbps. That’s probably too slow to send video to a display one pixel at a time, but it might be fast enough to send a few hundred ‘blocks’ of pixel data in parallel. As a small step towards that goal, I decided to try lighting up a single strip of WS2812B or SK6812 LEDs using Verilog. Here, I will try to describe what I learned.
And while this post will walk through a working design, I’m sorry that it will not be a great tutorial on writing Verilog or VHDL; I will try to gloss over what I don’t understand, so I would encourage you to read a more comprehensive tutorial on the subject like Al Williams’ series of Verilog and Icestorm tutorials on Hackaday. Sorry about that, but I’m still learning and I don’t want to present misleading information. This tutorial’s code is available on Github as usual, but caveat emptor.