Blog for my various projects, experiments, and learnings

“Bare Metal” STM32 Programming (Part 1): Hello, ARM!

The STM32 line of ARM Cortex-M microcontrollers are a fun way to get started with embedded programming. The nice thing about these chips is that they don’t require much setup, so you can start to learn about them bit by bit, starting with almost no code. And they are much more capable than the 8-bit processors used in many ‘Arduino’-type boards – some can run at over 400MHz, and they can have advanced peripherals up to and including simple graphics accelerators.

But in this tutorial, we will just learn the absolute minimum required to get a program running on one of the simpler STM32 chips. We’ll cover how to support multiple chips in a later post, but this example will use the STM32F031K6 as an example. ST makes an affordable ‘Nucleo’ development board with this chip, which costs just over $10 from somewhere like Digikey, Mouser, etc.

This guide will assume some familiarity with C programming and the popular GCC compiler + GDB debugger, but I will try to explain all of the parts specific to coding for microcontrollers. I’d also like to make these posts more accessible, and would welcome feedback if anything is unclear or could be better explained.

On the bright side, the very low-level starting code demonstrated in these first few examples are things that you won’t have to worry about once it is set up. If you want to skip these examples, there are tools such as ST’s CubeMX which can generate these sorts of empty starting projects. But it’s nice to have some idea of what goes on inside of the chip, so let’s get started! You can view the entire minimal example project described in this post in this Github repository.

The Toolchain: ‘arm-none-eabi-gcc’

Before we start writing a program for the chip, we need to have a toolchain for compiling and debugging. Fortunately, this is very easy – the same GCC that you know and love is available for the various ARM Cortex-M platforms, and so are all of its accompanying programs like GDB. That’s all that we’ll need for now.

Your package manager should have the gcc-arm-none-eabi and gdb-arm-none-eabi packages – just install them, and you’re good to go! If you need to build it from source or want to download a pre-built version, it is available for download here:

A couple of other useful and recommended packages for more complicated applications are libnewlib-arm-none-eabi and libstdc++-arm-none-eabi.

The ‘Linker Script’:

There are a lot of different kinds of ARM Cortex-M chips, with a lot of different capabilities. The compiler’s linker needs to know, at the very least, how much space the chip has for a program and how much RAM will be available. Without that information, it can’t tell if your program will fit on the chip, if there will be enough space for all the variables you want to define, or stuff like that.

In most cases we will want to define a bit more information to do things like copy variables’ initial values into RAM, but that will be the subject of a future post. For now, we’ll just say how much program memory and RAM there is; 32KB and 4KB respectively, on an STM32F031K6. This is the closest we’ll come to ‘ignore the magic code behind the curtain’, but it’s still fairly simple:

/* Define the end of RAM and limit of stack memory */
/* (4KB SRAM on the STM32F031x6 line, 4096 = 0x1000) */
/* (RAM starts at address 0x20000000) */
_estack = 0x20001000;

    FLASH ( rx )      : ORIGIN = 0x08000000, LENGTH = 32K
    RAM ( rxw )       : ORIGIN = 0x20000000, LENGTH = 4K

The first non-comment line defines a value called _estack, which represents the end (hence the ‘e’) of the program’s stack. We set this value to point to the very end of the chip’s RAM. The STM32 chips map their on-chip RAM to 0x20000000 in memory, and this chip has 4096 bytes of RAM (0x1000), so 0x20001000 is just past the boundary of what we can address.

Then we define the ‘MEMORY’ block, which tells the linker how much memory the chip has. We mark ‘flash’ memory as read-only, since that is where the program lives. STM32 chips map their flash memory to start at 0x08000000, and we have 32KB of flash memory available. We mark the RAM as read/write, and as mentioned above it is 4KB long starting at address 0x20000000.

In the next tutorial we will create a ‘SECTIONS’ block which gives the linker some more specific information, but we can ignore that for now. Also, linker scripts usually have a .ld file extension.

The ‘Vector Table’:

One thing that makes microcontrollers so cool is that they have ‘hardware interrupts’. When certain conditions are met, they can immediately jump to an ‘interrupt’ function. And when that function returns, the chip goes back to what it was doing before the interrupt triggered.

That ability comes with a small cost – we need to write a ‘vector table’ to define the locations in memory that the chip should jump to when each specific interrupt triggers. But since most interrupts are disabled by default, we can just ignore them for now. The ‘reset’ handler is the only one that we care about to start with – it defines the function that is run when the system resets or powers on. So let’s start a new file to hold the vector table and reset handler – I’ll call it core.S. Just like .c and .h are for C source and header files, the .S file extension is often used for assembly files.

If you aren’t familiar with assembly, the ‘Thumb’ instruction set used by these chips is fairly simple and contains only a handful of basic commands. Here’s a quick reference if assembly code doesn’t look familiar to you:

A core.S assembly file with a very basic vector table could look something like this:

// These instructions define attributes of our chip and
// the assembly language we'll use:
.syntax unified
.cpu cortex-m0
.fpu softvfp

// Global memory locations.
.global vtable
.global reset_handler

 * The actual vector table.
 * Only the size of RAM and 'reset' handler are
 * included, for simplicity.
.type vtable, %object
    .word _estack
    .word reset_handler
.size vtable, .-vtable

The first few lines just tell the compiler what sort of syntax it should expect, and what sorts of machine commands it can generate. The Cortex-M0 line has no floating-point hardware, so we use .fpu softvfp to make sure that the compiler generates software commands for floating-point calculations. And as mentioned above, ‘Thumb’ is just a name for the ARM Cortex-M instruction set.

The .global lines ensure that the labels we use are available to other files, although we’ll only have this one file for now.

Then we just define the vector table using the arbitrary label, vtable. Only two entries are populated – the first entry marks the ‘end of stack’ address we defined earlier, and the second defines the ‘reset handler’ address. The .word command places a 4-byte value in the program; the compiler will replace our labels with the addresses in memory that they correspond to.

The ‘Hello, World’ Program:

We can write a minimal program in either C or the ‘Thumb’ assembly language used by these microcontrollers – all we have to do is define the main ‘reset handler’ method which we pointed to from the vector table. This code can go after the vector table in the core.S assembly file.

For now, we’ll just load a recognizable hex value (0xDEADBEEF) into the r7 register, and then count up from 0 on the r0 register forever. In ‘Thumb’ assembly, that looks like this:

 * The Reset handler. Called on reset.
.type reset_handler, %function
  // Set the stack pointer to the end of the stack.
  // The '_estack' value is defined in our linker script.
  LDR  r0, =_estack
  MOV  sp, r0

  // Set some dummy values. When we see these values
  // in our debugger, we'll know that our program
  // is loaded on the chip and working.
  LDR  r7, =0xDEADBEEF
  MOVS r0, #0
    // Add 1 to register 'r0'.
    ADDS r0, r0, #1
    // Loop back.
    B    main_loop
.size reset_handler, .-reset_handler

Some details about the assembly commands, if you aren’t familiar:

The MOV and MOVS commands move a value from one register to another. We can also use # to set a register to a constant number, but only with ‘immediate’ numbers which can generally only be in the range of 0-255. For larger values, we need to use the LDR command, which loads an entire 4-byte word from memory into a register. The = symbol in front of the hex values is shorthand which tells the compiler to place the given word nearby in memory, then load that address into the register.

The ADDS command is simple addition; you can think of the command, ADDS a, b, c as the equation, a = b + c. So here, we just set r0 = r0 + 1.

The B command is short for ‘Branch’. It tells the program to jump somewhere else. Here, we set a main_loop label before the addition command, and then jump back to it afterwards to make an infinite loop.

Compiling the Code:

Okay, we’re done writing code now – our minimal program will simply count a number up forever. When the number gets to 0xFFFFFFFF in hexadecimal, adding 1 more ‘overflows’ the number and it goes back to 0.

So now, we have to compile and link the program to make a file which we can upload to the microcontroller. We’ll use the arm-none-eabi-gcc toolchain that you downloaded earlier. The STM32F0 line of chips use an “ARM Cortex-M0” architecture, so the following command should produce a usable object file from the core.S file that we created:

arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall core.S -o core.o

That will create an ‘object file’ called core.o, which we can use to produce a final program. Later, when we have more complex logic across multiple files, we will produce a different object file for each one and combine them together in this final step:

arm-none-eabi-gcc core.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf

This will create a main.elf file. ELF stands for ‘Executable and Linkable Format’, and it is basically a file which we can upload to our chip. Since we gave GCC the correct options (like -mcpu=cortex-m0) and the linker script we wrote earlier (the -T option), it should be set up to work with our specific chip.

We can see a rough outline of what will get written to the chip with the nm command – try running the command, arm-none-eabi-nm main.elf – the output should look like this:

20001000 A _estack
08000010 t main_loop
08000008 T reset_handler
08000000 T vtable

We can see that the vector table – vtable – is at memory offset 0x08000000, which is the very beginning of program memory. If that is not the case, the chip can get confused about what code it should be executing.

Uploading, Running, and Debugging:

Finally, we just have to upload our code to the chip and verify that it works. We don’t have an LED blinking quite yet, but we can still check that the program is running correctly by using GDB to read the r0 and r7 registers which are set by our test program. If you are completely unfamiliar with using GDB to debug a C program, this guide looks like a pretty good crash course.

You can use the programs provided by ST for flashing code, but I usually use the open-source ‘STLink’ tooling written by Texane. With that project, you can simply plug a USB cable into the Nucleo board and enter st-util on a command line; it will open a debugging port on localhost:4242 and print some basic information about the chip. It also works just as well with a USB ‘STLink/V2’-style debugger if you use cheap boards (or your own designs) which only expose the CLK/IO programming pins.

Anyways, once you have the chip plugged in and connected, you can run:

arm-none-eabi-gdb main.elf

…which will start the debugger using our program. Once the debugger is open, we can connect it to the chip with the command, target extended-remote <port>:

(gdb) target extended-remote :4242
Remote debugging using :4242

If GDB cannot connect to the chip at this step, double-check the output of your st-util program and make sure that the chip is listening for a debugger on port 4242. If it’s a different port, use that number instead.

Once you’ve connected to the chip, load the program using the load command – here’s a sample output:

(gdb) load
Loading section .text, size 0x1c lma 0x8000000
Start address 0x8000000, load size 28
Transfer rate: 70 bytes/sec, 28 bytes/write.

If GDB doesn’t know what to load, make sure that your main.elf file exists and that you passed it in as an argument to arm-none-eabi-gdb. You could also try load main.elf if you are running GDB from the directory where the program was compiled.

With the program successfully loaded, we can use the debugger normally. If you are familiar with debugging on GDB already, there aren’t many differences between using ‘regular GDB’ and ‘bare-metal GDB’. It’s good to have a basic familiarity with some sort of debugger for finding problems, but that is a large topic on its own. For now, you can just type continue, wait a few seconds, and then hit Control+C a few times. If it asks, ‘Give up waiting?’, enter y for ‘yes’. After the program has run for a bit and then stopped, you can enter the info registers command, and you should see the values that our program sets in registers r0 and r7:

(gdb) continue
^C^CInterrupted while waiting for the program.
Give up waiting? (y or n) y
(gdb) info registers

r0 0x189ff2 1613810

r1             0x8000400        134218752
r2             0x0      0
r3             0x0      0
r4             0x40022000       1073881088
r5             0x1      1
r6             0x4      4

r7 0xdeadbeef 3735928559

r8             0xffffffff       4294967295
r9             0xffffffff       4294967295
r10            0xffffffff       4294967295
r11            0xffffffff       4294967295
r12            0xffffffff       4294967295
sp             0x20001000       0x20001000
lr             0xffffffff       4294967295
pc             0x8000010        0x8000010 <reset_handler+8>
cpsr           0x1000000        16777216

We can see that the r0 register has a number value which counts up if we step through the program, and r7 has the recognizable value 0xDEADBEEF. Registers r1 through r6 hold values which we didn’t define, so they could be anything. And with that, you have a basic assembly programming up and running on the chip!


So in this post, we covered the bare minimum amount of code required to upload and run a program on an STM32 chip. You can find the full code with a Makefile on Github here:

In the next post, we will talk about the different ‘sections’ of memory which most programs use, and extend the linker script to account for them. We will also write some simple ‘boot code’ for copying important data into RAM when the chip starts up, and write a Makefile to compile the project for us. After that, we will finally get around to writing a C program to blink an LED.

I hope this was helpful or informative, and please feel free to let me know if any of the information presented here is inaccurate or could be explained more clearly.

I should also mention the ‘STM32CubeMX‘ tooling provided by ST – it can auto-generate initialization and peripheral code for you, and it has a lot of useful examples. But it’s also nice to learn about how these chips work at a low level, for debugging and writing performant code.

Comments (11):

  1. Eli

    October 19, 2018 at 9:36 am


    I have a question but first, great article.

    this post has inspired me to buy an STM32 MCU, because I’ve been working with the TI-RSLK (robotic kit) that uses the MSP432 controller , with Code Composer Studio, but the course and projects, etc.. all have to do with C and I really want to learn as much about the lowest layers as possible. I’ve been researching how to create an assembly project from bare bones, and stumbled on your article.

    Hopefully what I learn here will help me when I get back to the MSP432.

    My question: in the “core.s” file, the directive “.cpu cortex-m0”.. the Nucleo I bought is the STM32F303RE and what I found says that it has the Cortex-M4. So I’m not sure about this part.

    • Vivonomicon

      November 12, 2018 at 10:21 am

      Oh, cool – I hope you find these introductions helpful. I am hoping to come back and update some of the toolchain and assembly instructions with some more complete information.

      ARM’s ‘Mobile’ cores have a few different options. The STM32F0 chips use a Cortex-M0 core, but they are designed to be cheap and simple. The STM32F3 and MSP432 chips both use a Cortex-M4F core which is faster, can do floating-point math much more quickly, etc.

      Usually you can just change the ‘cpu’ or ‘mcpu’ option to the type of core used in your chip, which would be ‘cortex-m4’ in this case. But the more advanced Cortex-M cores have special hardware for floating-point math, so you might also need a few more options in your build script. I’m still not 100% clear on how that works, but here is an example of which GCC settings worked for me when switching to an STM32F3 core from an STM32F0.

      You might also be able to adapt this GCC/Make build system to the MSP432 since it can use the same ‘arm-none-eabi-gcc’ compiler, you’d just have to figure out the linker script/vector table/register macros/etc. I hope that helps – good luck!

  2. Fatih

    April 27, 2019 at 3:21 am


    great article,


  3. Hoang Duong

    May 1, 2019 at 6:23 pm

    This is clear tutorial i ever read about ARM bare meter on internet. Thanks so much.

  4. tubo

    June 30, 2019 at 1:59 am

    thank you very much for this information. can i contact you privately?

    • Vivonomicon

      July 6, 2019 at 2:45 pm

      Feel free to reach out to vivonomicon @ (gmail). Sorry if I don’t respond in a timely manner, sometimes life happens and I have to put this blog and these projects on the backburner for a little bit.

  5. Ngô Hùng Cường

    July 9, 2019 at 8:02 am

    Many Thanks.

  6. Aditya

    August 4, 2019 at 7:10 am

    Hi! I am planning to do baremetal programming on FRDM K82F unlike your CortexM0 is has M4 core! can you please guide me how can I upload it to the board since STM softwares wont work? I can manage rest of the things like updating linker scripts based on K82F, thanks

    • Vivonomicon

      August 22, 2019 at 10:32 am

      Different types of microcontrollers usually require different tooling for debugging and uploading code – one reason why I like the STM32 chips is that they have a pretty good set of open-source tools to do that. Also, the Cortex-M0 / M4 / etc core is only a small part of the microcontroller – there are STM32 chips with Cortex-M4 cores, and there are Kinetis chips with Cortex-M0(+) cores.

      I haven’t used NXP’s ARM cores, so I don’t know much about the software ecosystem which supports those chips. You’ll probably need to either use whatever software NXP provides, or search for open-source alternatives if any exist. Good luck!

  7. Pat

    September 22, 2019 at 7:49 am

    Small error that happened on my setup that no-one has commented but in the ‘The ‘Linker Script’:’ section there is an error on line 3 where there should be an end of comment ‘*/’ but there is not.

    This causes an error with the ld.exe for me – a simple addition of the end of comment fixes it.

    • Vivonomicon

      October 15, 2019 at 2:17 pm

      Oh, you’re right – thank you! I’ve updated the post to close that comment.


Leave a Reply

Your email address will not be published. Required fields are marked *