Blog for my various projects, experiments, and learnings

“Bare Metal” STM32 Programming (Part 2): Making it to ‘Main’

In a previous post, I tried to walk through some very minimal code to get an STM32 chip to boot into a simple “Hello, World” assembly program. But that quick introduction left out some important concepts, and usually people don’t want to write an entire program in an assembly language.

So in this tutorial, we’re going to build on that ‘absolute minimum’ example, and write some more complete ‘startup’ code which will run a familiar C program’s “main” method when it finishes.

We’ll use the STM32F031K6 chip as an example again; it is one of ST’s simpler ARM chips, and you can buy a pre-made ‘Nucleo’ board for just a little over $10.

What Will We Write?

This example project will consist of a few different files, but there’s still a good chance you can count them on one hand:

  • A more complete ‘Linker Script’ to map our C program’s individual sections of memory onto the chip.
  • A ‘Vector Table’ file which will point every possible hardware interrupt to a default ‘interrupt handler’ – we’ll go over how to actually use these later.
  • A ‘boot code’ file which will contain a reset handler to copy information to RAM and then jump into the ‘main’ method.
  • A ‘main.c’ file which will contain our actual program logic.
  • A ‘Makefile’ which will let GNU Make build the project for us, so we don’t have to copy/paste GCC commands into a console like last time.

Hopefully you’ll come out of this post with a decent starting point for STM32F0 projects, and a general understanding of what is required to create your own projects for other chips.

Linker Script ‘Sections’

In the last post, we defined a very basic linker script with a ‘MEMORY’ block defining how much program memory and RAM was available on the chip.

Most C compilers will automatically split programs into a handful of common memory ‘sections’, which are groups of data with similar properties. These default memory sections are not specific to microcontrollers; they are common among C programs on any platform. If you aren’t familiar with how a C program’s memory is normally organized, here is a brief crash-course.

Here is an example linker script with the addition of a SECTIONS block after the MEMORY one. It lays out space for each section in the type of memory that should hold that section – nonvolatile Flash storage for the program and read-only data, RAM for everything else. An ENTRY property is also defined, which says where we expect our program to start.

/* Label for the program's entry point */
ENTRY(reset_handler)

/* End of RAM / Start of stack */
/* (4KB SRAM) */
_estack = 0x20001000;

/* Set minimum size for stack and dynamic memory. */
/* (The linker will generate an error if there is
 * less than this much RAM leftover.) */
/* (1KB) */
_Min_Leftover_RAM = 0x400;

MEMORY
{
  FLASH ( rx )      : ORIGIN = 0x08000000, LENGTH = 32K
  RAM ( rxw )       : ORIGIN = 0x20000000, LENGTH = 4K
}

SECTIONS
{
  /* The vector table goes at the start of flash. */
  .vector_table :
  {
    . = ALIGN(4);
    KEEP (*(.vector_table))
    . = ALIGN(4);
  } >FLASH

  /* The 'text' section contains the main program code. */
  .text :
  {
    . = ALIGN(4);
    *(.text)
    *(.text*)
    . = ALIGN(4);
  } >FLASH

  /* The 'rodata' section contains read-only data,
   * constants, strings, information that won't change. */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)
    *(.rodata*)
    . = ALIGN(4);
  } >FLASH

  /* The 'data' section is space set aside in RAM for
   * things like variables, which can change. */
  _sidata = .;
  .data : AT(_sidata)
  {
    . = ALIGN(4);
    /* Mark start/end locations for the 'data' section. */
    _sdata = .;
    *(.data)
    *(.data*)
    _edata = .;
    . = ALIGN(4);
  } >RAM

  /* The 'bss' section is similar to the 'data' section,
   * but its space is initialized to all 0s at the
   * start of the program. */
  .bss :
  {
    . = ALIGN(4);
    /* Also mark the start/end of the BSS section. */
    _sbss = .;
    *(.bss)
    *(.bss*)
    *(COMMON)
    . = ALIGN(4);
    _ebss = .;
  } >RAM

  /* Space set aside for the application's heap/stack. */
  .dynamic_allocations :
  {
    . = ALIGN(4);
    _ssystem_ram = .;
    . = . + _Min_Leftover_RAM;
    . = ALIGN(4);
    _esystem_ram = .;
  } >RAM
}

The linker will generate an error if it tries to add a section and there isn’t space. The .dynamic_allocations section is there to prevent a case where there is not enough RAM for the program to run properly; it represents a minimum combined space for our program’s ‘Heap’ and ‘Stack’ sections. As an example, if the data and bss sections used 4092 bytes total and we had 4096 bytes of RAM, the program would compile and upload but it might not work correctly because 4 bytes is not enough space to push a function’s stack frame. If we add space for an imaginary ‘section’ at the end of RAM, then the program described above would cause a clear ‘out of memory’ linker error when the program compiled, instead of causing a confusing failure in the device’s real-world behavior.

There are other semi-common and compiler-specific memory sections which I’ve left out, but this should work for simple applications compiled by the GCC toolchain. If you’re curious about compatibility, you can get some ideas from the ‘.ld’ files generated for projects by an IDE like Keil or SW4STM32. The “CubeMX” code generation utility provided by ST is also a useful resource.

The (Complete) Vector Table

We went over the basics of what a ‘Vector Table’ is in the last post – it defines which code the chip should run when certain hardware events happen. Last time we only filled in the first two entries, which define the starting stack pointer value (typically the end of RAM) and the ‘reset handler’ which tells the chip which code to run when a core ‘system reset’ event happens.

But now that we’re writing a “complete” program, we should define all of the available entries for our chip. The STM32F031 line of chips has a fairly limited set of peripherals compared to its bigger siblings, but it is still sort of overwhelming when you see them all listed out at once. My advice is, don’t worry about individual entries until you have a reason to learn more about them. They are all disabled by default (besides the first few ‘system’ interrupts), so you really don’t have to think about them until you decide to turn them on in your program.

To find the vector table entries for your chip, you can refer to the documentation provided by ST. But they also provide example ‘startup’ assembly files for each chip in their ‘Cube’ HAL packages, as well as the (now outdated) ‘Standard Peripheral Libraries’. You can download those, and look at what is listed for your chip. For example, in the CubeMX F0 firmware package, check the file:

Drivers/CMSIS/Device/ST/STM32F0xx/Source/Templates/gcc/startup_stm32f031x6.s

It will contain a full vector table for the chip, as well as some basic ‘boot’ code similar to what we’ll go over in the next section. These vendor-provided files can be a good way to start getting a handle on how to structure your projects.

So with all of that in mind, here’s a full ‘vector table’ file for the STM32F031K6 chip:

.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb

.global vtable
.global default_interrupt_handler

/*
 * The vector table.
 */
.type vtable, %object
.section .vector_table,"a",%progbits
vtable:
  // 0-15
  .word _estack
  .word reset_handler
  .word NMI_handler
  .word hard_fault_handler
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word SVC_handler
  .word 0
  .word 0
  .word pending_SV_handler
  .word SysTick_handler
  // 16-31
  .word window_watchdog_IRQ_handler
  .word PVD_IRQ_handler
  .word RTC_IRQ_handler
  .word flash_IRQ_handler
  .word RCC_IRQ_handler
  .word EXTI0_1_IRQ_handler
  .word EXTI2_3_IRQ_handler
  .word EXTI4_15_IRQ_handler
  .word 0
  .word DMA1_chan1_IRQ_handler
  .word DMA1_chan2_3_IRQ_handler
  .word DMA1_chan4_5_IRQ_handler
  .word ADC1_IRQ_handler
  .word TIM1_break_IRQ_handler
  .word TIM1_CC_IRQ_handler
  .word TIM2_IRQ_handler
  // 32-47
  .word TIM3_IRQ_handler
  .word 0
  .word 0
  .word TIM14_IRQ_handler
  .word 0
  .word TIM16_IRQ_handler
  .word TIM17_IRQ_handler
  .word I2C1_IRQ_handler
  .word 0
  .word SPI1_IRQ_handler
  .word 0
  .word USART1_IRQ_handler
  .word 0
  .word 0
  .word 0
  .word 0
  // 48
  // (Location to boot from for RAM startup)
  #define boot_ram_base  0xF108F85F
  .word boot_ram_base

  /*
   * Setup weak aliases for each exception handler to the
   * default one. These can be updated later, or just
   * overridden since they're weak refs.
   * The reset_handler is set up separately.
   */
  .weak NMI_handler
  .thumb_set NMI_handler,default_interrupt_handler
  .weak hard_fault_handler
  .thumb_set hard_fault_handler,default_interrupt_handler
  .weak SVC_handler
  .thumb_set SVC_handler,default_interrupt_handler
  .weak pending_SV_handler
  .thumb_set pending_SV_handler,default_interrupt_handler
  .weak SysTick_handler
  .thumb_set SysTick_handler,default_interrupt_handler
  .weak window_watchdog_IRQ_handler
  .thumb_set window_watchdog_IRQ_handler,default_interrupt_handler
  .weak PVD_IRQ_handler
  .thumb_set PVD_IRQ_handler,default_interrupt_handler
  .weak RTC_IRQ_handler
  .thumb_set RTC_IRQ_handler,default_interrupt_handler
  .weak flash_IRQ_handler
  .thumb_set flash_IRQ_handler,default_interrupt_handler
  .weak RCC_IRQ_handler
  .thumb_set RCC_IRQ_handler,default_interrupt_handler
  .weak EXTI0_1_IRQ_handler
  .thumb_set EXTI0_1_IRQ_handler,default_interrupt_handler
  .weak EXTI2_3_IRQ_handler
  .thumb_set EXTI2_3_IRQ_handler,default_interrupt_handler
  .weak EXTI4_15_IRQ_handler
  .thumb_set EXTI4_15_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan1_IRQ_handler
  .thumb_set DMA1_chan1_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan2_3_IRQ_handler
  .thumb_set DMA1_chan2_3_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan4_5_IRQ_handler
  .thumb_set DMA1_chan4_5_IRQ_handler,default_interrupt_handler
  .weak ADC1_IRQ_handler
  .thumb_set ADC1_IRQ_handler,default_interrupt_handler
  .weak TIM1_break_IRQ_handler
  .thumb_set TIM1_break_IRQ_handler,default_interrupt_handler
  .weak TIM1_CC_IRQ_handler
  .thumb_set TIM1_CC_IRQ_handler,default_interrupt_handler
  .weak TIM2_IRQ_handler
  .thumb_set TIM2_IRQ_handler,default_interrupt_handler
  .weak TIM3_IRQ_handler
  .thumb_set TIM3_IRQ_handler,default_interrupt_handler
  .weak TIM14_IRQ_handler
  .thumb_set TIM14_IRQ_handler,default_interrupt_handler
  .weak TIM16_IRQ_handler
  .thumb_set TIM16_IRQ_handler,default_interrupt_handler
  .weak TIM17_IRQ_handler
  .thumb_set TIM17_IRQ_handler,default_interrupt_handler
  .weak I2C1_IRQ_handler
  .thumb_set I2C1_IRQ_handler,default_interrupt_handler
  .weak SPI1_IRQ_handler
  .thumb_set SPI1_IRQ_handler,default_interrupt_handler
  .weak USART1_IRQ_handler
  .thumb_set USART1_IRQ_handler,default_interrupt_handler
.size vtable, .-vtable

/*
 * A 'Default' interrupt handler. This is where interrupts
 * which are not otherwise configured will go.
 * It is an infinite loop, because...well, we weren't
 * expecting the interrupt, so what can we do?
 */
.section .text.default_interrupt_handler,"ax",%progbits
default_interrupt_handler:
  default_interrupt_loop:
    B default_interrupt_loop
.size default_interrupt_handler, .-default_interrupt_handler

 

‘Boot’ Logic

Finally, we should write some basic ‘boot’ logic to get the chip into a predictable starting state. The very basics of that process is:

  • Copy pre-initialized data into the .data RAM section.
  • Set the .bss RAM section to all 0s.
  • Jump to the ‘main’ method (finally!)

One good place to put this logic is right in the ‘reset_handler’, in a core.S file like the one in the last post. We’ll omit the vector table (which now has its own file,) but the first several lines are mostly the same:

.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb

// Global values.
.global reset_handler

/*
 * The Reset handler. Called on reset.
 */
.type reset_handler, %function
reset_handler:
  // Set the stack pointer to the end of the stack.
  // The '_estack' value is defined in our linker script.
  LDR  r0, =_estack
  MOV  sp, r0

Copying the ‘initialized data’ section into RAM and zero-ing out the ‘uninitialized data’ section is pretty simple. We just need to load data from one place, and store it in another:

// Copy data from flash to RAM data init section.
// R2 will store our progress along the sidata section.
MOVS r0, #0
// Load the start/end addresses of the data section,
// and the start of the data init section.
LDR  r1, =_sdata
LDR  r2, =_edata
LDR  r3, =_sidata
B    copy_sidata_loop

copy_sidata:
  // Offset the data init section by our copy progress.
  LDR  r4, [r3, r0]
  // Copy the current word into data, and increment.
  STR  r4, [r1, r0]
  ADDS r0, r0, #4

copy_sidata_loop:
  // Unless we've copied the whole data section, copy the
  // next word from sidata->data.
  ADDS r4, r0, r1
  CMP  r4, r2
  BCC  copy_sidata

// Once we are done copying the data section into RAM,
// move on to filling the BSS section with 0s.
MOVS r0, #0
LDR  r1, =_sbss
LDR  r2, =_ebss
B    reset_bss_loop

// Fill the BSS segment with '0's.
reset_bss:
  // Store a 0 and increment by a word.
  STR  r0, [r1]
  ADDS r1, r1, #4

reset_bss_loop:
  // We'll use R1 to count progress here; if we aren't
  // done, reset the next word and increment.
  CMP  r1, r2
  BCC  reset_bss

When it starts up, the STM32 resets most of its hardware peripherals to an ‘off’ state and automatically selects an 8MHz internal oscillator as its clock source. So we won’t actually have to worry much about the chip’s clock speed until a later tutorial. We’re done booting the chip now, so the only remaining step is to jump to the ‘main’ method that we are about to write, and mark the end of the reset handler:

  // Branch to the 'main' method.
  B    main
.size reset_handler, .-reset_handler

Finally – ‘main.c’

With a linker script defined for our chip, a vector table set up at the start of program memory, and a reset handler set up to start the chip, we can finally write a ‘main’ C method, and treat that method like a normal computer program (or one for an Arduino.)

Here’s a simple example – since the code will be running on a microcontroller for as long as power is applied, we should use an infinite loop and it doesn’t matter that there is no return statement:

/* Main program. */
int main(void) {
  int val = 0;
  while (1) {
    val += 1;
  }
}

Building with Make

In the last post, we compiled our test project with a couple of fairly long GCC commands:

arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 core.S -o core.o

arm-none-eabi-gcc core.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf

That was simple and easy, but also a little bit annoying. And we have several different files in this project – using both C and assembly – so it’ll be even worse.

If you aren’t already familiar with GNU Make, it is a classic solution to the problem of repetitively compiling software projects. Instead of copy/pasting a lot of instructions, we can just write a ‘Makefile’ with recipes for our build commands. Here is an example ‘Makefile’ for this project:

TARGET = main

# Define the linker script location and chip architecture.
LD_SCRIPT = STM32F031K6T6.ld
MCU_SPEC  = cortex-m0

# Toolchain definitions (ARM bare metal defaults)
TOOLCHAIN = /usr
CC = $(TOOLCHAIN)/bin/arm-none-eabi-gcc
AS = $(TOOLCHAIN)/bin/arm-none-eabi-as
LD = $(TOOLCHAIN)/bin/arm-none-eabi-ld
OC = $(TOOLCHAIN)/bin/arm-none-eabi-objcopy
OD = $(TOOLCHAIN)/bin/arm-none-eabi-objdump
OS = $(TOOLCHAIN)/bin/arm-none-eabi-size

# Assembly directives.
ASFLAGS += -c
ASFLAGS += -O0
ASFLAGS += -mcpu=$(MCU_SPEC)
ASFLAGS += -mthumb
ASFLAGS += -Wall
# (Set error messages to appear on a single line.)
ASFLAGS += -fmessage-length=0

# C compilation directives
CFLAGS += -mcpu=$(MCU_SPEC)
CFLAGS += -mthumb
CFLAGS += -Wall
CFLAGS += -g
# (Set error messages to appear on a single line.)
CFLAGS += -fmessage-length=0
# (Set system to ignore semihosted junk)
CFLAGS += --specs=nosys.specs

# Linker directives.
LSCRIPT = ./$(LD_SCRIPT)
LFLAGS += -mcpu=$(MCU_SPEC)
LFLAGS += -mthumb
LFLAGS += -Wall
LFLAGS += --specs=nosys.specs
LFLAGS += -nostdlib
LFLAGS += -lgcc
LFLAGS += -T$(LSCRIPT)

VECT_TBL = ./vector_table.S
AS_SRC   = ./core.S
C_SRC    = ./main.c

OBJS =  $(VECT_TBL:.S=.o)
OBJS += $(AS_SRC:.S=.o)
OBJS += $(C_SRC:.c=.o)

.PHONY: all
all: $(TARGET).bin

%.o: %.S
  $(CC) -x assembler-with-cpp $(ASFLAGS) $< -o $@

%.o: %.c
  $(CC) -c $(CFLAGS) $(INCLUDE) $< -o $@

$(TARGET).elf: $(OBJS)
  $(CC) $^ $(LFLAGS) -o $@

$(TARGET).bin: $(TARGET).elf
  $(OC) -S -O binary $< $@
  $(OS) $<

.PHONY: clean
clean:
  rm -f $(OBJS)
  rm -f $(TARGET).elf

This is not a post about Make and I wouldn’t know enough to write one, but basically this file just defines the same programs and arguments that we used previously, and sets them to run them in order when we run make all (or just make.) There’s also a make clean recipe which removes all of the files generated by the build process. Running make prints each command that gets run in order, resulting in something like this:

/usr/bin/arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 core.S -o core.o
/usr/bin/arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 vector_table.S -o vector_table.o
/usr/bin/arm-none-eabi-gcc -c -mcpu=cortex-m0 -mthumb -Wall -g -fmessage-length=0 --specs=nosys.specs  main.c -o main.o
/usr/bin/arm-none-eabi-gcc core.o vector_table.o main.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf
/usr/bin/arm-none-eabi-objcopy -S -O binary main.elf main.bin
/usr/bin/arm-none-eabi-size main.elf
   text    data     bss     dec     hex filename
    376       0    1024    1400     578 main.elf

Interestingly, it looks like the size program interpreted our extra stack/heap space as ‘BSS’ uninitialized data. Anyways, with the project is built you can upload and debug main.elf using st-util and arm-none-eabi-gdb with the same steps as in the previous post. This time, instead of checking the chip’s register values with i r, we can use the p val GDB command to check the value of our ‘val’ variable and verify that it counts upwards as the program runs.

Also like in the last post, we can use arm-none-eabi-nm main.elf to double-check where everything will be placed in memory – again, we should make sure that the vector table is located at 0x08000000. This time, we can also see the labels from our ‘boot’ code, along with the numerous vector table entries, which should all point to the same ‘default interrupt handler’:

08000174 W ADC1_IRQ_handler
080000d2 t copy_sidata
080000d8 t copy_sidata_loop
08000174 T default_interrupt_handler
08000174 t default_interrupt_loop
08000174 W DMA1_chan1_IRQ_handler
08000174 W DMA1_chan2_3_IRQ_handler
08000174 W DMA1_chan4_5_IRQ_handler
20000000 B _ebss
20000000 D _edata
[ ... ]
08000000 R vtable

There’s nothing in the bss or data sections – you can tell because the _ebss and _edata tags are both located at the RAM’s starting address of 0x20000000. That is because in C, local variables are dynamically allocated on the stack. If you define the ‘val’ variable in ‘main.c’ with static int instead of int, you should see 4 bytes between _sbss and _ebss if you set it to 0, or between _sdata and _edata if you set it to a non-zero starting value.

Conclusions

I’m sorry, I know this seems like a lot of information to cover without much to show for it; our main program still only increments a variable! But this should serve as a decent foundation for more complex topics like communicating with things like sensors and displays. Here is a repository containing the code discussed in this post:

STM32F0_Minimal_C

I can compile it and observe the ‘val’ variable counting up on an STM32F031K6 Nucleo board, but please let me know if anything seems broken, looks wrong, or just doesn’t make sense.

In the next post, we will learn about the STM32’s GPIO (‘General-Purpose Input/Output’) pins, and how to turn an LED on/off when a button is pressed. After that, I’m hoping to go over basic hardware interrupts and/or the peripherals for common communication protocols such as I2C and SPI. Thanks for reading!

Comments (6):

  1. Malcolm Loudon

    March 5, 2019 at 2:09 pm

    Great tutorials, thanks! I have been using the Arduino core + HAL for a while and have found your tutorials very informative for understanding what goes on underneath. Thanks again for your hard work and sharing!

    Reply
    • Vivonomicon

      March 30, 2019 at 12:36 pm

      Thanks, I’m glad to hear it’s helpful!

      Reply
  2. Lluc

    April 20, 2019 at 1:15 am

    This is really great content! I had been looking for something like this for a while, and this helps a lot. Great to avoid reading the full documentation at the beginning, and great to avoid using higher level libraries and tools that abstract too much.

    Reply
  3. tubo j

    May 21, 2019 at 6:42 am

    hello,

    good job. I am working on how to send bits from one stm32 to another and sending from the received stm32. can you help?

    Reply
    • Vivonomicon

      May 22, 2019 at 7:36 am

      There are a lot of ways that you could do that, but one popular approach is to use a UART peripheral. You connect the ‘Ground’ pins on both boards, then connect ‘TX’ (‘Transmit’) pin on the first board to the ‘RX’ (‘Receive’) pin on the second board and vice-versa. I don’t think I’ve written about using the STM32 UART peripherals, but this looks like a solid introduction:

      http://www.micromouseonline.com/2009/12/31/stm32-usart-basics/

      Reply
  4. mr_woggle

    July 9, 2019 at 2:15 pm

    Great article. I think this is really ‘bare metal’ since it doesn’t use any
    libraries ot all.

    I have an i386 background, so I’m often confused about the terminology.
    I think a vector table entry is kind of like an interrupt description
    entry (IDE) on x86, right? So that would mean it includes both exceptions and
    interrupts? Aren’t these mostly empty and programmable? Looking at your vtable,
    for instance, does the watchdog IRQ have to be there, or is that kinda commonly
    used now?

    I have a lot of questions, hopefully you have some time to elaborate or give
    some useful links!

    * How can your relocate this vector table? Let’s say, put it in RAM?
    * How does an interrupt work, like what happens? What get pushed on the
    stack?
    * What if I divide by 0, how do I handle that? What kind of fault is that?
    I read somewhere that’s in a different register?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *