April 20, 2018

“Bare Metal” STM32 Programming (Part 2): Making it to ‘Main’

In a previous post, I tried to walk through some very minimal code to get an STM32 chip to boot into a simple “Hello, World” assembly program. But that quick introduction left out some important concepts, and usually people don’t want to write an entire program in an assembly language.

So in this tutorial, we’re going to build on that ‘absolute minimum’ example, and write some more complete ‘startup’ code which will run a familiar C program’s “main” method when it finishes.

We’ll use the STM32F031K6 chip as an example again; it is one of ST’s simpler ARM chips, and you can buy a pre-made ‘Nucleo’ board for just a little over $10.

What Will We Write?

This example project will consist of a few different files, but there’s still a good chance you can count them on one hand:

A more complete ‘Linker Script’ to map our C program’s individual sections of memory onto the chip.
A ‘Vector Table’ file which will point every possible hardware interrupt to a default ‘interrupt handler’ – we’ll go over how to actually use these later.
A ‘boot code’ file which will contain a reset handler to copy information to RAM and then jump into the ‘main’ method.
A ‘main.c’ file which will contain our actual program logic.
A ‘Makefile’ which will let GNU Make build the project for us, so we don’t have to copy/paste GCC commands into a console like last time.

Hopefully you’ll come out of this post with a decent starting point for STM32F0 projects, and a general understanding of what is required to create your own projects for other chips.

Linker Script ‘Sections’

In the last post, we defined a very basic linker script with a ‘MEMORY’ block defining how much program memory and RAM was available on the chip.

Most C compilers will automatically split programs into a handful of common memory ‘sections’, which are groups of data with similar properties. These default memory sections are not specific to microcontrollers; they are common among C programs on any platform. If you aren’t familiar with how a C program’s memory is normally organized, here is a brief crash-course.

Here is an example linker script with the addition of a SECTIONS block after the MEMORY one. It lays out space for each section in the type of memory that should hold that section – nonvolatile Flash storage for the program and read-only data, RAM for everything else. An ENTRY property is also defined, which says where we expect our program to start.

/* Label for the program's entry point */
ENTRY(reset_handler)

/* End of RAM / Start of stack */
/* (4KB SRAM) */
_estack = 0x20001000;

/* Set minimum size for stack and dynamic memory. */
/* (The linker will generate an error if there is
 * less than this much RAM leftover.) */
/* (1KB) */
_Min_Leftover_RAM = 0x400;

MEMORY
{
  FLASH ( rx )      : ORIGIN = 0x08000000, LENGTH = 32K
  RAM ( rxw )       : ORIGIN = 0x20000000, LENGTH = 4K
}

SECTIONS
{
  /* The vector table goes at the start of flash. */
  .vector_table :
  {
    . = ALIGN(4);
    KEEP (*(.vector_table))
    . = ALIGN(4);
  } >FLASH

  /* The 'text' section contains the main program code. */
  .text :
  {
    . = ALIGN(4);
    *(.text)
    *(.text*)
    . = ALIGN(4);
  } >FLASH

  /* The 'rodata' section contains read-only data,
   * constants, strings, information that won't change. */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)
    *(.rodata*)
    . = ALIGN(4);
  } >FLASH

  /* The 'data' section is space set aside in RAM for
   * things like variables, which can change. */
  _sidata = .;
  .data : AT(_sidata)
  {
    . = ALIGN(4);
    /* Mark start/end locations for the 'data' section. */
    _sdata = .;
    *(.data)
    *(.data*)
    _edata = .;
    . = ALIGN(4);
  } >RAM

  /* The 'bss' section is similar to the 'data' section,
   * but its space is initialized to all 0s at the
   * start of the program. */
  .bss :
  {
    . = ALIGN(4);
    /* Also mark the start/end of the BSS section. */
    _sbss = .;
    *(.bss)
    *(.bss*)
    *(COMMON)
    . = ALIGN(4);
    _ebss = .;
  } >RAM

  /* Space set aside for the application's heap/stack. */
  .dynamic_allocations :
  {
    . = ALIGN(4);
    _ssystem_ram = .;
    . = . + _Min_Leftover_RAM;
    . = ALIGN(4);
    _esystem_ram = .;
  } >RAM
}

The linker will generate an error if it tries to add a section and there isn’t space. The .dynamic_allocations section is there to prevent a case where there is not enough RAM for the program to run properly; it represents a minimum combined space for our program’s ‘Heap’ and ‘Stack’ sections. As an example, if the data and bss sections used 4092 bytes total and we had 4096 bytes of RAM, the program would compile and upload but it might not work correctly because 4 bytes is not enough space to push a function’s stack frame. If we add space for an imaginary ‘section’ at the end of RAM, then the program described above would cause a clear ‘out of memory’ linker error when the program compiled, instead of causing a confusing failure in the device’s real-world behavior.

There are other semi-common and compiler-specific memory sections which I’ve left out, but this should work for simple applications compiled by the GCC toolchain. If you’re curious about compatibility, you can get some ideas from the ‘.ld’ files generated for projects by an IDE like Keil or SW4STM32. The “CubeMX” code generation utility provided by ST is also a useful resource.

The (Complete) Vector Table

We went over the basics of what a ‘Vector Table’ is in the last post – it defines which code the chip should run when certain hardware events happen. Last time we only filled in the first two entries, which define the starting stack pointer value (typically the end of RAM) and the ‘reset handler’ which tells the chip which code to run when a core ‘system reset’ event happens.

But now that we’re writing a “complete” program, we should define all of the available entries for our chip. The STM32F031 line of chips has a fairly limited set of peripherals compared to its bigger siblings, but it is still sort of overwhelming when you see them all listed out at once. My advice is, don’t worry about individual entries until you have a reason to learn more about them. They are all disabled by default (besides the first few ‘system’ interrupts), so you really don’t have to think about them until you decide to turn them on in your program.

To find the vector table entries for your chip, you can refer to the documentation provided by ST. But they also provide example ‘startup’ assembly files for each chip in their ‘Cube’ HAL packages, as well as the (now outdated) ‘Standard Peripheral Libraries’. You can download those, and look at what is listed for your chip. For example, in the CubeMX F0 firmware package, check the file:

Drivers/CMSIS/Device/ST/STM32F0xx/Source/Templates/gcc/startup_stm32f031x6.s

It will contain a full vector table for the chip, as well as some basic ‘boot’ code similar to what we’ll go over in the next section. These vendor-provided files can be a good way to start getting a handle on how to structure your projects.

So with all of that in mind, here’s a full ‘vector table’ file for the STM32F031K6 chip:

.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb

.global vtable
.global default_interrupt_handler

/*
 * The vector table.
 */
.type vtable, %object
.section .vector_table,"a",%progbits
vtable:
  // 0-15
  .word _estack
  .word reset_handler
  .word NMI_handler
  .word hard_fault_handler
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word 0
  .word SVC_handler
  .word 0
  .word 0
  .word pending_SV_handler
  .word SysTick_handler
  // 16-31
  .word window_watchdog_IRQ_handler
  .word PVD_IRQ_handler
  .word RTC_IRQ_handler
  .word flash_IRQ_handler
  .word RCC_IRQ_handler
  .word EXTI0_1_IRQ_handler
  .word EXTI2_3_IRQ_handler
  .word EXTI4_15_IRQ_handler
  .word 0
  .word DMA1_chan1_IRQ_handler
  .word DMA1_chan2_3_IRQ_handler
  .word DMA1_chan4_5_IRQ_handler
  .word ADC1_IRQ_handler
  .word TIM1_break_IRQ_handler
  .word TIM1_CC_IRQ_handler
  .word TIM2_IRQ_handler
  // 32-47
  .word TIM3_IRQ_handler
  .word 0
  .word 0
  .word TIM14_IRQ_handler
  .word 0
  .word TIM16_IRQ_handler
  .word TIM17_IRQ_handler
  .word I2C1_IRQ_handler
  .word 0
  .word SPI1_IRQ_handler
  .word 0
  .word USART1_IRQ_handler
  .word 0
  .word 0
  .word 0
  .word 0
  // 48
  // (Location to boot from for RAM startup)
  #define boot_ram_base  0xF108F85F
  .word boot_ram_base

  /*
   * Setup weak aliases for each exception handler to the
   * default one. These can be updated later, or just
   * overridden since they're weak refs.
   * The reset_handler is set up separately.
   */
  .weak NMI_handler
  .thumb_set NMI_handler,default_interrupt_handler
  .weak hard_fault_handler
  .thumb_set hard_fault_handler,default_interrupt_handler
  .weak SVC_handler
  .thumb_set SVC_handler,default_interrupt_handler
  .weak pending_SV_handler
  .thumb_set pending_SV_handler,default_interrupt_handler
  .weak SysTick_handler
  .thumb_set SysTick_handler,default_interrupt_handler
  .weak window_watchdog_IRQ_handler
  .thumb_set window_watchdog_IRQ_handler,default_interrupt_handler
  .weak PVD_IRQ_handler
  .thumb_set PVD_IRQ_handler,default_interrupt_handler
  .weak RTC_IRQ_handler
  .thumb_set RTC_IRQ_handler,default_interrupt_handler
  .weak flash_IRQ_handler
  .thumb_set flash_IRQ_handler,default_interrupt_handler
  .weak RCC_IRQ_handler
  .thumb_set RCC_IRQ_handler,default_interrupt_handler
  .weak EXTI0_1_IRQ_handler
  .thumb_set EXTI0_1_IRQ_handler,default_interrupt_handler
  .weak EXTI2_3_IRQ_handler
  .thumb_set EXTI2_3_IRQ_handler,default_interrupt_handler
  .weak EXTI4_15_IRQ_handler
  .thumb_set EXTI4_15_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan1_IRQ_handler
  .thumb_set DMA1_chan1_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan2_3_IRQ_handler
  .thumb_set DMA1_chan2_3_IRQ_handler,default_interrupt_handler
  .weak DMA1_chan4_5_IRQ_handler
  .thumb_set DMA1_chan4_5_IRQ_handler,default_interrupt_handler
  .weak ADC1_IRQ_handler
  .thumb_set ADC1_IRQ_handler,default_interrupt_handler
  .weak TIM1_break_IRQ_handler
  .thumb_set TIM1_break_IRQ_handler,default_interrupt_handler
  .weak TIM1_CC_IRQ_handler
  .thumb_set TIM1_CC_IRQ_handler,default_interrupt_handler
  .weak TIM2_IRQ_handler
  .thumb_set TIM2_IRQ_handler,default_interrupt_handler
  .weak TIM3_IRQ_handler
  .thumb_set TIM3_IRQ_handler,default_interrupt_handler
  .weak TIM14_IRQ_handler
  .thumb_set TIM14_IRQ_handler,default_interrupt_handler
  .weak TIM16_IRQ_handler
  .thumb_set TIM16_IRQ_handler,default_interrupt_handler
  .weak TIM17_IRQ_handler
  .thumb_set TIM17_IRQ_handler,default_interrupt_handler
  .weak I2C1_IRQ_handler
  .thumb_set I2C1_IRQ_handler,default_interrupt_handler
  .weak SPI1_IRQ_handler
  .thumb_set SPI1_IRQ_handler,default_interrupt_handler
  .weak USART1_IRQ_handler
  .thumb_set USART1_IRQ_handler,default_interrupt_handler
.size vtable, .-vtable

/*
 * A 'Default' interrupt handler. This is where interrupts
 * which are not otherwise configured will go.
 * It is an infinite loop, because...well, we weren't
 * expecting the interrupt, so what can we do?
 */
.section .text.default_interrupt_handler,"ax",%progbits
default_interrupt_handler:
  default_interrupt_loop:
    B default_interrupt_loop
.size default_interrupt_handler, .-default_interrupt_handler

‘Boot’ Logic

Finally, we should write some basic ‘boot’ logic to get the chip into a predictable starting state. The very basics of that process is:

Copy pre-initialized data into the .data RAM section.
Set the .bss RAM section to all 0s.
Jump to the ‘main’ method (finally!)

One good place to put this logic is right in the ‘reset_handler’, in a core.S file like the one in the last post. We’ll omit the vector table (which now has its own file,) but the first several lines are mostly the same:

.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb

// Global values.
.global reset_handler

/*
 * The Reset handler. Called on reset.
 */
.type reset_handler, %function
reset_handler:
  // Set the stack pointer to the end of the stack.
  // The '_estack' value is defined in our linker script.
  LDR  r0, =_estack
  MOV  sp, r0

Copying the ‘initialized data’ section into RAM and zero-ing out the ‘uninitialized data’ section is pretty simple. We just need to load data from one place, and store it in another:

// Copy data from flash to RAM data init section.
// R2 will store our progress along the sidata section.
MOVS r0, #0
// Load the start/end addresses of the data section,
// and the start of the data init section.
LDR  r1, =_sdata
LDR  r2, =_edata
LDR  r3, =_sidata
B    copy_sidata_loop

copy_sidata:
  // Offset the data init section by our copy progress.
  LDR  r4, [r3, r0]
  // Copy the current word into data, and increment.
  STR  r4, [r1, r0]
  ADDS r0, r0, #4

copy_sidata_loop:
  // Unless we've copied the whole data section, copy the
  // next word from sidata->data.
  ADDS r4, r0, r1
  CMP  r4, r2
  BCC  copy_sidata

// Once we are done copying the data section into RAM,
// move on to filling the BSS section with 0s.
MOVS r0, #0
LDR  r1, =_sbss
LDR  r2, =_ebss
B    reset_bss_loop

// Fill the BSS segment with '0's.
reset_bss:
  // Store a 0 and increment by a word.
  STR  r0, [r1]
  ADDS r1, r1, #4

reset_bss_loop:
  // We'll use R1 to count progress here; if we aren't
  // done, reset the next word and increment.
  CMP  r1, r2
  BCC  reset_bss

When it starts up, the STM32 resets most of its hardware peripherals to an ‘off’ state and automatically selects an 8MHz internal oscillator as its clock source. So we won’t actually have to worry much about the chip’s clock speed until a later tutorial. We’re done booting the chip now, so the only remaining step is to jump to the ‘main’ method that we are about to write, and mark the end of the reset handler:

  // Branch to the 'main' method.
  B    main
.size reset_handler, .-reset_handler

Finally – ‘main.c’

With a linker script defined for our chip, a vector table set up at the start of program memory, and a reset handler set up to start the chip, we can finally write a ‘main’ C method, and treat that method like a normal computer program (or one for an Arduino.)

Here’s a simple example – since the code will be running on a microcontroller for as long as power is applied, we should use an infinite loop and it doesn’t matter that there is no return statement:

/* Main program. */
int main(void) {
  int val = 0;
  while (1) {
    val += 1;
  }
}

Building with Make

In the last post, we compiled our test project with a couple of fairly long GCC commands:

arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 core.S -o core.o

arm-none-eabi-gcc core.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf

That was simple and easy, but also a little bit annoying. And we have several different files in this project – using both C and assembly – so it’ll be even worse.

If you aren’t already familiar with GNU Make, it is a classic solution to the problem of repetitively compiling software projects. Instead of copy/pasting a lot of instructions, we can just write a ‘Makefile’ with recipes for our build commands. Here is an example ‘Makefile’ for this project:

TARGET = main

# Define the linker script location and chip architecture.
LD_SCRIPT = STM32F031K6T6.ld
MCU_SPEC  = cortex-m0

# Toolchain definitions (ARM bare metal defaults)
TOOLCHAIN = /usr
CC = $(TOOLCHAIN)/bin/arm-none-eabi-gcc
AS = $(TOOLCHAIN)/bin/arm-none-eabi-as
LD = $(TOOLCHAIN)/bin/arm-none-eabi-ld
OC = $(TOOLCHAIN)/bin/arm-none-eabi-objcopy
OD = $(TOOLCHAIN)/bin/arm-none-eabi-objdump
OS = $(TOOLCHAIN)/bin/arm-none-eabi-size

# Assembly directives.
ASFLAGS += -c
ASFLAGS += -O0
ASFLAGS += -mcpu=$(MCU_SPEC)
ASFLAGS += -mthumb
ASFLAGS += -Wall
# (Set error messages to appear on a single line.)
ASFLAGS += -fmessage-length=0

# C compilation directives
CFLAGS += -mcpu=$(MCU_SPEC)
CFLAGS += -mthumb
CFLAGS += -Wall
CFLAGS += -g
# (Set error messages to appear on a single line.)
CFLAGS += -fmessage-length=0
# (Set system to ignore semihosted junk)
CFLAGS += --specs=nosys.specs

# Linker directives.
LSCRIPT = ./$(LD_SCRIPT)
LFLAGS += -mcpu=$(MCU_SPEC)
LFLAGS += -mthumb
LFLAGS += -Wall
LFLAGS += --specs=nosys.specs
LFLAGS += -nostdlib
LFLAGS += -lgcc
LFLAGS += -T$(LSCRIPT)

VECT_TBL = ./vector_table.S
AS_SRC   = ./core.S
C_SRC    = ./main.c

OBJS =  $(VECT_TBL:.S=.o)
OBJS += $(AS_SRC:.S=.o)
OBJS += $(C_SRC:.c=.o)

.PHONY: all
all: $(TARGET).bin

%.o: %.S
  $(CC) -x assembler-with-cpp $(ASFLAGS) $< -o $@

%.o: %.c
  $(CC) -c $(CFLAGS) $(INCLUDE) $< -o $@

$(TARGET).elf: $(OBJS)
  $(CC) $^ $(LFLAGS) -o $@

$(TARGET).bin: $(TARGET).elf
  $(OC) -S -O binary $< $@
  $(OS) $<

.PHONY: clean
clean:
  rm -f $(OBJS)
  rm -f $(TARGET).elf

This is not a post about Make and I wouldn’t know enough to write one, but basically this file just defines the same programs and arguments that we used previously, and sets them to run them in order when we run make all (or just make.) There’s also a make clean recipe which removes all of the files generated by the build process. Running make prints each command that gets run in order, resulting in something like this:

/usr/bin/arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 core.S -o core.o
/usr/bin/arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall -fmessage-length=0 vector_table.S -o vector_table.o
/usr/bin/arm-none-eabi-gcc -c -mcpu=cortex-m0 -mthumb -Wall -g -fmessage-length=0 --specs=nosys.specs  main.c -o main.o
/usr/bin/arm-none-eabi-gcc core.o vector_table.o main.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf
/usr/bin/arm-none-eabi-objcopy -S -O binary main.elf main.bin
/usr/bin/arm-none-eabi-size main.elf
   text    data     bss     dec     hex filename
    376       0    1024    1400     578 main.elf

Interestingly, it looks like the size program interpreted our extra stack/heap space as ‘BSS’ uninitialized data. Anyways, with the project is built you can upload and debug main.elf using st-util and arm-none-eabi-gdb with the same steps as in the previous post. This time, instead of checking the chip’s register values with i r, we can use the p val GDB command to check the value of our ‘val’ variable and verify that it counts upwards as the program runs.

Also like in the last post, we can use arm-none-eabi-nm main.elf to double-check where everything will be placed in memory – again, we should make sure that the vector table is located at 0x08000000. This time, we can also see the labels from our ‘boot’ code, along with the numerous vector table entries, which should all point to the same ‘default interrupt handler’:

08000174 W ADC1_IRQ_handler
080000d2 t copy_sidata
080000d8 t copy_sidata_loop
08000174 T default_interrupt_handler
08000174 t default_interrupt_loop
08000174 W DMA1_chan1_IRQ_handler
08000174 W DMA1_chan2_3_IRQ_handler
08000174 W DMA1_chan4_5_IRQ_handler
20000000 B _ebss
20000000 D _edata
[ ... ]
08000000 R vtable

There’s nothing in the bss or data sections – you can tell because the _ebss and _edata tags are both located at the RAM’s starting address of 0x20000000. That is because in C, local variables are dynamically allocated on the stack. If you define the ‘val’ variable in ‘main.c’ with static int instead of int, you should see 4 bytes between _sbss and _ebss if you set it to 0, or between _sdata and _edata if you set it to a non-zero starting value.

Conclusions

I’m sorry, I know this seems like a lot of information to cover without much to show for it; our main program still only increments a variable! But this should serve as a decent foundation for more complex topics like communicating with things like sensors and displays. Here is a repository containing the code discussed in this post:

STM32F0_Minimal_C

I can compile it and observe the ‘val’ variable counting up on an STM32F031K6 Nucleo board, but please let me know if anything seems broken, looks wrong, or just doesn’t make sense.

In the next post, we will learn about the STM32’s GPIO (‘General-Purpose Input/Output’) pins, and how to turn an LED on/off when a button is pressed. After that, I’m hoping to go over basic hardware interrupts and/or the peripherals for common communication protocols such as I2C and SPI. Thanks for reading!

Comments (16):

Malcolm Loudon

March 5, 2019 at 2:09 pm

Great tutorials, thanks! I have been using the Arduino core + HAL for a while and have found your tutorials very informative for understanding what goes on underneath. Thanks again for your hard work and sharing!
Reply
- Vivonomicon
  
  March 30, 2019 at 12:36 pm
  
  Thanks, I’m glad to hear it’s helpful!
  Reply
Lluc

April 20, 2019 at 1:15 am

This is really great content! I had been looking for something like this for a while, and this helps a lot. Great to avoid reading the full documentation at the beginning, and great to avoid using higher level libraries and tools that abstract too much.
Reply
tubo j

May 21, 2019 at 6:42 am

hello,

good job. I am working on how to send bits from one stm32 to another and sending from the received stm32. can you help?
Reply
- Vivonomicon
  
  May 22, 2019 at 7:36 am
  
  There are a lot of ways that you could do that, but one popular approach is to use a UART peripheral. You connect the ‘Ground’ pins on both boards, then connect ‘TX’ (‘Transmit’) pin on the first board to the ‘RX’ (‘Receive’) pin on the second board and vice-versa. I don’t think I’ve written about using the STM32 UART peripherals, but this looks like a solid introduction:
  
  http://www.micromouseonline.com/2009/12/31/stm32-usart-basics/
  Reply
mr_woggle

July 9, 2019 at 2:15 pm

Great article. I think this is really ‘bare metal’ since it doesn’t use any
libraries ot all.

I have an i386 background, so I’m often confused about the terminology.
I think a vector table entry is kind of like an interrupt description
entry (IDE) on x86, right? So that would mean it includes both exceptions and
interrupts? Aren’t these mostly empty and programmable? Looking at your vtable,
for instance, does the watchdog IRQ have to be there, or is that kinda commonly
used now?

I have a lot of questions, hopefully you have some time to elaborate or give
some useful links!

* How can your relocate this vector table? Let’s say, put it in RAM?
* How does an interrupt work, like what happens? What get pushed on the
stack?
* What if I divide by 0, how do I handle that? What kind of fault is that?
I read somewhere that’s in a different register?
Reply
Bas

September 6, 2019 at 2:10 am

Hey Vivonomicon,

My god what a brilliant blog you have. As others already said, priceless for those just starting out with microcontrollers, like me :).

One thing doesn’t make much sense to me yet. the _estack pointer. In part 1 you explain the name, that it points to the END of the stack. Yet the comment here states that it’s the END of RAM, and the beginning of the stack. Where does it point too? The beginning or the end of the stack?

Maybe I learn the answer in the next parts of your blogs? Hope you can find the time to help this noob on its journey :).

Again, thanks so much, it’s really taking away a lot of mist in the air!
Reply
- Vivonomicon
  
  September 11, 2019 at 9:10 am
  
  Oh, I’m sorry that I wrote that in a confusing way. The ‘stack’ in C gets its name from the data structure that it uses, which acts like a ‘stack’ of things. It is a first-in / last-out queue, so when you “pop” an item off the top of the stack you get the last item that was “pushed” onto it. So the phrasing can be a little confusing; when I write about it, sometimes I forget whether the ‘start’ is the first value that was placed onto the stack, or the first value that will be retrieved from it.
  
  To answer your question, the ‘end’ of the stack usually refers to the address of the ‘last’ element that it contains, and the C language’s stack ‘grows’ from high memory addresses towards low ones. When the program starts, the stack is empty, so we place the ‘end of stack’ value at the highest memory address available. Usually statically- and dynamically-allocated memory gets placed near the lowest RAM addresses, which is how we end up with the heap and stack “growing towards each other” in RAM.
  
  And since data like function frames and local variables are placed on the stack, if you get into an unusual situation such as a very deeply-nested function call, you can end up with a “stack-heap collision”. When that happens, you get undefined behavior because the stack can overwrite previously-allocated heap memory (and vice-versa), possibly without causing errors in your program. Those kinds of bugs can be hard to track down, so many operating systems and languages keep track of their stack space and give you errors when that sort of thing happens – that’s the infamous “stack overflow” error.
  
  And speaking of, there’s a little more information in this stack overflow question: https://stackoverflow.com/questions/1334055/what-happens-when-stack-and-heap-collide
  
  Also, if you use an RTOS (Real-Time Operating System) to write a concurrent program, each thread will usually need its own stack space, so you’ll have to pick stack sizes which are large enough to avoid overflowing while being small enough to avoid running out of memory when you have a lot of threads running at once. That is because each thread will be running different functions with different data, and you can’t quickly pluck data out from the middle of a stack. In you phone or laptop’s operating system, things are even more abstracted – to manage the very large memory space and prevent applications from snooping on each others’ data, every application gets its own stack and heap space with ‘virtual’ (fake) memory addresses that the OS needs to translate before reading/writing data.
  
  But those are more complex topics; if you can find a class on operating systems, it should cover that sort of thing in more detail. I hope that helps to clear the basics up, though – good luck!
  Reply
rjg

July 22, 2020 at 7:32 pm

I had been looking for info about linker scripts and vector table for a while. Today, while searching for a seemingly unrelated topic, I came across your website and I’m very glad I did. I have an engineering background and would like to pursue a career in embedded systems. I’m just curious to know what your background is, how you got started, and if you think a solid computer science background is a required. Thanks in advance.
Reply
- Vivonomicon
  
  July 26, 2020 at 12:52 pm
  
  Oh cool, I hope you find these articles helpful! I do have a CS background, but while I think that some courses on topics like systems programming and operating systems helped a bit by providing context, I’m pretty self-taught when it comes to embedded devices.
  
  So I don’t think that a computer science background is required at all, but it would be a good idea to learn a bit about the C programming language and the basics of how processors interpret machine code. Courses like nand2tetris or MIT’s computation structures can help with that if you want to learn from square one.
  
  Many vendors also run community forums which can help fill in the gaps in your understanding and correct misconceptions. For STM32 chips, you can check out https://community.st.com/. You can also learn a lot by searching for error messages that you encounter and reading about similar problems that other people have encountered. ST’s datasheets and reference manuals are also excellent, once you’re ready to start programming the chip’s peripherals.
  
  And one nice thing about embedded development is that dynamic memory allocation is usually discouraged because of resource constraints. Dynamic allocation is one of the most confusing things for new C programmers to learn, so I actually think that embedded platforms have a lot of promise for teaching computer science. If you want to learn or teach low-level computing, it seems like there’s a natural progression from bare-metal code, to real-time operating systems, to full-fledged OS kernels.
  
  Good luck! I’m sure you’ll be fine if you come from a real engineering background; software “engineering” is not a very rigorous discipline yet 🙂
  Reply
Vladimir

August 5, 2020 at 8:28 am

Thanks for the article!

However, it ain’t work in my case. After successful “load” command and trying to step any further, program stucks with message ” 0xfffffffe in ?? () ” and “i r” shows 0xffffffff on pc and sp. And I fail to see why.
Reply
- Vivonomicon
  
  August 8, 2020 at 11:03 am
  
  Oh, I’m sorry to hear that it doesn’t work for you. Sometimes I see that sort of error when I accidentally use the wrong linker script; what type of chip are you using, and have you compared your code to the reference repository on github?
  Reply
  - Vladimir
    
    August 13, 2020 at 3:54 am
    
    I’m using F030F4 chip, so I corrected only “MEMORY” part in linker file. I also tried using Cube linker and startup files – still no luck. I get .bin and .elf files, but cannot burn them into chip with gdb tool
    Reply
    - Vivonomicon
      
      August 15, 2020 at 9:55 am
      
      Huh, interesting – I actually have a board with an F030F4 chip here, and it seems to work alright with this code (replacing 32KB with 16KB of Flash in the linker script):
      
      https://github.com/WRansohoff/STM32F0_minimal_C
      
      Did you open a debugging connection to the chip and use GDB’s “target extended-remote” command first? If you did, what sort of F030F4 board are you using?
      
      I ask because a year or two ago, I ordered a handful of very cheap STM32F030F4 boards from China which looked similar to “blue pill” STM32F103C8 boards (blue silkscreen / yellow headers), and they didn’t seem to work. I could open a connection with st-link or openocd, but I couldn’t flash new programs onto them. I suspect that they might have been faulty or had readout protection enabled, but they were ~$0.50 boards so I didn’t investigate too much.
      Reply
      - Vladimir
        
        August 20, 2020 at 4:49 am
        
        Thanks for your time!
        
        Chip says STM32F030F4P6, board looks like this (https://aliexpress.ru/item/32363879391.html)
        
        Here’s what I did:
        – downloaded and unzipped git archive
        – corrected path to arm-none-eabi toolchain in makefile, corrected memory to 16 KB
        – run “make” within that STM32F0_minimal_C-master folder
        – got
        text data bss dec hex filename
        284 0 1024 1308 51c main.elf
        and output files
        – then run “st-util”
        – then in new shell window: “arm-none-eabi-gdb main.elf”
        – (gdb) target extended-remote :4242
        – (gdb) load
        – (gdb) continue
        – after stopping with ctrl+c I try to see where am I.
        – “p val” gives
        No symbol “val” in current context.
        – “where” gives
        #0 0xfffffffe in ?? ()
        #1 0x20000ff0 in ?? ()
        Backtrace stopped: previous frame identical to this frame (corrupt stack?)
        – stepping with “si” gives
        0xfffffffe in ?? ()
        forever
        As I understood program fails to get to main.
        I’ll try to burn following lessons with STM32CubeProgrammer tool, sadly without any debugging.
        
        Thanks anyway!
      - Vivonomicon
        
        September 6, 2020 at 2:53 pm
        
        Ah, sorry to hear that it doesn’t work. That actually does look like the sort of board which I ordered awhile ago and I couldn’t get working, but hopefully you have better luck. Maybe it needs to be powered by USB with the +3V3 debugging pin disconnected? I don’t think I tried that.
        
        Anyways, good luck.