Blog for my various projects, experiments, and learnings

“Bare Metal” STM32 Programming (Part 1): Hello, ARM!

The STM32 line of ARM Cortex-M microcontrollers are a fun way to get started with embedded programming. The nice thing about these chips is that they don’t require much setup, so you can start to learn about them bit by bit, starting with almost no code. And they are much more capable than the 8-bit processors used in many ‘Arduino’-type boards – some can run at over 400MHz, and they can have advanced peripherals up to and including simple graphics accelerators.

But in this tutorial, we will just learn the absolute minimum required to get a program running on one of the simpler STM32 chips. We’ll cover how to support multiple chips in a later post, but this example will use the STM32F031K6 as an example. ST makes an affordable ‘Nucleo’ development board with this chip, which costs just over $10 from somewhere like Digikey, Mouser, etc.

This guide will assume some familiarity with C programming and the popular GCC compiler + GDB debugger, but I will try to explain all of the parts specific to coding for microcontrollers. I’d also like to make these posts more accessible, and would welcome feedback if anything is unclear or could be better explained.

On the bright side, the very low-level starting code demonstrated in these first few examples are things that you won’t have to worry about once it is set up. If you want to skip these examples, there are tools such as ST’s CubeMX which can generate these sorts of empty starting projects. But it’s nice to have some idea of what goes on inside of the chip, so let’s get started! You can view the entire minimal example project described in this post in this Github repository.

The Toolchain: ‘arm-none-eabi-gcc’

Before we start writing a program for the chip, we need to have a toolchain for compiling and debugging. Fortunately, this is very easy – the same GCC that you know and love is available for the various ARM Cortex-M platforms, and so are all of its accompanying programs like GDB. That’s all that we’ll need for now.

Your package manager should have the gcc-arm-none-eabi and gdb-arm-none-eabi packages – just install them, and you’re good to go! If you need to build it from source or want to download a pre-built version, it is available for download here: https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

A couple of other useful and recommended packages for more complicated applications are libnewlib-arm-none-eabi and libstdc++-arm-none-eabi.

The ‘Linker Script’:

There are a lot of different kinds of ARM Cortex-M chips, with a lot of different capabilities. The compiler’s linker needs to know, at the very least, how much space the chip has for a program and how much RAM will be available. Without that information, it can’t tell if your program will fit on the chip, if there will be enough space for all the variables you want to define, or stuff like that.

In most cases we will want to define a bit more information to do things like copy variables’ initial values into RAM, but that will be the subject of a future post. For now, we’ll just say how much program memory and RAM there is; 32KB and 4KB respectively, on an STM32F031K6. This is the closest we’ll come to ‘ignore the magic code behind the curtain’, but it’s still fairly simple:

/* Define the end of RAM and limit of stack memory */
/* (4KB SRAM on the STM32F031x6 line, 4096 = 0x1000) */
/* (RAM starts at address 0x20000000) */
_estack = 0x20001000;

MEMORY
{
    FLASH ( rx )      : ORIGIN = 0x08000000, LENGTH = 32K
    RAM ( rxw )       : ORIGIN = 0x20000000, LENGTH = 4K
}

The first non-comment line defines a value called _estack, which represents the end (hence the ‘e’) of the program’s stack. We set this value to point to the very end of the chip’s RAM. The STM32 chips map their on-chip RAM to 0x20000000 in memory, and this chip has 4096 bytes of RAM (0x1000), so 0x20001000 is just past the boundary of what we can address.

Then we define the ‘MEMORY’ block, which tells the linker how much memory the chip has. We mark ‘flash’ memory as read-only, since that is where the program lives. STM32 chips map their flash memory to start at 0x08000000, and we have 32KB of flash memory available. We mark the RAM as read/write, and as mentioned above it is 4KB long starting at address 0x20000000.

In the next tutorial we will create a ‘SECTIONS’ block which gives the linker some more specific information, but we can ignore that for now. Also, linker scripts usually have a .ld file extension.

The ‘Vector Table’:

One thing that makes microcontrollers so cool is that they have ‘hardware interrupts’. When certain conditions are met, they can immediately jump to an ‘interrupt’ function. And when that function returns, the chip goes back to what it was doing before the interrupt triggered.

That ability comes with a small cost – we need to write a ‘vector table’ to define the locations in memory that the chip should jump to when each specific interrupt triggers. But since most interrupts are disabled by default, we can just ignore them for now. The ‘reset’ handler is the only one that we care about to start with – it defines the function that is run when the system resets or powers on. So let’s start a new file to hold the vector table and reset handler – I’ll call it core.S. Just like .c and .h are for C source and header files, the .S file extension is often used for assembly files.

If you aren’t familiar with assembly, the ‘Thumb’ instruction set used by these chips is fairly simple and contains only a handful of basic commands. Here’s a quick reference if assembly code doesn’t look familiar to you: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0432c/CHDCICDF.html

A core.S assembly file with a very basic vector table could look something like this:

// These instructions define attributes of our chip and
// the assembly language we'll use:
.syntax unified
.cpu cortex-m0
.fpu softvfp
.thumb

// Global memory locations.
.global vtable
.global reset_handler

/*
 * The actual vector table.
 * Only the size of RAM and 'reset' handler are
 * included, for simplicity.
 */
.type vtable, %object
vtable:
    .word _estack
    .word reset_handler
.size vtable, .-vtable

The first few lines just tell the compiler what sort of syntax it should expect, and what sorts of machine commands it can generate. The Cortex-M0 line has no floating-point hardware, so we use .fpu softvfp to make sure that the compiler generates software commands for floating-point calculations. And as mentioned above, ‘Thumb’ is just a name for the ARM Cortex-M instruction set.

The .global lines ensure that the labels we use are available to other files, although we’ll only have this one file for now.

Then we just define the vector table using the arbitrary label, vtable. Only two entries are populated – the first entry marks the ‘end of stack’ address we defined earlier, and the second defines the ‘reset handler’ address. The .word command places a 4-byte value in the program; the compiler will replace our labels with the addresses in memory that they correspond to.

The ‘Hello, World’ Program:

We can write a minimal program in either C or the ‘Thumb’ assembly language used by these microcontrollers – all we have to do is define the main ‘reset handler’ method which we pointed to from the vector table. This code can go after the vector table in the core.S assembly file.

For now, we’ll just load a recognizable hex value (0xDEADBEEF) into the r7 register, and then count up from 0 on the r0 register forever. In ‘Thumb’ assembly, that looks like this:

/*
 * The Reset handler. Called on reset.
 */
.type reset_handler, %function
reset_handler:
  // Set the stack pointer to the end of the stack.
  // The '_estack' value is defined in our linker script.
  LDR  r0, =_estack
  MOV  sp, r0

  // Set some dummy values. When we see these values
  // in our debugger, we'll know that our program
  // is loaded on the chip and working.
  LDR  r7, =0xDEADBEEF
  MOVS r0, #0
  main_loop:
    // Add 1 to register 'r0'.
    ADDS r0, r0, #1
    // Loop back.
    B    main_loop
.size reset_handler, .-reset_handler

Some details about the assembly commands, if you aren’t familiar:

The MOV and MOVS commands move a value from one register to another. We can also use # to set a register to a constant number, but only with ‘immediate’ numbers which can generally only be in the range of 0-255. For larger values, we need to use the LDR command, which loads an entire 4-byte word from memory into a register. The = symbol in front of the hex values is shorthand which tells the compiler to place the given word nearby in memory, then load that address into the register.

The ADDS command is simple addition; you can think of the command, ADDS a, b, c as the equation, a = b + c. So here, we just set r0 = r0 + 1.

The B command is short for ‘Branch’. It tells the program to jump somewhere else. Here, we set a main_loop label before the addition command, and then jump back to it afterwards to make an infinite loop.

Compiling the Code:

Okay, we’re done writing code now – our minimal program will simply count a number up forever. When the number gets to 0xFFFFFFFF in hexadecimal, adding 1 more ‘overflows’ the number and it goes back to 0.

So now, we have to compile and link the program to make a file which we can upload to the microcontroller. We’ll use the arm-none-eabi-gcc toolchain that you downloaded earlier. The STM32F0 line of chips use an “ARM Cortex-M0” architecture, so the following command should produce a usable object file from the core.S file that we created:

arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0 -mthumb -Wall core.S -o core.o

That will create an ‘object file’ called core.o, which we can use to produce a final program. Later, when we have more complex logic across multiple files, we will produce a different object file for each one and combine them together in this final step:

arm-none-eabi-gcc core.o -mcpu=cortex-m0 -mthumb -Wall --specs=nosys.specs -nostdlib -lgcc -T./STM32F031K6T6.ld -o main.elf

This will create a main.elf file. ELF stands for ‘Executable and Linkable Format’, and it is basically a file which we can upload to our chip. Since we gave GCC the correct options (like -mcpu=cortex-m0) and the linker script we wrote earlier (the -T option), it should be set up to work with our specific chip.

We can see a rough outline of what will get written to the chip with the nm command – try running the command, arm-none-eabi-nm main.elf – the output should look like this:

20001000 A _estack
08000010 t main_loop
08000008 T reset_handler
08000000 T vtable

We can see that the vector table – vtable – is at memory offset 0x08000000, which is the very beginning of program memory. If that is not the case, the chip can get confused about what code it should be executing.

Uploading, Running, and Debugging:

Finally, we just have to upload our code to the chip and verify that it works. We don’t have an LED blinking quite yet, but we can still check that the program is running correctly by using GDB to read the r0 and r7 registers which are set by our test program. If you are completely unfamiliar with using GDB to debug a C program, this guide looks like a pretty good crash course.

You can use the programs provided by ST for flashing code, but I usually use the open-source ‘STLink’ tooling written by Texane. With that project, you can simply plug a USB cable into the Nucleo board and enter st-util on a command line; it will open a debugging port on localhost:4242 and print some basic information about the chip. It also works just as well with a USB ‘STLink/V2’-style debugger if you use cheap boards (or your own designs) which only expose the CLK/IO programming pins.

Anyways, once you have the chip plugged in and connected, you can run:

arm-none-eabi-gdb main.elf

…which will start the debugger using our program. Once the debugger is open, we can connect it to the chip with the command, target extended-remote <port>:

(gdb) target extended-remote :4242
Remote debugging using :4242

If GDB cannot connect to the chip at this step, double-check the output of your st-util program and make sure that the chip is listening for a debugger on port 4242. If it’s a different port, use that number instead.

Once you’ve connected to the chip, load the program using the load command – here’s a sample output:

(gdb) load
Loading section .text, size 0x1c lma 0x8000000
Start address 0x8000000, load size 28
Transfer rate: 70 bytes/sec, 28 bytes/write.

If GDB doesn’t know what to load, make sure that your main.elf file exists and that you passed it in as an argument to arm-none-eabi-gdb. You could also try load main.elf if you are running GDB from the directory where the program was compiled.

With the program successfully loaded, we can use the debugger normally. If you are familiar with debugging on GDB already, there aren’t many differences between using ‘regular GDB’ and ‘bare-metal GDB’. It’s good to have a basic familiarity with some sort of debugger for finding problems, but that is a large topic on its own. For now, you can just type continue, wait a few seconds, and then hit Control+C a few times. If it asks, ‘Give up waiting?’, enter y for ‘yes’. After the program has run for a bit and then stopped, you can enter the info registers command, and you should see the values that our program sets in registers r0 and r7:

(gdb) continue
Continuing.
^C^CInterrupted while waiting for the program.
Give up waiting? (y or n) y
Quit
(gdb) info registers

r0 0x189ff2 1613810

r1             0x8000400        134218752
r2             0x0      0
r3             0x0      0
r4             0x40022000       1073881088
r5             0x1      1
r6             0x4      4

r7 0xdeadbeef 3735928559

r8             0xffffffff       4294967295
r9             0xffffffff       4294967295
r10            0xffffffff       4294967295
r11            0xffffffff       4294967295
r12            0xffffffff       4294967295
sp             0x20001000       0x20001000
lr             0xffffffff       4294967295
pc             0x8000010        0x8000010 <reset_handler+8>
cpsr           0x1000000        16777216
(gdb)

We can see that the r0 register has a number value which counts up if we step through the program, and r7 has the recognizable value 0xDEADBEEF. Registers r1 through r6 hold values which we didn’t define, so they could be anything. And with that, you have a basic assembly programming up and running on the chip!

Conclusions

So in this post, we covered the bare minimum amount of code required to upload and run a program on an STM32 chip. You can find the full code with a Makefile on Github here: https://github.com/WRansohoff/STM32F0_minimal

In the next post, we will talk about the different ‘sections’ of memory which most programs use, and extend the linker script to account for them. We will also write some simple ‘boot code’ for copying important data into RAM when the chip starts up, and write a Makefile to compile the project for us. After that, we will finally get around to writing a C program to blink an LED.

I hope this was helpful or informative, and please feel free to let me know if any of the information presented here is inaccurate or could be explained more clearly.

I should also mention the ‘STM32CubeMX‘ tooling provided by ST – it can auto-generate initialization and peripheral code for you, and it has a lot of useful examples. But it’s also nice to learn about how these chips work at a low level, for debugging and writing performant code.

Comments (32):

  1. Eli

    October 19, 2018 at 9:36 am

    Hi,

    I have a question but first, great article.

    this post has inspired me to buy an STM32 MCU, because I’ve been working with the TI-RSLK (robotic kit) that uses the MSP432 controller , with Code Composer Studio, but the course and projects, etc.. all have to do with C and I really want to learn as much about the lowest layers as possible. I’ve been researching how to create an assembly project from bare bones, and stumbled on your article.

    Hopefully what I learn here will help me when I get back to the MSP432.

    My question: in the “core.s” file, the directive “.cpu cortex-m0”.. the Nucleo I bought is the STM32F303RE and what I found says that it has the Cortex-M4. So I’m not sure about this part.

    Reply
    • Vivonomicon

      November 12, 2018 at 10:21 am

      Oh, cool – I hope you find these introductions helpful. I am hoping to come back and update some of the toolchain and assembly instructions with some more complete information.

      ARM’s ‘Mobile’ cores have a few different options. The STM32F0 chips use a Cortex-M0 core, but they are designed to be cheap and simple. The STM32F3 and MSP432 chips both use a Cortex-M4F core which is faster, can do floating-point math much more quickly, etc.

      Usually you can just change the ‘cpu’ or ‘mcpu’ option to the type of core used in your chip, which would be ‘cortex-m4’ in this case. But the more advanced Cortex-M cores have special hardware for floating-point math, so you might also need a few more options in your build script. I’m still not 100% clear on how that works, but here is an example of which GCC settings worked for me when switching to an STM32F3 core from an STM32F0.

      You might also be able to adapt this GCC/Make build system to the MSP432 since it can use the same ‘arm-none-eabi-gcc’ compiler, you’d just have to figure out the linker script/vector table/register macros/etc. I hope that helps – good luck!

      Reply
  2. Fatih

    April 27, 2019 at 3:21 am

    Hi,

    great article,

    thanks.

    Reply
  3. Hoang Duong

    May 1, 2019 at 6:23 pm

    This is clear tutorial i ever read about ARM bare meter on internet. Thanks so much.

    Reply
  4. tubo

    June 30, 2019 at 1:59 am

    thank you very much for this information. can i contact you privately?

    Reply
    • Vivonomicon

      July 6, 2019 at 2:45 pm

      Feel free to reach out to vivonomicon @ (gmail). Sorry if I don’t respond in a timely manner, sometimes life happens and I have to put this blog and these projects on the backburner for a little bit.

      Reply
  5. Ngô Hùng Cường

    July 9, 2019 at 8:02 am

    Many Thanks.

    Reply
  6. Aditya

    August 4, 2019 at 7:10 am

    Hi! I am planning to do baremetal programming on FRDM K82F unlike your CortexM0 is has M4 core! can you please guide me how can I upload it to the board since STM softwares wont work? I can manage rest of the things like updating linker scripts based on K82F, thanks

    Reply
    • Vivonomicon

      August 22, 2019 at 10:32 am

      Different types of microcontrollers usually require different tooling for debugging and uploading code – one reason why I like the STM32 chips is that they have a pretty good set of open-source tools to do that. Also, the Cortex-M0 / M4 / etc core is only a small part of the microcontroller – there are STM32 chips with Cortex-M4 cores, and there are Kinetis chips with Cortex-M0(+) cores.

      I haven’t used NXP’s ARM cores, so I don’t know much about the software ecosystem which supports those chips. You’ll probably need to either use whatever software NXP provides, or search for open-source alternatives if any exist. Good luck!

      Reply
  7. Pat

    September 22, 2019 at 7:49 am

    Small error that happened on my setup that no-one has commented but in the ‘The ‘Linker Script’:’ section there is an error on line 3 where there should be an end of comment ‘*/’ but there is not.

    This causes an error with the ld.exe for me – a simple addition of the end of comment fixes it.

    Reply
    • Vivonomicon

      October 15, 2019 at 2:17 pm

      Oh, you’re right – thank you! I’ve updated the post to close that comment.

      Reply
  8. Manu Prakash

    October 31, 2019 at 8:25 am

    I have tried stm32cubeMX and stm32cubeIDE for bluepill (stm32f103) , but none has worked for LED toggling. In Keil I have successfully toggled the LED.
    I am not sure where I go wrong. Should I try gcc in linux environment; help me please.

    Reply
    • Vivonomicon

      December 4, 2019 at 6:40 am

      It’s hard to say what would fit your needs best, sorry. Using a minimal GCC environment is a good way to learn how the chips work at a low level, but IDEs like CubeMX and Keil are usually more user-friendly and better for getting an application working quickly. ST and Keil also have community forums which are good places to ask specific questions about using their IDEs, for example:

      https://community.st.com/s/topic/0TO0X000000BTr8WAG/stm32cubemx

      I’m also sorry that these first couple of introductions are sort of rough around the edges – I’ve learned some slightly simpler ways to set things up in the meantime and I keep meaning to update these posts.

      Reply
    • thegi

      December 25, 2019 at 2:08 am

      Are you the “Manu Prakash” from Stanford? I’m a big fan of you.

      Reply
  9. dfirmansyah

    January 24, 2020 at 8:16 am

    Hi, thanks for a really good article.

    I also doing a bit of microcontroller programming (using c/c++), but never touch the lower layer. Pretty much just setting up the development environment using tools the vendor like ST provide.

    This kind of writing is really inspiring for me.
    Even though I can’t do asm programming, but your explanation is so easy to follow and understand. In fact, I got new knowledges and better understanding about mcu inner working.

    Btw, I stumbled here when searching about how to developing esp32 without esp-idf. And your article made me itch to set aside my ESP and back to ST again 😀

    Reply
    • Vivonomicon

      February 11, 2020 at 3:39 pm

      Oh, thanks! I’m glad that you found it helpful. It is kind of fun to learn more about how these things work at a low level, even if it’s not exactly the fastest way to write an application from scratch.

      I keep meaning to get back to the ESP32, especially since I didn’t get very far in that first example post. Those chips are surprisingly complicated, and it’s sort of odd how they don’t have internal rewrite-able program memory, but people have done some really impressive things with them.

      There’s too much to learn, huh? Good luck with your projects and thanks for the kind words!

      Reply
  10. Charles Miller

    March 16, 2020 at 10:15 am

    V,
    Excellent set of articles.

    There is one tip you might want to include when setting up the stack. If _estack is 0x20001000, and one pushes or calls in the Reset_Handler, then one gets an exception, because writing at 0x20001000 is illegal. I always define _estack to be _esram – 4 (where _esram = 0x20001000). The MSP ends up being 0x2000FFFC (where one can write), and you only waste 4 bytes if you don’t need it.

    Safety first!

    Cheers.

    Reply
    • Vivonomicon

      March 16, 2020 at 1:36 pm

      Oh, thank you for the suggestion! I wrote these first couple of posts as I learned about these Cortex-M cores, and I keep meaning to go back and fix them up a bit. I’ll work this in when I get a chance to.

      Thanks again, it’s always useful to hear about these sorts of infrequent hazards.

      Reply
      • Ralph Doncaster

        March 17, 2020 at 8:35 am

        Actually, you are not “wasting” any space. With SP set to 0x2000FFFC, the first push will write a size_t (4-byte in this case) register to that address, and decrement the stack pointer by 4 bytes. In terms of byte addresses, RAM goes from 0x2000000 to 0x2000FFFF.
        I also think you can remove the SP initialization code because it gets loaded from the vector table on reset; SP gets loaded from 0x00000000 and PC gets loaded from 0x00000004.

        Reply
        • Vivonomicon

          April 14, 2020 at 11:46 am

          Good to know, thank you – it can be hard to find information about some of these low-level details.

          Reply
    • Ralph Doncaster

      March 18, 2020 at 7:58 am

      I looked at the STM32 docs, and I think the 0x20001000 is actually correct. While on most other MCUs use a post-decrementing stack pointer, the ARM cortex-M uses a pre-decrement. This means a push subtracts 4 from the SP, then writes the register contents to that location in RAM.

      Reply
  11. Charles Miller

    March 16, 2020 at 10:17 am

    Correction: The msp ends up being 0x20000FFC.

    Reply
  12. Fred Wex

    April 8, 2020 at 1:39 pm

    Getting an error when trying to compile the file in ubuntu:
    arm-none-eabi-gcc: error: =: No such file or directory
    arm-none-eabi-gcc: error: cortex-m0: No such file or directory
    arm-none-eabi-gcc: error: unrecognized command line option ‘-mcpu’; did you mean ‘-Wcpp’?

    Reply
    • Vivonomicon

      April 14, 2020 at 11:21 am

      Sorry to hear that you’re having trouble; what is the exact command that you’re trying to run? There shouldn’t be any spaces between the ‘=’ symbol and the flags / values when you call GCC.

      Reply
  13. Srikanth

    July 3, 2020 at 4:56 am

    Hi Vivonomicon,
    Thank you very much for your series on bare metal programming. Your explanation is simple and very easy to understand.

    I have started with STM32G071RB.

    While compiling the core.s file, I used the command
    “C:\Users\XXXX\Desktop\STM32 BARE METAL PROGRAMMING>arm-none-eabi-gcc -x assembler-with-cpp -c -O0 -mcpu=cortex-m0plus -mthumb -Wall core.s -o core.o”

    But i am getting error:
    “core.s: Assembler messages:
    core.s:42: Error: invalid offset, value too big (0xFFFFFFFC)”

    My configuration of stmf32g071rb.ld file are (Copied from Stm32cubeIDE for understanding my stm32g0 chip)

    _estack = ORIGIN(RAM) + LENGTH(RAM);
    MEMORY
    {
    FLASH (rx) : ORIGIN = 0x8000000, LENGTH =128K
    RAM (rxw) : ORIGIN = 0x20000000, LENGTH = 36K
    }
    Please guide what may be wrong. I double checked size of FLASH & RAM as per datasheets, they are correct.
    I compiled my program from the GCC compiler through command line and not through CUBE IDE.

    Reply
    • Vivonomicon

      July 3, 2020 at 10:59 am

      Sorry to hear that you’re having trouble. I just tried compiling this code with the command you listed above, and it compiled without errors.

      Just a guess, but did you omit the “=” sign in the `LDR r7, =0xDEADBEEF` command? If I change that line to `LDR r7, 0xDEADBEEF`, I get a similar assembly error.

      The “=” symbol tells the assembler that it is loading a literal value into the register, instead of referring to a label or memory offset. There is no “load 32-bit value” machine code instruction, so this syntax is actually a “pseudo-operation”. The assembler will place the “0xDEADBEEF” value somewhere nearby in memory, then translate the load instruction to fetch the data from that memory address using a relative offset. You can see that by running `objdump` on the object file once it compiles:


      >arm-none-eabi-objdump -d core.o
      [...]
      00000008 :
      8: 4802 ldr r0, [pc, #8] ; (14 )
      a: 4685 mov sp, r0
      c: 4f02 ldr r7, [pc, #8] ; (18 )
      e: 2000 movs r0, #0

      00000010 :
      10: 3001 adds r0, #1
      12: e7fd b.n 10

      14: 00000000 .word 0x00000000
      18: deadbeef .word 0xdeadbeef

      The assembler put the 32-bit value at the very end of the program, and pointed the `LDR` machine code instruction to that address. I hope that helps – good luck! If you still have trouble, try comparing your ‘core.s’ file to the example on GitHub:

      https://github.com/WRansohoff/STM32F0_minimal/blob/master/core.S

      Reply
  14. R Swanson

    July 11, 2020 at 2:28 pm

    Hello, thanks for the great tutorial. It’s a light shining in a very dark jungle. Just got started on STM MCU’s earlier this week and I was quite discouraged the first couple of days at how steep the learning curve is for these things, and I write software for a living. After a few days I realized I wasn’t going to learn anything auto-generating code, fiddling with graphical pins, and cobbling together source code from the net trying to get this working. On top of that I jumped into the deep-end and got a H755ZI-Q so I have to make two of everything, lol. I figure I’ll just keep the M4 asleep for now. Anyway, enough chat, I have a question.

    So I’m following your tutorial without any issues, but I’m on Windows 10 and too damn lazy to setup my Ubuntu VM to write makefiles and build this thing like someone who isn’t a noob. So I’m using the STMCubeIDE to do the build part and it seems to build everything just fine; however, when I nm on the M7 ELF file I get some different symbol names than the ones in your tutorial. This is what nm produces for me:

    $ nm bare-metal-stm32-1_CM7.elf
    24080000 A _estack
    08000005 T _fini
    08000001 T _init
    08000000 t $t
    08000004 t $t

    STMCubeIDE Debug Defaults:
    GCC Assembler: gcc -mcpu=cortex-m7 -g3 -c -x assembler-with-cpp –specs=nano.specs -mfpu=fpv5-d16 -mfloat-
    abi=hard -mthumb
    GCC Compiler: gcc -mcpu=cortex-m7 -std=gnu11 -g3 -DUSE_HAL_DRIVER -DCORE_CM7 -DDEBUG -DSTM32H755xx
    -c -I../Core/Inc -O0 -ffunction-sections -fdata-sections -Wall -fstack-usage –specs=nano.specs -fpu=fpv5-
    d16 -mfloat-abi=hard -mthumb
    GCC Linker: -mcpu=cortex-m7 -T”C:\Users\—-\STM32CubeIDE\workspace_1.3.0\bare-metal-stm32-
    1\CM7\STM32H755ZITX-M7.ld” –specs=nosys.specs -Wl,-Map=”${ProjName}.map” -Wl,–gc-sections –
    static –specs=nano.specs -mfpu=fpv5-d16 -mfloat-abi=hard -mthumb -Wl,–start-group -lc -lm -Wl,–end-
    group

    Are those symbols functionally correct? If so why do they come out like that?

    Bonus Question: I tried to run the debugger on the IDE and it built and downloaded just fine. After a few seconds however it times out with the message below. Just curious why this happens? Anyway to resolve the issue? If not I can always debug it on my VM.

    (Read)Failed determine breakpoint type
    Error! Failed to read target status
    Debugger connection lost.
    Shutting down…

    Thanks for the tutorials and any help you can provide

    Reply
    • Vivonomicon

      July 26, 2020 at 1:33 pm

      Aw, thanks for the kind words. Yeah, the STM32Cube IDE is great for getting started quickly, but it can sort of obscure how things work at a low level. The H7 line is a bit more complex than the Cortex-M0/M0+ devices that I wrote these first posts about, but the basic concepts should still apply.

      In this case, it looks like your vector table is not getting placed at the start of Flash memory (0x08000000). The second entry in the vector table tells the chip where the reset handler is located, so that could explain why your debugger might have trouble stepping through your application after it gets loaded.

      This example uses an over-simplified linker script which has no “SECTIONS” attributes; the next post talks about how those work. And it looks like your IDE is using one of ST’s standard linker scripts (…/STM32H755ZITX-M7.ld), which is significantly more complex. If you open that linker script, you can see what they expect the vector table to be called by looking at the first section in Flash memory; it’s probably something like “.isr_vector”. If you add this line right above the ‘vtable:’ definition, that might make the linker put it in the right place:

      .section .isr_vector,"a",%progbits

      You might also need to omit the “USE_HAL_DRIVER” option. But I’m not completely sure; I haven’t actually used the STM32Cube IDE to flash or debug code before. For these very low-level examples, it would probably be easiest to use a raw GCC toolchain. You should be able to use the C code in later posts with the official IDE and build scripts (maybe after renaming some things like interrupt handlers), but these early posts were sort of written to explain what those IDEs do in the background to load/run/debug your code.

      Good luck! Sorry that I can’t provide more insight.

      Reply
  15. sajeev sankaran

    August 4, 2020 at 9:39 pm

    Master V,
    I am newbie to programming field. Newfound hobby. I started out with Atmel Attiny13, self learned little bit of AVR assembly and successfully did some programming such as software I2C bitbanging etc. I would love to learn ARM assembly for some hobby projects . I have no idea of any high level languages and it didnt go into my head when i tried. Assembly seems to be OK for me to understand. Is there any standalone assembler for ARM chips ? User friendly IDE like ATMEL STUDIO you can suggest for ARM? Is C necessary for ARM or Assembly can do? Your article here is good and inspiring. Is there a tutorial you can direct me like “ARM assembly for MORONS’.
    Yuor reply in this regard will be highly appreciated.
    Sajeev

    Reply
    • Vivonomicon

      August 8, 2020 at 11:11 am

      Oh, welcome – I hope you’re enjoying the hobby! I try to use C when I can, so I don’t know a ton about ARM assemblers, but the GCC toolchain does include an assembler (“as”) and its C compiler can also act as an assembler if you pass it the “-x assembler-with-cpp” option.

      You might look at a book series called “The Definitive Guide to Cortex-M[x] Processors” by Joseph Yiu. There are two versions, one for Cortex-M3 / -M4 chips and one for Cortex-M0 / -M0+ chips. They cover the ARMv7m instruction set and hardware pretty well, but they don’t contain many specific assembly examples.

      As for IDEs, most vendors distribute their own (like STM32Cube), but they’re usually forks of Eclipse which only work with chips made by that company. If you want an IDE that could work with many types of ARM chips, maybe look into the PlatformIO extension for VSCode?

      Reply
  16. MC

    September 9, 2020 at 10:46 am

    I just wanted to thank you for your series of articles. I’m new to MCUs in general, and have mostly got by on the Arduino IDE. Bare metal is, indeed, daunting. so I appreciate the guidance. I tried the Cube thing not all that long ago, and found it a confused, bloated, mess. Mbed online seems slightly better, but it does seem an amalgam of current and deprecated modules.

    Reply
    • Vivonomicon

      September 10, 2020 at 11:18 pm

      Thanks, I’m glad you’re finding them helpful!

      It does seem like a lot of embedded tools are sort of bloated, which is why I like learning about minimal bare-metal development. The official and widely-used tools are good for collaboration, but it’s also nice to learn how things work at a low level.

      I definitely don’t envy the people who maintain cross-platform embedded development platforms and drivers. You try to come up with nice universal APIs, but you end up needing to plaster on all kinds of exceptions and special cases to get it to work with real hardware.

      Reply

Leave a Reply to Hoang Duong Cancel reply

Your email address will not be published. Required fields are marked *