Get Started with ARM Assembly on the Pi Pico

When I got my first microcomputer, I already knew Basic programming. My machine had a different Basic dialect from the one I’d learned at school, and there was a stack of graphics and sound functionality to get to grips with too, but it wasn’t long before I felt I’d mastered the high-level stuff and that it was time to move on to machine code. That’s how I’ve come to feel about the Raspberry Pi Pico’s RP2040 chip. The time’s right to learn ARM assembly programming on the Pico.

The Raspberry Pi RP2040 chip

I’ve tried to condense what I’ve learned into a series of posts intended to help anyone with a little C/C++ programming experience learn how to add assembly code to existing and new software projects. It’s not a tutorial, more a ‘what you need to know’ guide.

Inside the RP2040

The RP2040 is based on the Cortex-M0+ core design — it has two of them on board — which is well documented by ARM. The M0+ uses the ARMv6-M Thumb instruction set, which comprises memory-saving 16-bit versions of the standard ARMv6-M ISA. The M0+ is still a 32-bit processor, and has a handful of full 32-bit instructions too. However, all but one of these, BL, are for advanced tasks you probably won’t need to worry about yet.

Note You’ll want to grab the RP2040 datasheet early on — it has all the device details you’ll need. You might also want to download the Cortex M0+ Technical Reference Manual from ARM.

The M0+ contains 16 32-bit registers named R0 through R15 and one called PSR. The latter is short for ‘Program Status Register’ and it’s what we used to call the condition code register: the place where the CPU records the effect of the most recently executed instruction. The R registers are a mix of general purpose (R0-12) and special purpose (R13-15) stores. The latter are:

  • R13 — aka the Stack Pointer (SP)
  • R14 — aka the Link Register (LR)
  • R15 — the Program Counter (PC)

Registers R0-3 are also, by convention, used to hold parameter values to be passed to a subroutine or function. Functions with more than four parameters typically use one or more of those parameter registers to hold addresses pointing to data structures in memory.

The calling convention says that the called function can read and modify R0-3, but must preserve the other registers by pushing them onto the stack and then popping them back once it’s work is done. Before returning, it puts its return value into R0 for the calling code to read. The caller shouldn’t assume that R0-3 have not been touched by the called function.

The Thumb instruction set’s 16-bit format typically combines the instruction’s opcode and values to identify which registers the operation will be performed on. Usually there are one or two source registers (Rm and Rn, in ARM parlance) to specify, and a third, destination register (Rd) which will receive the result. There are usually only three bits available to identify a given register, limiting some ops to only the first eight registers, R0-7. For this reason, we’ll most commonly use those ‘low’ registers, rather than the ‘high’ registers, R8-12.

The Build System

The good news is that turning assembly language programs into executable code works with the same build system that you use for C/C++ development. This means a project folder containing the usual CMakeLists.txt and pico_sdk_import.cmake files, and your source code, saved as either a .s or a .S file. You’ll have installed the Pico SDK already and have the environment variable, PICO_SDK_PATH, pointing to it. The one thing you need to add to your CMakeLists.txt file is to add ASM to your initial project() command, after the C and CXX entries:

        DESCRIPTION "RP2040 application in assembly"

Those changes made, you can prep the assembly process with the usual

cmake -S . -B build 

command issued at you project’s root directory and then

cmake --build build 

to assemble the code, compile any additional C or C++, link all the parts together, and write the program out as a .UF2 file ready for copying to a Pico.

The First Example

What’s the difference between .s and .S? The former indicates a pure assembly file, the latter an assembly file that may contain C code, such as a Pico SDK library #include, or call C functions. As you’ll see from the source files in my pico-asm repo on GitHub, which contains the example code from this series, you’re likely to be making heavy use of the SDK, so you should get used to using the .S file extension.

In fact, call up the first example’s main.S file now to check out how Pico assembly code is formatted.

As you can see, multiline comments use the usual /* and */ delimiters. Single-line comments are customarily prefixed with @, but you can use // if you prefer.

Note I’ll used capitals here for assembler mnemonics — the text codes that represent processor instructions — and register names, but I’ve used lower case in the code itself. The assembler is happy with either — use the format you prefer.

Items prefixed with a full stop are assembler directives or macros. Of the former, .EQU is used to set a constant (the assembler equivalent of #define), .ALIGN to specify a byte alignment (I’ll talk about this in a future post), while .GLOBAL indicates the scope of the identifier that follows it, here main. There are others used too: .WORD and .ASCIZ, which indicate data types.

.thumb_func is a macro while tells the assembler we’re working with the Thumb instruction set. This is very important. Many ARM chips can switch between the 32-bit ISA and Thumb because they support both. The RP2040 only supports Thumb so we need to prep the assembly process to ensure that external library code that our own source calls won’t switch to the unsupported mode. This macro does that.

You mark out units of code with labels, which are declared by suffixing them with a colon, as I’ve done with main:, loop: and info:. Declare a label by placing it alongside an instruction that your code may branch to. The label is a proxy for that instruction’s address. It’s just like naming a function and then later calling it.

Let’s look at the code, starting at main:. The first line’s BL instruction stands for ‘branch with link’ and transfers execution to the code at the label stdio_init_all. This should be familiar as the name of a Pico SDK function. Most, if not all, SDK functions are also defined as assembler labels so we can use them in code. I do so again a few lines down to call the sleep_ms function.

How do I pass the number of milliseconds to sleep to that function? Recall the use of registers R0-3 as function parameters? That’s how. sleep_ms reads R0 to get the sleep period. The second line moves the value 0x80 (128) into R0 and the line after logically shifts that value four bits to the left — multiplying by 16 to get 2048. sleep_ms then halts execution for just over two seconds (2000ms). The # tells the assembler the 0x80 and the 0x04 on the next line are literal values.

Why not move the value 2000 into R0? Because you can’t. The MOV instruction can be used to move the contents of one register, or an 8-bit value, into another register. That it will only accept an 8-bit immediate value is a result of the Thumb’s 16-bit format: once the op and the target register have been specified, there are no more bits left for a number greater than 255. So we have to set R0 with a small value and then multiply it up to the figure we want.

There is an alternative. Rather than use the MOV instruction, we could load the register (LDR) with the contents of one to four bytes of memory, having first put the sleep period into that memory. This is what we do at line 52: load R0 with the 32-bit value stored in the bytes labelled DELAY_TIME_MS. I’ve included both approaches to show them. The first method uses four bytes for two instructions; the second requires five bytes — one for the instruction, four to hold the address that’ll be loaded. For a little program like this, that ‘wasted’ byte doesn’t matter, but when your code is pushing the limits of the RP2040’s RAM, you may need to save bytes wherever you can.

Toggle GPIOs

Back to the code; you can probably work out the rest. It sets R0 to the GPIO pin number the Pico’s LED is attached to and calls SDK functions to initialise the pin and then set its direction. The loop: code uses another SDK function to toggle the pin. Delays ensure we can see the flash, and we just branch back to the start of the loop (the B instruction is ‘branch always’) to keep it flashing.

The gpio_get_state and gpio_set_direction functions have two parameters, so both — as per convention — also require value to be placed in R1: respectively, the pin state and the desired direction.

You might notice something else about those functions: they are not SDK functions. How come? And what’s the sdk_inlines.c file for? The SDK functions we want to use are gpio_set_dir() and gpio_put(). These are declared as inline C functions. In other words, the compiler is told to drop in the function’s code wherever it’s called, and not to apply it as a separate block of the code that the program jumps to — what take place with a regular function. This approach is convenient for C, but not assembly, so we have to embed the SDK calls in functions that will be implemented as code we can jump to. Those functions are defined in sdk_inlines.c to wrap the inline compiled SDK code.

The info: section, which code jumps to and from before running the LED flash loop, uses one of the M0+’s other registers, CPUID. This is one of a large number of registers that have a specific role and are accessed as memory locations. A glance at the ARM ARMv6-M reference manual shows CPUID’s address is 0xE000ED00, so we store that in four bytes of memory at the tail of the program and with a label, CPUID:, to make accessing it easy. The .WORD directive tells the assembler to reserve four bytes for it.

Also stored ahead of time are four Ascii text strings. Look closely and you’ll see they are printf() format strings. The stanzas within info: each make use of these and the value read from the CPUID register to extract data about the CPU and print it via the SDK’s output mechanism. If you assemble the code and run it, you’ll need a program like minicom to view the STDIO readout.

I put the address of the CPUID register into R5 and then read the contents of R5 into the same register. The square brackets in line 68 tell the assembler that we want the contents of the memory pointed to by the address in R5, rather than the value in R5 itself. Why use R5? Recall the calling convention: function parameters go into R0-3, so we can’t assume those registers’ contents won’t be changed by functions we call. But if the functions are well behaved, we know they won’t affect R5. In each stanza we copy the value in R5 into R1 so it can be fed into the printf function.

printf expects R0 to contain the address of the format string, and R1 to contain a value to be interpolated into that string. In each stanza we have to do some bit manipulation on the value in R1 so it only contains the field we’re interested in. R0 is loaded with the string addresses — the equals sign indicates we want the address of the label, not the bytes at that address.

That’s plenty for now. Next time: more mnemonics…

More ARM Assembly on the Raspberry Pi Pico